HOME > IT & Software > Apache Spark 3 for Data Engineering Analytics with Python

Apache Spark 3 for Data Engineering Analytics with Python

SynopsisApache Spark 3 for Data Engineering & Analytics with Pyth...
Apache Spark 3 for Data Engineering Analytics with Python  No.1

Apache Spark 3 for Data Engineering & Analytics with Python, available at $69.99, has an average rating of 4.33, with 89 lectures, based on 606 reviews, and has 7423 subscribers.

You will learn about Learn the Spark Architecture Learn Spark Execution Concepts Learn Spark Transformations and Actions using the Structured API Learn Spark Transformations and Actions using the RDD (Resilient Distributed Datasets) API Learn how to set up your own local PySpark Environment Learn how to interpret the Spark Web UI Learn how to interpret DAG (Directed Acyclic Graph) for Spark Execution Learn the RDD (Resilient Distributed Datasets) API (Crash Course) Learn the Spark DataFrame API? (Structured APIs) Learn Spark SQL Learn Spark on Databricks Learn to Visualize (Graphs and Dashboards) Data on Databricks This course is ideal for individuals who are Python Developers who wish to learn how to use the language for Data Engineering and Analytics with PySpark or Aspiring Data Engineering and Analytics Professionals or Data Scientists / Analysts who wish to learn an analytical processing strategy that can be deployed over a big data cluster or Data Managers who want to gain a deeper understanding of managing data over a cluster It is particularly useful for Python Developers who wish to learn how to use the language for Data Engineering and Analytics with PySpark or Aspiring Data Engineering and Analytics Professionals or Data Scientists / Analysts who wish to learn an analytical processing strategy that can be deployed over a big data cluster or Data Managers who want to gain a deeper understanding of managing data over a cluster.

Enroll now: Apache Spark 3 for Data Engineering & Analytics with Python

Summary

Title: Apache Spark 3 for Data Engineering & Analytics with Python

Price: $69.99

Average Rating: 4.33

Number of Lectures: 89

Number of Published Lectures: 89

Number of Curriculum Items: 89

Number of Published Curriculum Objects: 89

Original Price: $19.99

Quality Status: approved

Status: Live

What You Will Learn

  • Learn the Spark Architecture
  • Learn Spark Execution Concepts
  • Learn Spark Transformations and Actions using the Structured API
  • Learn Spark Transformations and Actions using the RDD (Resilient Distributed Datasets) API
  • Learn how to set up your own local PySpark Environment
  • Learn how to interpret the Spark Web UI
  • Learn how to interpret DAG (Directed Acyclic Graph) for Spark Execution
  • Learn the RDD (Resilient Distributed Datasets) API (Crash Course)
  • Learn the Spark DataFrame API? (Structured APIs)
  • Learn Spark SQL
  • Learn Spark on Databricks
  • Learn to Visualize (Graphs and Dashboards) Data on Databricks
  • Who Should Attend

  • Python Developers who wish to learn how to use the language for Data Engineering and Analytics with PySpark
  • Aspiring Data Engineering and Analytics Professionals
  • Data Scientists / Analysts who wish to learn an analytical processing strategy that can be deployed over a big data cluster
  • Data Managers who want to gain a deeper understanding of managing data over a cluster
  • Target Audiences

  • Python Developers who wish to learn how to use the language for Data Engineering and Analytics with PySpark
  • Aspiring Data Engineering and Analytics Professionals
  • Data Scientists / Analysts who wish to learn an analytical processing strategy that can be deployed over a big data cluster
  • Data Managers who want to gain a deeper understanding of managing data over a cluster
  • The key objectives of this course are as follows;

  • Learn the Spark Architecture

  • Learn Spark Execution Concepts

  • Learn Spark Transformations and Actions using the Structured API

  • Learn Spark Transformations and Actions using the RDD (Resilient Distributed Datasets) API

  • Learn how to set up your own local PySpark Environment

  • Learn how to interpret the Spark Web UI

  • Learn how to interpret DAG (Directed Acyclic Graph) for Spark Execution

  • Learn the RDD (Resilient Distributed Datasets) API (Crash Course)

  • RDD Transformations

  • RDD Actions

  • Learn the Spark DataFrame API  (Structured APIs)

  • Create Schemas and Assign DataTypes

  • Read and Write Data using the DataFrame Reader and Writer

  • Read Semi-Structured Data such as JSON

  • Create and New Data Columns to the DataFrame using Expressions

  • Filter the DataFrame using the “Filter” and “Where” Transformations

  • Ensure that the DataFrame has unique rows

  • Detect and Drop Duplicates

  • Augment the DataFrame by Adding New Rows

  • Combine 2 or More DataFrames

  • Order the DataFrame by Specific Columns

  • Renaming and Drop Columns from the DataFrame

  • Clean the DataFrame by detecting and Removing Missing or Bad Data

  • Create  User-Defined Spark Functions

  • Read and Write to/from Parquet File

  • Partition the DataFrame and Write to Parquet File

  • Aggregate the DataFrame using Spark SQL functions (count, countDistinct, Max, Min, Sum, SumDistinct, AVG)

  • Perform Aggregations with Grouping

  • Learn Spark SQL and Databricks

  • Create a Databricks Account

  • Create a Databricks Cluster

  • Create Databricks SQL and Python Notebooks

  • Learn Databricks shortcuts

  • Create Databases and Tables using Spark SQL

  • Use DML, DQL, and DDL with Spark SQL

  • Use Spark SQL Functions

  • Learn the differences between Managed and Unmanaged Tables

  • Read CSV Files from the Databricks File System

  • Learn to write Complex SQL

  • Use Spark SQL Functions

  • Create Visualisations with Databricks

  • Create a Databricks Dashboard

  • The Python Spark project that we are going to do together;

    Sales Data

  • Create a Spark Session

  • Read a CSV file into a Spark Dataframe

  • Learn to Infer a Schema

  • Select data from the Spark Dataframe

  • Produce analytics that shows the topmost sales orders per Region and Country

  • Convert Fahrenheit to Degrees Centigrade

  • Create a Spark Session

  • Read and Parallelize data using the Spark Context into an RDD

  • Create a Function to Convert Fahrenheit to Degrees Centigrade

  • Use the Map Function to convert data contained within an RDD

  • Filter temperatures greater than or equal to 13 degrees celsius

  • XYZ Research

  • Create a set of RDDs that hold Research Data

  • Use the union transformation to combine RDDs

  • Learn to use the subtract transformation to minus values from an RDD

  • Use the RDD API to answer the following questions

  • How many research projects were initiated in the first three years?

  • How many projects were completed in the first year?

  • How many projects were completed in the first two years?

  • Sales Analytics

  • Create the Sales Analytics DataFrame to a set of CSV Files

  • Prepare the DataFrame by applying a Structure

  • Remove bad records from the DataFrame (Cleaning)

  • Generate New Columns from the DataFrame

  • Write a Partitioned DataFrame to a Parquet Directory

  • Answer the following questions and create visualizations using Seaborn and Matplotlib

  • What was the best month in sales?

  • What city sold the most products?

  • What time should the business display advertisements to maximize the likelihood of customers buying products?

  • What products are often sold together in the state “NY”?

  • Technology Spec

    1. Python

    2. Jupyter Notebook

    3. Jupyter Lab

    4. PySpark (Spark with Python)

    5. Pandas

    6. Matplotlib

    7. Seaborne

    8. Databricks

    9. SQL

    Course Curriculum

    Chapter 1: Introduction to Spark and Installation

    Lecture 1: Introduction

    Lecture 2: The Spark Architecture

    Lecture 3: The Spark Unified Stack

    Lecture 4: Windows – Download Java

    Lecture 5: Windows – Install Java

    Lecture 6: Windows – Set up Java environment variables

    Lecture 7: Windows – Download Python Installer

    Lecture 8: Windows – Install Python

    Lecture 9: Windows – Set up PATH variable for Python

    Lecture 10: Windows – Install Spark for Python

    Lecture 11: Windows – PySpark Test Program

    Lecture 12: Hadoop Installation

    Lecture 13: Install Microsoft Buid Tools

    Lecture 14: Mac OS – Java Installation

    Lecture 15: Mac OS – Python Installation

    Lecture 16: Mac OS – PySpark Installation

    Lecture 17: Mac OS – Testing the Spark Installation

    Lecture 18: Install Jupyter Notebooks

    Lecture 19: The Spark Web UI

    Lecture 20: Section Summary

    Chapter 2: Spark Execution Concepts

    Lecture 1: Section Introduction

    Lecture 2: Spark Application and Session

    Lecture 3: Spark Transformations and Actions Part 1

    Lecture 4: Spark Transformations and Actions Part 2

    Lecture 5: DAG Visualisation

    Chapter 3: RDD Crash Course

    Lecture 1: Introduction to RDDs

    Lecture 2: Data Preparation

    Lecture 3: Distince and Filter Transformations

    Lecture 4: Map and Flat Map Transformations

    Lecture 5: SortByKey Transformations

    Lecture 6: RDD Actions

    Lecture 7: Challenge – Convert Fahrenheit to Centigrade

    Lecture 8: Challenge – XYZ Research

    Lecture 9: XYZ Research

    Lecture 10: Challenge – XYZ Research Part 1

    Lecture 11: Challenge XYZ Research Part 2

    Chapter 4: Structured API – Spark DataFrame

    Lecture 1: Structured APIs Introduction

    Lecture 2: Preparing the Project Folder

    Lecture 3: PySpark DataFrame, Schema and DataTypes

    Lecture 4: DataFrame Reader and Writer

    Lecture 5: Challenge Part 1 – Brief

    Lecture 6: Challenge Part 1

    Lecture 7: Challenge Part 1 – Data Preparation

    Lecture 8: Working with Structured Operations

    Lecture 9: Managing Performance Errors

    Lecture 10: Reading a JSON File

    Lecture 11: Columns and Expressions

    Lecture 12: Filter and Where Conditions

    Lecture 13: Distinct Drop Duplicates Order By

    Lecture 14: Rows and Union

    Lecture 15: Adding, Renaming and Dropping Columns

    Lecture 16: Working with Missing or Bad Data

    Lecture 17: Working with User Defined Functions

    Lecture 18: Challenge Part 2 – Brief

    Lecture 19: Challenge Part 2

    Lecture 20: Challenge Part 2 – Remove Null Row and Bad Records

    Lecture 21: Challenge Part 2 – Get the City and State

    Lecture 22: Challenge Part 2 – Rearrange the Schema

    Lecture 23: Challenge Part 2 – Write Partitioned DataFrame to Parquet

    Lecture 24: Aggregations

    Lecture 25: Aggregations – Setting up Flight Summary Data

    Lecture 26: Aggregations – Count and Count Distinct

    Lecture 27: Aggregations – Min Max Sum SumDistinct AVG

    Lecture 28: Aggregations with Grouping

    Lecture 29: Challenge Part 3 – Brief

    Lecture 30: Challenge Part 3

    Lecture 31: Challenge Part 3 – Prepare 2019 Data

    Lecture 32: Challenge Part 3 – Q1 Get the Best Sales Month

    Lecture 33: Challenge Part 3 – Q2 Get the City that sold the most products

    Lecture 34: Challenge Part 3 – Q3 When to advertise

    Lecture 35: Challenge Part 3 – Q4 Products Bought Together

    Chapter 5: Introduction to Spark SQL and Databricks

    Lecture 1: Introduction to DataBricks

    Lecture 2: Spark SQL Introduction

    Lecture 3: Register Account on Databricks

    Lecture 4: Create a Databricks Cluster

    Lecture 5: Creating our First 2 Databricks Notebooks

    Lecture 6: Reading CSV Files into DataFrame

    Lecture 7: Creating a Database and Table

    Lecture 8: Inserting Records into a Table

    Lecture 9: Exposing Bad Records

    Lecture 10: Figuring out how to remove bad records

    Lecture 11: Extract the City and State

    Lecture 12: Inserting Records to Final Sales Table

    Lecture 13: What was the best month in sales?

    Lecture 14: Get the City that sold the most products

    Lecture 15: Get the right time to advertise

    Lecture 16: Get the most products sold together

    Lecture 17: Create a Dashboard

    Lecture 18: Summary

    Instructors

  • Apache Spark 3 for Data Engineering Analytics with Python  No.2
    David Charles Academy
    Senior Big Data Engineer / Consultant at ABN AMRO
  • Rating Distribution

  • 1 stars: 4 votes
  • 2 stars: 7 votes
  • 3 stars: 77 votes
  • 4 stars: 224 votes
  • 5 stars: 294 votes
  • Frequently Asked Questions

    How long do I have access to the course materials?

    You can view and review the lecture materials indefinitely, like an on-demand channel.

    Can I take my courses with me wherever I go?

    Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don’t have an internet connection, some instructors also let their students download course lectures. That’s up to the instructor though, so make sure you get on their good side!