HOME > IT & Software > Problem Solving using PySpark Regression Classification

Problem Solving using PySpark Regression Classification

SynopsisProblem Solving using PySpark – Regression & Classi...
Problem Solving using PySpark Regression Classification  No.1

Problem Solving using PySpark – Regression & Classification, available at $19.99, has an average rating of 5, with 35 lectures, based on 1 reviews, and has 14 subscribers.

You will learn about Data analysis and descriptive statistics with PySpark – Learning to compute essential descriptive statistics for data understanding and summarization Data Cleaning with PySpark Predictive modeling with PySpark using Regression Applying Classification techniques to a real world problem in PySpark Text analytics using PySpark and Spark NLP Time-Series modeling with PySpark and Prophet Introduction to Spark SQL for data querying This course is ideal for individuals who are This course is suited for anyone interested in the realm of analytics using PySpark – particularly useful for analysts and engineers interested in Big Data, someone with a basic knowledge of data science and ML principles It is particularly useful for This course is suited for anyone interested in the realm of analytics using PySpark – particularly useful for analysts and engineers interested in Big Data, someone with a basic knowledge of data science and ML principles.

Enroll now: Problem Solving using PySpark – Regression & Classification

Summary

Title: Problem Solving using PySpark – Regression & Classification

Price: $19.99

Average Rating: 5

Number of Lectures: 35

Number of Published Lectures: 35

Number of Curriculum Items: 35

Number of Published Curriculum Objects: 35

Original Price: ?799

Quality Status: approved

Status: Live

What You Will Learn

  • Data analysis and descriptive statistics with PySpark – Learning to compute essential descriptive statistics for data understanding and summarization
  • Data Cleaning with PySpark
  • Predictive modeling with PySpark using Regression
  • Applying Classification techniques to a real world problem in PySpark
  • Text analytics using PySpark and Spark NLP
  • Time-Series modeling with PySpark and Prophet
  • Introduction to Spark SQL for data querying
  • Who Should Attend

  • This course is suited for anyone interested in the realm of analytics using PySpark – particularly useful for analysts and engineers interested in Big Data, someone with a basic knowledge of data science and ML principles
  • Target Audiences

  • This course is suited for anyone interested in the realm of analytics using PySpark – particularly useful for analysts and engineers interested in Big Data, someone with a basic knowledge of data science and ML principles
  • This course is based on real world problems in PySpark, surrounding Data Cleaning, Descriptive statistics, Classification and Regression Modeling.

    The first segment introduces descriptive statistics in PySpark and computing fundamental measures such as mean, standard deviation and generating an extended statistical summary.

    The second segment is based on cleaning the data in PySpark, working with null values,  redundant data and imputing the null values.

    The third segment is about Predictive modeling with PySpark using Gradient Boosted Trees Regression

    The fourth and fifth segments  are based on applying classification techniques in PySpark. The fourth Segment introduces the application of Spark XGB Classifier for a classification problem and the fifth segment is about using a deep learning model for text sentiment classification.

    The sixth segment is about time series analytics and modeling using PySpark and Prophet

    The seventh segment introduces  Spark SQL for data querying and analysis.

    These segments also include advanced visualization techniques through Seaborn and Plotly libraries including  Box plots to understand the distribution of the data and assessment of outliers, Count plots to understand balance in the proportion of data, Bar chart to represent feature importance as part of the Gradient Boosted Trees Regression Model, Word Cloud for text analytics and analyzing time series data to extract seasonality and trend components.

    Each of these segments, has a Google Colab notebook included aligning with the lecture.

    Course Curriculum

    Chapter 1: Introduction

    Lecture 1: Introduction

    Lecture 2: Problem Solving with PySpark : Regression and Classification

    Chapter 2: Data analysis and descriptive statistics with PySpark

    Lecture 1: Setting up PySpark Environment in Google Colab

    Lecture 2: Understanding Descriptive Statistics in PySpark

    Lecture 3: Understanding Data Filtering and Slicing in PySpark

    Lecture 4: Summary of Descriptive Statistics in PySpark and Quiz

    Chapter 3: Data Cleaning with PySpark

    Lecture 1: Introduction to Data Cleaning with PySpark

    Lecture 2: Setting up PySpark Environment for Data Cleaning on Google Colab

    Lecture 3: Understanding the Dataset : Explanatory Analysis and Data Cleaning with PySpark

    Lecture 4: PySpark Data Cleaning : Assessment of Null Values and Outliers

    Lecture 5: Data Cleaning with PySpark : Imputation Strategy Quiz

    Lecture 6: Introduction to Pivot Tables in PySpark

    Chapter 4: Predictive modeling with PySpark using Regression

    Lecture 1: Introduction to Regression and Classification Problems in PySpark

    Lecture 2: Understanding the Data Set through Explanatory Analysis

    Lecture 3: Correlation Analysis and Data Preparation

    Lecture 4: Modeling the data using Gradient Boosted Trees Regression

    Lecture 5: Understanding Feature Importance

    Lecture 6: Gradient Boosted Trees Regression – Quiz

    Chapter 5: Predictive Modeling with PySpark using Classification

    Lecture 1: Classification Problem Statement : Supervised Machine Learning

    Lecture 2: Data Cleaning and Preparation for XGBoost Classification Model

    Lecture 3: XGBoost Classification Model Pipeline using PySpark

    Lecture 4: Summary of the segment on Spark XGBoost Classifier

    Chapter 6: Text analytics using PySpark and Spark NLP

    Lecture 1: Classification Model for Text Data

    Lecture 2: Understanding the Data for Text Classification

    Lecture 3: Word Cloud : Text Analytics Quiz

    Lecture 4: Spark NLP Pipeline : Classification Model

    Chapter 7: Time Series Analysis and Forecast with PySpark and Prophet

    Lecture 1: Introduction to Time Series Analysis : Setting up the Google Colab Notebook

    Lecture 2: Explanatory Analysis and Data Cleaning

    Lecture 3: Analysis of time series components using advanced visualization techniques

    Lecture 4: Use of Prophet Model for Time Series Forecasting

    Lecture 5: Time Series Forecasting – Quiz

    Chapter 8: Introduction to Spark SQL

    Lecture 1: Introduction to Spark SQL Querying

    Lecture 2: Comparison of PySpark statements and Spark SQL Query

    Lecture 3: Join in Spark SQL

    Lecture 4: Join in Spark SQL – Quiz

    Instructors

  • Problem Solving using PySpark Regression Classification  No.2
    Sathish Jayaraman
    PySpark – Data Cleaning
  • Rating Distribution

  • 1 stars: 0 votes
  • 2 stars: 0 votes
  • 3 stars: 0 votes
  • 4 stars: 0 votes
  • 5 stars: 1 votes
  • Frequently Asked Questions

    How long do I have access to the course materials?

    You can view and review the lecture materials indefinitely, like an on-demand channel.

    Can I take my courses with me wherever I go?

    Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don’t have an internet connection, some instructors also let their students download course lectures. That’s up to the instructor though, so make sure you get on their good side!