HOME > IT & Software > Data pre-processing for Machine Learning in Python

Data pre-processing for Machine Learning in Python

SynopsisData pre-processing for Machine Learning in Python, available...
Data pre-processing for Machine Learning in Python  No.1

Data pre-processing for Machine Learning in Python, available at $79.99, has an average rating of 4.3, with 48 lectures, based on 96 reviews, and has 1835 subscribers.

You will learn about How to fill the missings in numerical and categorical variables How to encode the categorical variables How to transform the numerical variables How to scale the numerical variables Principal Component Analysis and how to use it How to apply oversampling using SMOTE How to use several useful objects in scikit-learn library This course is ideal for individuals who are Python developers or Aspiring data scientists or People interested in machine learning and artificial intelligence It is particularly useful for Python developers or Aspiring data scientists or People interested in machine learning and artificial intelligence.

Enroll now: Data pre-processing for Machine Learning in Python

Summary

Title: Data pre-processing for Machine Learning in Python

Price: $79.99

Average Rating: 4.3

Number of Lectures: 48

Number of Published Lectures: 48

Number of Curriculum Items: 48

Number of Published Curriculum Objects: 48

Original Price: $29.99

Quality Status: approved

Status: Live

What You Will Learn

  • How to fill the missings in numerical and categorical variables
  • How to encode the categorical variables
  • How to transform the numerical variables
  • How to scale the numerical variables
  • Principal Component Analysis and how to use it
  • How to apply oversampling using SMOTE
  • How to use several useful objects in scikit-learn library
  • Who Should Attend

  • Python developers
  • Aspiring data scientists
  • People interested in machine learning and artificial intelligence
  • Target Audiences

  • Python developers
  • Aspiring data scientists
  • People interested in machine learning and artificial intelligence
  • In this course, we are going to focus on pre-processing techniques for machine learning.

    Pre-processing is the set of manipulations that transforma raw dataset to make it used by a machine learning model. It is necessary for making our data suitablefor some machine learning models, to reduce the dimensionality,to better identify the relevant data,and to increase model performance. It’s the most important part of a machine learning pipeline and it’s strongly able to affect the success of a project. In fact, if we don’t feed a machine learning model with the correctly shaped data, it won’t work at all.

    Sometimes, aspiring Data Scientists start studying neural networks and other complex models and forget to study how to manipulate a datasetin order to make it used by their algorithms. So, they fail in creating good models and only at the end they realize that good pre-processing would make them save a lot of time and increase the performanceof their algorithms. So, handling pre-processing techniques is a very important skill. That’s why I have created an entire coursethat focuses only on data pre-processing.

    With this course, you are going to learn:

    1. Data cleaning

    2. Encoding of the categorical variables

    3. Transformation of the numerical features

    4. Scikit-learn Pipeline and ColumnTransformer objects

    5. Scaling of the numerical features

    6. Principal Component Analysis

    7. Filter-based feature selection

    8. Oversampling using SMOTE

    All the examples will be given using Python programming language and its powerful scikit-learn library. The environment that will be used is Jupyter, which is a standard in the data science industry. All the sections of this course end with some practical exercisesand the Jupyter notebooks are all downloadable.

    Course Curriculum

    Chapter 1: Introduction

    Lecture 1: Introduction to the course

    Lecture 2: Numerical and categorical variables

    Lecture 3: The dataset

    Lecture 4: Required Python packages

    Lecture 5: Jupyter notebooks

    Chapter 2: Data cleaning

    Lecture 1: Introduction to data cleaning

    Lecture 2: Selecting numerical and categorical variables

    Lecture 3: Cleaning the numerical features

    Lecture 4: Cleaning the categorical features

    Lecture 5: KNN blank filling

    Lecture 6: ColumnTransformer and make_column_selector

    Lecture 7: Exercises

    Chapter 3: Encoding of the categorical features

    Lecture 1: Introduction to the encoding of categorical variables

    Lecture 2: One-hot encoding

    Lecture 3: Ordinal encoding

    Lecture 4: Label encoding of the target variable

    Lecture 5: Exercise

    Chapter 4: Transformations of the numerical features

    Lecture 1: Introduction to transformations

    Lecture 2: Power Transformation

    Lecture 3: Binning

    Lecture 4: Binarizing

    Lecture 5: Applying an arbitrary transformation

    Lecture 6: Exercise

    Lecture 7: About power transformations

    Chapter 5: Pipelines

    Lecture 1: Define a transformation pipeline

    Lecture 2: Pipelines and ColumnTransformer together

    Lecture 3: Exercises

    Chapter 6: Scaling

    Lecture 1: Introduction to scaling

    Lecture 2: Normalization, Standardization, Robust scaling

    Lecture 3: Exercise

    Chapter 7: Principal Component Analysis

    Lecture 1: Introduction to PCA

    Lecture 2: How to perform PCA

    Lecture 3: Exercise

    Lecture 4: A comment about scaling before PCA

    Chapter 8: Filter-based feature selection

    Lecture 1: Introduction to feature selection

    Lecture 2: Numerical features, numerical target

    Lecture 3: Numerical features, categorical target

    Lecture 4: Categorical features, numerical target

    Lecture 5: Categorical features, categorical target

    Lecture 6: Feature importance according to a model

    Lecture 7: A comment on mutual information

    Lecture 8: A comment on feature selection with categorical variables

    Lecture 9: Exercises

    Chapter 9: A complete pipeline

    Lecture 1: An example of a complete pipeline

    Chapter 10: Oversampling

    Lecture 1: Introduction to SMOTE

    Lecture 2: How to perform SMOTE

    Lecture 3: Exercise

    Chapter 11: General guidelines

    Lecture 1: Practical suggestions

    Instructors

  • Data pre-processing for Machine Learning in Python  No.2
    Gianluca Malato
    Your Data Teacher
  • Rating Distribution

  • 1 stars: 1 votes
  • 2 stars: 1 votes
  • 3 stars: 8 votes
  • 4 stars: 28 votes
  • 5 stars: 58 votes
  • Frequently Asked Questions

    How long do I have access to the course materials?

    You can view and review the lecture materials indefinitely, like an on-demand channel.

    Can I take my courses with me wherever I go?

    Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don’t have an internet connection, some instructors also let their students download course lectures. That’s up to the instructor though, so make sure you get on their good side!