HOME > Development > Apache Spark Hands on Specialization for Big Data Analytics

Apache Spark Hands on Specialization for Big Data Analytics

  • Development
  • Jan 30, 2025
SynopsisApache Spark Hands on Specialization for Big Data Analytics,...
Apache Spark Hands on Specialization for Big Data Analytics  No.1

Apache Spark Hands on Specialization for Big Data Analytics, available at $54.99, has an average rating of 3.75, with 73 lectures, 2 quizzes, based on 544 reviews, and has 12453 subscribers.

You will learn about Understand the relationship between Apache Spark and Hadoop Ecosystem Understand Apache Spark use-cases and advanced characteristics Understand Apache Spark Architecture and how it works Understand how Apache Spark on YARN (Hadoop) works in multiple modes Understand development life-cycle of Apache Spark Applications in Python and Scala Learn the foundations of Scala programming language Understand Apache Sparks primary data abstraction (RDDs) Understand and use RDDs advanced characteristics (e.g. partitioning) Learn nuances in loading files in Hadoop Distributed File system in Apache Spark Learn implications of delimiters in text files and its processing in Spark Create and use RDDs by parallelizing Scalas collection objects and implications Learn the usage of Spark and YARN Web UI to gain in-depth operational insights Understand Sparks Direct Acyclic Graph (DAG) based execution model and implications Learn Transformations and their lazy execution semantics Learn Map transformation and master its applications in real-world challenges Learn Filter transformation and master its usage in real-world challenges Learn Apache Sparks advanced Transformations and Actions Learn and use RDDs of different JVM objects including collections and understanding critical nuances Learn and use Apache Spark for statistical analysis Learn and master Key Value Pair RDDs and their applications in complex Big Data problems Learn and master Join Operations on complex Key Value Pair RDDs in Apache Spark Learn how RDDs caching works and use it for advanced performance optimization Learn how to use Apache Spark for Data Ranking problems Learn how to use Apache Spark for handling and processing structured and unstructured data Learn how to use Apache Spark for advanced Business Analytics Learn how to use Apache Spark for advanced data integrity and quality checks Learn how to use Scalas advanced features like functional programming and pattern matching Learn how to use Apache Spark for logs processing This course is ideal for individuals who are Anyone who has the passion to develop expertise in Big Data and specifically Apache Spark or Software Engineers or Developers or Data Warehousing or Business Intelligence Professionals or Data Scientist and Machine Learning Enthusiasts or Data Engineers and Big Data Architects It is particularly useful for Anyone who has the passion to develop expertise in Big Data and specifically Apache Spark or Software Engineers or Developers or Data Warehousing or Business Intelligence Professionals or Data Scientist and Machine Learning Enthusiasts or Data Engineers and Big Data Architects.

Enroll now: Apache Spark Hands on Specialization for Big Data Analytics

Summary

Title: Apache Spark Hands on Specialization for Big Data Analytics

Price: $54.99

Average Rating: 3.75

Number of Lectures: 73

Number of Quizzes: 2

Number of Published Lectures: 73

Number of Published Quizzes: 1

Number of Curriculum Items: 75

Number of Published Curriculum Objects: 74

Number of Practice Tests: 1

Number of Published Practice Tests: 1

Original Price: $129.99

Quality Status: approved

Status: Live

What You Will Learn

  • Understand the relationship between Apache Spark and Hadoop Ecosystem
  • Understand Apache Spark use-cases and advanced characteristics
  • Understand Apache Spark Architecture and how it works
  • Understand how Apache Spark on YARN (Hadoop) works in multiple modes
  • Understand development life-cycle of Apache Spark Applications in Python and Scala
  • Learn the foundations of Scala programming language
  • Understand Apache Sparks primary data abstraction (RDDs)
  • Understand and use RDDs advanced characteristics (e.g. partitioning)
  • Learn nuances in loading files in Hadoop Distributed File system in Apache Spark
  • Learn implications of delimiters in text files and its processing in Spark
  • Create and use RDDs by parallelizing Scalas collection objects and implications
  • Learn the usage of Spark and YARN Web UI to gain in-depth operational insights
  • Understand Sparks Direct Acyclic Graph (DAG) based execution model and implications
  • Learn Transformations and their lazy execution semantics
  • Learn Map transformation and master its applications in real-world challenges
  • Learn Filter transformation and master its usage in real-world challenges
  • Learn Apache Sparks advanced Transformations and Actions
  • Learn and use RDDs of different JVM objects including collections and understanding critical nuances
  • Learn and use Apache Spark for statistical analysis
  • Learn and master Key Value Pair RDDs and their applications in complex Big Data problems
  • Learn and master Join Operations on complex Key Value Pair RDDs in Apache Spark
  • Learn how RDDs caching works and use it for advanced performance optimization
  • Learn how to use Apache Spark for Data Ranking problems
  • Learn how to use Apache Spark for handling and processing structured and unstructured data
  • Learn how to use Apache Spark for advanced Business Analytics
  • Learn how to use Apache Spark for advanced data integrity and quality checks
  • Learn how to use Scalas advanced features like functional programming and pattern matching
  • Learn how to use Apache Spark for logs processing
  • Who Should Attend

  • Anyone who has the passion to develop expertise in Big Data and specifically Apache Spark
  • Software Engineers or Developers
  • Data Warehousing or Business Intelligence Professionals
  • Data Scientist and Machine Learning Enthusiasts
  • Data Engineers and Big Data Architects
  • Target Audiences

  • Anyone who has the passion to develop expertise in Big Data and specifically Apache Spark
  • Software Engineers or Developers
  • Data Warehousing or Business Intelligence Professionals
  • Data Scientist and Machine Learning Enthusiasts
  • Data Engineers and Big Data Architects
  • What if you could catapult your career?in one of the most lucrative domains i.e. Big Data by learning the state of the art Hadoop technology (Apache Spark) which is considered mandatory in all of the current jobs in this industry?

    What if you could develop your skill-set in one of the most hottest Big Data technology i.e. Apache Spark by learning in one of the most comprehensive course ?out there (with 10+ hours of content) packed with dozens of hands-on real world examples, use-cases,?challenges and best-practices?

    What if you could learn from an instructor who is working in the world’s largest consultancy firm,?has worked,?end-to-end,?in Australia’s biggest Big Data projects to date and who?has a proven track record on Udemy with highly positive reviews and thousands of students already enrolled in his previous course(s)?

    If you have such aspirations and goals, then this course and you is a perfect match made in heaven!

    Why Apache Spark?

    Apache Spark has revolutionised and disrupted the way?big data processing and?machine learning were done by virtue of its unprecedented in-memory and optimised computational model. It has been unanimously hailed as the future of Big Data.?It’s the tool of choice all around the world which?allows data scientists, engineers and developers to acquire and?process data for a number of use-cases like?scalable?machine learning, stream processing and graph analytics to name a few.?All of the leading organisations like?Amazon, Ebay, Yahoo among many others have embraced this technology to address their Big Data processing requirements.?

    Additionally,?Gartner has repeatedly highlighted Apache Spark as a leader in Data Science platforms.?Certification programs of?Hadoop vendors like Cloudera and Hortonworks, which have high esteem in current industry, have oriented their curriculum to focus heavily on Apache?Spark. Almost all of the jobs in Big Data and Machine Learning space demand proficiency in Apache Spark.?

    This is what John Tripier, Alliances and Ecosystem Lead at Databricks has to say, “The adoption of Apache Spark by businesses large and small is growing at an incredible rate across a wide range of industries, and the demand for developers with certified expertise is quickly following suit”.

    All of these facts correlate to the notion that learning this amazing technology will give you a strong competitive edge in your career.

    Why this course?

    Firstly, this is the most comprehensive and in-depth courseever produced on Apache Spark. I’ve carefully and critically surveyed all of the resources out there and almost all of them fail to cover this technology in the depth that it truly?deserves. Some of them lack coverage of Apache Spark’s theoretical concepts like its architecture and?how it works in conjunction with Hadoop, some fall short in thoroughly describing how to use Apache Spark APIs optimally for complex big data problems, some ignore the hands-on aspects to demonstrate how to do Apache Spark programming to work on real-world use-cases?and almost all of them don’t cover the best practices in industry?and the mistakes that many professionals make in field.

    This course addresses all of the limitations that’s prevalent in the currently available courses. Apart from that, as I have attended trainings from leading Big Data vendors like Cloudera (for which they charge thousands of dollars), I’ve ensured that the course is aligned with?the educational?patterns and best practices followed in those training to ensure that you get the best and most effective learning experience.?

    Each section of the course covers concepts in extensive detail and from scratch?so that you won’t find any challenges in learning even?if you are new to this domain. Also, each section will have an accompanying assignment section where we will work together on a number of real-world challenges and use-casesemploying real-world data-sets. The data-sets themselves will also belong to different niches ranging from retail, web server logs, telecommunication and some of them will also be from Kaggle (world’s leading Data Science competition platform).

    The course leverages Scala instead of Python. Even though wherever possible, reference to Python development is also given but the course is?majorly based on Scala. The decision was made based on a number of rational factors. Scala is the de-facto language for development in Apache Spark. Apache Spark itself is developed in Scala and as a result all of the new features are initially made available in Scala and then in other languages like Python. Additionally, there is significant performance difference when it comes to using Apache Spark with Scala compared to Python. Scala itself is one of the most highest paid programming languages and you will be developing strong skill in that language along the way as well.

    The course also has a number of quizzes to further test your skills. For further support, you can always ask questions to which you will get prompt response. I will also be sharing best practices and tips on regular basis with my students.

    What you are going to learn in this course?

    The course consists
    of majorly two sections:

  • Section – 1:
  • We’ll start off with
    the introduction of Apache Spark and will understand its potential and business
    use-cases in the context of overall Hadoop ecosystem. We’ll then focus on how
    Apache Spark actually works and will take a deep dive of the architectural components
    of Spark as its crucial for thorough understanding.

  • Section? – 2:
  • After developing
    understanding of Spark architecture, we will move to the next section of this
    course where we will employ Scala language to use Apache Spark APIs to develop
    distributed computation programs. Please note that you don’t need to have prior
    knowledge of Scala for this course as I will start with the very basics of
    Scala and as a result you will also be developing your skills in this one of
    the highest paying programming languages.

    In this section, We
    will comprehensively understand how spark performs distributed computation
    using abstractions like RDDs, what are the caveats
    in loading data in Apache Spark, what are the
    different ways to create RDDs and how to leverage parallelism and much more.

    Furthermore, as
    transformations and action constitute the gist of Apache Spark APIs thus its
    imperative to have sound understanding of these. Thus, we will then
    focus on a number of Spark transformations and Actions that are heavily being
    used in Industry and will go into detail of each. Each API usage will be
    complimented with a series of real-world examples and datasets e.g. retail, web
    server logs, customer churn and also from kaggle. Each section of the course
    will have a number of assignments where you will be able to practically apply
    the learned concepts to further consolidate your skills.

    A significant
    section of the course will also be dedicated to key value RDDs which form the
    basis of working optimally on a number of big data problems.

    In addition to
    covering the crux of Spark APIs, I will also highlight a number of valuable
    best practices based on my experience and exposure and will also intuit on
    mistakes that many people do in field. You will rarely such information
    anywhere else.

    Each topic will be
    covered in a lot of detail with strong emphasis on being hands-on thus ensuring
    that you learn Apache Spark in the best possible way.

    The course is
    applicable and valid for all versions of Spark i.e. 1.6 and 2.0.

    After completing
    this course, you will develop a strong foundation and extended skill-set to use
    Spark on complex big data processing tasks. Big data is one of the most
    lucractive career domains where data engineers claim salaries in high numbers.
    This course will also substantially help in your job interviews. Also, if you
    are looking to excel further in your big data career, by passing Hadoop
    certifications
    like of Cloudera and Hortonworks, this course will prove to be
    extremely helpful in that context as well.

    Lastly, once enrolled, you will have life-time access to the lectures and resources.?Its a self-paced course and you can watch lecture videos on any device like smartphone or laptop. Also, you are backed by Udemy’s rock-solid 30 days money back guarantee. So if you are serious about learning about learning Apache Spark, enrol in this course now and lets start this amazing journey together!

    Course Curriculum

    Chapter 1: Introduction

    Lecture 1: Breaking the Ice with Warm Welcome!

    Lecture 2: Courses Curriculum – Journey to the excellence!

    Chapter 2: Section 1 – Apache Spark Introduction and Architecture Deep Dive

    Lecture 1: Apache Spark in the context of Hadoop Evolution

    Lecture 2: Say Hello to Apache Spark – Thorough Dissemination of Capabilities

    Lecture 3: In-Depth Understanding of Sparks Ecosystem of High Level Libraries

    Lecture 4: Apache Spark and its integration within Enterprise Lambda Architecture

    Lecture 5: Apache Spark and where it fits in whole Hadoop Ecosystem

    Chapter 3: Working with Text Files to create Resilient Distributed Datasets (RDDs) in Spark

    Lecture 1: Setting up development Environment

    Lecture 2: Better Development Environment Employing DataBricks – Part 1 (**New Lecture**)

    Lecture 3: Better Development Environment Employing Databricks – Part 2 (**New Lecture**)

    Lecture 4: Loading Text Files (in HDFS) in Spark to create RDDs

    Lecture 5: Loading All Directory Files (in HDFS) simultaneously in Spark and implications

    Lecture 6: Loading Text Files (in HDFS) in Spark – Continued

    Lecture 7: Using Wildcards to selectively load text files (in HDFS) in Spark and use-cases

    Lecture 8: Real Life Challenge: Different Record Delimiters in Text Files in Spark

    Lecture 9: Solution: Handling Different Record Delimiters in Text Files in Spark

    Chapter 4: Creating RDDs by Distributing Scala Collections in Spark

    Lecture 1: The semantics and implications behind parallelizing Scala Collections

    Lecture 2: Hands-on: Distributing/Parallelizing Scala Collections

    Chapter 5: Understanding the Partitioning and Distributed Nature of RDDs in Spark

    Lecture 1: How Data gets Partitioned and Distributed in Spark Cluster

    Lecture 2: Accessing Hadoop YARN RM and AM Web UIs to understand RDDs Partitioning

    Lecture 3: Manually Changing Partitions of RDDs in Spark and Implications

    Chapter 6: Developing Mastery in Sparks Map Transformations and lazy DAG Execution Model

    Lecture 1: Demystifying Sparks Direct Acyclic Graph (DAG) and Lazy Execution Model

    Lecture 2: Introducing Map Transformation – the Swiss Army Knife of Transformations

    Lecture 3: Hands-on: Map Transformation via Scalas Functional Programming constructs

    Lecture 4: Understanding the Potential of Map Transformation to alter RDDs Types

    Lecture 5: Using Your Own Functions, in addition to Anonymous ones, in Map Transformations

    Chapter 7: Assignment – Using Map Transformation on Real World Big Data Retail Analytics

    Lecture 1: Introducing the Real World Online Retail Data-set and Assignment Challenges

    Lecture 2: Detailed Hands-on Comprehension of Assignment Challenges Solutions

    Lecture 3: Conceptual Understanding of Distributing Scala Collections and Implications

    Lecture 4: Hands-on Understanding of Distributing Scala Collections and use-cases

    Chapter 8: Developing Mastery in Sparks Filter Transformation

    Lecture 1: Introducing Filter Transformation and its Powerful Use-Cases

    Lecture 2: Hands on: Sparks Filter Transformation in Action

    Chapter 9: Assignment – Using Filter and Map on Apache Web Server Logs and Retail Dataset

    Lecture 1: Introducing the Data-sets and Real-World Assignment Challenges

    Lecture 2: Challenge 1: Removing Empty Lines in Web Logs Data-set

    Lecture 3: Challenge 2: Removing Header Line in Retail Data-set

    Lecture 4: Challenge 3: Selecting rows in Retail Data-set Containing Specific Countries

    Chapter 10: Developing Mastery in RDD of Scala Collections

    Lecture 1: Introducing RDDs of Scala Collections and their Relational Analytics use-cases

    Lecture 2: Transforming Scala Collections using Functional Programming Constructs

    Lecture 3: Creating and Manipulating RDDs of Arrays of String from Different Data Sources

    Chapter 11: Assignment – Customer Churn Analytics using Apache Spark

    Lecture 1: Introducing the Context, Challenges and Data-set of Customer Churn Use-Case

    Lecture 2: Challenge 1: Finding Number of Unique States in the Data-set

    Lecture 3: Challenge 2: Performing Data Integrity Check on Individual Columns of Data-Set

    Lecture 4: Challenge 3: Finding Summary Statistics on number of Voice Mail Messages

    Lecture 5: Challenge 4: Finding Summary Statistics on Voice Mail in Selected States

    Lecture 6: Challenge 5: Finding Average Value of Total Night Calls Minutes

    Lecture 7: Challenge 6: Finding conditioned Total day calls for customers

    Lecture 8: Challenge 7: Using Scala Functions and Pattern Matching for advanced processing

    Lecture 9: Challenge 8: Finding Churned Customers with International and Voice Mail Plan

    Lecture 10: Challenge 9: Performing Data Quality and Type Checks on Individual Columns

    Chapter 12: Developing Mastery in Sparks Key-Value (Pair) RDDs

    Lecture 1: Introduction

    Lecture 2: Developing Intuition for Solving Big Data Problems using KeyValue Pair Construct

    Lecture 3: Developing Hands-on Understanding of working with KeyValue RDDs in Spark

    Lecture 4: Proof – Transformations exclusivity to KeyValue RDDs

    Lecture 5: Transforming Text File Data to Pair RDDs for KeyValue based Data Processing

    Lecture 6: The Case of Different Data Types of Values in KeyValue RDDs

    Lecture 7: Transforming Complex Delimited Text File to Pair RDDs for KeyValue Processing

    Chapter 13: Assignment – Analyzing Video Games (Kaggle Dataset) using Sparks KeyValue RDDs

    Lecture 1: Challenge 1: Determining Frequency Distribution of Video Games Platforms

    Lecture 2: Challenge 2: Finding Total Sales of Each Video Games Platform

    Lecture 3: Challenge 3: Finding Global Sales of Video Games Platform

    Lecture 4: Challenge 4: Maximum Sales Value of Each Gaming Console

    Lecture 5: Challenge 5: Data Ranking – Top 10 platforms by global sales

    Chapter 14: Developing Mastery in Join Operations on Key Value Pair RDDs in Apache Spark

    Lecture 1: Introducing Join Operations on Relational Data with Examples

    Lecture 2: Getting started with join operation in Spark with Key Value Pair RDDs

    Lecture 3: Working towards complex Join Operations in Apache Spark with advanced indexing

    Chapter 15: Assignment – A Real Life Relational Dataset about Retail Customers

    Lecture 1: Setting context and developing understanding of relationships in the dataset

    Lecture 2: Challenge 1 – Top 5 states with Most Orders Status as Cancelled

    Lecture 3: Challenge 2 – Top 5 Cities from CA State with Orders Status as Cancelled

    Chapter 16: Apache Spark – Advanced Concepts

    Lecture 1: Introducing Caching in RDDs, Motivation and Relation to DAG Based Execution

    Lecture 2: Caching and Persistence in RDDs in Action

    Lecture 3: Technique: Finding and Filtering Dirty Records in Data-Set using Apache Spark

    Lecture 4: Sentiment Analysis of Trumps Tweets using Azure Cognitive Services & Databricks

    Chapter 17: Bonus Section

    Lecture 1: My lecture to University of Tromso students – When Databases Meet Hadoop

    Lecture 2: Bonus Lecture: Exceptional Discount on My Course(s)/Book(s)

    Instructors

  • Apache Spark Hands on Specialization for Big Data Analytics  No.2
    Irfan Elahi
    Data Scientist in the worlds largest consultancy firm
  • Rating Distribution

  • 1 stars: 20 votes
  • 2 stars: 24 votes
  • 3 stars: 69 votes
  • 4 stars: 152 votes
  • 5 stars: 279 votes
  • Frequently Asked Questions

    How long do I have access to the course materials?

    You can view and review the lecture materials indefinitely, like an on-demand channel.

    Can I take my courses with me wherever I go?

    Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don’t have an internet connection, some instructors also let their students download course lectures. That’s up to the instructor though, so make sure you get on their good side!