HOME > Development > Tuning Apache Spark- Powerful Big Data Processing Recipes

Tuning Apache Spark- Powerful Big Data Processing Recipes

  • Development
  • Apr 29, 2025
SynopsisTuning Apache Spark: Powerful Big Data Processing Recipes, av...
Tuning Apache Spark- Powerful Big Data Processing Recipes  No.1

Tuning Apache Spark: Powerful Big Data Processing Recipes, available at $49.99, has an average rating of 3.5, with 84 lectures, 3 quizzes, based on 30 reviews, and has 368 subscribers.

You will learn about How to attain a solid foundation in the most powerful and versatile technologies involved in data streaming: Apache Spark and Apache Kafka Form a robust and clean architecture for a data streaming pipeline Ways to implement the correct tools to bring your data streaming architecture to life How to create robust processing pipelines by testing Apache Spark jobs How to create highly concurrent Spark programs by leveraging immutability How to solve repeated problems by leveraging the GraphX API How to solve long-running computation problems by leveraging lazy evaluation in Spark Tips to avoid memory leaks by understanding the internal memory management of Apache Spark Troubleshoot real-time pipelines written in Spark Streaming This course is ideal for individuals who are An Application Developer, Data Scientist, Analyst, Statistician, Big data Engineer, or anyone who has some experience with Spark will feel perfectly comfortable in understanding the topics presented. They usually work with large amounts of data on a day to day basis. They may or may not have used Spark, but it’s an added advantage if they have some experience with the tool. It is particularly useful for An Application Developer, Data Scientist, Analyst, Statistician, Big data Engineer, or anyone who has some experience with Spark will feel perfectly comfortable in understanding the topics presented. They usually work with large amounts of data on a day to day basis. They may or may not have used Spark, but it’s an added advantage if they have some experience with the tool.

Enroll now: Tuning Apache Spark: Powerful Big Data Processing Recipes

Summary

Title: Tuning Apache Spark: Powerful Big Data Processing Recipes

Price: $49.99

Average Rating: 3.5

Number of Lectures: 84

Number of Quizzes: 3

Number of Published Lectures: 84

Number of Published Quizzes: 3

Number of Curriculum Items: 87

Number of Published Curriculum Objects: 87

Original Price: $199.99

Quality Status: approved

Status: Live

What You Will Learn

  • How to attain a solid foundation in the most powerful and versatile technologies involved in data streaming: Apache Spark and Apache Kafka
  • Form a robust and clean architecture for a data streaming pipeline
  • Ways to implement the correct tools to bring your data streaming architecture to life
  • How to create robust processing pipelines by testing Apache Spark jobs
  • How to create highly concurrent Spark programs by leveraging immutability
  • How to solve repeated problems by leveraging the GraphX API
  • How to solve long-running computation problems by leveraging lazy evaluation in Spark
  • Tips to avoid memory leaks by understanding the internal memory management of Apache Spark
  • Troubleshoot real-time pipelines written in Spark Streaming
  • Who Should Attend

  • An Application Developer, Data Scientist, Analyst, Statistician, Big data Engineer, or anyone who has some experience with Spark will feel perfectly comfortable in understanding the topics presented. They usually work with large amounts of data on a day to day basis. They may or may not have used Spark, but it’s an added advantage if they have some experience with the tool.
  • Target Audiences

  • An Application Developer, Data Scientist, Analyst, Statistician, Big data Engineer, or anyone who has some experience with Spark will feel perfectly comfortable in understanding the topics presented. They usually work with large amounts of data on a day to day basis. They may or may not have used Spark, but it’s an added advantage if they have some experience with the tool.
  • Video Learning Path Overview

    A Learning Path is a specially tailored course that brings together two or more different topics that lead you to achieve an end goal. Much thought goes into the selection of the assets for a Learning Path, and this is done through a complete understanding of the requirements to achieve a goal.

    Today, organizations have a difficult time working with large datasets. In addition, big data processing and analyzing need to be done in real time to gain valuable insights quickly. This is where data streaming and Spark come in.

    In this well thought out Learning Path, you will not only learn how to work with Spark to solve the problem of analyzing massive amounts of data for your organization, but you’ll also learn how to tune it for performance. Beginning with a step by step approach, you’ll get comfortable in using Spark and will learn how to implement some practical and proven techniques to improve particular aspects of programming and administration in Apache Spark. You’ll be able to perform tasks and get the best out of your databases much faster.

    Moving further and accelerating the pace a bit, You’ll learn some of the lesser known techniques to squeeze the best out of Spark and then you’ll learn to overcome several problems you might come across when working with Spark, without having to break a sweat. The simple and practical solutions provided will get you back in action in no time at all!

    By the end of the course, you will be well versed in using Spark in your day to day projects.

    Key Features

  • From blueprint architecture to complete code solution, this course treats every important aspect involved in architecting and developing a data streaming pipeline

  • Test Spark jobs using the unit, integration, and end-to-end techniques to make your data pipeline robust and bulletproof.

  • Solve several painful issues like slow-running jobs that affect the performance of your application.

  • Author Bios

  • Anghel Leonardis currently a Java chief architect. He is a member of the Java EE Guardians with 20+ years’ experience. He has spent most of his career architecting distributed systems. He is also the author of several books, a speaker, and a big fan of working with data.

  • Tomasz Lelekis a Software Engineer, programming mostly in Java and Scala. He has been working with the Spark and ML APIs for the past 5 years with production experience in processing petabytes of data. He is passionate about nearly everything associated with software development and believes that we should always try to consider different solutions and approaches before solving a problem. Recently he was a speaker at conferences in Poland, Confitura and JDD (Java Developers Day), and at Krakow Scala User Group. He has also conducted a live coding session at Geecon Conference. He is a co-founder of initlearn, an e-learning platform that was built with the Java language. He has also written articles about everything related to the Java world.

  • Course Curriculum

    Chapter 1: Data Stream Development with Apache Spark, Kafka, and Spring Boot

    Lecture 1: The Course Overview

    Lecture 2: Discovering the Data Streaming Pipeline Blueprint Architecture

    Lecture 3: Analyzing Meetup RSVPs in Real-Time

    Lecture 4: Running the Collection Tier (Part I – Collecting Data)

    Lecture 5: Collecting Data Via the Stream Pattern and Spring WebSocketClient API

    Lecture 6: Explaining the Message Queuing Tier Role

    Lecture 7: Introducing Our Message Queuing Tier –Apache Kafka

    Lecture 8: Running The Collection Tier (Part II – Sending Data)

    Lecture 9: Dissecting the Data Access Tier

    Lecture 10: Introducing Our Data Access Tier – MongoDB

    Lecture 11: Exploring Spring Reactive

    Lecture 12: Exposing the Data Access Tier in Browser

    Lecture 13: Diving into the Analysis Tier

    Lecture 14: Streaming Algorithms For Data Analysis

    Lecture 15: Introducing Our Analysis Tier – Apache Spark

    Lecture 16: Plug-in Spark Analysis Tier to Our Pipeline

    Lecture 17: Brief Overview of Spark RDDs

    Lecture 18: Spark Streaming

    Lecture 19: DataFrames, Datasets and Spark SQL

    Lecture 20: Spark Structured Streaming

    Lecture 21: Machine Learning in 7 Steps

    Lecture 22: MLlib (Spark ML)

    Lecture 23: Spark ML and Structured Streaming

    Lecture 24: Spark GraphX

    Lecture 25: Fault Tolerance (HML)

    Lecture 26: Kafka Connect

    Lecture 27: Securing Communication between Tiers

    Chapter 2: Apache Spark: Tips, Tricks, & Techniques

    Lecture 1: The Course Overview

    Lecture 2: Using Spark Transformations to Defer Computations to a Later Time

    Lecture 3: Avoiding Transformations

    Lecture 4: Using reduce and reduceByKey to Calculate Results

    Lecture 5: Performing Actions That Trigger Computations

    Lecture 6: Reusing the Same RDD for Different Actions

    Lecture 7: Delve into Spark RDDs Parent/Child Chain

    Lecture 8: Using RDD in an Immutable Way

    Lecture 9: Using DataFrame Operations to Transform It

    Lecture 10: Immutability in the Highly Concurrent Environment

    Lecture 11: Using Dataset API in an Immutable Way

    Lecture 12: Detecting a Shuffle in a Processing

    Lecture 13: Testing Operations That Cause Shuffle in Apache Spark

    Lecture 14: Changing Design of Jobs with Wide Dependencies

    Lecture 15: Using keyBy() Operations to Reduce Shuffle

    Lecture 16: Using Custom Partitioner to Reduce Shuffle

    Lecture 17: Saving Data in Plain Text

    Lecture 18: Leveraging JSON as a Data Format

    Lecture 19: Tabular Formats – CSV

    Lecture 20: Using Avro with Spark

    Lecture 21: Columnar Formats – Parquet

    Lecture 22: Available Transformations on Key/Value Pairs

    Lecture 23: Using aggregateByKey Instead of groupBy()

    Lecture 24: Actions on Key/Value Pairs

    Lecture 25: Available Partitioners on Key/Value Data

    Lecture 26: Implementing Custom Partitioner

    Lecture 27: Separating Logic from Spark Engine – Unit Testing

    Lecture 28: Integration Testing Using SparkSession

    Lecture 29: Mocking Data Sources Using Partial Functions

    Lecture 30: Using ScalaCheck for Property-Based Testing

    Lecture 31: Testing in Different Versions of Spark

    Lecture 32: Creating Graph from Datasource

    Lecture 33: Using Vertex API

    Lecture 34: Using Edge API

    Lecture 35: Calculate Degree of Vertex

    Lecture 36: Calculate Page Rank

    Chapter 3: Troubleshooting Apache Spark

    Lecture 1: The Course Overview

    Lecture 2: Eager Computations: Lazy Evaluation

    Lecture 3: Caching Values: In-Memory Persistence

    Lecture 4: Unexpected API Behavior: Picking the Proper RDD API

    Lecture 5: Wide Dependencies: Using Narrow Dependencies

    Lecture 6: Making Computations Parallel: Using Partitions

    Lecture 7: Defining Robust Custom Functions: Understanding User-Defined Functions

    Lecture 8: Logical Plans Hiding the Truth: Examining the Physical Plans

    Lecture 9: Slow Interpreted Lambdas: Code Generation Spark Optimization

    Lecture 10: Avoid Wrong Join Strategies: Using a Join Type Based on Data Volume

    Lecture 11: Slow Joins: Choosing an Execution Plan for Join

    Lecture 12: Distributed Joins Problem: DataFrame API

    Lecture 13: TypeSafe Joins Problem: The Newest DataSet API

    Lecture 14: Minimizing Object Creation: Reusing Existing Objects

    Lecture 15: Iterating Transformations – The mapPartitions() Method

    Lecture 16: Slow Spark Application Start: Reducing Setup Overhead

    Lecture 17: Performing Unnecessary Recomputation: Reusing RDDs

    Lecture 18: Repeating the Same Code in Stream Pipeline: Using Sources and Sinks

    Lecture 19: Long Latency of Jobs: Understanding Batch Internals

    Lecture 20: Fault Tolerance: Using Data Checkpointing

    Lecture 21: Maintaining Batch and Streaming: Using Structured Streaming Pros

    Instructors

  • Tuning Apache Spark- Powerful Big Data Processing Recipes  No.2
    Packt Publishing
    Tech Knowledge in Motion
  • Rating Distribution

  • 1 stars: 3 votes
  • 2 stars: 2 votes
  • 3 stars: 7 votes
  • 4 stars: 6 votes
  • 5 stars: 12 votes
  • Frequently Asked Questions

    How long do I have access to the course materials?

    You can view and review the lecture materials indefinitely, like an on-demand channel.

    Can I take my courses with me wherever I go?

    Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don’t have an internet connection, some instructors also let their students download course lectures. That’s up to the instructor though, so make sure you get on their good side!