HOME > Development > Learn By Example- Hadoop, MapReduce for Big Data problems

Learn By Example- Hadoop, MapReduce for Big Data problems

  • Development
  • Apr 30, 2025
SynopsisLearn By Example: Hadoop, MapReduce for Big Data problems, av...
Learn By Example- Hadoop, MapReduce for Big Data problems  No.1

Learn By Example: Hadoop, MapReduce for Big Data problems, available at $89.99, has an average rating of 4.75, with 74 lectures, based on 1110 reviews, and has 9904 subscribers.

You will learn about Develop advanced MapReduce applications to process BigData Master the art of thinking parallel – how to break up a task into Map/Reduce transformations Self-sufficiently set up their own mini-Hadoop cluster whether its a single node, a physical cluster or in the cloud. Use Hadoop + MapReduce to solve a wide variety of problems : from NLP to Inverted Indices to Recommendations Understand HDFS, MapReduce and YARN and how they interact with each other Understand the basics of performance tuning and managing your own cluster This course is ideal for individuals who are Yep! Analysts who want to leverage the power of HDFS where traditional databases dont cut it anymore or Yep! Engineers who want to develop complex distributed computing applications to process lots of data or Yep! Data Scientists who want to add MapReduce to their bag of tricks for processing data It is particularly useful for Yep! Analysts who want to leverage the power of HDFS where traditional databases dont cut it anymore or Yep! Engineers who want to develop complex distributed computing applications to process lots of data or Yep! Data Scientists who want to add MapReduce to their bag of tricks for processing data.

Enroll now: Learn By Example: Hadoop, MapReduce for Big Data problems

Summary

Title: Learn By Example: Hadoop, MapReduce for Big Data problems

Price: $89.99

Average Rating: 4.75

Number of Lectures: 74

Number of Published Lectures: 73

Number of Curriculum Items: 74

Number of Published Curriculum Objects: 73

Original Price: $89.99

Quality Status: approved

Status: Live

What You Will Learn

  • Develop advanced MapReduce applications to process BigData
  • Master the art of thinking parallel – how to break up a task into Map/Reduce transformations
  • Self-sufficiently set up their own mini-Hadoop cluster whether its a single node, a physical cluster or in the cloud.
  • Use Hadoop + MapReduce to solve a wide variety of problems : from NLP to Inverted Indices to Recommendations
  • Understand HDFS, MapReduce and YARN and how they interact with each other
  • Understand the basics of performance tuning and managing your own cluster
  • Who Should Attend

  • Yep! Analysts who want to leverage the power of HDFS where traditional databases dont cut it anymore
  • Yep! Engineers who want to develop complex distributed computing applications to process lots of data
  • Yep! Data Scientists who want to add MapReduce to their bag of tricks for processing data
  • Target Audiences

  • Yep! Analysts who want to leverage the power of HDFS where traditional databases dont cut it anymore
  • Yep! Engineers who want to develop complex distributed computing applications to process lots of data
  • Yep! Data Scientists who want to add MapReduce to their bag of tricks for processing data
  • Taught by?a 4 person team including 2?Stanford-educated, ex-Googlers? and 2 ex-Flipkart Lead Analysts. This team has decades of practical experience in working with Java and with billions of rows of data.?

    This course is a zoom-in, zoom-out,?hands-on workout involving Hadoop, MapReduce and the art of thinking parallel.?

    Let’s parse that.

    Zoom-in, Zoom-Out:??This course is both broad and?deep. It covers the individual components of Hadoop in great detail, and also?gives you a higher level picture of how they interact with each other.?

    Hands-on workout involving Hadoop, MapReduce :?This course will get you hands-on with Hadoop very early on.??You’ll learn how to?set up your own?cluster using both VMs and the Cloud. All the major features of MapReduce are covered – including advanced topics like Total Sort and Secondary Sort.?

    The art of thinking parallel:?MapReduce completely?changed the way people thought about processing Big Data. Breaking down any problem into parallelizable units is?an art. The examples in this course?will train you to “think parallel”.?

    What’s Covered:

    Lot’s of cool stuff ..

  • Using MapReduce to?
  • Recommend friends in?a Social Networking site:?Generate Top 10 friend recommendations using a Collaborative filtering algorithm.?
  • Build an Inverted Index for Search Engines:?Use MapReduce to parallelize the humongous task of building an inverted index for a search engine.?
  • Generate?Bigrams from text:?Generate bigrams and compute?their frequency distribution in a corpus of text.?
  • Build your?Hadoop cluster:?
  • Install?Hadoop in Standalone, Pseudo-Distributed and Fully Distributed modes?
  • Set?up a hadoop cluster using Linux VMs.
  • Set up a cloud Hadoop?cluster on AWS?with Cloudera Manager.
  • Understand?HDFS, MapReduce and YARN?and their interaction?
  • Customize your MapReduce Jobs:?
  • Chain multiple MR?jobs together
  • Write your own?Customized Partitioner
  • Total Sort?:?Globally sort?a large amount of data by sampling input files
  • Secondary sorting?
  • Unit tests with MR?Unit
  • Integrate with Python using the Hadoop Streaming API

  • .. and of course all the basics:?

  • MapReduce :?Mapper, Reducer, Sort/Merge, Partitioning, Shuffle and Sort
  • HDFS &?YARN:?Namenode, Datanode, Resource manager, Node manager, the anatomy of a MapReduce application, YARN?Scheduling,?Configuring HDFS?and YARN?to performance tune?your cluster.?
  • Course Curriculum

    Chapter 1: Introduction

    Lecture 1: You, this course and Us

    Chapter 2: Why is Big Data a Big Deal

    Lecture 1: The Big Data Paradigm

    Lecture 2: Serial vs Distributed Computing

    Lecture 3: What is Hadoop?

    Lecture 4: HDFS or the Hadoop Distributed File System

    Lecture 5: MapReduce Introduced

    Lecture 6: YARN or Yet Another Resource Negotiator

    Chapter 3: Installing Hadoop in a Local Environment

    Lecture 1: Hadoop Install Modes

    Lecture 2: Hadoop Standalone mode Install

    Lecture 3: Hadoop Pseudo-Distributed mode Install

    Chapter 4: The MapReduce Hello World

    Lecture 1: The basic philosophy underlying MapReduce

    Lecture 2: MapReduce – Visualized And Explained

    Lecture 3: MapReduce – Digging a little deeper at every step

    Lecture 4: Hello World in MapReduce

    Lecture 5: The Mapper

    Lecture 6: The Reducer

    Lecture 7: The Job

    Chapter 5: Run a MapReduce Job

    Lecture 1: Get comfortable with HDFS

    Lecture 2: Run your first MapReduce Job

    Chapter 6: Juicing your MapReduce – Combiners, Shuffle and Sort and The Streaming API

    Lecture 1: Parallelize the reduce phase – use the Combiner

    Lecture 2: Not all Reducers are Combiners

    Lecture 3: How many mappers and reducers does your MapReduce have?

    Lecture 4: Parallelizing reduce using Shuffle And Sort

    Lecture 5: MapReduce is not limited to the Java language – Introducing the Streaming API

    Lecture 6: Python for MapReduce

    Chapter 7: HDFS and Yarn

    Lecture 1: HDFS – Protecting against data loss using replication

    Lecture 2: HDFS – Name nodes and why theyre critical

    Lecture 3: HDFS – Checkpointing to backup name node information

    Lecture 4: Yarn – Basic components

    Lecture 5: Yarn – Submitting a job to Yarn

    Lecture 6: Yarn – Plug in scheduling policies

    Lecture 7: Yarn – Configure the scheduler

    Chapter 8: MapReduce Customizations For Finer Grained Control

    Lecture 1: Setting up your MapReduce to accept command line arguments

    Lecture 2: The Tool, ToolRunner and GenericOptionsParser

    Lecture 3: Configuring properties of the Job object

    Lecture 4: Customizing the Partitioner, Sort Comparator, and Group Comparator

    Chapter 9: The Inverted Index, Custom Data Types for Keys, Bigram Counts and Unit Tests!

    Lecture 1: The heart of search engines – The Inverted Index

    Lecture 2: Generating the inverted index using MapReduce

    Lecture 3: Custom data types for keys – The Writable Interface

    Lecture 4: Represent a Bigram using a WritableComparable

    Lecture 5: MapReduce to count the Bigrams in input text

    Lecture 6: Setting up your Hadoop project

    Lecture 7: Test your MapReduce job using MRUnit

    Chapter 10: Input and Output Formats and Customized Partitioning

    Lecture 1: Introducing the File Input Format

    Lecture 2: Text And Sequence File Formats

    Lecture 3: Data partitioning using a custom partitioner

    Lecture 4: Make the custom partitioner real in code

    Lecture 5: Total Order Partitioning

    Lecture 6: Input Sampling, Distribution, Partitioning and configuring these

    Lecture 7: Secondary Sort

    Chapter 11: Recommendation Systems using Collaborative Filtering

    Lecture 1: Introduction to Collaborative Filtering

    Lecture 2: Friend recommendations using chained MR jobs

    Lecture 3: Get common friends for every pair of users – the first MapReduce

    Lecture 4: Top 10 friend recommendation for every user – the second MapReduce

    Chapter 12: Hadoop as a Database

    Lecture 1: Structured data in Hadoop

    Lecture 2: Running an SQL Select with MapReduce

    Lecture 3: Running an SQL Group By with MapReduce

    Lecture 4: A MapReduce Join – The Map Side

    Lecture 5: A MapReduce Join – The Reduce Side

    Lecture 6: A MapReduce Join – Sorting and Partitioning

    Lecture 7: A MapReduce Join – Putting it all together

    Chapter 13: K-Means Clustering

    Lecture 1: What is K-Means Clustering?

    Lecture 2: A MapReduce job for K-Means Clustering

    Lecture 3: K-Means Clustering – Measuring the distance between points

    Lecture 4: K-Means Clustering – Custom Writables for Input/Output

    Lecture 5: K-Means Clustering – Configuring the Job

    Lecture 6: K-Means Clustering – The Mapper and Reducer

    Lecture 7: K-Means Clustering : The Iterative MapReduce Job

    Chapter 14: Setting up a Hadoop Cluster

    Lecture 1: Manually configuring a Hadoop cluster (Linux VMs)

    Lecture 2: Getting started with Amazon Web Servicies

    Lecture 3: Start a Hadoop Cluster with Cloudera Manager on AWS

    Chapter 15: Appendix

    Lecture 1: Setup a Virtual Linux Instance (For Windows users)

    Lecture 2: [For Linux/Mac OS Shell Newbies] Path and other Environment Variables

    Instructors

  • Learn By Example- Hadoop, MapReduce for Big Data problems  No.2
    Loony Corn
    An ex-Google, Stanford and Flipkart team
  • Rating Distribution

  • 1 stars: 19 votes
  • 2 stars: 34 votes
  • 3 stars: 126 votes
  • 4 stars: 427 votes
  • 5 stars: 504 votes
  • Frequently Asked Questions

    How long do I have access to the course materials?

    You can view and review the lecture materials indefinitely, like an on-demand channel.

    Can I take my courses with me wherever I go?

    Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don’t have an internet connection, some instructors also let their students download course lectures. That’s up to the instructor though, so make sure you get on their good side!