Big Data Hadoop - upskill365

Module 1 – Introduction to Hadoop and its Ecosystem, Map Reduce and HDFS

Big Data, Factors constituting Big Data
What is Hadoop?
Overview of Hadoop Ecosystem
Map Reduce -Concepts of Map, Reduce, Ordering, Concurrency, Shuffle, Reducing, Concurrency
Hadoop Distributed File System (HDFS) Concepts and its Importance
Deep Dive in Map Reduce – Execution Framework, Partitioner, Combiner, Data Types, Key pairs
HDFS Deep Dive – Architecture, Data Replication, Name Node, Data Node, Data Flow
Parallel Copying with DISTCP, Hadoop Archives

Assignment – 1

Module 2 – Hands on Exercises

Installing Hadoop in Pseudo Distributed Mode, Understanding Important configuration files, their Properties and Demon Threads
Accessing HDFS from Command Line
Map Reduce – Basic Exercises
Understanding Hadoop Eco-system
Introduction to Sqoop, use cases and Installation
Introduction to Hive, use cases and Installation
Introduction to Pig, use cases and Installation
Introduction to Oozie, use cases and Installation
Introduction to Flume, use cases and Installation
Introduction to Yarn

Assignment -2 and 3

Mini Project – Importing Mysql Data using Sqoop and Querying it using Hive

Module 3 – Deep Dive in Map Reduce and Yarn

How to develop Map Reduce Application, writing unit test
Best Practices for developing and writing, Debugging Map Reduce applications
Joining Data sets in Map Reduce
Hadoop API’s
Introduction to Hadoop Yarn
Difference between Hadoop 1.0 and 2.0

Module 3.1

Project 1- Hands on exercise – end to end PoC using Yarn or Hadoop 2.
1. Real World Transactions handling of Bank
2. Moving data using Sqoop to HDFS
3. Incremental update of data to HDFS
4. Running Map Reduce Program
5. Running Hive queries for data analytics
Project 2- Hands on exercise – end to end PoC using Yarn or Hadoop 2.0

Running Map Reduce Code for Movie Rating and finding their fans and average rating

Assignment -4 and 5

Module 4 – Deep Dive in Pig

1. Introduction to Pig

What Is Pig?
Pig’s Features
Pig Use Cases
Interacting with Pig

2. Basic Data Analysis with Pig

Pig Latin Syntax
Loading Data
Simple Data Types
Field Definitions
Data Output
Viewing the Schema
Filtering and Sorting Data
Commonly-Used Functions
Hands-On Exercise: Using Pig for ETL Processing

3. Processing Complex Data with Pig

Complex/Nested Data Types
Grouping
Iterating Grouped Data
Hands-On Exercise: Analyzing Data with Pig

Assignment – 6

Module 5 – Deep Dive in Hive

1. Introduction to Hive

What Is Hive?
Hive Schema and Data Storage
Comparing Hive to Traditional Databases
Hive vs. Pig
Hive Use Cases
Interacting with Hive

2. Relational Data Analysis with Hive

Hive Databases and Tables
Basic HiveQL Syntax
Data Types
Joining Data Sets
Common Built-in Functions
Hands-On Exercise: Running Hive Queries on the Shell, Scripts, and Hue

3. Hive Data Management

Hive Data Formats
Creating Databases and Hive-Managed Tables
Loading Data into Hive
Altering Databases and Tables
Self-Managed Tables
Simplifying Queries with Views
Storing Query Results
Controlling Access to Data
Hands-On Exercise: Data Management with Hive

4. Hive Optimization

Understanding Query Performance
Partitioning
Bucketing
Indexing Data

Assignment – 7

Module 6 – Introduction to Hbase architecture

What is Hbase
Where does it fits
What is NOSQL

Assignment -8

Module 7 – Hadoop Cluster Setup and Running Map Reduce Jobs

Running Map Reduce Jobs on Cluster

Assignment – 9, 10

Module 8 – Advance Mapreduce

Delving Deeper Into The Hadoop API
More Advanced Map Reduce Programming, Joining Data Sets in Map Reduce
Graph Manipulation in Hadoop

Assignment – 11, 12

Module 9 – Job and certification support

Major Project, Hadoop Development, cloudera Certification Tips and Guidance and Mock Interview Preparation, Practical Development Tips and Techniques, certification preparation

Project Work

1. Project – Working with Map Reduce, Hive, Sqoop

Problem Statement – It describes that how to import mysql data using sqoop and querying it using hive and also describes that how to run the word count mapreduce job.

2. Project – Hadoop Yarn Project – End to End PoC

Problem Statement – It includes:

Import Movie data
Append the data
How to use sqoop commands to bring the data into the hdfs
End to End flow of transaction data
How to process the real word data or huge amount of data using map reduce program in terms of movie etc.

Hue:

More Ecosystems
HUE.(Cloudera).
Oozie
Workflow (Action, Start, Action, End, Kill, Join and Fork), Schedulers, Coordinators and Bundles.
Workflow to show how to schedule Sqoop Job, Hive, MR and PIG.
Real world Use case which will find the top websites used by users of certain ages and will be scheduled to run for every one hour.
Zoo Keeper

Impala:

1. Objective

2. Impala Architecture

3. Impala Query Processing Interfaces

Apache Kafka:

1.Kafka producer

2.Kafka consumer

3.Kafa topics

Usecase:

Kafka with spark streaming.

Course duration: 60 min/day
No. of Sessions: 45
Weekend/Weekday Batch Starting August 1^st Week
Course Fee: Rs 18000/-

Leave Comment Cancel reply