HADOOP Online Training Course Content

Module 1
Hadoop Architecture

What is Big Data
Hadoop Architecture
Hadoop ecosystem components
Hadoop Storage: HDFS
Hadoop Processing: MapReduce Framework
Hadoop Server Roles: NameNode, Secondary NameNode and DataNode,
Anatomy of File Write and Read.

Module 2
Hadoop Cluster Configuration and Data Loading

Hadoop Cluster Architecture
Hadoop Cluster Configuration files
Hadoop Cluster Modes
Multi-Node Hadoop Cluster
A Typical Production Hadoop Cluster
MapReduce Job execution
Common Hadoop Shell commands
Data Loading Techniques: FLUME, SQOOP, Hadoop Copy Commands

 

 

Module 3
Hadoop MapReduce framework

Hadoop Data Types
Hadoop MapReduce paradigm
Map and Reduce tasks
MapReduce Execution Framework
Partitioners and Combiners
Input Formats (Input Splits and Records, Text Input, Binary Input, Multiple Inputs)
Output Formats (TextOutput, BinaryOutPut, Multiple Output)

Module 4
Advance MapReduce

Counters
Custom Writables
Unit Testing: JUnit and MRUnit testing framework
Error Handling
Tuning
Advance MapReduce
Module 5
Pig and Pig Latin

 Installing and Running Pig
Grunt
Pig's Data Model
Pig Latin
Developing & Testing Pig Latin Scripts
Writing Evaluation
Filter
Load & Store Functions

Module 6
Hive and HiveQL

Hive Architecture and Installation
Comparison with Traditional Database
HiveQL: Data Types
Operators and Functions
Hive Tables(Managed Tables and External Tables, Partitions and Buckets, Storage Formats, Importing Data, Altering Tables, Dropping Tables)
Querying Data (Sorting And Aggregating, Map Reduce Scripts, Joins &Subqueries, Views, Map and Reduce side Joins to optimize Query).



Module 7
Advance Hive, NoSQL Databases and HBase

Hive: Data manipulation with Hive
User Defined Functions
Appending Data into existing Hive Table
Custom Map/Reduce in Hive
Hadoop Project: Hive Scripting
HBase: Introduction to HBase
Client API's and their features
Available Client
HBase Architecture
MapReduce Integration.
Module 8
Advance HBase and ZooKeeper

HBase: Advanced Usage
Schema Design
Advance Indexing
Coprocessors

Module 9
Hadoop 2.0, MRv2 and YARN

Schedulers:Fair and Capacity
Hadoop 2.0 New Features: NameNode High Availability
HDFS Federation
MRv2
YARN
Running MRv1 in YARN
Upgrade your existing MRv1 code to MRv2
Programming in YARN framework.