Big Data (Hadoop) Course Announcement @ Telecom Nagar

 

Check if Laptop supports VM or not –

3 Easy Ways to to Check If Your Processor Supports Virtualization

Topics Covered

Hadoop Training Course Prerequisites

  • Basic Unix Commands
  • Core Java (OOPS Concepts, Collections , Exceptions ) — For Map-Reduce Programming
  • SQL Query knowledge

Hardware and Software Requirements

  • Any Linux flavor OS (Ex: Ubuntu/Cent OS/Fedora/RedHat Linux) with 4 GB RAM (minimum), 100 GB HDD
  • Java 1.6+
  • Open-SSH server & client
  • MYSQL Database
  • Eclipse IDE
  • VMWare (To use Linux OS along with Windows OS)

Hadoop Training Course Duration

  • 55 Hours, daily 3:00  Hours ( approx 20 days )

Hadoop Training Course Content

(Depending on the interest of the students – few changes in the topics are possible)

Getting Ready

  • Basics of Unix
  • Basics of Restful calls

Introduction to Hadoop & Big Data

  • High Availability
  • Scaling
  • Advantages and Challenges 
  • What is Big data
  • Big Data opportunities
  • Big Data Challenges
  • Characteristics of Big data 

Overview of Hadoop

  • Hadoop Distributed File System
  • Comparing Hadoop & SQL.
  • Industries using Hadoop.
  • Data Locality.
  • Hadoop Architecture.
  • Map Reduce & HDFS.
  • Using the Hadoop single node image (Clone). 

The Hadoop Distributed File System (HDFS)

  • HDFS Design & Concepts
  • Blocks, Name nodes and Data nodes
  • HDFS High-Availability
  • Hadoop DFS The Command-Line Interface
  • Basic File System Operations
  • Anatomy of File Read
  • Anatomy of File Write
  • Block Placement Policy
  • Configuration files.
  • Metadata, FS image, Edit log, Secondary Name Node and Safe Mode.
  • FSCK Utility. (Block report)
  • HDFS Federation
  • ZOOKEEPER Leader Election Algorithm
  • Exercise and small use case on HDFS

Map Reduce

  • Functional Programming Basics
  • Map and Reduce Basics
  • How Map Reduce Works
  • Anatomy of a Map Reduce Job Run
  • Hadoop 2.x Architecture
  • Job Completion, Failures
  • Shuffling and Sorting
  • Splits, Record reader, Partition, Types of partitions & Combiner
  • YARN
  • Types of I/O Formats
  • Handling small files using CombineFileInputFormat

Map/Reduce Programming – Java Programming

  • Hands on “Word Count” in Map/Reduce in standalone and Pseudo distribution Mode.
  • Dictionary translation using Hadoop
  • Average Length of words for characters
  • HDFS Writer and Reader

NOSQL

  • ACID in RDBMS and BASE in NoSQL.
  • CAP Theorem and Types of Consistency.
  • Types of NoSQL Databases in detail.
  • Columnar Databases in Detail (HBASE and/or CASSANDRA).
  • TTL, Bloom Filters and Compensation.

HBase

  • HBase Installation
  • HBase concepts
  • HBase Data Model and Comparison between RDBMS and NOSQL
  • Master  & Region Servers
  • HBase Operations (DDL and DML) through Shell and Programming and HBase Architecture
  • Catalog Tables
  • Block Cache and sharding
  • SPLITS
  • DATA Modeling (Sequential, Salted, Promoted and Random Keys)
  • JAVA API’s and Rest Interface
  • HBASE Filters

Hive

  • Installation
  • Introduction and Architecture.
  • Hive Services, Hive Shell, Hive Server and Hive Web Interface (HWI)
  • Meta store
  • Hive QL
  • OLTP vs. OLAP
  • Working with Tables.
  • Primitive data types and complex data types.
  • Working with Partitions.
  • Hive Bucketed Tables and Sampling.
  • External partitioned tables, Map the data to the partition in the table, Writing the output of one query to another table, Multiple inserts
  • Differences between ORDER BY, DISTRIBUTE BY and SORT BY.
  • Hands on Exercises

Big Data Analytics

  • Elasticsearch (NoSQL DB)
    • Data Access
      • read, write, update
    • Clustering
    • Rest Calls
    • Securities in ES
  • Kibana
    • Analytical tool
    • Visualize
    • Dashboard
    • Search
    • Establishing Connection

FLUME ( Optional – depending on the time availability )

  • Installation
  • Introduction to Flume
  • Flume Agents: Sources, Channels and Sinks
  • Log User information using Java program in to HDFS using LOG4J and Avro Source
  • Log User information using Java program in to HDFS using Tail Source
  • Log User information using Java program in to HBASE using LOG4J and Avro Source
  • Log User information using Java program in to HBASE using Tail Source
  • Flume Commands
  • Use case of Flume: Flume the data from twitter in to HDFS and HBASE. Do some analysis using HIVE and PIG

Pig

  • Installation
  • Execution Types
  • Grunt Shell
  • Pig Latin
  • Data Processing
  • Schema on read
  • Primitive data types and complex data types.
  • Tuple schema, BAG Schema and MAP Schema.
  • Loading and Storing
  • Filtering
  • Grouping & Joining
  • Debugging commands (Illustrate and Explain).
  • Hands on Exercises

Message Brokers

  • Overview of Message Brokers
  • Pub/sub model
  • Fitment of message brokers in the Big Data Domain
  • Information on Kafka/ActiveMQ/RabbitMQ
  • Hands-on examples
    • ActiveMQ

Duration

  • Date & Timings ( Second Week of August- tentative )
  • Will be confirmed after the batch is formed