Big Data Seminar with Demo @ BVRIT

Day – 1

Introduction to Hadoop & Big Data

  • High Availability
  • Scaling
  • Advantages and Challenges 
  • What is Big data
  • Big Data opportunities
  • Big Data Challenges
  • Characteristics of Big data 

Overview of Hadoop

  • Hadoop Distributed File System
  • Comparing Hadoop & SQL.
  • Industries using Hadoop.
  • Data Locality.
  • Hadoop Architecture.
  • Map Reduce & HDFS.
  • Using the Hadoop single node image (Clone). 

The Hadoop Distributed File System (HDFS)

  • HDFS Design & Concepts
  • Blocks, Name nodes and Data nodes
  • HDFS High-Availability
  • Hadoop DFS The Command-Line Interface
  • Basic File System Operations
  • Anatomy of File Read
  • Anatomy of File Write
  • Block Placement Policy
  • Configuration files.
  • Metadata, FS image, Edit log, Secondary Name Node and Safe Mode.
  • FSCK Utility. (Block report)
  • HDFS Federation
  • ZOOKEEPER Leader Election Algorithm
  • Exercise and small use case on HDFS

Map Reduce

  • Functional Programming Basics
  • Map and Reduce Basics
  • How Map Reduce Works
  • Anatomy of a Map Reduce Job Run
  • Hadoop 2.x Architecture
  • Job Completion, Failures
  • Shuffling and Sorting
  • Splits, Record reader, Partition, Types of partitions & Combiner
  • YARN
  • Types of I/O Formats
  • Handling small files using CombineFileInputFormat

Map/Reduce Programming – Java Programming

  • “Word Count” in Map/Reduce in standalone and Pseudo distribution Mode.
  • Dictionary translation using Hadoop
  • Average Length of words for characters
  • Few more problems solving using map-reduce.

Hive

  • Installation
  • Introduction and Architecture.
  • Hive Services, Hive Shell, Hive Server and Hive Web Interface (HWI)
  • Meta store
  • Hive QL
  • OLTP vs. OLAP
  • Working with Tables.
  • Primitive data types and complex data types.
  • Working with Partitions.
  • Hive Bucketed Tables and Sampling.
  • External partitioned tables, Map the data to the partition in the table, Writing the output of one query to another table, Multiple inserts
  • Differences between ORDER BY, DISTRIBUTE BY and SORT BY.
  • Demo

Day – 2

Pig

  • Installation
  • Execution Types
  • Grunt Shell
  • Pig Latin
  • Data Processing
  • Schema on read
  • Primitive data types and complex data types.
  • Tuple schema, BAG Schema and MAP Schema.
  • Loading and Storing
  • Filtering
  • Grouping & Joining
  • Demo

NoSQL

  • ACID in RDBMS and BASE in NoSQL.
  • CAP Theorem and Types of Consistency.
  • Types of NoSQL Databases in detail.
  • Columnar Databases  (HBASE and/or CASSANDRA).

NoSQL Databases

  • Elasticsearch 
    • Basic architecture
    • Querying the DB
  • MongoDB
    • Basic architecture
    • Querying the DB

Big Data Analytics

  • Elasticsearch (NoSQL DB)
    • Data Access
      • read, write, update
    • Clustering
    • Rest Calls
    • Securities in ES
  • Kibana
    • Analytical tool
    • Visualize
    • Dashboard
    • Search
    • Establishing Connection

Message Brokers

  • Overview of Message Brokers
  • Pub/sub model
  • Fitment of message brokers in the Big Data Domain
  • Information on Kafka/ActiveMQ/RabbitMQ
  • Hands-on examples
    • ActiveMQ