Big Data Seminar with Demo @ BVRIT
Day – 1
Introduction to Hadoop & Big Data
- High Availability
- Scaling
- Advantages and Challenges
- What is Big data
- Big Data opportunities
- Big Data Challenges
- Characteristics of Big data
Overview of Hadoop
- Hadoop Distributed File System
- Comparing Hadoop & SQL.
- Industries using Hadoop.
- Data Locality.
- Hadoop Architecture.
- Map Reduce & HDFS.
- Using the Hadoop single node image (Clone).
The Hadoop Distributed File System (HDFS)
- HDFS Design & Concepts
- Blocks, Name nodes and Data nodes
- HDFS High-Availability
- Hadoop DFS The Command-Line Interface
- Basic File System Operations
- Anatomy of File Read
- Anatomy of File Write
- Block Placement Policy
- Configuration files.
- Metadata, FS image, Edit log, Secondary Name Node and Safe Mode.
- FSCK Utility. (Block report)
- HDFS Federation
- ZOOKEEPER Leader Election Algorithm
- Exercise and small use case on HDFS
Map Reduce
- Functional Programming Basics
- Map and Reduce Basics
- How Map Reduce Works
- Anatomy of a Map Reduce Job Run
- Hadoop 2.x Architecture
- Job Completion, Failures
- Shuffling and Sorting
- Splits, Record reader, Partition, Types of partitions & Combiner
- YARN
- Types of I/O Formats
- Handling small files using CombineFileInputFormat
Map/Reduce Programming – Java Programming
- “Word Count” in Map/Reduce in standalone and Pseudo distribution Mode.
- Dictionary translation using Hadoop
- Average Length of words for characters
- Few more problems solving using map-reduce.
Hive
- Installation
- Introduction and Architecture.
- Hive Services, Hive Shell, Hive Server and Hive Web Interface (HWI)
- Meta store
- Hive QL
- OLTP vs. OLAP
- Working with Tables.
- Primitive data types and complex data types.
- Working with Partitions.
- Hive Bucketed Tables and Sampling.
- External partitioned tables, Map the data to the partition in the table, Writing the output of one query to another table, Multiple inserts
- Differences between ORDER BY, DISTRIBUTE BY and SORT BY.
- Demo
Day – 2
Pig
- Installation
- Execution Types
- Grunt Shell
- Pig Latin
- Data Processing
- Schema on read
- Primitive data types and complex data types.
- Tuple schema, BAG Schema and MAP Schema.
- Loading and Storing
- Filtering
- Grouping & Joining
- Demo
NoSQL
- ACID in RDBMS and BASE in NoSQL.
- CAP Theorem and Types of Consistency.
- Types of NoSQL Databases in detail.
- Columnar Databases (HBASE and/or CASSANDRA).
NoSQL Databases
- Elasticsearch
- Basic architecture
- Querying the DB
- MongoDB
- Basic architecture
- Querying the DB
Big Data Analytics
- Elasticsearch (NoSQL DB)
- Data Access
- read, write, update
- Clustering
- Rest Calls
- Securities in ES
- Data Access
- Kibana
- Analytical tool
- Visualize
- Dashboard
- Search
- Establishing Connection
Message Brokers
- Overview of Message Brokers
- Pub/sub model
- Fitment of message brokers in the Big Data Domain
- Information on Kafka/ActiveMQ/RabbitMQ
- Hands-on examples
- ActiveMQ