Summer Internship and Live Projects on NoSQL @ WOIR Software – Course Announcement @ Near Hitech City
Date & Timings ( Ist Week of June )
-
Duration is around 3 to 5 weeks
-
Limited Seats ( Max 10 )
-
Location –
-
Jubilee Enclave ( Near Shilparamam & Oracle Building ) , Hyderabad
-
-
Resource Person – Industry Expert, worked in Yahoo and Microsoft and an IITian
-
Training Certificate from WOIR Software India Pvt. Ltd. (Private Limited Company)
- Training of the Latest Technology to be used in the Project
- 100% Assistance in completing the projects
- Review of Project Reports
Technologies to be used are –
-
Hadoop ( MR, HDFS, YARN, Hive etc. )
-
NoSQL – Elasticsearch
-
Message Brokers – ActiveMQ
-
Analytics – Kibana, Logstash
Topics Covered
Hadoop Training Course Prerequisites
- Basic Unix Commands
- Core Java (OOPS Concepts, Collections , Exceptions ) — For Map-Reduce Programming
- SQL Query knowledge
Hardware and Software Requirements
- Any Linux flavor OS (Ex: Ubuntu/Cent OS/Fedora/RedHat Linux) with 4 GB RAM (minimum), 100 GB HDD
- Java 1.6+
- Open-SSH server & client
- MYSQL Database
- Eclipse IDE
- VMWare (To use Linux OS along with Windows OS)
Hadoop Training Course Duration
- Approximate 40 to 44 Hours
Hadoop Training Course Content
(Depending on the interest of the students – few changes in the topics and their depths are possible)
Getting Ready
- Basics of Unix
- Basics of Restful calls
- Basics of Java Programming
- Compiling Java Program and it’s execution
Introduction to Hadoop & Big Data
- High Availability
- Scaling
- Advantages and Challenges
- What is Big data
- Big Data opportunities
- Big Data Challenges
- Characteristics of Big data
Overview of Hadoop
- Hadoop Distributed File System
- Comparing Hadoop & SQL.
- Industries using Hadoop.
- Data Locality.
- Hadoop Architecture.
- Map Reduce & HDFS.
- Using the Hadoop single node image (Clone).
The Hadoop Distributed File System (HDFS)
- HDFS Design & Concepts
- Blocks, Name nodes and Data nodes
- HDFS High-Availability
- Hadoop DFS The Command-Line Interface
- Basic File System Operations
- Anatomy of File Read
- Anatomy of File Write
- Block Placement Policy
- HDFS Federation
- Exercise and small use case on HDFS
Map Reduce
- Functional Programming Basics
- Map and Reduce Basics
- How Map Reduce Works
- Anatomy of a Map Reduce Job Run
- Hadoop 2.x Architecture
- Job Completion, Failures
- Shuffling and Sorting
- Splits, Record reader, Partition, Types of partitions & Combiner
- YARN
- Types of I/O Formats
- Handling small files using CombineFileInputFormat
Map/Reduce Programming – Java Programming
- Hands on “Word Count” in Map/Reduce in standalone and Pseudo distribution Mode.
- Dictionary translation using Hadoop
- Average Length of words for characters
- HDFS Writer and Reader
Hive
- Installation
- Introduction and Architecture.
- Hive Services, Hive Shell, Hive Server and Hive Web Interface (HWI)
- Meta store
- Hive QL
- OLTP vs. OLAP
- Working with Tables.
- Primitive data types and complex data types.
- Working with Partitions.
- Hive Bucketed Tables and Sampling.
- External partitioned tables, Map the data to the partition in the table, Writing the output of one query to another table, Multiple inserts
- Differences between ORDER BY, DISTRIBUTE BY and SORT BY.
- Hands on Exercises
NoSQL
- ACID in RDBMS and BASE in NoSQL.
- CAP Theorem and Types of Consistency.
- Types of NoSQL Databases in detail.
- Columnar Databases in Detail (HBASE and/or CASSANDRA).
- TTL, Bloom Filters and Compensation.
NoSQL Databases/Search Engine
Elasticsearch
- Data Access
- read, write, update
- Clustering
- Rest Calls
- Securities in ES
- Sharding
- Partitioning
Big Data Analytics
- Elasticsearch (NoSQL DB)
- Data Access
- read, write, update
- Clustering
- Rest Calls
- Securities in ES
- Data Access
- Kibana
- Analytical tool
- Visualize
- Dashboard
- Search
- Establishing Connection
Message Brokers
- Overview of Message Brokers
- Pub/sub model
- Fitment of message brokers in the Big Data Domain
- Information on Kafka/ActiveMQ/RabbitMQ
- Hands-on examples
- ActiveMQ
Optional – depending on the time availability
FLUME
- Installation
- Introduction to Flume
- Flume Agents: Sources, Channels and Sinks
- Log User information using Java program in to HDFS using LOG4J and Avro Source
- Log User information using Java program in to HDFS using Tail Source
- Log User information using Java program in to HBASE using LOG4J and Avro Source
- Log User information using Java program in to HBASE using Tail Source
- Flume Commands
- Use case of Flume: Flume the data from twitter in to HDFS and HBASE. Do some analysis using HIVE and PIG
Pig
- Installation
- Execution Types
- Grunt Shell
- Pig Latin
- Data Processing
- Schema on read
- Primitive data types and complex data types.
- Tuple schema, BAG Schema and MAP Schema.
- Loading and Storing
- Filtering
- Grouping & Joining
- Debugging commands (Illustrate and Explain).
- Hands on Exercises
HBase
- HBase Installation
- HBase concepts
- HBase Data Model and Comparison between RDBMS and NOSQL
- Master & Region Servers
- HBase Operations (DDL and DML) through Shell and Programming and HBase Architecture
Duration
- Date & Timings ( Ist Week of June – tentative )
- Exact date will be confirmed after the batch is formed