HIVE Queries

HIVE Queries

Sample Date employee.csv ( save it in /home/woir/Downloads/ )
1,Amar Sharma1,42,Gachibowli – Hyderabad-1,40000,Technology1
2,Amar Sharma2,43,Gachibowli – Hyderabad-2,40005,Technology2
3,Amar Sharma3,44,Gachibowli – Hyderabad-3,40010,Technology3
4,Amar Sharma4,45,Gachibowli – Hyderabad-4,40015,Technology4
5,Amar Sharma5,46,Gachibowli – Hyderabad-5,40020,Technology5
6,Amar Sharma6,47,Gachibowli – Hyderabad-6,40025,Technology6
7,Amar Sharma7,48,Gachibowli – Hyderabad-7,40030,Technology7
8,Amar Sharma8,49,Gachibowli – Hyderabad-8,40035,Technology8
9,Amar Sharma9,50,Gachibowli – Hyderabad-9,40040,Technology9
10,Amar Sharma10,51,Gachibowli – Hyderabad-10,40045,Technology10
11,Amar Sharma11,52,Gachibowli – Hyderabad-11,40050,Technology11
12,Amar Sharma12,53,Gachibowli – Hyderabad-12,40055,Technology12
13,Amar Sharma13,54,Gachibowli – Hyderabad-13,40060,Technology13
14,Amar Sharma14,55,Gachibowli – Hyderabad-14,40065,Technology14
15,Amar Sharma15,56,Gachibowli – Hyderabad-15,40070,Technology15
16,Amar Sharma16,57,Gachibowli – Hyderabad-16,40075,Technology16
17,Amar Sharma17,58,Gachibowli – Hyderabad-17,40080,Technology17
18,Amar Sharma18,59,Gachibowli – Hyderabad-18,40085,Technology18
19,Amar Sharma19,60,Gachibowli – Hyderabad-19,40090,Technology19
20,Amar Sharma20,61,Gachibowli – Hyderabad-20,40095,Technology20
21,Amar Sharma21,62,Gachibowli – Hyderabad-21,40100,Technology21
22,Amar Sharma22,63,Gachibowli – Hyderabad-22,40105,Technology22
23,Amar Sharma23,64,Gachibowli – Hyderabad-23,40110,Technology23
24,Amar Sharma24,65,Gachibowli – Hyderabad-24,40115,Technology24
25,Amar Sharma25,66,Gachibowli – Hyderabad-25,40120,Technology25
26,Amar Sharma26,67,Gachibowli – Hyderabad-26,40125,Technology26
27,Amar Sharma27,68,Gachibowli – Hyderabad-27,40130,Technology27
28,Amar Sharma28,69,Gachibowli – Hyderabad-28,40135,Technology28
29,Amar Sharma29,70,Gachibowli – Hyderabad-29,40140,Technology29
30,Amar Sharma30,71,Gachibowli – Hyderabad-30,40145,Technology30

hadoop dfs -ls /user/hive/warehouse/
hadoop dfs -copyFromLocal /home/woir/Downloads/*.csv /user/amar
hadoop dfs -ls /user/hive/warehouse/woir.db

Create database <DatabaseName>

-> create database woir ;
create database woir_training;
show databases;
drop database woir_training;
use woir;
-> show tables ;
-> create table employees_woir(Id INT, Name STRING, Age INT, Address STRING, Salary FLOAT, Department STRING) Row format delimited Fields terminated by ',';

-> LOAD DATA LOCAL INPATH '/home/woir/Downloads/employee.csv' INTO table employees_woir;

-> create TABLE order_history (OrderId INT,Date1 TIMESTAMP, Id INT, Amount FLOAT) ROW Format delimited Fields terminated by ',';

-> LOAD DATA INPATH '/user/woir/orders.csv' INTO table order_history;
-> create TABLE order_history_tmp (OrderId INT,Date1 TIMESTAMP, Id INT, Amount FLOAT) ROW Format delimited Fields terminated by ',';
-> show tables;
-> alter table order_history_tmp rename to order_history_deleteme;
-> show tables;
-> drop table order_history_deleteme;

To create the internal table

-> CREATE TABLE woirhive_internaltable (id INT,Name STRING) Row format delimited Fields terminated by ',';

Load the data into internal table

-> LOAD DATA INPATH '/user/names.csv' INTO table woirhive_internaltable;

Display the content of the table

-> select * from woirhive_internaltable;
-> select * from woirhive_internaltable where id=1;

To drop the internal table

-> DROP TABLE woirhive_internaltable;

If you dropped the woirhive_internaltable, including its metadata and its data will be deleted from Hive.

Create External table

-> CREATE EXTERNAL TABLE woirhive_external(id INT,Name STRING) Row format delimited Fields terminated by ',' LOCATION '/user/woirhive_external';

We can load the data manually

-> LOAD DATA INPATH '/user/names.txt' INTO TABLE woirhive_external;

Display the content of the table

-> select * from woirhive_external;

To drop the external table

-> DROP TABLE woirhive_external;

Creation of Table allstates

-> create table allstates(state string, District string,Enrolments string) row format delimited fields terminated by ',';

Loading data into created table all states

-> Load data local inpath '/home/woir/Downloads/states.csv' into table allstates;

Creation of partition table

-> create table state_part(District string,Enrolments string) PARTITIONED BY(state string);
-> set hive.exec.dynamic.partition.mode=nonstrict

For partition we have to set this property

Loading data into partition table

-> INSERT OVERWRITE TABLE state_part PARTITION(state) SELECT district,enrolments,state from allstates;

In Hive, we have to enable buckets by using the set hive.enforce.bucketing=true;

-> create table samplebucket (    
    first_name string,
    job_id    int,
    department string,
    salary string
) clustered by  (department) into 4 buckets 
  row format delimited
fields terminated by ',' ; from employees_woir insert overwrite table samplebucket select Name, id, Department, Salary;
# Id INT, Name STRING, Age INT, Address STRING, Salary FLOAT, Department STRING
->  hadoop dfs -lsr /user/hive/warehouse/woir.db/samplebucket/

Example:

-> Create VIEW Sample_View AS SELECT * FROM employees_woir  WHERE salary>40100;

-> select * from Sample_View;
-> Create INDEX sample_Index ON TABLE woirhive_internaltable(id);
-> #create table employees_woir(Id INT, Name STRING,Age INT, Address STRING, Salary FLOAT, Department STRING) Row Format delimiterd Fields terminated by ',';
-> SELECT * FROM employees_woir ORDER BY Department;

-> SELECT Department, count(*) FROM employees_woir GROUP BY Department;

-> SELECT * from employees_woir SORT BY Id DESC;

-> SELECT Id, Name from employees_woir CLUSTER BY Id;

-> SELECT Id, Name from employees_woir DISTRIBUTE BY Id;

Join queries Different type of joins

Joins are of 4 types, these are –

  • Inner join
  • Left outer Join
  • Right Outer Join
  • Full Outer Join

Inner Join:

The Records common to the both tables will be retrieved by this Inner Join.

-> SELECT c.Id, c.Name, c.Age, o.Amount FROM employees_woir c JOIN order_history o ON(c.Id=o.Id);

Left Outer Join:

Hive query language LEFT OUTER JOIN returns all the rows from the left table even though there are no matches in right table. If ON Clause matches zero records in the right table, the joins still return a record in the result with NULL in each column from the right table

-> SELECT c.Id, c.Name, o.Amount, o.Date1 FROM employees_woir c LEFT OUTER JOIN order_history o ON(c.Id=o.Id)

Right outer Join:

Hive query language RIGHT OUTER JOIN returns all the rows from the Right table even though there are no matches in left table. If ON Clause matches zero records in the left table, the joins still return a record in the result with NULL in each column from the left table. RIGHT joins always return records from a Right table and matched records from the left table. If the left table is having no values corresponding to the column, it will return NULL values in that place.

-> SELECT c.Id, c.Name, o.Amount, o.Date1 FROM employees_woir c RIGHT OUTER JOIN order_history o ON(c.Id=o.Id)

Full outer join:

It combines records of both the tables sample_joins and sample_joins1 based on the JOIN Condition given in query.It returns all the records from both tables and fills in NULL Values for the columns missing values matched on either side.

-> SELECT c.Id, c.Name, o.Amount, o.Date1 FROM woir_employees c FULL OUTER JOIN order_history o ON(c.Id=o.Id)

 Download – employee names order states