Learn Big Data With Me - Hive

Introduction

 
Hive is a query language, which was developed by Facebook. Hadoop can give support to any kind of data.
  • Structured data like database tables.
  • Unstructured data like videos, audios, PDF, TXT files, etc.
  • Semi-structured data like XML.
Hadoop supports HiveQL.
 
Difference between SQL & HiveQL
  • In SQL, we can insert the data values row by row but not in HQL.
  • In SQL we can update any row or column but not in HQL because the data is stored in HDFS, after putting data into HDFS, you shouldn’t change the contents of the data.
  • In SQL, we can use delete but not in HQL.
In Hive, every table is created as a directory.
 
HQL datatypes
 
Like other RDBMS (Oracle, MySQL, SQL Server), it also has databases.
 
TinyInt Float Map
ShortInt Double Array
BigInt String Struct
Here map, array, struct are called collection datatypes.
 
Creating hive tables
 
Hive tables can be created in two ways.
  1. Managed tables or Internal tables.
  2. External tables.
Managed tables or Internal tables
  • user@machine:~$ hive
  • hive> create table employee(id int, name string,salary float)
  • >row format delimited
  • >fields terminated by ‘\t’;
 
Important points
  • String can contain any kind of data.
  • In SQL, if you want to insert the data, you have to first create schema or table but in HQL, you can either create the table and insert the data or you can insert the data and then create the table.
  • If you apply it after the table column in create a statement in HQL, it will give you null but not actual data and it will not give you an error, so you need to write delimiter & terminated line.
Loading data into Hive tables
 
The data can be loaded in two ways.
 
Either from the local file system or from HDFS
 
Loading data from the local file system
 
hive>load data local inpath <filepath> into table <tablename>
 
Loading data from HDFS
 
hive>load data inpath <filepath> into table <tablename>
  • If it is a local file system, the default path is home/user.
  • If it is HDFS, it is a user.
Here, we will hear a few words like metadata, which means the data about the data & metastore, which means keeping metadata to store.
 
External Tables
 
hive> create external table employeeE(id int, name string,salary float),
  • >row format delimited
  • >fields terminated by ‘\t’
  • >location “/vimal/newfolder”;
Concept
  • If we are creating internal tables, the table name is created as a directory on the warehouse. If we are creating the external tables, the table name will never be created as a directory name but is just trying to refresh some location /vimal/newfolder.
  • For global usage, you can refer to an external table but not an internal table.
Internal Table
 
/user / hive / warehouse
employee (directory)
employee (file)
employee1 (file)
 
External Table
 
/vimal / newfolder /
employee (file)


Similar Articles