Hive hcatalog

Preview:

Citation preview

@alepoletto

@alepoletto

Hive

@alepoletto

Hive – What is?

• Data warehouse System Layer build on top of Hadoop

• Define Structure for your Unstructured Big Data

• Query this Data Using SQL like Language HiveQL

@alepoletto

Hive - is not …Relational Database

• Use Relational database to store metadata.

• Data that HIVE process is stored in HDFS

@alepoletto

Hive - is not… designed for online transactions• Runs on Hadoop ( batch Processing system)

• Jobs can have High latency with overhead

@alepoletto

Hive - is not… real time queries and row updates• Suited for batch jobs and over large sets of immutable data

@alepoletto

Hive – What it does

• Hadoop was built to organize and store massive amounts of data.

• A Hadoop cluster is a reservoir of heterogeneous data, from multiple sources and in different formats.

• Hive allows the user to explore and structure that data, analyze it, and then turn it into business insight.

@alepoletto

Hive – Architecture

@alepoletto

Hive – Tables

• Hive Tables• Data: in files in HDFS• Schema: in metadata stored into relational tables

• Schema and Data are separated

• Hive needs schema for existing HDFS data

@alepoletto

@alepoletto

Hive – Pig x Hive

Pig is good for• ETL.

• Preparing data for easier analyses.

• for long series of steps to perform

Hive is for• Query Data

• Need answer to specific questions

• If you are familiar with sql

@alepoletto

Hive – HiveQL

@alepoletto

@alepoletto

HCatalog – What it does

• Metadata and Table management System for Hadoop.

• shared schema and data type mechanism for different Hadoop tools like pig, hive and MapReduce• Interoperability across data processing tools

• Table abstraction, so you don’t need to worry with where and how the data is stored.

@alepoletto

HCatalog – Summary

• “Takes Hive Meatafdata and opens to everybody else”

@alepoletto

HCatalog – Overview

• Access data Through Hcatalog

@alepoletto

HCatalog – Archtecture

@alepoletto

Recommended