18
@alepoletto

Hive hcatalog

Embed Size (px)

Citation preview

Page 1: Hive hcatalog

@alepoletto

Page 2: Hive hcatalog

@alepoletto

Hive

Page 3: Hive hcatalog

@alepoletto

Hive – What is?

• Data warehouse System Layer build on top of Hadoop

• Define Structure for your Unstructured Big Data

• Query this Data Using SQL like Language HiveQL

Page 4: Hive hcatalog

@alepoletto

Hive - is not …Relational Database

• Use Relational database to store metadata.

• Data that HIVE process is stored in HDFS

Page 5: Hive hcatalog

@alepoletto

Hive - is not… designed for online transactions• Runs on Hadoop ( batch Processing system)

• Jobs can have High latency with overhead

Page 6: Hive hcatalog

@alepoletto

Hive - is not… real time queries and row updates• Suited for batch jobs and over large sets of immutable data

Page 7: Hive hcatalog

@alepoletto

Hive – What it does

• Hadoop was built to organize and store massive amounts of data.

• A Hadoop cluster is a reservoir of heterogeneous data, from multiple sources and in different formats.

• Hive allows the user to explore and structure that data, analyze it, and then turn it into business insight.

Page 8: Hive hcatalog

@alepoletto

Hive – Architecture

Page 9: Hive hcatalog

@alepoletto

Hive – Tables

• Hive Tables• Data: in files in HDFS• Schema: in metadata stored into relational tables

• Schema and Data are separated

• Hive needs schema for existing HDFS data

Page 10: Hive hcatalog

@alepoletto

Page 11: Hive hcatalog

@alepoletto

Hive – Pig x Hive

Pig is good for• ETL.

• Preparing data for easier analyses.

• for long series of steps to perform

Hive is for• Query Data

• Need answer to specific questions

• If you are familiar with sql

Page 12: Hive hcatalog

@alepoletto

Hive – HiveQL

Page 13: Hive hcatalog

@alepoletto

Page 14: Hive hcatalog

@alepoletto

HCatalog – What it does

• Metadata and Table management System for Hadoop.

• shared schema and data type mechanism for different Hadoop tools like pig, hive and MapReduce• Interoperability across data processing tools

• Table abstraction, so you don’t need to worry with where and how the data is stored.

Page 15: Hive hcatalog

@alepoletto

HCatalog – Summary

• “Takes Hive Meatafdata and opens to everybody else”

Page 16: Hive hcatalog

@alepoletto

HCatalog – Overview

• Access data Through Hcatalog

Page 17: Hive hcatalog

@alepoletto

HCatalog – Archtecture

Page 18: Hive hcatalog

@alepoletto