23
HPCC Systems Load, Index & Query Big Data the EZ way By Fujio Turner @myhousehippo

Big Data - Load, Index & Query the EZ way - HPCC Systems

Embed Size (px)

DESCRIPTION

Learn how to index your Big Data to get the speed that you want and need. With HPCC Systems use less machines and do more work faster then Hadoop. To Install HPCC Systems in just 5 Minutes Watch this Youtube video. http://www.youtube.com/watch?v=8SV43DCUqJg

Citation preview

Page 1: Big Data - Load, Index & Query the EZ way - HPCC Systems

HPCC Systems Load, Index & Query

Big Data the EZ way

By Fujio Turner

@myhousehippo

Page 2: Big Data - Load, Index & Query the EZ way - HPCC Systems

Comparison

JAVA C++Petabytes

1-80,000 Jobs/day

Since 2005

Exabytes

Non-Indexed 4X-13X

Since 2000

Indexed: 2K-3K Jobs/sec

? ? ? ? ? ?

Page 3: Big Data - Load, Index & Query the EZ way - HPCC Systems

BusinessDevelopmentCustomers1 20

Non-Indexed Full Data Set

http://hpccsystems.com/why-hpcc/benchmarks

Page 4: Big Data - Load, Index & Query the EZ way - HPCC Systems

Map/Reduce

SQL w/ JOINS

GraphDB

Machine Learning

Simple to Complex Queries

Page 5: Big Data - Load, Index & Query the EZ way - HPCC Systems

PluginsBITransport

SecurityQuery

Encrypted on disk

Page 6: Big Data - Load, Index & Query the EZ way - HPCC Systems

“I’m sub-second fast.”

“I can query all or part of your

data.”

Thor RoxieHard Disk

Index(optional)Hard Disk

Index(optional) In-memory Index

SSD

Either/Both

Architecture

Page 7: Big Data - Load, Index & Query the EZ way - HPCC Systems

Data QueryFile

Example 2

Example 1

Page 8: Big Data - Load, Index & Query the EZ way - HPCC Systems

HPCC Systems Sample Data for Examples 1 & 2

Sample Data

http://hpccsystems.com/download/docs/learning-ecl

More Examples

Page 9: Big Data - Load, Index & Query the EZ way - HPCC Systems

CREATE TABLE layout_person ( PersonID INT(15) NOT NULL, FirstName VARCHAR(15) NOT NULL, LastName VARCHAR(25) NOT NULL, PRIMARY KEY (PersonID) );

1. Schema

2.

3.

Load

Query

INSERT INTO`layout_person` (`FirstName`,`LastName`)VALUE(‘Joe’,’Smith’;

SELECT * FROM `layout_person`;

Typical

Page 10: Big Data - Load, Index & Query the EZ way - HPCC Systems

1.

2.

Load

Queryw/ Applied Schema

on Read allPeople := DATASET(‘~file’,Layout_Person,THOR);

Layout_Person := RECORD UNSIGNED1 PersonID; STRING15 FirstName; STRING25 LastName; END;

allPeople;

Structured or

Semi-structured or

Unstructured

All data has: 1. Origin 2. DateTime 3. Info

Page 11: Big Data - Load, Index & Query the EZ way - HPCC Systems

Administrator Web GUI!on

Port 8010IP / Url of HPCC install

Page 12: Big Data - Load, Index & Query the EZ way - HPCC Systems

4.

5.

1. Upload file*!2. Distribute to cluster!3. Name of file in cluster!4. Size of each row!5. Push to cluster

*2GB file size limit through web No limit if uploaded via SOAP

Load Data

Page 13: Big Data - Load, Index & Query the EZ way - HPCC Systems

In Thor Cluster

Loaded

Page 14: Big Data - Load, Index & Query the EZ way - HPCC Systems

Query !Example 1

Data

allPeople := DATASET(‘~test::originalperson’,Layout_Person,THOR);

Layout_People := RECORD STRING15 FirstName; STRING25 LastName; STRING15 MiddleName; STRING5 Zip; STRING42 Street; STRING20 City; STRING2 State; END;

Smiths; //Output

Smiths := allPeople(LastName = ‘Smith’);Query

Schema

WHERE `LastName` = ‘Smith’

File TypeFile Location,!“FROM Table”

“USE DATABASE;”

“SELECT * ….”

Page 15: Big Data - Load, Index & Query the EZ way - HPCC Systems

1. Go to playground!2. Edit ECL!3. Pick “thor” Cluster!4. Submit

http://www.meetup.com/HPCC-SV/pages/ECL_EXAMPLE_1/

Practice

Page 16: Big Data - Load, Index & Query the EZ way - HPCC Systems

Full !Table or Data !

Scan

Why Index ?

++and

from date to date

Page 17: Big Data - Load, Index & Query the EZ way - HPCC Systems

Indexing!Example 2

Make Index

File Position Number!pseudo recordID!

“Alter Table”(new column)Index Filename

allPeople := DATASET(‘~test::originalperson’, {Layout_People, UNSIGNED8 RecPtr {virtual(fileposition)}}, THOR);

datax := INDEX(allPeople,{State,RecPtr},’~test::key_person’);

BUILDINDEX(datax);

Ex. Creating an index by “STATE”

http://www.meetup.com/HPCC-SV/pages/ECL_EXAMPLE_2a_-_Create_Index

Page 18: Big Data - Load, Index & Query the EZ way - HPCC Systems

Query

filterdata; //Output

w/ IndexData

Queryfilterdata:= FETCH(allPeople,datax(State=‘NJ’),RIGHT. RecPtr);

datax:= INDEX(allPeople,{State,RecPtr},’~thor::test::key_person’);

WHERE `State` = ‘NJ’ from Index

allPeople := DATASET(‘~test::originalperson’, {Layout_People, UNSIGNED8 RecPtr {virtual(fileposition)}},THOR);

http://www.meetup.com/HPCC-SV/pages/ECL_EXAMPLE_2b_-_Query_with_Index

Page 19: Big Data - Load, Index & Query the EZ way - HPCC Systems

2013-06-06 Twitter

2013-06-07 Twitter

2013-06-08 Twitter

2013-06 Twitter

2013-06-06 ……….. -07 ……….. -08

Logical File

Real File

SuperFile!organizing your files

+ Append new real files

Page 20: Big Data - Load, Index & Query the EZ way - HPCC Systems

1. Create New !! or !! Update Existing!! Super File

2. Super File Name!!2b. Add new file to !! existing superfile!!

3. Create Superfile!!

Creating a SuperFile

Page 21: Big Data - Load, Index & Query the EZ way - HPCC Systems

2013-06-06 Twitter

2013-06-07 Twitter

2013-06-08 Twitter

2013-06 Twitter

2013 Twitter

SuperKeys!organizing your indexes

Page 22: Big Data - Load, Index & Query the EZ way - HPCC Systems

2013-06-06 Twitter

2013-06-07 Twitter

2013-06-08 Twitter

2013 Twitter

SuperKeys No Sub-Super Files or Keys

in Roxie

Page 23: Big Data - Load, Index & Query the EZ way - HPCC Systems

When and where NOT to Index

Filtered Data

80-100% Queries @ Roxie

Index HereDo Not Index Here

100% of Data Enters Here

100% of Data Enters Here

• Query 100% of all data • Lots of Regular Expressions • Few or No DateTime DataDo Not Index Here