22

Getting the most out of Impala - Best practices for infrastructure optimization

  • Upload
    bigstep

  • View
    940

  • Download
    1

Embed Size (px)

DESCRIPTION

We started testing Impala in an effort to understand what hardware setup would provide the best performance/price for it. We didn’t want to see it perform in extreme cases, but in regular situations that most users would encounter. We aimed to provide a quick practical guide for choosing the infrastructure to run Impala on. With this in mind, we looked at a medium sized deployment of 4 Full Metal Compute Instances, and we scaled the hardware from single CPU, low RAM capacity to dual CPU, high RAM capacity. We didn’t start out aiming to prove any particular assumption. The purpose of the project was to explore and understand how Impala works with hardware and our findings were quite surprising. This presentation was prepared for the London Enterprise Technology Meetup by Alex Bordei - Product Manager at Bigstep, together with Graham Gear - EMEA Director of Systems Engineering at Cloudera. Stay informed: http://blog.bigstep.com/

Citation preview

Page 1: Getting the most out of Impala - Best practices for infrastructure optimization
Page 2: Getting the most out of Impala - Best practices for infrastructure optimization

Who is Cloudera?

Page 3: Getting the most out of Impala - Best practices for infrastructure optimization

The Leading Open Source Distribution of Apache Hadoop

Powerful Suite of System & Data Management Software Built for the Enterprise

Founded: 2008

Employees: 500+

Customers: Over 50% of the Fortune 50 and 65% of the Fortune 500 plus top US intelligence and defense agencies

Partners: 700+ in hardware, software, and services

Education: 15,000+ trained; developers, admins, analysts, data scientists

Community: Founders and top supporters of the Hadoop open source ecosystem

Page 4: Getting the most out of Impala - Best practices for infrastructure optimization
Page 5: Getting the most out of Impala - Best practices for infrastructure optimization
Page 6: Getting the most out of Impala - Best practices for infrastructure optimization

What is Hadoop and how has it evolved?

BATCHPROCESSING

ANALYTICSQL

SEARCHENGINE

MACHINELEARNING

STREAMPROCESSING

3RD PARTYAPPS

WORKLOAD MANAGEMENT

STORAGE FOR ANY TYPE OF DATA

DATAM

ANAG

EMEN

TSYSTEM

MAN

AGEM

ENT

ENTERPRISE DATA HUB

FILESYETEM ONLINE NOSQL

OPEN SOURCE SCALABLE FLEXIBLE COST EFFECTIVE

MANAGED

OPEN ARCHITECTURE

SECURE AND GOVERNED

Page 7: Getting the most out of Impala - Best practices for infrastructure optimization
Page 8: Getting the most out of Impala - Best practices for infrastructure optimization
Page 9: Getting the most out of Impala - Best practices for infrastructure optimization
Page 10: Getting the most out of Impala - Best practices for infrastructure optimization
Page 11: Getting the most out of Impala - Best practices for infrastructure optimization
Page 12: Getting the most out of Impala - Best practices for infrastructure optimization
Page 13: Getting the most out of Impala - Best practices for infrastructure optimization
Page 14: Getting the most out of Impala - Best practices for infrastructure optimization
Page 15: Getting the most out of Impala - Best practices for infrastructure optimization
Page 16: Getting the most out of Impala - Best practices for infrastructure optimization
Page 17: Getting the most out of Impala - Best practices for infrastructure optimization
Page 18: Getting the most out of Impala - Best practices for infrastructure optimization
Page 19: Getting the most out of Impala - Best practices for infrastructure optimization
Page 20: Getting the most out of Impala - Best practices for infrastructure optimization
Page 21: Getting the most out of Impala - Best practices for infrastructure optimization
Page 22: Getting the most out of Impala - Best practices for infrastructure optimization