Build next generation apps with eyes and ears using Google Chrome

Preview:

DESCRIPTION

Presented at Google Extended I/O Event at Ahmedabad Java Meetup Group

Citation preview

For Ahmedabad Java Meetup Group (300+ members strong now!)

Big Data Workshop – An introduction

and workshop launch session

10th May, 2014

Dhruv GohilFrom Ishi systems

Welcome!

l Why a workshop and not a presentation

l What you should do in workshop?

l What is expected from you in this session

l What you should expect from this session?

l What are up-coming sessions going to be like?

Seems too serious?

Now, This is much better!

So, let's change the font!

OK... So what are we gonna do today?

➔Workshop setup and series introduction➔Already done! (See it's easy!)

➔Big is not only ‘big’.

➔Why we need 'Big data'?

➔What 'Big data' is NOT?

➔fear of Big data? Kick it off!

Let me tell you a story..

http://en.wikipedia.org/wiki/Information_Management_System

If you still think about 'Entities' and 'Tables'

Everything you have been taught in college about Database is ALL WRONG.

http://slideshot.epfl.ch/play/suri_stonebraker

Big Data is...

http://www.ibmbigdatahub.com/infographic/four-vs-big-data

Big Data is not only ‘big’

Volume, Velocity, Variety

GB/TB vs PB/EB

Centralized vs Distributed

Structured vs Semi-Structured/Unstructured

Data Model vs Schema

Known relationships vs Flexible associations

What 'Big data' is NOT?

Big data हहैं इसललिए Hadoop हहहैँ , Hadoop हहहैँ इसलिए Big data नलहह!

What 'Big data' is NOT?

Applying for a job here?

Hadoop ससे कम ततो गगालिली कसे बरगाबर हहैं !

What 'Big data' is NOT?

Why always Hadoop comes to mind with big data?

What else we should know?

Tools vs Methodologies

Being too futuristic vs. being practical/economical

Big Data in your organization

http://www.fakingnews.firstpost.com/2014/04/transcript-of-rahul-gandhis-interview-for-job-of-a-c-programmer/

We brought RTSC. Right To Source Code.

Now, deal with it.

Big Data in your organization

➢ Cost of tools/software decreases, but cost of knowledge increases

➢ Being agile is the only way to deal competition

➢ Are you working with.

✔ Social networking and media✔ Mobile devices✔ Internet transactions✔ Networked devices and sensors

Big Data in your product/service

● Have to change thinking in perspective of access vs. storage

● Design based on when/where data is used vs. when/where data is produced.

● Use redundancy in contrast of storage cost

● Understand NoSQL = Not Only SQL

✔ Streams✔ In memory analytics✔ Massively parallel processing (Data crunching)

Big Data in your project

Random Research says.. ➔ 99% client of yours asked for Big Data project, ended up having total paid customers less then your own fingers.

A Project hits Business scalability much much earlier then technical scalability.

Big Data for your clients

➢ Business first - technology second

➢ Current reality for client projects:

✔ Use big data tools which works at small scale :-)✔ Design with domain in mind not the database client suggests.

➢ Always design for read optimization in mind (the golden rule)

Big Data project for small data customers

If you can do it postgresql, then do it postgresql

(the blue elephant rule)

Few important tips..

The CAP theorem- Basics of NoSQL Databases

Read a lot about design of database before using any non traditional database. Or read good negative posts to know when NOT to use it.

e.g. : http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/

Now... the good parts !

It's your time to speak now!

Workshop session:

About practical selection of technology and design for real word use cases.

All references used in workshop reference

➔ Basic hadoop introductory material : http://www.coreservlets.com/hadoop-tutorial/

➔ Evaluate hadoop without installation : http://go.cloudera.com/cloudera-live.html

➔ Postgresql good parts : http://www.slideshare.net/Aveic/postgresql-34323147

➔ Postgresql as NOSQL column store : http://postgresguide.com/sexy/hstore.html

➔ Postgresql as Elastic search basic functionality : http://blog.lostpropertyhq.com/postgres-full-text-search-is-good-enough/

➔ Good big data compatible OSS softwares : http://netflix.github.io/

➔ Practical Hbase usage : https://www.facebook.com/UsingHbase

➔ Using cassandra for write heavy applications : http://www.datastax.com/1-million-writes

➔ On-line analytics in STORM : http://hortonworks.com/hadoop/storm/

➔ E-commerce Domain specific use case : http://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376

➔ Good use case of selecting data store based on proper understanding of CAP theorem : http://tech-blog.flipkart.net/2013/01/nosql-for-a-user-engagement-platform/

➔ Recommendation engine in Big Data scenarios : http://www.slideshare.net/hava101/recommendations-play-flipkart-14115791

➔ High volume log proessing: http://www.splunk.com/view/product-tour/SP-CAAAAGV Open source alternatives : http://logstash.net/ and http://graylog2.org/

Recommended