26
Ingesting click data for analytics Francesco Furiani, CTO @

Ingesting Click Data for Analytics

Embed Size (px)

Citation preview

Page 1: Ingesting Click Data for Analytics

Ingesting click data for analytics

Francesco Furiani, CTO @

Page 2: Ingesting Click Data for Analytics

$ whoami

Francesco Furiani (@ilfurio): Backend Engineer

Roamed these halls not too long ago

Ingesting clicks data for analytics

Loves: Studying new CS stuff

PlayStation / Bike / Traveling / Soccer

O RLY? books

How do I make a living: CTO @ ClickMeter

Backend Engineer @ ClickMeter

Enum.take_random(IT_ROLES,1) @ ClickMeter

Page 3: Ingesting Click Data for Analytics

Ingesting clicks data for analytics

ClickMeter

Page 4: Ingesting Click Data for Analytics

100k+ customers

Getting events for customers from 10 to 3000 req/sec

Ingesting clicks data for analytics

ClickMeter

Page 5: Ingesting Click Data for Analytics

We receive data anytime someone: Clicks our links

Views our pixels

Calls our postbacks

Our customers use us: Inside a famous app the day of the big release ✔

Advertising on an extremely big video portal ✔

A tiny travel blog ✔

A physical device for advertising ✔

Ingesting clicks data for analytics

Getting the data

Page 6: Ingesting Click Data for Analytics

We need to:

Try not to lose the events we receive (duh)

Show customers data for better insight on their campaigns

Scale up/down according to the incoming fluxes

Improve the product by using the data we get

Do it as fast as possible (wasn’t this ready a week ago?)

Do it as cheap as possible

Ingesting clicks data for analytics

The challenge

Page 7: Ingesting Click Data for Analytics

Find the size of the problem you’re trying to solve How much data do you expect? Rate?

What do you have to do with it?

Do I have to do something with ALL of it?

How fast do I have to do it?

Answers to these questions are a starting point.

Ingesting clicks data for analytics

Size

Page 8: Ingesting Click Data for Analytics

Once we know how big and bad the beast is, we

need to design the ranch that will keep it in check.

Iterative process and prone to a lot of failures, but

the world is out there to help us.

Think, write and draw a lot.

Ingesting clicks data for analytics

Design

Page 9: Ingesting Click Data for Analytics

… draw too much ...

Ingesting clicks data for analytics

Design

Page 10: Ingesting Click Data for Analytics

Most of us will never have the joy (and the horror) of

creating a new stack, novel in theory and practice.

Still we need to understand the theory behind every

brick.

Read the info, read the opinions, try little proof of

concept of the moving parts, it helps a lot!

Ingesting clicks data for analytics

Which bricks should I use

Page 11: Ingesting Click Data for Analytics

A very important brick.

Elasticity of computation power, many *aaS, managed solutions are

really a great help in terms of saved manpower and fast iterations.

It comes at a great cost to consider:

• $$$ (ymmv)

• Possible lock-ins

Ingesting clicks data for analytics

The cloud is a brick too

Page 12: Ingesting Click Data for Analytics

… well it’s never definitive ...

Ingesting clicks data for analytics

Design with bricks

Page 13: Ingesting Click Data for Analytics

Obviously we haven’t followed those guidelines.

One becomes savvy after crashing and burning

many times.

But still thanks to those errors we got there and

built, at every iteration, a better infrastructure.

Ingesting clicks data for analytics

How we did it

Page 14: Ingesting Click Data for Analytics

ClickMeter was already live and growing

It needed an overhaul in its infrastructure/backend.

The growth fueled the need to be ready for more power to handle more data.

Obviously this had to be a tablecloth trick migration

Ingesting clicks data for analytics

How we did it

Page 15: Ingesting Click Data for Analytics

Already on the cloud (AWS), we thought of having a hybrid approach but it didn’t

make sense.

Review of old components already in production to see what to kill, keep or

update.

Kept good stuff and designed some new layers to make them work flawlessly in

the new infrastructure.

Ingesting clicks data for analytics

How we did it

Page 16: Ingesting Click Data for Analytics

Ingesting clicks data for analytics

Page 17: Ingesting Click Data for Analytics

Pretty important, they need to:

• Stay up

• Scale up/down depending on the incoming traffic

• Never lose anything

• Be as fast as possible in processing

They’re a custom web app application that undergoes a lot of testing.

We used stuff like Beanstalk, Scaling groups, Load Balancers and Health routing

offered by our cloud provider to manage the webapp scaling/availability

Ingesting clicks data for analytics

Redirect engineaka events collector

Page 18: Ingesting Click Data for Analytics

Pipeline

Most of this part uses our cloud provider

technology.

This simplifies maintenance and provisioning,

keeping the focus on the value of our product.

Some moving parts are custom made by us to

interact with the cloud technology (might be

proprietary or just repackaged known one).

Ingesting clicks data for analytics

Tracking engineand friends

Page 19: Ingesting Click Data for Analytics

SQS Pipeline

Kinesis

• Events • Preprocessing• Postprocessing

• DynamoDB

Ingesting clicks data for analytics

Tracking engineand friends

Page 20: Ingesting Click Data for Analytics

Combination of real-time and batch technologies.

One of the scaling parts that actually provides value to the customers.

Computes analysis on events data from a simple count to some predictions.

Check the data produced by your processing system to improve the pipeline step-by-step!

Ingesting clicks data for analytics

Pipeline

Page 21: Ingesting Click Data for Analytics

Ingesting clicks data for analytics

Pipeline

Page 22: Ingesting Click Data for Analytics

We employ different storage based on speed of delivery and data type.

All the data is accessible via a REST API.

This permits to develop a frontend layer with relative ease and allows customers

to take control of the data and use it in a way we may have not considered.

Ingesting clicks data for analytics

Storage and data delivery

Page 23: Ingesting Click Data for Analytics

Managed services on the cloud help us a lot!

Most of the team can focus on improvements

and shipping (users are happy, so is the CEO).

Some of us (me) still have to be the

CloudOp/DevOp.

p.s.: always prepare a Plan B for when you’ll

break things!

Ingesting clicks data for analytics

Operations

Page 24: Ingesting Click Data for Analytics

Cloud is typically more expensive of your own metal.

This extra money you have to spend is actually well spent:

• Flexibility

• Easier provisioning

• Easier management

• Easier operations

There are different types of clouds, so choose wisely.

Ingesting clicks data for analytics

Cloud co$t$

Page 25: Ingesting Click Data for Analytics

Creating and managing a “big data” ready infrastructure is no easy task,

but it can be done step-by-step also by startups.

The cloud is a cool starting ground providing you with many of the toys

you need, so you can focus on what part of “big data” gives you value!

Use the wisdom shared by the big/medium players that have already

been there (and built most of the stuff you’re using).

Ingesting clicks data for analytics

Conclusions

Page 26: Ingesting Click Data for Analytics

Thank YouAny questions?

@il_furio

[email protected]