67
Data Engineering Tools & Best Practices Sriram Baskaran Insight

Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Data Engineering Tools & Best PracticesSriram BaskaranInsight

Page 2: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Bachelors in CSGrad 2013

Machine Learning Engineer

2013-2016

Insight2018

Masters in CS (Data Science)

Grad 2018

Sriram Baskaran

Program DirectorData Engineer

linkedin.com/[email protected]

apply.insightdatascience.com

Page 3: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Some context

Page 4: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

AppBackend

id rest_name loc

1 Everest Momo Sunnyvale

2 Cafe Centro San Francisco

... ... ...

id user_name user_base_loc

101 James San Jose

102 Mark San Francisco

... ... ...

Restaurants Customers

Let’s take an example

Page 5: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Why Relational?

● Rows of my tables are accessed together.○ Single row-All column○ All relational databases follow this pattern: Postgres, MySQL, Oracle○ Huge amount of planning is required to design good schemas!

■ No flexibility for schema changes

id rest_name loc

1 Everest Momo Sunnyvale

2 Cafe Centro San Francisco

... ... ...

id user_name user_base_loc

101 James San Jose

102 Mark San Francisco

... ... ...

Restaurants Customersid cust_id rest_id rating

1001 101 1 3

1002 102 1 5

... ... ...

Reviews

Page 6: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Backend Databases

● Mostly Relational: Postgres, MySQL are popular.● Based on Relational Algebra and Codd’s model! It’s important to know this! ● Things to know: SQL, ER modeling.

○ Crow’s foot notation

● Most of your data for Data pipelines start here○ It is important to understand backend databases.

● Binary format like Images are stored separately○ Caching and Content Delivery Networks

Page 7: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Data Engineering starts here

Page 8: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Data engineering

● Extensions and Analytics on Backend databases.● Building pipelines to move data from A to B. ● Ingest and store data in efficient storage systems. ● Ability to handle large scale data processing.● Automating a large part of ETL work

Page 9: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Agenda

Storing / Ingesting

Data

Processing Data

Visualizing Data

Scheduling and Monitoring!

Page 10: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Agenda - focus

Storing / Ingesting

Data

Processing Data

Visualizing Data

Scheduling and Monitoring!

Page 11: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Storing Data

● Database and storage systems are the most underrated tools.● Processing hinges on good storage of data● It removes the additional transformations in processing stage.

id rest_name loc

1 Everest Momo Sunnyvale

2 Cafe Centro San Francisco

... ... ...

id user_name user_base_loc

101 James San Jose

102 Mark San Francisco

... ... ...

Restaurants Customersid cust_id rest_id rating

1001 101 1 3

1002 102 1 5

... ... ...

Reviews

Page 12: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Storing Data

● Database and storage systems are the most underrated tools.● Processing hinges on good storage of data● It removes the additional transformations in processing stage.

NormalizedRestaurantsCustomersRatings

Joins happen every time.

Page 13: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Storing Data

● Database and storage systems are the most underrated tools.● Processing hinges on good storage of data● It removes the additional transformations in processing stage.

DenormalizedAll Data

Star Schema(But prod is not optimized,Let’s fix that in sometime)

Joins don’t happen here

Page 14: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Storing Data

● Database and storage systems are the most underrated tools.● Processing hinges on good storage of data● It removes the additional transformations in processing stage.

DenormalizedAll Data

Load on the production database.

Joins don’t happen here

Page 15: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Build a warehouse that is independent of your prod database

Some way to sync

Analytical DatabaseTransactional

Database

Page 16: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

What are our options?

● You will come across○ Postgres○ MySQL○ Oracle○ Druid○ Redshift○ Elastic Search○ Cassandra○ Memcached○ Redis○ Dynamo○ Couchbase○ Flat-files (S3)

Pick a database after knowing the access patterns

Page 17: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Analytical in Relational

● OLAP is pretty powerful.○ Use of ROLLUP and CUBE operations○ Star Schema and Snowflake schema are pretty nice.○ Examples: Postgres, Oracle, SQL Server, MySQL

● Good but it will not scale well. Mainly due to the way the data is stored.● Schema is rigid so changes are very hard.

Page 18: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Groupings and Aggregations

● Columnar○ Druid○ Redshift

id rest_name loc

1 Everest Momo Sunnyvale

2 Cafe Centro San Francisco

... ... ...

id user_name user_base_loc

101 James San Jose

102 Mark San Francisco

... ... ...

Restaurants Customersid cust_id rest_id rating

1001 101 1 3

1002 102 1 5

... ... ...

Reviews

Page 19: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Search through unstructured text

● Like % in SQL is not efficient. ○ SELECT * FROM reviews WHERE review_text LIKE ‘%great%’○ SELECT * FROM reviews WHERE review_text LIKE ‘Loved%’

● Indexing through unstructured text should be really good○ Elastic Search○ Solr

● Eg, searching the text in the review● Each tool has a new data structure called “Postings-list”, which makes it

faster.

Page 20: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Caching

● Temporary in-memory storage○ Redis○ Memcache

● Optimized for quick and fast storage/retrieval. Key-value store (not a document store)

● Use reasonable keys so hashing algorithm is not a bottleneck

Page 21: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

How to pick one?

● Make educated & reasonable assumptions○ Type of Data○ Access Patterns○ Scaling factor (Most databases are designed to scale in their “domain”)

● Read a lot, never stop reading it. ● Use it in a project

○ There are hundreds of open large datasets available. ○ Start with GDELT (https://www.gdeltproject.org/data.html)

Page 22: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Complexities of communication

● More tools, difficult it is to communicate between them● Keeping databases in sync is one of the main challenges in the industry.● Kafka may be a solution

○ Act as a message bus○ Use Kafka Connect to bridge

Page 23: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Remember our Denormalized issue?

DenormalizedAll Data

Star Schema(But prod is not optimized,Let’s fix that in sometime)

Joins don’t happen here

Page 24: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Remember our Denormalized issue?

AppBackend

Page 25: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Agenda - for completion

Storing / Ingesting

Data

Processing Data

Visualizing Data

Scheduling and Monitoring!

Page 26: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

We are talking about scale!

● Tackling two problems: Time and Space○ Data size is greater than size of your “main-memory”○ Data cannot fit entirely.○ It takes too long to compute

● Distributed computing is a popular solution○ Hadoop, Spark, Presto, Hive○ Kafka is gaining popularity in processing too

● Example: Scrape menu items for each restaurant○ Go to each restaurant’s website○ Scrape it○ Parse it the website○ Find the menu content and process it.

Page 27: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Yelp - update menu items

Yelp’s Database

1.Get URL

2.Get actual content from internet

3.Process text and store results

Postgres

Page 28: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Yelp - update menu items - 1 million urls!

1.custom way to get urls

2.Each script access separately

3.Each script Process text and store results

Yelp’s Database

Page 29: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Yelp - update menu items - 1 million urls!

Yelp’s Database

Page 30: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Yelp - update menu items - 1 million urls!

Yelp’s Database

Page 31: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Yelp - update menu items - 1 million urls!

Yelp’s Database

or

Page 32: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

ML Training at Scale

● Use distributed computing to scale your training. ● Compute weights in a fast and efficient manner.

○ Sparkling water wrapper: https://github.com/h2oai/sparkling-water ○ H20

Page 33: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

What about Speed/Velocity?

● Data can be unbounded stream of information● Example: Processing reviews for each restaurant, Do a POS tagging.

….r50, r52, r53, …..

id cust_id rest_id rating

1001 101 1 3

1002 102 1 5

... ... ...

Reviews

Batch Processing

POS Tagging Model

Page 34: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

What about Speed/Velocity?

● Data can be unbounded stream of information● Need a robust system● Example: Processing reviews

….r50, r52, r53, …..

Spark Streaming (Micro-batches)

id cust_id rest_id rating

1001 101 1 3

1002 102 1 5

... ... ...

Reviews

POS Tagging Model

Page 35: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Agenda - for completion

Storing / Ingesting

Data

Processing Data

Visualizing Data

Scheduling and Monitoring!

Page 36: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Visualize the output data

● It’s like building a software application○ Consider end-users○ What is most intuitive way to see this information?

● Professor would have give even better examples● Do not reinvent the wheel

○ Tableau (education edition)○ Kibana (Self-setup)○ Mode (Paid)○ Looker (Paid)○ Plotly (open source, free)○ Dash (abstraction around plotly, free)○ Matlab (not so much used in industry)

If you are not able to show it in a good way, there was no need to process it!

Page 37: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Agenda - for completion

Storing / Ingesting

Data

Processing Data

Visualizing Data

Scheduling and Monitoring!

Page 38: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Putting together a pipeline

Transactional

AppBackend

Page 39: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Putting together a pipeline

Transactional

AppBackend

Page 40: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Putting together a pipeline

Transactional

AppBackend

Page 41: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Putting together a pipeline

Transactional

AppBackend

POS Tagging Model

Page 42: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Putting together a pipeline

Transactional

AppBackend

Event Store

POS Tagging Model

Page 43: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Putting together a pipeline

Transactional

AppBackend

Event Store

Spark Streaming (Micro-batches)POS Tagging

Model

Page 44: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Putting together a pipeline

Transactional

AppBackend

Event Store

Spark Streaming (Micro-batches)POS Tagging

Model

Page 45: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Putting together a pipeline

Transactional

AppBackend

Event Store

Spark Streaming (Micro-batches)POS Tagging

Model

Page 46: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

How to automate the tasks?

Page 47: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Scheduling & Monitoring

● Scheduling tasks in a sequence● Easy to specify dependency● Code based configuration● Easy to deploy and manage● Every Batch pipeline needs a scheduler to automate tasks.● Handling failure● Also allows backfill.

Page 48: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Backfill

…………...

??

Events in time

Page 49: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Backfill

…………... Events in time

Backfill

Page 50: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data
Page 51: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Think ahead, Think smart

● Get all data in to one place (know about data warehousing)● Understand the why behind any tool choices● Expect future requests from stakeholders● Learn by collaborating, know all different ways a data can be stored,

processed and visualized.● Constantly learn, know the latest updates in a too

○ Start with basics of why the tool was built

● Learn these five: Kafka, Spark, Cassandra, Postgres (PostGIS), Redshift● Managed: Lambdas, Redshift, Dynamo, S3

Page 52: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Start using cloud resources

● Students get $300 in credits both in AWS and GCP. Start using them.● Spin up compute resources● Try out labs for managed services. ● AWS for Students

○ AWS Lambdas○ AWS Redshift○ AWS Dynamo

Page 54: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Insight

Page 55: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data
Page 56: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data
Page 57: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data
Page 58: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Insight Offerings - Which one to pick

Data Science Program

● PhD in quantitative fields.

● Have worked in analysing data.

● Good problem solving skills

Data Engineering Program

● Engineering background.

● Worked on and maintained building engineering systems.

● Java/Python

Health Data Science Program

● Postdoctoral researcher, medical doctors

● Interested in genome sequences,clinical trials.

Artificial Intelligence Program

● Engineering background.

● Have worked on training and deploying ML or NN.

DevOps Engineering Program

● Systems admin and Linux background.

● Problem solver critical thinker.

● Can understand containerized sys.

Page 59: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

New Programs - More focused domains

● Designing security measures

● Building secure applications.

● Blockchain technology

● Smart contract management

● Decentralized architectures

Page 60: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data
Page 61: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data
Page 62: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data
Page 63: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data
Page 64: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data
Page 65: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Where are we?

65

Seattle

Portland

San Francisco

Los Angeles

Austin

Chicago

New

York

Boston

Toronto

In Person

Remote

Page 66: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Apply to Insight● 3 sessions a year● Apply when you are ready

for full-time ● Prepare a role-driven

resume● Read our blog posts● Contact alumni● Application process:

○ Resume + Application Form○ Interview

Note: Data Engineering program has a Coding challenge before the interview.

Page 67: Best Practices Data Engineering Tools & Insight Sriram ...bytes.usc.edu/cs585/f19_AGI1ml04Us/lectures/Guest/DEToolsEtc.pdf · Learn by collaborating, know all different ways a data

Applications open for June 2020 Session!

Apply.insightdatascience.comSign up for Notifications list