13
Big Data Infrastructure for Scientific Computing Mathijs Kattenberg – [email protected]

Big Data Infrastructure for Scientific Computing

  • Upload
    vinnie

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

Big Data Infrastructure for Scientific Computing. Mathijs Kattenberg – [email protected]. Big Data Landscape. Large Hadron Collider: Uses: Grid Volume: ~15 PB per year (~4PB @ SURFsara) Type of data : structured. Big Data Landscape. Next Generation Sequencing ( GoNL ): - PowerPoint PPT Presentation

Citation preview

Page 1: Big Data Infrastructure for Scientific Computing

Big Data Infrastructure for Scientific Computing

Mathijs Kattenberg – [email protected]

Page 2: Big Data Infrastructure for Scientific Computing

Big Data Landscape

Large Hadron Collider:- Uses: Grid- Volume: ~15 PB per year (~4PB @ SURFsara)- Type of data: structured

Page 3: Big Data Infrastructure for Scientific Computing

Next Generation Sequencing (GoNL):- Uses: Grid, Cloud, Cluster- Volume: ~100 GB to 300 TB- Type of data: various formats and noise

Big Data Landscape

Page 4: Big Data Infrastructure for Scientific Computing

Big Data Landscape

Information retrieval and NLP- Uses: Hadoop, Cloud- Volume: ~70 TB- Type of data: Text, unstructured

http://bit.ly/173ddfz

Page 5: Big Data Infrastructure for Scientific Computing

Where having and exploiting data leads to insights:

- Brainscanr- Healthmap

Effectiveness of Data

Page 6: Big Data Infrastructure for Scientific Computing

• Lots of open data:- Open data Nederland- CitySDK- Community of Amsterdam- Rijkswaterstaat- Twitter- Facebook- Google

• Different formats:- Excel files- JSON- Webservices

• Different quality:- Noise- Missing values- Availability

(Open) Data Sources

Page 7: Big Data Infrastructure for Scientific Computing

Capacity:

• CPU cores

• Hard drive space

• Network bandwidth

Solutions:

• Scale up: get faster tools

• Scale out: work with more tools

Complexity:

• Data:- Noise, missing data- Formats- Access

• Distributed computing- Failures- Parallel programming

Solutions:

• Data: deal with it

• Distributed computing:- Super/Cluster computer- Grid- Hadoop

Computing Big Data

Page 8: Big Data Infrastructure for Scientific Computing

Computing Big Data

Page 9: Big Data Infrastructure for Scientific Computing

Computing Big Data

Page 10: Big Data Infrastructure for Scientific Computing

SURFsara provides:

1. Infrastructure: Supercomputer, clusters, grid, cloud, hadoop

2. Support: development, parallelization, consultancy

3. R&D: piloting new technologies

4. Hosting datasets for common use

What SURFsara Offers

Page 11: Big Data Infrastructure for Scientific Computing

www.surfsara.nl

Mathijs [email protected]

Page 12: Big Data Infrastructure for Scientific Computing

www.sendsteps.comPrepare to react; keep your phone ready!

TXT 1

2

Text to +316 4250 0030

Type Session <space> WS4 <space> your answer

Internet 1

2

Go to sendc.com

Log in with Session

Posting messages is anonymousNo additional charge per message

3 Type WS4 <space> your answer

Page 13: Big Data Infrastructure for Scientific Computing

What kind of technologies would you consider using in order to deal with technical Big Data challenges?

Internet Go to sendc.com and log in with Session Type WS4 <space> Your answer

TXT Send to 06 4250 0030: Session Type WS4 <space> Your answer