Big Data Infrastructure for Scientific Computing

Preview:

DESCRIPTION

Big Data Infrastructure for Scientific Computing. Mathijs Kattenberg – mathijs.kattenberg@surfsara.nl. Big Data Landscape. Large Hadron Collider: Uses: Grid Volume: ~15 PB per year (~4PB @ SURFsara) Type of data : structured. Big Data Landscape. Next Generation Sequencing ( GoNL ): - PowerPoint PPT Presentation

Citation preview

Big Data Infrastructure for Scientific Computing

Mathijs Kattenberg – mathijs.kattenberg@surfsara.nl

Big Data Landscape

Large Hadron Collider:- Uses: Grid- Volume: ~15 PB per year (~4PB @ SURFsara)- Type of data: structured

Next Generation Sequencing (GoNL):- Uses: Grid, Cloud, Cluster- Volume: ~100 GB to 300 TB- Type of data: various formats and noise

Big Data Landscape

Big Data Landscape

Information retrieval and NLP- Uses: Hadoop, Cloud- Volume: ~70 TB- Type of data: Text, unstructured

http://bit.ly/173ddfz

Where having and exploiting data leads to insights:

- Brainscanr- Healthmap

Effectiveness of Data

• Lots of open data:- Open data Nederland- CitySDK- Community of Amsterdam- Rijkswaterstaat- Twitter- Facebook- Google

• Different formats:- Excel files- JSON- Webservices

• Different quality:- Noise- Missing values- Availability

(Open) Data Sources

Capacity:

• CPU cores

• Hard drive space

• Network bandwidth

Solutions:

• Scale up: get faster tools

• Scale out: work with more tools

Complexity:

• Data:- Noise, missing data- Formats- Access

• Distributed computing- Failures- Parallel programming

Solutions:

• Data: deal with it

• Distributed computing:- Super/Cluster computer- Grid- Hadoop

Computing Big Data

Computing Big Data

Computing Big Data

SURFsara provides:

1. Infrastructure: Supercomputer, clusters, grid, cloud, hadoop

2. Support: development, parallelization, consultancy

3. R&D: piloting new technologies

4. Hosting datasets for common use

What SURFsara Offers

www.surfsara.nl

Mathijs Kattenbergmathijs.kattenberg@surfsara.nl

www.sendsteps.comPrepare to react; keep your phone ready!

TXT 1

2

Text to +316 4250 0030

Type Session <space> WS4 <space> your answer

Internet 1

2

Go to sendc.com

Log in with Session

Posting messages is anonymousNo additional charge per message

3 Type WS4 <space> your answer

What kind of technologies would you consider using in order to deal with technical Big Data challenges?

Internet Go to sendc.com and log in with Session Type WS4 <space> Your answer

TXT Send to 06 4250 0030: Session Type WS4 <space> Your answer

Recommended