12
nci.org.au © National Computational Infrastructure 2015 IEEE Big Data in the Geosciences Workshop, Santa Clara 2015 nci.org.au @NCInews Perspec’ves on Big Data in the Geosciences from a major Australian na’onal data center Lesley Wyborn

Perspec’ves on Big Data in the Geosciences from a major ... · © National Computational nci.org.au Infrastructure 2015 IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Perspec’ves on Big Data in the Geosciences from a major ... · © National Computational nci.org.au Infrastructure 2015 IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

nci.org.au © National Computational Infrastructure 2015

IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

nci.org.au @NCInews

Perspec'vesonBigDataintheGeosciencesfromamajorAustralianna'onaldatacenter

LesleyWyborn

Page 2: Perspec’ves on Big Data in the Geosciences from a major ... · © National Computational nci.org.au Infrastructure 2015 IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

nci.org.au © National Computational Infrastructure 2015

IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

TheDataTsunamihasnotyetlanded

Page 3: Perspec’ves on Big Data in the Geosciences from a major ... · © National Computational nci.org.au Infrastructure 2015 IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

nci.org.au © National Computational Infrastructure 2015

IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

The43PBontheResearchDataStorageInfrastructure

Source:hFps://www.rds.edu.au/

•  Our~43Petabyteshassaveddataon8nodesthatforthemostpart,alreadyexisted

•  Wehavenotconsiderednewini'a'vessuchastheSKA,Copernicus,Himawarrietc.

•  Wearenotusingourcurrentdatatoitsfullestresolu'on

© National Computational Infrastructure 2015

IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

10PBytes

Page 4: Perspec’ves on Big Data in the Geosciences from a major ... · © National Computational nci.org.au Infrastructure 2015 IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

nci.org.au © National Computational Infrastructure 2015

IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

ISO/IEC10746RM-ODPofNCICollec'onswithEOData

No. Collec0on Enterprise Technology Informa0on Alloca0on(TB)

Ingested(TB)

Capacity

u39 MODISOceanColour

IMOS THREDDS NetCDF-CF 428 324 76.0%

AusCover TERN THREDDS NetCDF-CF

MODISL1B IMOS/TERN THREDDS NetCDF-CF

AVHRR,VIIRS CSIRO THREDDS NetCDF-CF

_4 SoilMoisture CSIRO THREDDS NetCDF-CF 5 3 54.6%

_2 eMastdataassimila'on

TERN THREDDS

NetCDF-CF 110 13 11.8%

rr9 eMast TERN THREDDS NetCDF-CF 90 28 31.0%

ua6 CMIP5 BoM THREDDS NetCDF-CF 2400 1488 62.0%

rr5 Observa'on(Himawarri)

BoM THREDDS NetCDF-CF 360 240 65.5%

rs0 Landsat GA GeoServer GeoTIFF 1478 1413 95.6%

c4 WOFS GA GeoServer GeoTIFF 22 16 72.8%

Page 5: Perspec’ves on Big Data in the Geosciences from a major ... · © National Computational nci.org.au Infrastructure 2015 IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

nci.org.au © National Computational Infrastructure 2015

IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

Thebonsaieffect:degradingourdata

Page 6: Perspec’ves on Big Data in the Geosciences from a major ... · © National Computational nci.org.au Infrastructure 2015 IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

nci.org.au © National Computational Infrastructure 2015

IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

Source:hFp://www.marine.csiro.au/eez_data/doc/comp_bath.gif

Marineexampleofbonsai’eddata

Page 7: Perspec’ves on Big Data in the Geosciences from a major ... · © National Computational nci.org.au Infrastructure 2015 IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

nci.org.au © National Computational Infrastructure 2015

IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

GridsofGridsvsGridsofActualObserva'ons

AustralianGeophysicsDataCollec'on

Survey2 Survey3 Survey5

Collec'on

Theme

Survey Survey6

Gravity Mag Radio-metrics AEM Seismic MT

Survey4Survey1 Surveyn

Seis-mology

Grids

Lines

Points

GravityNa'onalGrids

Mag Radiometrics

20m

40m

30m

80m

10m

50m

20m

40m

10m

50m

Page 8: Perspec’ves on Big Data in the Geosciences from a major ... · © National Computational nci.org.au Infrastructure 2015 IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

nci.org.au © National Computational Infrastructure 2015

IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

Version Year Grid cell size

Data file size

3 1999 400m 0.49 GB

4 2004 250m 0.94 GB

5 2010 80m 9.73 GB

6 2013 (?) <80m 3 TB

20042010

(Slide courtesy of Murray Richardson )

Resolu'onimpactsonfilesize:e.g.Magne'cs

1999

Page 9: Perspec’ves on Big Data in the Geosciences from a major ... · © National Computational nci.org.au Infrastructure 2015 IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

nci.org.au © National Computational Infrastructure 2015

IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

TheDataAggrega'onProbleminClimateScienceResearch

3rd assessment 2001

4th assessment 2007

5th assessment 2013

6th assessment 2020

Slide Courtesy of Andy Pitman�COE Climate System Science

Page 10: Perspec’ves on Big Data in the Geosciences from a major ... · © National Computational nci.org.au Infrastructure 2015 IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

nci.org.au © National Computational Infrastructure 2015

IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

NumericalWeatherPredic'onRoadmap

ModelTopographyofSydney,NSW

2xdaily10-day&3-dayforecast40kmGlobalModel

4xdaily3-dayforecast12kmRegionalModel

Sydney,NSW(research1.5kmtopography)

4xdaily36-hourforecast4kmCity/StateModel

TC

Increasingmodelresolu'onforimprovedlocalinforma'on

Futuremodelensemblesforlikelihoodofsignificantweather

2xdaily10-day&3-dayforecast12kmGlobalModel

8xdaily3-dayforecast5kmRegionalModel

24xdaily18hor36hforecast1.0kmCity/StateModel

20132020

C/-TimPugh,BoM

Page 11: Perspec’ves on Big Data in the Geosciences from a major ... · © National Computational nci.org.au Infrastructure 2015 IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

nci.org.au © National Computational Infrastructure 2015

IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

MentallyMovingofftheDesktopforLargerScaleInversions

Density Model (g/cm3)

Susceptibility Model (SI)

~7.56 Million Cells

450 x 450 x 25 km

Each cell = 2 x 2 x 0.25 km

Page 12: Perspec’ves on Big Data in the Geosciences from a major ... · © National Computational nci.org.au Infrastructure 2015 IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

nci.org.au © National Computational Infrastructure 2015

IEEE Big Data in the Geosciences Workshop, Santa Clara 2015

LifeonBigInterdisciplinaryDataPlakorms

•  WearemovingfromDataMy-ningonDesktops–  “Givemethefile,thewholefile,andnothingbutthefileandletmeprocess

itlocallyonMYDESKTOPKIT”•  Tonew,morecomplexDataMiningoncentraliseddataplakorms

–  “PleaseenablemeinrealEmetodiscover,accessandprocessonlythosepartsofmulEplefilesand/ordatabasesthatIneedandletmedoitonlineusingYOURKITandletmedriveitfrommyiPADormySmartPhone”.

•  Butwealsoneedtoleaveourdesktophabitsbehindincluding:–  ConstantlyreformaMngforparEcularapplicaEons–  Downsamplingthedata–  ETCETC.

•  THEDATATSUNAMIHASNOTYETHITTHEBEACH–WEARENOTREADY