Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
nci.org.au © National Computational Infrastructure 2015
IEEE Big Data in the Geosciences Workshop, Santa Clara 2015
nci.org.au @NCInews
Perspec'vesonBigDataintheGeosciencesfromamajorAustralianna'onaldatacenter
LesleyWyborn
nci.org.au © National Computational Infrastructure 2015
IEEE Big Data in the Geosciences Workshop, Santa Clara 2015
TheDataTsunamihasnotyetlanded
nci.org.au © National Computational Infrastructure 2015
IEEE Big Data in the Geosciences Workshop, Santa Clara 2015
The43PBontheResearchDataStorageInfrastructure
Source:hFps://www.rds.edu.au/
• Our~43Petabyteshassaveddataon8nodesthatforthemostpart,alreadyexisted
• Wehavenotconsiderednewini'a'vessuchastheSKA,Copernicus,Himawarrietc.
• Wearenotusingourcurrentdatatoitsfullestresolu'on
© National Computational Infrastructure 2015
IEEE Big Data in the Geosciences Workshop, Santa Clara 2015
10PBytes
nci.org.au © National Computational Infrastructure 2015
IEEE Big Data in the Geosciences Workshop, Santa Clara 2015
ISO/IEC10746RM-ODPofNCICollec'onswithEOData
No. Collec0on Enterprise Technology Informa0on Alloca0on(TB)
Ingested(TB)
Capacity
u39 MODISOceanColour
IMOS THREDDS NetCDF-CF 428 324 76.0%
AusCover TERN THREDDS NetCDF-CF
MODISL1B IMOS/TERN THREDDS NetCDF-CF
AVHRR,VIIRS CSIRO THREDDS NetCDF-CF
_4 SoilMoisture CSIRO THREDDS NetCDF-CF 5 3 54.6%
_2 eMastdataassimila'on
TERN THREDDS
NetCDF-CF 110 13 11.8%
rr9 eMast TERN THREDDS NetCDF-CF 90 28 31.0%
ua6 CMIP5 BoM THREDDS NetCDF-CF 2400 1488 62.0%
rr5 Observa'on(Himawarri)
BoM THREDDS NetCDF-CF 360 240 65.5%
rs0 Landsat GA GeoServer GeoTIFF 1478 1413 95.6%
c4 WOFS GA GeoServer GeoTIFF 22 16 72.8%
nci.org.au © National Computational Infrastructure 2015
IEEE Big Data in the Geosciences Workshop, Santa Clara 2015
Thebonsaieffect:degradingourdata
nci.org.au © National Computational Infrastructure 2015
IEEE Big Data in the Geosciences Workshop, Santa Clara 2015
Source:hFp://www.marine.csiro.au/eez_data/doc/comp_bath.gif
Marineexampleofbonsai’eddata
nci.org.au © National Computational Infrastructure 2015
IEEE Big Data in the Geosciences Workshop, Santa Clara 2015
GridsofGridsvsGridsofActualObserva'ons
AustralianGeophysicsDataCollec'on
Survey2 Survey3 Survey5
Collec'on
Theme
Survey Survey6
Gravity Mag Radio-metrics AEM Seismic MT
Survey4Survey1 Surveyn
Seis-mology
Grids
Lines
Points
GravityNa'onalGrids
Mag Radiometrics
20m
40m
30m
80m
10m
50m
20m
40m
10m
50m
nci.org.au © National Computational Infrastructure 2015
IEEE Big Data in the Geosciences Workshop, Santa Clara 2015
Version Year Grid cell size
Data file size
3 1999 400m 0.49 GB
4 2004 250m 0.94 GB
5 2010 80m 9.73 GB
6 2013 (?) <80m 3 TB
20042010
(Slide courtesy of Murray Richardson )
Resolu'onimpactsonfilesize:e.g.Magne'cs
1999
nci.org.au © National Computational Infrastructure 2015
IEEE Big Data in the Geosciences Workshop, Santa Clara 2015
TheDataAggrega'onProbleminClimateScienceResearch
3rd assessment 2001
4th assessment 2007
5th assessment 2013
6th assessment 2020
Slide Courtesy of Andy Pitman�COE Climate System Science
nci.org.au © National Computational Infrastructure 2015
IEEE Big Data in the Geosciences Workshop, Santa Clara 2015
NumericalWeatherPredic'onRoadmap
ModelTopographyofSydney,NSW
2xdaily10-day&3-dayforecast40kmGlobalModel
4xdaily3-dayforecast12kmRegionalModel
Sydney,NSW(research1.5kmtopography)
4xdaily36-hourforecast4kmCity/StateModel
TC
Increasingmodelresolu'onforimprovedlocalinforma'on
Futuremodelensemblesforlikelihoodofsignificantweather
2xdaily10-day&3-dayforecast12kmGlobalModel
8xdaily3-dayforecast5kmRegionalModel
24xdaily18hor36hforecast1.0kmCity/StateModel
20132020
C/-TimPugh,BoM
nci.org.au © National Computational Infrastructure 2015
IEEE Big Data in the Geosciences Workshop, Santa Clara 2015
MentallyMovingofftheDesktopforLargerScaleInversions
Density Model (g/cm3)
Susceptibility Model (SI)
~7.56 Million Cells
450 x 450 x 25 km
Each cell = 2 x 2 x 0.25 km
nci.org.au © National Computational Infrastructure 2015
IEEE Big Data in the Geosciences Workshop, Santa Clara 2015
LifeonBigInterdisciplinaryDataPlakorms
• WearemovingfromDataMy-ningonDesktops– “Givemethefile,thewholefile,andnothingbutthefileandletmeprocess
itlocallyonMYDESKTOPKIT”• Tonew,morecomplexDataMiningoncentraliseddataplakorms
– “PleaseenablemeinrealEmetodiscover,accessandprocessonlythosepartsofmulEplefilesand/ordatabasesthatIneedandletmedoitonlineusingYOURKITandletmedriveitfrommyiPADormySmartPhone”.
• Butwealsoneedtoleaveourdesktophabitsbehindincluding:– ConstantlyreformaMngforparEcularapplicaEons– Downsamplingthedata– ETCETC.
• THEDATATSUNAMIHASNOTYETHITTHEBEACH–WEARENOTREADY