View
245
Download
0
Embed Size (px)
Citation preview
Slide 1
nci.org.au@NCInews
Big Data is today: key issues for big data
Dr Ben EvansAssociate DirectorResearch Engagements and Initiatives
nci.org.au@NCInews
nci.org.auImpact of Collaborations around Earth Systems Science Research
Tropical Cyclones Cyclone Winston 20-21 Feb, 2016Volcanic AshManam Eruption31 July, 2015Wye Valley and Lorne Fires25-31 Dec, 2015Bush FiresSocietal impacts requiring cross-domain collaborationModelling Extreme & High Impact events BoMNWP, Climate Coupled Systems & Data Assimilation BoM, CSIRO, Research CollabsHazards - Geoscience Australia, BoM, StatesGeophysics, Potential Fields, Siesmic Geoscience Australia, UniversitiesMonitoring the Environment & Ocean ANU, BoM, CSIRO, GA, Research, Fed/StateInternational research International agencies and Collaborative ProgramsAgriculture - Flooding
St George, QLDFebruary, 2011 National Computational Infrastructure 2016
Ben Evans, Preparing for your data future, July 2016
nci.org.auEmerging Petascale Geophysics HPC codesAssess priority Geophysics areas3D/4D Geophysics: Magneto-tellurics, AEMHydrology, Groundwater, Carbon SequestrationForward and Inverse Seismic models and analysis (onshore and offshore)Natural Hazard and Risk models: Tsunami, Ash-cloud
IssuesData across domains, data resolution (points, lines, grids), data coverageProvenance capture and queryModel maturity for running at scaleEnsemble, Uncertainty analysis and Inferencing
National Computational Infrastructure 2016
Ben Evans, Preparing for your data future, July 2016
nci.org.auGrowth of Genomics data generation and need for analysis
The arrival of the $1,000 genome
National Computational Infrastructure 2016
c/- Marcel Dinger, Garvin Inst.Ben Evans, Preparing for your data future, July 2016
nci.org.auRef Dinger_IMB_Winter_School_2014.pptx4
Computational need to access big data
http://www.top500.org/statistics/perfdevel/
Current NCI
Next NCIHigh-Performance Data (HPD) (Evans, ISESS 2015, Springer)
HPC turning compute into IO-bound problemsHPD turning IO-bound into ontology + semantic problemsComputational Performance increasingNumber of CPU cores increasingData needs to scaleNeed compute to make full use of data National Computational Infrastructure 2016
Ben Evans, Preparing for your data future, July 2016
nci.org.auNCI National Platform to enable collaboration/transformationNCI Proposal to NCRIS RDSI (RDS) for a High Performance Data Node to:Enable dramatic increases in the scale and reach of Australian research by providing nationwide access to enabling data collections;Specialise in nationally significant research collections requiring high-performance computational and data-intensive capabilities for their use in effective research methods; Realise synergies with related national research infrastructure programs
As a result, Researchers will be able to:share, use and reuse significant collections of data that were previously either unavailable to them or difficult to accessaccess the data in a consistent manner which will support a general interface as well as discipline specific accessuse the consistent interface established/funded by this project for access to data collections at participating institutions and other locations as well as data held at the Nodes
National Computational Infrastructure 2016
Ben Evans, Preparing for your data future, July 2016
nci.org.au1. Climate/ESS Model Assets and Data Products2. Earth and Marine Observations and Data Products3. Geoscience Collections4. Terrestrial Ecosystems Collections5. Water Management and Hydrology Collections
NCI National Environment Research Data Collections National Computational Infrastructure 2016
Allocations and Review panelsScience Data CommitteeData Technical committeeBen Evans, Preparing for your data future, July 2016
nci.org.au
Enable global and continental scale and to scale-down to local/catchment/plot Water availability and usage over timeCatchment zoneVegetation changesData fusion with point-clouds and local or other measurementsStatistical techniques on key variables
Preparing for:Better programmatic accessMachine/Deep LearningBetter Integration through Semantic/Linked data technologies National Computational Infrastructure 2016
Ben Evans, Preparing for your data future, July 2016
nci.org.auSmall Data to calibrate, validate and understand the Big Data
Image Credit: Japan Meteorological Agency (JMA)
National Computational Infrastructure 2016
Ben Evans, Preparing for your data future, July 2016
nci.org.auDiabolical data understanding data complexity
National Computational Infrastructure 2016
Data CollectionsData SubCollectionsData Sets (and granules)Data subsetting and Dynamic data
Versioning, licensing,provenance, citation, sync,linked/semantic dataSocial issues/responsibility mgtBen Evans, Preparing for your data future, July 2016
nci.org.auData ServicesNERDIP Data Platform
Compute IntensiveVirtual LaboratoriesNERDIP simplified viewFast/Deep Data Access
Portal views
Machine Connected
National Computational Infrastructure 2016
ProgramaccessServer-side functionsBen Evans, Preparing for your data future, July 2016
nci.org.au
11
http://geonetwork.nci.org.au/ - access to metadata
National Computational Infrastructure 2016
Ben Evans, Preparing for your data future, July 2016
nci.org.au Licensing and Access for Earth and EnvironmentalAll metadata must be open and discoverable:through NCI, ANDS, FIND, data.gov.au and partner websitesWhere possible, data will be CC-BYMetadata and landing pages will document any access restrictionsNCI worked with Baden Appleyard QC of AusGOAL (Australian Governments Open Access and Licensing Framework) National Computational Infrastructure 2015
Ben Evans, Preparing for your data future, July 2016
nci.org.auStandards Ensure compliant with Standards
AIMSCSIROMARGeoscience Australia
BOMDept. of Defence AADAust. Ocean Data Centre Joint Facility (AODCJF)
Data Integration eMIIMACDDAPData Generation ARGOSOOPSOTSANFOGAUVANMNAATAMSFAIMMSSRS
NCRIS IMOSAustralian OceanData NetworkPortals and Access
Data Management ComponentsANDSNCIRDSI
Other ComponentsAAFAARNet
Data MangementAustralian ResearchData Commons
VICWAGATASNT
QLDGovt Geoscience Info. Committee (GGIC)SANSW
Data Integration AuScope GridSISSARSDC
Data GenerationVCLGeospatiallSAMEarth Imaging Earth Composition GroundwaterNCRIS AuScopeAuScope PortalGeoscience PortalResearch & DevelopmentGovernment Operational
ANZLIC SpatialInformation Council
Australian Spatial Data DirectoryVICWAOSDMTASNTQLDSANSWACTNZICSM
Data Integration Atlas of Living AustraliaAust Phenomics Network
Data Generation Aust. Plant Phenomics Facility
NCRIS Integrated Biological SystemsAtlas of Living Australia
Australian Govt Water
VICWABOMTASNTQLDSANSWACTCSIROAust Water ResourcesInformation SystemAustralian Spatial ConsortiumASIBASSIPSMA43 Pty LtdCRC for Spatial Information
NCRIS TERNe-MASTBCCVL
TERN.
Climate & WeatherNCRIS CWSLab
Australian Government
AGIMOGov 2.0CSSDPNAMFNSSAGLSMDBCNWCAust. Govt. OnlineService PointGANZ
NT
QLDNSWVICWAACTTASSACSIROBureau of Met
ISO/OGCISO/OGCISO/OGCISO/OGCISO/OGCISO/OGCISO/OGCISOISO/OGC National Computational Infrastructure 2015
nci.org.auTransform data to become transdisciplinary and born-connected
A call to action for a Transdisciplinary approach starting at the conception of data collections Researchers across the science disciplines and broaderThen achieve interoperability and relevant information will be accessible to all sectors
Data moving to Born-Connected, which is part of the semantic and linked data worldImproves quality assurance of the data if linked National Computational Infrastructure 2016
Ben Evans, Preparing for your data future, July 2016
nci.org.auGetting Serious about Profiling Data Performance Calltree analysisMain General global profiling tools:Scalasca/Score-P; TAU; OpenSpeedShopHPCToolKit; mpiP; ITACIO analysis:Compare to baselinesDarshanGlobal profiling tools focused on IO
National Computational Infrastructure 2016
Ben Evans, Preparing for your data future, July 2016
nci.org.auPerformance Access FactorsData packingVariable orderingChunking/blockingCompressionCachingSubsetting/SievingRead vs WriteParallel IOData conversion
National Computational Infrastructure 2016
Ben Evans, Preparing for your data future, July 2016
nci.org.auData Classified Based On Processing LevelsLevel*Proposed NameDescription*0Raw DataInstrumental data as received from sensor. Includes any and all artefacts.1Instrument DataInstrument data that have been converted to sensor units but are otherwise unprocessed. Data includes appended time and platform georeferencing parameters (e.g., satellite ephemeris).2Calibrated DataData that has undergone corrections or calibrations necessary to convert instrument data into geophysical value. Data includes calculated position.3Gridded DataData that has been gridded and undergone minor processing for completeness and consistency (i.e., replacing missing data).4Value-added Data ProductsAnalytical (modelled) data such as those derived from the application of algorithms to multiple measurements or sensors.5Model-derived Data ProductsData resulting from the simulation of physical processes and/or application of expert knowledge and interpretation.
*The level numbers and descriptions above follow definitions used in satellite data processing, as defined by NASA. (see ; ; ).
HPDpointsgrids National Computational Infrastructure 2016
Ben Evans, Preparing for your data future, July 2016
nci.org.auQuality Assurance, Conventions & Interoperable Standards
National Computational Infrastructure 2016
O&M ISO standards
CF and ACDD standardsBen Evans, Preparing for your data future, July 2016
nci.org.auBarriers: Like my Coordinate System?
Mercator grid in southTripolar grid in north
Standards on Nested Grids
National Computational Infrastructure 2016
Ben Evans, Preparing for your data future, July 2016
nci.org.auTransforming data on-the-fly
nci.org.auLandsat: A mosaic composed from different scenes for the selected area, using the scenes which are closer to the selected date. An RGB image is composed mapping three different bands into the RGB colours.Himawari: A video corresponding to the selected date and area, 12 frames, corresponding to period around noon where every frame is 30 minutes apart. Each frame is an RGB image which is composed mapping the closest three bands of Himawari to Landsat to have a similar image.ERA interim: A video corresponding to the selected date and 2000 square kilometers around the selected region representing "ERA-Interim Evaporation [m] forecast on surface". 8 frames, corresponding to one day (one every 3 hours). Each frame is an RGB image which is composed using a colormap to represent the different values of evaporation.
21
Examples of Virtual Labs and web tools
eReefs online analysis portal
National Computational Infrastructure 2016
Ben Evans, Preparing for your data future, July 2016
nci.org.auMatching to the database of events
Input ImageFeature Maps Convolution Layer 1 National Computational Infrastructure 2016
c/- Rahul Ramachandran, NASA / MSFCBen Evans, Preparing for your data future, July 2016
nci.org.auReasonable progressOverall Accuracy = 87.88% MODIS Rapid Response Test Images (Images to Trained scheme)
True PositiveTrue PositiveTrue PositiveFalse NegativeFalse PositiveFalse PositiveHurricaneDustSmoke
National Computational Infrastructure 2016
Ben Evans, Preparing for your data future, July 2016
nci.org.au
PROMS v3 uses an extension to the PROV ontology as its data model.Entities ActivitiesAgent
RD-Switchboard http://www.rd-switchboard.org/ National Computational Infrastructure 2016
Enabling transparency, reproducibility & informatics techniquesBen Evans, Preparing for your data future, July 2016
nci.org.auKey Messages for raising a Data Centre in a Big Data WorldScientific Computing scales of today have to be built across collaborations of national facilities around national institutions that both scale up and scale-down Data needs to be born-connected, transdisciplinary, high quality, computationally readyNeeds expertise around usability and performance tuning to ensure getting the most out of the data.No one [insert grouping] can do it alone. No one organisation, no one group, no one country has the required resources or the expertise. Collaborative efforts across disciplines and collaboration across nations
Working Collaboratively in the era of Exascale and Big Data National Computational Infrastructure 2016
Ben Evans, Preparing for your data future, July 2016
nci.org.au