If you can't read please download the document
Upload
phungthuy
View
217
Download
1
Embed Size (px)
Citation preview
Big Data
Remaking Science and Government
So whattaya mean by big?
1. Theres a lot of it 2. Its unstructured
Whats changed?
The Cloud Hadoop
For a number of years weve worked really hard at transforming the information we were collecting into something that computers could understandThe next revolution thats starting to come is instead of spending a lot of energy turning data into something computers can understand, we can train computers to understand the data and information we humans understand.
Sky Bristol, chief of the USGSs Science Information Services
Data-driven science
Theres just a deluge of data. Rather than starting by developing your own hypothesis, now you can do the data analysis first and develop your hypotheses when youre deeper in.
Suzi Iacono, NSF Deputy Assistant Director CISE
Democratization
In the old days if you wanted to know what was going on in the Indian Ocean, you had to get a boat and get a crew It was easier for men to do that. But big data democratizes things. Now weve got sensors on the whole floor of the Indian Ocean and you can look at that data every morning, afternoon and night. You could look at data from the last five years. Thats a whole new ballgame for science and engineering.
Suzi Iacono, NSF CISE
What this means to me
What does it mean for government?
Fraud and abuse
March 2012: The government invested $200 million in big data research
NSF: Geo Deep Dive
Watson for the geosciences A computer system that crawls the scanned
science journals, videos, spreadsheets and other data to create a query-able database of all geological knowledge
What does it mean?
Dark data would be uncovered Geologists could more easily rely on existing
data
Geologists could spend less time gathering data
Geologists could ask bigger questions
Some problems were kind of off limits. You couldnt really think about reasonably addressing them in a meaningful way in one lifetime. These new tools have that promise: to change the types of questions were able to ask and the nature of answers we get.
Prof. Shanan Peters, UW GeoSciences
NIH: 1,000 Genomes in the Cloud
About 1,400 individual genomes placed inside Amazons EC2 Cloud
Anyone can look at the data for free. Anyone can do genomic research inside the
cloud using a cloud fee structure
What does it mean?
The barriers to entry for genomic research are much lower
Genomic research is opened to small universities, non profits and graduate students
Better treatments for diseases with a genetic component such as diabetes and breast cancer
If you rewind seven years, the questions that scientists could ask were constrained by the resources available to themNow we dont have to worry about arbitrary constraints so research is significantly accelerated. They dont have to live with the repercussions of making incorrect assumptions or of running an experiment that didnt play out.
Matt Wood, AWS Principal Data Scientist
Defense: Clear Heart
A system that scans hours of video to identify human motions that suggest mal intent
What does it mean?
Much smarter data coming back from drones and satellites
Human analysts concentrated where theyre most needed
Smarter domestic security systems
In a situation like Newtown, if youd had a video camera connected with this system it could have given an early warning that someone was roaming the halls with a gun. That could have hooked into an alarm system. It would have been an early warning.
Richard McNeight, President Modus Operandi
Challenges
Just because theres a lot of data doesnt mean its the right data
Quality matters A lot of data is proprietary