26
The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Work ing_Group_Meetup August 4, 2014 1

The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

Embed Size (px)

Citation preview

Page 1: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

1

The Federal Big Data Initiative: Where it has been and where it is going

Dr. Brand NiemannDirector and Senior Data Scientist

Semantic Communityhttp://semanticommunity.info/

http://www.meetup.com/Federal-Big-Data-Working-Group/http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup

August 4, 2014

Page 2: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

2

Keynote and Panel: COM.BigData 2014

http://www.com-geo.org/conferences/2014/prog_keynotes.htm

Page 3: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

3

Abstract• Since the White House announced the Big Data Initiative in 2012, there

have been a series of activities for government agencies, academia, and industry to participate in to develop data scientists, perform research, and to develop applications, which this presentation will summarize.

• The work of the Federal Big Data Senior Steering Work Group, the NSF Big Data Funding Opportunities, and the Federal Big Data Working Group Meetup will be described and specific examples will be shown.

• The roles of the Presidential Digital Government Strategy and Open Data / Open Government Policy, the new Congressional Data Act, and the Open Research Data Policy will be described and specific examples of their implementation will be given.

• Attendees should be able to see where they might participate in the Federal Big Data Initiative as a result of attending this presentation.

Page 4: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

4

President Obama Discovers Big Data in 2009

• United States President Barack Obama looks through a telescope during an Astronomy Night event on the South Lawn of the White House on Wednesday, 7 October 2009.

• The President and First Lady, Michelle Obama, hosted the star party to mark the International Year of Astronomy (IYA2009), which celebrated the 400th anniversary of Galileo Galilei's first use of a telescope.

• The President addressed a group of 150 local school students, and astronauts Buzz Aldrin, Mae Jemison, John Grunsfeld, and Sally Ride also attended.

• The President's science advisor John Holdren guided the President in viewing a double star in the constellation Lyra through an 8" diffraction limited Schmidt–Cassegrain telescope

Source: http://commons.wikimedia.org/wiki/File:Barack_Obama_looks_through_a_telescope.jpg

Page 5: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

5Source: The Federal Big Data Initiative: Where it has been and where it is going

Announcement

Prelude

Semantic Community

Page 6: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

6

Semantic Community• The Semantic Community Semantic Data Science Team pioneered a government big data

application for the Federal Big Data Senior Steering Work Group called Semantic Medline on the YarcData Graph Appliance in which a massive medical publication data base (PubMed) was converted to a Semantic Web Graph Data Format (RDF) consisting of about 25 billion triples whose complex graph relationships are instantaneously visualized for discovery of diseases and treatments by medical scientists and researchers.

• For more details see:– Finding a Needle in a Digital haystack The Opinion Pages,– Gartner on YarcData Urika,– MEDLINE Solutions Brief, and– Urika Product Brief.

• This was one of the first Big Data Publication (PubMed) in a Data Browser (YarcData).• We had done hundreds before that in the Spotfire Web Player for the US EPA, Data.gov,

etc.• The Presidential Digital Government Strategy and Open Data / Open Government Policy,

the new Congressional Data Act, and the Open Research Data Policy, all essentially require Data Science Data Publications in Data Browsers.

Page 7: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

7Source: Conclusion

Three Examples in Paper and Examples for 10 Senior Government People

Page 8: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

8

You Can Participate• Federal Big Data Senior Steering Work Group:

– The Big Data Senior Steering Group (BD SSG) works to facilitate and further the goals of the White House Big Data R&D Initiative. The BD SSG strategic priorities include: Core technologies, Big data infrastructure, Workforce development, and Competitions and challenges.• Primarily government with some non-government invited presentations like Semantic Community.

• Faster Administration of Science and Technology Education and Research (FASTER):– FASTER’s goal is to enhance collaboration and accelerate agencies’ adoption of advanced IT

capabilities developed by Government-sponsored IT research. FASTER hosts Expedition and Emerging Technology workshops as well as monthly meetings with invited guest speakers to achieve this goal.• Open to public. Get on email list.

• NSF Funding for Big Data and Data Science:– Recent Program Solicitation: Critical Techniques and Technologies for Advancing Big Data Science &

Engineering (BIGDATA). • The Semantic Community Data Science Team submitted a proposal you can see.

• Federal Big Data Working Group Meetup:– Mission Statement, What We Are Doing, and How We Are Doing It.

• About 240 members now (government and non-government) with diverse employment and interests. Open to public, just become a member at the web site.

Page 9: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

9

NITRD FASTER

Web Site

FASTER is responding to the Open Government Directive by using the technologies of the Social Data Web (e.g., Linked Open Data and the Semantic Web).

Page 10: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

10

NSF Funding for Big Data and Data Science

Knowledge Base: NSF Funding for BIG DATA and Data Science

Those looking to know more about the NSF BIG DATA Initiative and related programs can search the spreadsheets and Spotfire dashboard to identify projects they might want to partner with in their completions and/or new projects they might want to propose that do not duplicate existing projects.

Page 12: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

12

NSF Grant Proposal Guide and Semantic Community Proposal

Data Publication in Data Browser: NSF Grant Proposal Guide

The NSF Grant Proposal Guide PDF was Converted to MindTouch (Wiki) and Then Used as a Template for the Actual Proposal with Semantically Linked Data!

Page 13: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

13

NSF FastLane Submittal Sheet

https://www.fastlane.nsf.gov/jsp/homepage/proposals.jsp

Page 14: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

14

Federal Big Data Working Group Meetup

http://www.meetup.com/Federal-Big-Data-Working-Group/events/186838842/

Page 15: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

15

Agenda:MIT Big Data Initiative: Sam Madden, & Current Elephants: Michael Stonebraker

• MIT Big Data with Sam Madden and Tamr with Michael Stonebraker– Background: See Workshops on Extremely Large Databases– 6:30 pm Welcome and Introduction– 6:35 pm MIT Big Data Initiative: bigdata@CAIL and the new

Intel Science and Technology Center for Big Data, Sam Madden– 7:10 pm Brief Member Introductions– 7:15 pm Alan Wagner, Tamr Demo– 7:3045 pm Why the current "elephants" are good at nothing, Data Tamer,

and data integration issues, Michael Stonebraker– 8:30 p.m. Open Discussion– 8:45 p.m. Networking– 9:00 p.m. Depart

• July 7 and August 4: Once a month– Silver Line Spring Hill Metro Station Opens in July?

Page 16: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

16

Mission Statement• Federal: Supports the Federal Big Data Initiative, but not

endorsed by the Federal Government or its Agencies;• Big Data: Supports the Federal Digital Government Strategy which

is "treating all content as data", so big data = all your content;• Working Group: Data Science Teams composed of Federal

Government and Non-Federal Government experts producing big data products (How was the data collected, Where is it stored, and What are the results?); and

• Meetup: The world's largest network of local groups to revitalize local community and help people around the world self-organize like MOOCs (Massive Open On-line Classes) being considered by the White House.

Co-organizers: Brand Niemann and Kate Goodier

Page 17: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

17

What Are We Doing?• Leadership of the Semantic Data Science Team that produced Semantic Medline

running on the Yarc Data Graph Appliance.• Founding and co-organizing of the Federal Big Data Working Group Meetup.• A graduate class prepared for GMU entitled “Practical Data Science for Data

Scientists”.• Using the Cross Industry Standard Process for Data Mining (CRISP-DM; Shearer,

2000) to build a Data Science Knowledge Base• Mining of the Data Science and Digital Earth scientific journals the CODATA

International Workshop on Big Data for International Scientific Programmes, (June 8-9, in Beijing).

• Participation in the Data FAIRport (Findable, Accessible, Interoperable, and Reusable) with “Data Publication in Data Browsers”.

• Providing data stories that persuade and presentation materials for public education conferences like the COM.BigData Conference (August 4-6, in Washington, DC).

Page 18: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

18

How Are We Doing It?• Federating Uses Cases: Data Science (Brand Niemann); Environmental

and Earth Science (Joan Aron); and Astronomy (Kirk Borne)• Federating Data Publications: Structured Scientific Content (Papers,

journals, books, reports, etc.); Data FAIRports (Findable, Accessible, Interoperable); and Reusable Data Stories That Persuade (Claims and Evidence)

• Federating Solutions & Technologies: Hand-Crafted by Individuals and Teams (Mary Galvin, STEM); Data Mining Standards and Products (Brand Niemann, Data Publications in Data Browsers); Machine Processing (Fredrik Salvesen, Semantic Data Publications on Yarc Data Graph Appliance); Reading and Reasoning (Kate Goodier and Chuck Rehberg (Semantic Insights on Elsevier Content Text Mining); and Data Curation at Scale (Michael Stonebraker, Tamr on 1000s of Spreadsheets)

Page 19: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

19

Data FAIRPort

http://datafairport.org/http://semanticommunity.info/Data_Science/Euretos_BRAIN

Final Report, Interview, andJoint Hackathons Started

Page 20: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

20

Fourth Paradigm and Fourth Question

• The Fourth Paradigm of Science (1):– First Paradigm. Observation, descriptions of natural phenomena, and

experimentation.– Second Paradigm. Theoretical science such as Newton’s laws of motion

and Maxwell’s equations.– Third Paradigm. Simulation and modelling, such as in astronomy.– Fourth Paradigm. Data-intensive science that exploits the large volumes of

data in new ways for scientific exploration, such as the International Virtual Observatory Alliance in astronomy.

• The Fourth Question of Big Data for Science (2):– How was the data collected?– Where is the data stored?– What are the data results?– Does the data story persuade?(1) Bell G, Hey, T., & Szalay, A. (2009) Beyond the data deluge, Science 323, 6 March 2009, pp. 1297-1298.

(2) de Waard, Anita, (2014) About Stories, that Persuade With Data, Federal Big Data Working Group Meetup, 20 May,, 41 slides.

Page 21: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

21

Activities• Mentoring:

– White House Energy Datapalooza, May 28 (In process with Alexandra Winkler, Knowledge Cities Graduate Student)

• Health Datapalooza V, June 1-3, and HHS Fellowship:– Story and Application for HHS 12-month External Entrepreneur Fellowship for

Innovative Design, Development and Linkages of Databases• Big Data for Government, June 16-17:

– Keynote from Dr. George Strawn and Presentation by Dr. Tom Rindflesch and Semantic Medline/YarcData Team

• Earth Cube All-Hands Meeting, June 24-26:– ESIP Earth Science Analytics (In process with Joan Aron, Global

Environmental/Climate Change Scientist)• Keynote and Panel: COM.BigData 2014, August 4-6:

– You can participate and attend

Page 22: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

22

June 30th Meetup:Continue Data Science Tutorial

• Practical Data Science for Data Scientists:– Reading Assignments:

• Chapter 15: The Students Speak– We invited the students who took Introduction to Data Science version 1.0 to contribute a

chapter to the book. They chose to use their chapter to reflect on the course and describe how they experienced it.

• Chapters 16: Next-Generation Data Scientists, Hubris, and Ethics– The best minds of my generation are thinking about how to make people click ads… That sucks.

— Jeff Hammerbacher– We’d like to encourage the next-gen data scientists to become problem solvers and question

askers, to think deeply about appropriate design and process, and to use data responsibly and make the world better, not worse.

– Resources: AmericasDataFest Competition– Team Homework Exercise:

• Study about Graph Databases, Graph Computing, and Semantic Medline• Review Wiki and View Videos: YarcData Videos (Schizo-7 minutes, Cancer-21 minutes).• Ask Me Questions and Prepare to Ask Questions Next Week

Page 23: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

23

Practical Data Science for Data Scientists

http://semanticommunity.info/Data_Science/Practical_Data_Science_for_Data_Scientists

Class 8

Providing On-Line ClassWith Private Tutoring

Page 24: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

24

Follow Ben Shneiderman's 8 Golden Rules of Data Science

• Preparation– Choose actionable problems & appropriate theories– Consult domain experts & generalists

• Exploration– Examine data in isolation & contexually– Keep cleaning & add related data– Apply visualizations& statistical patterns, clusters, gaps, outliers,

missing & uncertain data• Decision

– Evaluate your efficacy, refine your theory– Take responsibility, own your failures– World is complex, proceed with humilitySource: "8 Golden Rules of Data Science“

http://semanticommunity.info/Data_Science/Ben_Shneiderman

Page 25: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

25

Dr. George Strawn, Director, National Coordination Office/NITRD (USA)

• Semantic Community was following the Data FAIRPort principle and Force11 Data Citation Preamble even before they existed and using Data Stories to "persuade" readers with the facts because of the influence of Dr. George Strawn, Director, National Coordination Office / NITRD (USA) who said recently at the Data FAIRPort (Findable, Accessible, Interoperable, and Reusable) Conference (see Interview Innovation International):

Page 26: The Federal Big Data Initiative: Where it has been and where it is going Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

26

Data FAIRPort Conference Interview Innovation International

• I am an observer from the US federal government and especially interested in this conference given the recent requirement to provide open access to scientific results funded by the US federal government covering both scientific articles as well as the supporting data.

• At the highest level, I am hoping that we will develop the technology and the social willingness to work on interoperability of heterogeneous datasets so that we can combine them in novel ways. If we can truly structure scientific data, we will be able to conduct new science.

• I would just add that that not only are these technologies ultimately applicable to all science and other scholarly domains, their ultimate value will hopefully be to promote interdisciplinary research. If we can use electronic technology to help us articulate between and among these scientific fields, I think we will create entire new tiers of knowledge.