30
Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Embed Size (px)

DESCRIPTION

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/ of 25 Background It all started innocently enough, I was at this conference 2 years ago listening to a W3C talk about FOAF at a Semantic Technology. Jeanne Holm (JPL) was in the audience too and told us of a problem she was having. –Didn’t have the $3M to build a new expertise locator. –Even so there were anticipated issues with information integrity and curation. –Customer’s expectations were demanding and needs were real. I opened my big mouth and said, –NASA already had a directory that could populate a FOAF model and –Probably all the data we needed about projects to populate a DOAP model. We could re-use the information NASA already had and apply it to this new customer requirement!!! Bijan Parsia, Kendall Clark, Mike Grove, Evren Sirin (now Clark & Parsia LLC.) and other MINDLAB folks built a prototype in 9 weeks and… Presto! A project is born!!!

Citation preview

Page 1: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Solving the Big Problem

A Pragmatic Approach Towards Information Management at NASA

Page 2: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/20072 of 25

Agenda

• Background• As Built & Rationale for Design• The Bigger Issues• What’s Next

Page 3: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/20073 of 25

BackgroundIt all started innocently enough, I was at this conference 2 years ago listening to a W3C talk about FOAF at a Semantic Technology. Jeanne Holm (JPL) was in the audience too and told us of a problem she was having.

– Didn’t have the $3M to build a new expertise locator.– Even so there were anticipated issues with information integrity and curation.– Customer’s expectations were demanding and needs were real.

I opened my big mouth and said, – NASA already had a directory that could populate a FOAF model and– Probably all the data we needed about projects to populate a DOAP model.

We could re-use the information NASA already had and apply it to this new customer requirement!!!Bijan Parsia, Kendall Clark, Mike Grove, Evren Sirin (now Clark & Parsia LLC.) and other MINDLAB folks built a prototype in 9 weeks and…

Presto! A project is born!!!

Page 4: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/20074 of 25

Overview

Enable efficient expertise location by:• Integrating already existing but disparate data

sources,• Providing a dynamic UI for exploring the

information integration,• Visualizing social networks to facilitate

communication,• Supporting incremental integration and

incremental annotation.

Page 5: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/20075 of 25

POPS(People, Organization, Projects & Skills)

Capabilities

Provides single integrated view of: People, competencies, project participation, publications. NASA location information. Visualizes perspectival social networks between people. Allows for local or sharable annotations of integrated info. Aggregates info into a query-able, reusable service.

Page 6: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/20076 of 25

Jspace

Jspace is a “polyarchical query builder” for federated RDF Stores.

Folks can learn a QL, but why? Get the machine to build queries based on customary

user input: browsing. Browsing is better than searching!!

Started as a clone, then mass extension of mspace, from University of Southampton, UK

Be Different? Then look-feel-&-act different!

Page 7: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/20077 of 25

Goes-innas & Goes-outtas@ 10,000 ft

Page 8: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/20078 of 25

Goes-outtas & Goes-innas@ 5,000 ft

Page 9: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/20079 of 25

Some Grit of SW & Interchange • Database proxy: Apache 2.2.x• RDF Database: SWI- Prolog RDF DB• Data adapters software: Python 2.4 and rdflib 2.2.1• SOAP Library: Java SOAP• Java 1.4 or Java 1.5• POPS client: jspace 0.28• Database proxy application server: Pylons 0.9.3• NTRS harvester: Java OAI-MHP library• Social network visualization library: Jung • XML between client and DB proxy• RDF loaded into the DB• Data sources:

– CMS: SOAP → RDF– NTRS: OAI-MHP → RDF– WIMS: CSV dump → RDF– X500: LDAP → RDF

Page 10: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/200710 of 25

Extensible

• Adding new data sources (CiteSeer, PRACA, etc.) is done easily with no end-user or data source disruption.

• Customized views of existing data sources (tweaking the Jspace model file).

• Extend visualization to other facets.• Everything annotatable by users or groups.• Open Source, soup to nuts!

Page 11: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/200711 of 25

No, Really Extensible

• POPS isn’t really an expertise locator; It’s:– An infrastructure for information integration.– A generic data services (convert, federate, query,

browse) for other apps and services to use.– A generic client of those services (Jspace).– Applicable to hundreds of information integration

problems at NASA.

• How does this help with NASA’s problem?

Page 12: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/200712 of 25

The Problem• Our reliance on data and the information we derive from it

touches everything that we do.

• Critical information related to our daily operation is becoming more difficult to find.

• It is difficult to find relevant information that you know is available.

• And it’s virtually impossible to discover critical information that is relevant but unknown.

Page 13: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/200713 of 25

Our Situation• The data problem exists within at least 5 dimensions; size, complexity,

diversity, rate of growth and trust.• When we cannot find resources, we often recreate them. When we have

trouble integrating information, we often copy it. – These habits make NASA’s data volume and data integrity problems

worse.• Use-case scenarios and requirements change all the time.

– We cannot anticipate in advance what the next collection of information elements need to be or for what purpose!!

• NASA needs a strategy to help us be more consistent about our use of, reliance on, and trust in our data, and which would enable information sharing and reuse.

• Our goal is to implement a strategy for organizing our information and data assets so they can be discoverable (by machines and humans) and reusable.

Page 14: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/200714 of 25

The Challenge

Integrate information from disjoint data sources, ad hoc’ly, to solve customer needs.

Without upsetting delicate info-ecologies (data owners, curators, extant policies & procedures).

Without requiring unrealistic investment in time or money.

Page 15: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/200715 of 25

The Inspiration

Page 16: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/200716 of 25

Being a Model Model

What do you need to determine the value / utility of adata model before you use it?

1. Models should be discoverable.- you or your machine must be able to find it.

2. Models should be written to the applicable standard.- easy to incorporate or adopt.

3. Models should indicate a) Provenance,b) Currency,c) Validation,d) that they work, function, perform as expected.

Page 17: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/200717 of 25

Get the Machines Involved!Who can access the data?Who can access the data?

When can they access it? (How often, what duration, etc.)When can they access it? (How often, what duration, etc.)

Why would someone want this data? (what is data good for)Why would someone want this data? (what is data good for)

Where does the data originate from?

What curation processes?

What is the carrying capacity of the application that supports this data source?

Is there spare capacity for accessing the data source?

What can clients grant/do with the data? What can they not do?

Page 18: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/200718 of 25

Our Target Customer Experience

• Make information contained within databases and systems across projects and programs discoverable without disruption, without great expense, without loss of original contextual meaning, and without breaches of trust.

• Make attributes of trust, validity, currency and provenance known.

• Make information easier to find and, once it is discovered, make it easier for the next person to find.– Your experience compiling information benefits the next person’s

collection.

Page 19: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/200719 of 25

Better (or Different) Security

• Some mechanism to manage all of the security and access-control issues.– Working on a prototype (XACML-DL) that uses

W3C’s OWL DL to manage access control policies:• Policy verification and consistency.• Policy containment.• Policy comparison and subsumption.

Page 20: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/200720 of 25

Prepare for the eventual semantic upshift

• A web-based “curation-friendly” catalog of information services. – A database of databases, data models, and policies built from the fabric

of the Web. • Using current web standards and technologies, computers will be able to

negotiate with each other for access and services. • Customers will be able to browse, query, and search through NASA’s

collection of information resources as easily as choosing a hotel or sweater. • Opportunities use-pattern matches will assist customers can make the

experiences of others available to you.• Through your browser (or web service) discover attributes of the

information’s currency, provenance, validity, and trust. • This service’s utility will be enriched by each customer’s use over time and

it will grow incrementally, just like the Web.

Page 21: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/200721 of 25

The Big Picture

Page 22: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/200722 of 25

Next 18 - 24 Months• Define the Gold, Silver and Bronze criteria for NASA’s Reference

Model Types.• Build a prototype repository service in collaboration with our

communities of practice. • Assist developers in the construction of initial SLAPs for data and data

model discovery & reuse.• Assist developers in building a proof-of-concept repository for

Ontologies and SLAPs.• Determine best practices and techniques for adding a validation bit. • Construct go-to standards for new applications and models.• Participate in key W3C standards groups (e.g. WS-policy, Owl 1.1).

Page 23: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/200723 of 25

The StrategyGiven the heterogeneity and diversity of NASA data (e.g., scientific, administrative, operational, financial, analytic), we need a flexible approach to building information integration solutions with sufficient formality to provide cross-system discovery and reuse.

• Establish Information Management standards and mechanisms that promote enriched and ad-hoc information sharing and reuse across NASA data services.

• Define a prospective solution that will augment data management capabilities as newly created data sources are integrated.

• Promote a layered approach, enriching services incrementally, when practical and requirements-driven. • Enable integration so that the most sought after, useful, and mostly easily integrated data services

(databases, models, web services, etc.) are pushed to the front of the queue.• Enable discovery and reuse of policy agreements between data providers and customers and between data

systems so attributes of confidentiality, integrity, availability and currency are managed uniformly across diverse systems.

• Enable easier query integration across disparate hierarchies by modernizing NASA Information Standards to include a NASA Data Reference Model and definition of “gold, silver and bronze” standards for data and data models.

• Leverage current communities who have demonstrated excellence within their projects and programs.

Page 24: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/200724 of 25

AcknowledgementsLeadership: Jeanne Holm, Dan Schumacher, Hal Bell, Greg Robinson, the EA Data Team, Ken Griffey, Nitin Niak, many others.

Data sources: Chris Carlson, Calvin Mackey, Robin Land, Tim Sullivan

Code & design: Clark & Parsia, LLC (Mike Grove, Evren Sirin, Bijan Parsia, and Kendall Clark) and Koansys, LLC (Chris Shenton)

R&D, proof of concept: Jim Hendler, m.c. schraefel, Mindlab people

Page 25: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/200725 of 25

Bibliography

McGuffin & schraefel, A Comparison of Hyperstructures: Zzstructures, mSpaces, and Polyarchies (Proceedings of ACM Conference on Hypertext and Hypermedia)

Clark, Schain, & Parsia: Semantic Web At NASA (XTech 2006)

SWI-Prolog Semantic Web Server

Construction, Collection & Curation Of NASA’s Data Reference Models - Navigating NASA’s Information Space

M. Smith, A. Schain, K. Clark, A. Griffey, and V. Kolovski, Mother, May I? OWL-based Policy Management at NASA

Page 26: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/200726 of 25

Questions?Complaints?

Page 27: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/200727 of 25

Future View• Develop and deploy new classes of applications that merge data, services,

and physical resources into a semantically aware, adaptive environment.• Create a pervasive collaborative environment by having software “tasking”

agents autonomously scan published IT service assets in conference areas, and choreograph them to an interconnected virtual work environment.

• Deploy software agents that can autonomously scan published knowledge and metadata and automatically connect them, or harvest them for information, anticipating users' needs: give the users the data they need when the need it, in a form relevant to their current task.

• Develop agents that can resolve conflicts amongst different data sources and ascertain the trustworthiness of the published data, both within NASA and outside the Agency.

• Develop agents that can learn, anticipate needs, discover relevant data, and enter into transactions, all on behalf of their human users.

Page 28: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/200728 of 25

Screen Shot

Page 29: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/200729 of 25

Design Choices

• Java client v. javascript, in-browser client • federation v. data consolidation • HTTP v. SOAP • RDF v. XML• SeRQL/RDF v. SQL/RDBMS • aggregation v. distributed query • Via broker v. aggregation via client• visual query building v. NLP• Browsing v. some other “direct query” interface (QBE, forms) versus • Browsing v. searching

Page 30: Solving the Big Problem A Pragmatic Approach Towards Information Management at NASA

Andrew Schain, NASA HQ, Government Emerging Technology Subcommittee, Washington, DC 07/17/200730 of 25

Mathematics of the who-knows-who relationship visualization

Given a set of people, P and a set of relationships, R, that connect people and entities

We define five types of relationships: 1) same facility, 2) same department, 3) same skill and department, 4) same skill and project, 5) same skill, project, and facility. Call these r 1 - r5.

rixy indicates a relationship of type i between person x (px) and person y (py)

There is a direct connection between users pu and ps if there exists an rmus

If there is not a direct connection, we search for a path from pu to ps by finding pa such that there exists rm

ua, rnas. Then, we add (pu, ps, pa, rm

ua, rnas) to the graph.

For example, if Alice is the user and Bob is the selected person, we will look for a direct relationship between them, such as if Alice and Bob both work in the same department (i.e. find rm

alice,bob).

If the direct relationship does not exist, we look at all the people Alice has relationships with, and check to see if any of them also have relationships with Bob. For example, Alice may work in the same facility as Chuck (r1

alice,chuck). Chuck, in turn, may have the same skill and work on the same project as Bob (r4

Chuck,Bob). Chuck then becomes a connection between Alice and Bob. All three people and their relationships are added to the graph.