On our way to to Information Overload ?

Preview:

DESCRIPTION

On our way to to Information Overload ?. Or to prevent it by Appropriate use of Technology ?. C19881 0.99 C92992 0.67 C02002 0.66 C99229 0.44 C00392 0.33 C93939 0.21. Collexis Fingerprints (CFP’s). consolidated knowledge. Cross-language networking. - PowerPoint PPT Presentation

Citation preview

On our way toto

Information Overload ?

Or to prevent it by Or to prevent it by Appropriate use of Technology ?Appropriate use of Technology ?

C19881 0.99C92992 0.67C02002 0.66C99229 0.44C00392 0.33C93939 0.21

consolidated knowledge

Collexis Fingerprints (CFP’s)

English

French

Spanish

Peoplemedical researchersaround the world

Activitiesin elect. text like projects, publicationsMedline abstracts...

Disease: #12674

MultilingualThesaurus IndexerMatches keywords, translatesthem to identical numbers and ranks them by their relevance

Maladie: #12674

Enfermedad: #12674

Malaria: #24530

Hospital: #19994

Paludisme: #24530

Paludismo: #24530

Hôpital : #19994

Hospital: #19994

...

...

...

The CommonLanguageEach activity is representedas a set of keyword numbersranked by their relevance

#4256 : 1.0#3627 : 0.8#19994 : 0.5#28746 : 0.3#32874 : 0.1#32874 : 0.1#32874 : 0.1

#14325 : 1.0#3627 : 0.8#19994 : 0.5#28746 : 0.3#32874 : 0.1#32874 : 0.1#32874 : 0.1

#85643 : 1.0#3627 : 0.8#19994 : 0.5#28746 : 0.3#32874 : 0.1#32874 : 0.1#32874 : 0.1

#17345 : 1.0#3627 : 0.8#19994 : 0.5#28746 : 0.3#32874 : 0.1#1c8456 : 0.1#00356 : 0.1

„Collexion“ of activities

You:

#17345:1.0#3627 :0.8#19994:0.5#28746:0.3#32874:0.1

Your activity as text

Submit and indexed to keyword numbers

Find similaractivities andthe peoplebehind

Cross-language networking

BIOSEMANTICS• “Cellese”: the language that cells use to communicate

internally and externally.

• The Molecular Language and its biological MEANING• The Group

– Jan Kors PhD.– Erik van Mulligen PhD– Bob Schijvenaars PhD– Marc Weeber PhD– Christiaan v.d. Eyck MsC– Rob Jelier PhD – Barend Mons PhD– Johan van der Lei PhD

SERENDIP

Beyond PublicationBeyond PublicationSemantic metaSemantic meta--analysis of massive data and information sources for discoveryanalysis of massive data and information sources for discovery

Bsik 2003Bsik 2003

A consortium to combine State-of-the-art Information and Knowledge Mining Technologies

To support:

•Thesaurus and ontology enrichment

•Disambiguation of concepts

•Semantic meta-analysis of massive information

To enable:

•Information-based discovery

•Evidence based policy making

Thesaurus and Ontology Enrichment

• New concepts• Synonyms• Homonyms• Genes, Proteins • Pictures

Valida

tion 3

Freetext

UnexplainedText (XML)

Potential concepts

Thesauri:•Mesh•HUGO•SwissProt•SAGE•Others

FUA

4

1Fingerprints(known concepts)

partners

E-BioSci

EMBOElsevier

NLP

2

TNO

LUMC

HUGONC

Genebio

AMC

EUR

UVA

SERENDIP

Too much to read: major trends foreseen:

• From Reading to Consulting• From Reading to Meta-analysis• From Text to Knowledge

Representations

C19881 0.99C92992 0.67C02002 0.66C99229 0.44C00392 0.33C93939 0.21

Semantic typesSemantic typesCo-occurrence dataCo-occurrence data

The first step: to the Conceptual Semantic Network

Calcium deposition Pleocytosis Basal Ganglia EncephalopathyCerebrospinal Fluid Tomography, X-Ray Computed Parents FamilyAicardi Goutieres syndrome Ferrocalcinotic deposition Spastic quadraplegia Fahr disease Microcephaly AGS1

xG-protein coupled receptors G-substrate Lipoid dermatoarthritis Receptors Complement Factor B RNA, Complementary Xenopus oocyte AGS1

SwissProt: Activator of G-protein signaling 1 (AGS1)

*225750

AICARDI-GOUTIERES SYNDROME 1; (AGS1) : OMIM

Aicardi Goutieres syndrome 1Heterogeneity Linkage (Genetics) Clinical diagnosis Family 2 AGS1 **Lod Score Genetic Heterogeneity analysis Toxoplasmosis Calcium deposition 3 Encephalopathy 4 Cadmium Genus: Human cytomegalovir... Cerebrospinal fluid abnorm. 5.. Interferon-alpha Chromosomes Viral Child Head Tricuspid Valve Stenosis

Fingerprinting

disambiguatio

n

ACS

META-ANALYSIS

Applications

• Cross-language, jargon and cross-system matching (implemented): www.sharingpoint.shared-global.org

• Information-based discovery (Research)

• Community building (Experts,Policy Making)

• Trendwatching and Indicators (Policy Making)

Seed-Term based Conceptual Semantic Networks

??

Clustering of genes on-the-fly

Predicting new knowledge ?

III= Distribution over distance categories of concept-pairs without co-occurrence in the learning set.

IV= Distance categories of concept pairs related to the probability that there is no explicit relationship or co-occurrence in Medline (zero ratio) . A ratio of 0 means that an automatic Query in Medline with the concept pair with “AND” in between does lead to 0 hits in Medline.

New Drug discovery ?

Semantic Filtering

Knowledge Maps, Nature Biotechnology Map

Knowledge Maps: Medline Bioterrorism Map 1997

Knowledge Maps: Medline Bioterrorism Map 2001

Private Research

DC

Public

E-BioSciPharma etc.

ORIELSERENDIPFP6 etc.

I-ResearchMinistiesWHO, FAOetc.

SHAREDBIREME/VHLEDCTPOxford intiative etc.

Recommended