Upload
others
View
24
Download
0
Embed Size (px)
Citation preview
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 1
The WebLab platform
Prepared by Patrick GIROUX and Arnaud SAVAL EADS Defense and Security
System Design Center, Val-de-Reuil, France
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 2
Overview
• Who are we?– Our department in EADS
– Our activities
• The WebLab platform– Goals
– Principles
– Basic concepts
– Architecture
– Implementation examples
• Questions
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 3
IPCC at a glance
•Information Processing Control & Cognition
•Part of EADS Defense & Security (DS), System Design Centre (SDC)
•SDFR1/IPCC: Competence Centre specialised in Information Processing.
•Carries out R&T and R&D activities in the field of:– Multimedia mining: understanding of multi-media documents
(text, audio, image, video)– Data fusion: multi-sensor data fusion applied to surveillance
dedicated to Military & Civil domain.
•Size of the team: 21 people incl. 7 PhD research engineers
•Location: Val-de-Reuil, Normandy, 100 Km west of Paris
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 4
Media-mining
Context : vast amount of digital information in today’s media age (Internet, TV, radio, digital telephony, DVD, etc.).
Ever increasing amount of data available as digital content: :
– text
– audio
– image
– video
Objective : develop advanced semantics tools enabling the automatic or semi-automatic exploitation of unstructured data.
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 5
Media-mining as potential Intelligence targets
Public media such as Internet, written press, TV, radio, or CD-ROMs can provide significant information.
Different kinds of data: geopolitical, social, cultural, economic, environmental, etc.
Identification of important events: political, military, terrorist, criminal, etc.
Identification of networks,
Tracking the changes of a crisis,
International situations analysis,
Detection of weak signals (indices, alarms, etc.).
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 6
Information retrieval
Information acquisition
Information filtering
Information extraction
Synthesis & Alert
Visualisation
Media mining functions
@@Mastering the complete chain of information
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 7
Our competencies in media mining
Multi-disciplinary approach combining :• Statistical analysis• Linguistic analysis • Semantics analysis • Learning techniques
Best selection approach depending on the objective.
Free to choose any COTS.
Positioning as an integrator and expert • Integration of COTS or components developed by external laboratories• Developments of specific EADS components• Patents and scientific publications
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 8
What is WebLab ?
An integration platform
– based on recognised standards (SOA, Web Services, Semantic Web)
– allowing the integration of a selection of software components (search engine, information extraction, translation, knowledge management, graphical representation using maps/networks, etc.)
– allowing the interoperation of the selected components.
A set of media mining services for multi-media documents dedicated to technological/economical watch applications and OSINT applications.
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 9
An integration eased by the conformity to standards
An open platform willingly based on Web Services and Web Semantic standards which are recognised and
adopted standards by most all software editors.
URI UTF-8
XML Namespaces
XML Schema XPath XQuery RDF
RDF SchemaWSDL
SOAP
OWLBPEL
SOA JBI ESBPortal/Portlets
JSR168
SPARQL
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 10
An “open source” technical foundation
WebLab platform includes an open source technicalfoundation called WebLab Core
http://www.weblab-project.org
•developed by EADS and used by more than 30 partners from France and Europe:•used in several projects in various application domains (WebContent, Vitalas, eWokHub & others … ). •recently chosen for new projects and proposals (Virtuoso, SAIMSI, etc.) • allows to manage all the WebLab Services resulting from variouscomponent integration.
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 11
The WebLab Platform
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 12
The WebLab Logical Architecture
Access
Process
Services
Components
Data
ServiceBus
Messaging & Distribution
AcquisitionComponents
<<component>>
OrchestrationEngine
BusinessApplications
PortalBusiness Process
Editor
Mon
itoring &
Supervision
Security &
QoS
Repository
TechnicalServices
Other Services (evaluation)
AcquisitionServices
ProcessingServices
ProcessingComponents
<<component>>Diffusion
Components
<<component>>Technical
Components
<<component>>
OtherComponents
<<component>>
DiffusionServices
Repository
Infrastructure
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 13
Basic principle of the WebLab
A service •process a document or a part of a document•is independant and shall not invoke other services•can enrich the document with new information that will be exploited by the other services (processing chain).
<<service>>WebCrawling
<resource … document"><annotation uri= …><data …><rdf:RDF><dc:source … ><dc:title … ><dc:author …>
</rdf:RDF></data></annotation>
source>
<resource … document"> <annotation uri= …><data …><rdf:RDF><dc:source … ><dc:title … ><dc:author … >
</rdf:RDF></data></annotation><mediaUnit … text>blablablibliediaUnit >ource>
<<service>>Normalisation
<<Service>>Segmentation
doc.pdf
blablabla
<resource … document"> <annotation uri= …><data …><rdf:RDF><dc:source … ><dc:title … ><dc:author … >
</rdf:RDF></data></annotation><mediaUnit … composedUnit>
<mediaUnit … text>blabla
</mediaUnit><mediaUnit … text>
blibli/mediaUnit>ediaUnit >ource>
<<service>>WebCrawling
<resource … document"><annotation uri= …><data …><rdf:RDF><dc:source … ><dc:title … ><dc:author …>
</rdf:RDF></data></annotation>
source>
<resource … document"> <annotation uri= …><data …><rdf:RDF><dc:source … ><dc:title … ><dc:author … >
</rdf:RDF></data></annotation><mediaUnit … text>blablablibliediaUnit >ource>
<<service>>Normalisation
<<Service>>Segmentation
<<Service>>Segmentation
doc.pdf
Blablibli
blabla
<resource … document"> <annotation uri= …><data …><rdf:RDF><dc:source … ><dc:title … ><dc:author … >
</rdf:RDF></data></annotation><mediaUnit … composedUnit>
<mediaUnit … text>blabla
</mediaUnit><mediaUnit … text>
blibli/mediaUnit>ediaUnit >ource>
<<service>>WebCrawling
<resource … document"><annotation uri= …><data …><rdf:RDF><dc:source … ><dc:title … ><dc:author …>
</rdf:RDF></data></annotation>
source>
<resource … document"> <annotation uri= …><data …><rdf:RDF><dc:source … ><dc:title … ><dc:author … >
</rdf:RDF></data></annotation><mediaUnit … text>blablablibliediaUnit >ource>
<<service>>Normalisation
<<Service>>Segmentation
<<Service>>Segmentation
doc.pdf
blablabla
<resource … document"> <annotation uri= …><data …><rdf:RDF><dc:source … ><dc:title … ><dc:author … >
</rdf:RDF></data></annotation><mediaUnit … composedUnit>
<mediaUnit … text>blabla
</mediaUnit><mediaUnit … text>
blibli/mediaUnit>ediaUnit >ource>
<<service>>WebCrawling
<resource … document"><annotation uri= …><data …><rdf:RDF><dc:source … ><dc:title … ><dc:author …>
</rdf:RDF></data></annotation>
source>
<resource … document"> <annotation uri= …><data …><rdf:RDF><dc:source … ><dc:title … ><dc:author … >
</rdf:RDF></data></annotation><mediaUnit … text>blablablibliediaUnit >ource>
<<service>>Normalisation
<<Service>>Segmentation
<<Service>>Segmentation
doc.pdf
Blablibli
blabla
<resource … document"> <annotation uri= …><data …><rdf:RDF><dc:source … ><dc:title … ><dc:author … >
</rdf:RDF></data></annotation><mediaUnit … composedUnit>
<mediaUnit … text>blabla
</mediaUnit><mediaUnit … text>
blibli/mediaUnit>ediaUnit >ource>
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 14
Annotation system
<rdf:RDFxmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:dc="http://purl.org/dc/elements/1.1/"xmlns="http://my/name/space/"><item rdf:about=« file://webConten/paper"><dc:date>30-11-2008</dc:date><dc:title>Warehousing Web Resources with the WebContent Platform</dc:title><dc:language>en</dc:language><dc:publisher>WWW2009</dc:publisher><dc:type>Research paper</dc:type><dc:format>.pdf</dc:format></item></rdf:RDF>
SeparationSeparation betweenbetween contents andcontents andannotationsannotations
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 15
Annotation systemAnnotation system
SplittingSplitting a document in mediaa document in media--unitsunits
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 16
Modeling objects used by media-miningservices
Document
1
1..*
ComposedUnit
MediaUnit
Text BinaryMediaUnit
Video AudioImage
Resource Annotation1 0..*
PieceOfKnowledge
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 17
The WebLab exchange model
Resource
+uri: URI
MediaUnit
Service
Document
ComposedUnit
+synchronised: boolean
1
1..*
Text
<<optional>>+content: String
Segment
1
*
VideoAudio Image
Ontology
Query
ResourceCollection
*
BinaryMediaUnit
<<optional>>+content: Binary
User
Annotation
1
0..*
LinearSegment
+start: int+end: int
SpatialSegment
Coordinate
+x: int+y: int
2..*
LowLevelDescriptor
+key: StringTemporalSegment
+start: int+end: int
BinaryContent
+data: Binary
TextContent
+data: String
Content
+offset: int+size: int
UsageContext
0..*
ComposedQuery
+boolOperator: {AND,OR,NOT}
1..*
SimilarityQuery
1..*
StringQuery
+request: String
Feature
<<optional>>+label: String+value: List <any>
1..*
PieceOfKnowledge
data: RDF/XML
ResultSet0..*
0..1
ElementaryQuery
weight: float = 1.0
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 18
Classification of WebLab services independentfrom any COTS (in French)
Acquisition<<service>>
Normalisation<<service>>
Epuration<<service>>
Pré-traitement<<service>>
Reconnaissancede l'écriture
<<service>>
Transcriptionaudio
<<service>>
Classification<<service>>
Extractiond'information
<<service>>
Résumé<<service>>
Traduction<<service>>
Indexationet
recherche
<<service>>
Alerte<<service>>
Service Métier<<service>>
Acquisitiondes
données duWeb
<<service>>Acquisition
desdonnéespapier
<<service>>
Acquisition desdonnées télé
ouradio-diffusées
<<service>>
Normalisationdu texte
<<service>>
Normalisationde l'audio
<<service>>
Normalisationde la vidéo
<<service>>Normalisation
de l'image
<<service>>
Epurationdu texte
<<service>>
Epurationde l'audio
<<service>>
Epurationde l'image
<<service>>
Epurationde la vidéo
<<service>>
Prétraitementdu texte
<<service>>
Prétraitementde l'audio
<<service>>
Prétraitementde la vidéo
<<service>>
Reconnaissanced’écriture
dactylographiée
<<service>>Reconnaissance
d’écrituremanuscrite
<<service>>
Classificationdes données
textuelles
<<service>>Classificationdes données
images
<<service>>
Extractiond’information
dans lesdonnées
textuelles
<<service>>
Extractiond’information
dans desdonnéesimages
<<service>>
Extractiond’information
dans desdonnées
audio
<<service>>
Résumé detexte
<<service>>Résumé
vidéo
<<service>>
Requêtestextuelles
et surannotations
<<service>>Requêtes surles images etmultimodales
<<service>>
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 19
Example of WebLab’isation of COTS (in French)
Extractiond'information
<<service>>
Extraction d’informationdans les données
textuelles
<<service>>
+definirModele()+extraireEntités()+extraireRelations()
Extractiond’information dans
des données images
<<service>>
+reconnaitreTexte()+reconnaitreVisage()+reconnaitreForme()
Kim, Jeongil
Kim, Keehyun Kum
Lee
Leen
Lim bae
Jung, Kim
ikona
Extractiond'information
<<service>>
Extraction d’informationdans les données
textuelles
<<service>>
+definirModele()+extraireEntités()+extraireRelations()
Extractiond’information dans
des données images
<<service>>
+reconnaitreTexte()+reconnaitreVisage()+reconnaitreForme()
Kim, Jeongil
Kim, Keehyun Kum
Lee
Leen
Lim bae
Jung, Kim
ikonaikona
Classificationdes données
images
<<service>>Classificationdes données
images
<<service>>
Acquisition<<service>>
+configurer()+collecter()
Acquisition desdonnées du Web
<<service>>Acquisition desdonnées télé ouradio-diffusées
<<service>>
+enregistrer()
Lee, Minkyu
Lim, Heejin
Jung, Yoontae Kim, Hyunsoo
Acquisition<<service>>
+configurer()+collecter()
Acquisition desdonnées du Web
<<service>>Acquisition desdonnées télé ouradio-diffusées
<<service>>
+enregistrer()
Lee, Minkyu
Lim, Heejin
Jung, Yoontae Kim, Hyunsoo
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 20
Component already integrated as WebLab services
• Exalead• Sinequa• Pertimm• Temis – Luxid• Nstein - TME• Mondeca - ITM• SailLabs• Systran• Analyst’s NoteBook / I Base• Digimind• Autonomy • IMEDIA • …• HTTrack• MythTV• Tyka• Apache- Lucene Solr• GATE• Google Map• INRIA Edelweiss – Corese• Prefuse• …
And many more to come
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 21
WebLab services
Non-exhaustive list of services available in the WebLab platform
•Data acquisition (Web, data bases, folders)•Normalisation of textual content•Language identification (> 30 languages)•Speech to text transcription•Manual annotation and cotation•Named entities extraction •Semantics analysis and relation extraction•Thematic categorization and clustering•Automatic summarisation•Indexing•Full Text Search (keywords, annotation, boolean, etc.)•Semantics search•Information mapping
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 22
Generic interfaces for services
Analyser<<interface>>
+process(uc: UsageContext, res: Resource): Resource
Configurable<<interface>>
+configure(uc: UsageContext, configuration: PieceOfKnowledge)+resetConfiguration(uc: UsageContext)
Trainable<<interface>>
+addTrainResource(uc: UsageContext, res: Resource)+train(uc: UsageContext)+resetTrainedModel(uc: UsageContext)
ContentProvider<<interface>>
+getContent(contentId: URI, offset: int, limit: int): Content
ContentConsumer<<interface>>
+setContent(content: Content): URI
ResourceContainer<<interface>>
+saveResource(uc: UsageContext, res: Resource): URI+getResource(uc: UsageContext, resourceId: URI): Resource
ReportProvider<<interface>>
+addInformation(uc: UsageContext, res: Resource)+buildReport(uc: UsageContext): Resource
Indexer<<interface>>
+index(uc: UsageContext, res: Resource)
Generic Interface<<interface>>
Searcher<<interface>>
search(uc: UsageContext, q: Query, offset: int, limit: int): ResultSet
SourceReader<<interface>>
getResource(uc: UsageContext, offset: int, limit: int): ResourceCollection
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 23
An example of a processing chain: from TV to annotations
Videosegmentation
Audiotranscription
iran trying to established new facts on the ground there's a couple different timelines even mind that the time line of most intense discussions how long before iran is able to produce enough highly enriched uranium for a nuclear weapon
VIDEO AUDIO VOCAL AUDIO
TEXTANNOTATED TEXTiran trying to established new facts
on the ground there's a couple different timelines even mind that the time line of most intense discussions how long before iran is able toproduce enough enriched uraniumfor a nuclear weapon
Audiorefinement
Information extraction
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 24
Architecture VITALAS
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 25
Using portlet technology to build specificapplication from generic HCI modules
A A WebLabWebLab application application cancan bebe
implementedimplemented by by
assemblingassembling//orchestratingorchestrating a set a set
of of existingexisting portletsportlets..
A Portlet is
– Composable in a page– Integrable in an application– Displayable as an information
block– Normalised
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 26
Example of WebLab interface using portlet technology
Defence and Security
© EADS 2009 – All rights reservedPatrick GIROUX - EADS DS / SDFR1/IPCCPage 27
Thanks for your attention …
Questions ?
Demo application: http://forge.ow2.org/projects/weblab
Contacts: [email protected]@eads.com