II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)

The Road to Federated Text Mining: Are we there yet?

II-SDV 2014

Guy Singh

Click to edit Master title style Click to edit Master title style

“Federated search is an information retrieval technology that allows the simultaneous search of multiple searchable resources.

What is federated search?

A user makes a single query request which is distributed to the search engines participating in the federation”

- Wikipedia

Click to edit Master title style Click to edit Master title style Current Situation

• Volume of data ever increasing

• Proprietary content can reside within Enterprise

• No need for everyone to keep standard sources up-to-date

• Data from content providers can reside on their sites

Linguamatics Customer Confidential 3

Internal Content External Content

MEDLINE Clinical Trials

Publisher Content

FDA Drug Labels

Patents

Data Sources

Scientific Literature

Social Media

Web Pages

Internal Documents

Patents

Clinical Trials

Increasing Range of Data Sources

Varying in Structure

Click to edit Master title style Click to edit Master title style How does text mining differ from keyword search?

Example: What genes affect breast cancer

• Searching across documents using keywords is relatively trivial

– Do not need to be aware of where the words occur and in what context

• Text mining documents with varying structure requires a more sophisticated approach; Need to:

– Know where words matching entities/concepts occur

– Disambiguate depending on context and location

– Find terms in particular regions/parts of document for targeted searches

Why does document structure matter?

• Integrate the data together into a data warehouse

– Extract, Transform and Load each data source into a new database

– Multiple copies of the data

– Data normalisation can be difficult and challenging

– Time consuming and expensive process

– Most database vendors take this approach

– Allows users to perform a single search across all the content

• Leave the data where it is, federated content

– Data remains in it’s original form and location

– Multiple data types

– Multiple network locations

– Single search across multiple different data sources

Approaches to dealing with different data sources

Data Normalisation

Link the Content Servers

Merge Results

Federated Text Mining

How do we get to Federated Text Mining?

Data Normalisation – Virtual Indexes

Pathology Reports Index

Journal Abstracts Index

Virtual Index

Data Normalisation – Document Structure

Pathology Reports

Journal Abstracts

Data Normalisation - Entities

Journal Abstracts

Pathology Reports Combined

(Normalized)

Linking Content Servers

• I2E 4.1 introduced a new feature – Linked Server

• One I2E server can be linked to another I2E server

• Provides access to remote and local indexes and queries through a single I2E interface (Linked Servers)

– Indexes and queries on remote servers on the network appear the same as local indexes

Linked Servers

Development Status

Linguamatics – Customer confidential

I2E 4.1 Linked Servers

I2E Enterprise on Customer network

I2E OnDemand SaaS

Infrastructure

In-house Indexes

I2E OnDemand Standard Indexes

I2E Enterprise Access

Custom Indexes

Access via Linked Servers

Access via single UI

Merging Results (Part I)

Single Server, Multiple Queries

Click to edit Master title style Click to edit Master title style I2E 3.0 (2009) – Merging Results (part I) from one server

Profiling Individuals

• Example from news reports related to pharmaceutical industry

• Pick up properties from one document or many

I2E 3.0 – Merging Results (part I) from one server

Document

Identifier

Patient

information Disease history

Patient data

Medications

and dosages

Hit displayed in

context

Merging Results (Part II)

Multiple Servers, Multiple Queries

Each Server supplying separate set of results

Content Server 1

Content Server 2

Content Server 3

Content Server 4

Merge into a single set of results

The Road to Federated Text Mining

Click to edit Master title style Click to edit Master title style I2E 4.0: Multiple Clients, Multiple Results

I2E Server 2 FDA Drug Labels

I2E Server 1 Internal Documents

external network internal network

Click to edit Master title style Click to edit Master title style I2E 4.1/4.2: Single Client, Multiple Results

Linked server

Merging Results (Part II)

Click to edit Master title style Click to edit Master title style Q4 2014: Single Client, Single Result, Multiple Servers

Linked server

Click to edit Master title style Click to edit Master title style Q4 2014: Federated Text Mining Example

• Single Query

• Differently structured data sources on different servers

– Journal Articles (PubMed Central) on Enterprise Server

– MEDLINE on I2E OnDemand

• Single set of results

Click to edit Master title style Click to edit Master title style The Road to Federated – Are we there yet?

I2E 4.0

Dec 2012

I2E 4.1

October 2013

Next release: in Development

Q4 2014

Merging the Results (part II)

Data Normalisation

Cambridge

Linked Server

Journal Abstracts

Pathology Reports

Thank you

II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh - Linguamatics, UK)

Software

II-SDV 2016 VantagePoint

Causas Etiologicas de SDV

II-SDV 2016 Centredoc

Integrating ChemAxon and Linguamatics to provide Agile, Chemistry-enabled Text Mining

II-SDV 2016 RightsDirect

SDV Horoz Lojistik

capitolul 3 proiectarea sdv

SDV overview 042706

Linguamatics â€“ David Milward - ChemAxon

SDV Series - Tuthill Vacuum & Blower · 2020-01-08 · SDV Series Models SDV-120 SDV-320 SDV-800 SDV-200 SDV-430 SDV-1500 SDV-2700 OPERATOR’S MANUAL Manual 1861 Rev C p/n 001861

SDV Overview

SDV-FH2 DC24 Datasheet

1 Integrating ChemAxon and Linguamatics to provide Agile, Chemistry-enabled Text Mining Dr Jeffrey L. Nauss Application Specialist, Linguamatics ChemAxon

Linguamatics Text Mining Summit 2017 · Linguamatics Text Mining Summit 2017. Monday October 2 Welcome to the Linguamatics Text Mining Summit 2017 ... MDM Account Executive, Informatica

SDV Tutorial

SWaM/SDV Dashboard

Voltage Sensor SDV

SDV Manual

Sdv 0405 design-pattern_thc_jps_skript

SDV NEWSLETTER #12