IAHxSearch Interface using Apache Solr/Lucene
ABCD & CDS/ISIS WorkshopElluminate Session
24 May 2012
Vinicius de AndradeDesarrollo de SistemasKMC/BIREME
BIREME / PAHO / WHO
Data Level
Index Level
ISISLucene
InterfaceLevel
Services Interfaces
LayersCapas
MetadataMetadatos
Conversion of information sources for a set of metadata (single schema)
Identification of elements for organization into "clusters"
Data LevelCapa de los datos
Indexes
Index LevelCapa de los ndices
Boolean queryBsqueda booleana
Boolean query, ranking and clustersBsqueda booleana, ranking y clusters
Multiples interfaces for present result
Interface Level
What is LuceneHigh performance, scalable, full-text search libraryFocus: Indexing + Searching DocumentsDocument is just a list of name+value pairsNo crawlers or document parsingFlexible Text Analysis (tokenizers + token filters)100% Java, no dependencies, no configfiles
What is SolrA full text search server based on LuceneXML/HTTP, JSON InterfacesFaceted Search (category counting)Flexible data schema to define types and fieldsIndex ReplicationExtensible Open Architecture, PluginsWeb Administration InterfaceWritten in Java, deployable as a WAR
Lucene Architecture
admin update select
Standard request handlerCustom request handler
XML response writerJSON response writer
XML Update HandlerCSV Update Handler
Lucene
Basic AppDocument
title: Genomeauthor: Matt Ridleytype: book...
Query Response(matching docs)Query(title:genome)
http://solr/update http://solr/select
Servlet
Contain
er
Solr
HTML
WebappIndexer
DocList
Search(Query,Filter[],Sort,offset,n)
language:en
year:2008
genomeyear asc
subject:chromosomes
subject:diseases
DocSet
type:article
type:book
journal:Rev. A
journal:Rev B
Journal: Rev C
intersection
Size()
= 594
= 382
= 247
= 689
= 104
= 92
= 75
Query Response
Clusters / Grupos
Indexing DataHTTP POST to http://localhost:8080/solr/update
05991GenoneMatt Ridleygenomediseasechromosomesen
Index Key Generation
Deleting DocumentsDelete by Id, most efficient
0559132552
Delete by Query
subject:disease
Commit makes changes visible
same as commit, merges all index segments for faster searching
_0.fnm_0.fdt_0.fdx_0.frq_0.tis_0.tii_0.prx_0.nrm
_0_1.del
_1.fnm_1.fdt_1.fdx[]
Lucene Index Segments
Searchinghttp://localhost:8080/solr/select?q=genome
&start=0&rows=2&fl=title,author
GenomeMatt Ridley
Update and Query Index
:8080/index/update
/index/select
XML
QUERY
http://localhost:8080/index/select?q=saude&fq=type:article&wt=json
IAHx - Architecture
Client Interface Controller Index
Update scripts
index.shindex.sh [arquivo xml] [indice]commit.shcommit.sh [indice]optimize.shoptimize.sh [indice]deletedocs.shdeletedocs.sh [indice] [query]
lil-7320LILACSBR1.1regionalarticleRibeiro, M. VGallina, R. ASato, THidranencefalia: estudo clinicopatologico de 6 casos.Hydranencephaly: clinicopathological study of 6 cases184-92Arq Neuropsiquiatr;40(2)1982. Arq Neuropsiquiatr0004-282X402pt1982BR1982000000.0671982Foram estudados 6 casos de hidranencefalia do ponto de vista de sua semiologia clinica, de seus
exames complementares e das verificacoes anatomopatologicas. Os autores concluem que a transiluminacao e de grande utilidade no diagnostico precoce destes casos. O seguimento dos pacientes e as verificacoes anatomopatologicasdemonstram que a hidranencefalia teve como origem lesoes encefaloclasticas (inflamatorias, mecanicas e vasculares) que levaram, antes ou apos o nascimento, a destruicao total do cerebro com preservacao das estruturas sub-tentoriais
^d6984SCAD
Solr Update XML formatrelevancy
cluster
order
Solr XML Config schema (1/2)
.....
Solr XML Config schema.xml (2/2)
.....
....
Solr XML Config solrconfig.xml
.....
truetype type_of_studymh_clusterta_clusteryear_cluster201
010
oniahx
BVS-3700iAHx integrated searchpresentation
Solr XML result
{"responseHeader":{"status":0,"QTime":1,"params":{
"wt":"json","rows":["1","1"],
"start":"0","indent":"on","q":iahx","version":"2.2"}},"response":{"numFound":2,"start":0,"docs":[
{"id":"BVS-3700",au":"Antonio, Vinicius de Andrade",ti":" iAHx integrated search ","type":"presentation"}]}}
Solr JSON result
IAHx - Search Interface
Project Source Code RedDes (tickets, documentation)
http://reddes.bvsalud.org/projects/iahx/
GitHub (source code) http://github.com/bireme/iahx-opac/ http://github.com/bireme/iahx-server/ http://github.com/bireme/iahx-controller/
IAHx - Instalation
http://reddes.bvsalud.org/projects/iahx/wiki/Install* Available only in Portuguese at this time
Running iAHx Solr server installed and running
iahx-server is a custom installation of tomcat6 with solr deployment and shell scripts for executing basic solr REST commands
Tomcat6 iahx-controller is a war module used for dispatch
and receive solr requests
Webserver + PHP iahx-opac interface that convert JSON Solr result
using smarty template
Prepare data for Solr
ISIS SOLR Conversion via PFT
OAI-PMH XML SOLR Conversion via XSL
ISIS SOLRif p(v2) then,
' ',| |v2||/,(| |v4||/),(| |v16^*||/),(| |v18^*||/),
' '/,fi,
OAI-PMH XML SOLR
Questions
Vinicius de AndradeBIREME/OPS/OMS
Thank you