Upload
erik-hatcher
View
6.454
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Got data? Let's make it searchable! This interactive presentation will demonstrate getting documents into Solr quickly, provide some tips in adjusting Solr's schema to match your needs better, and finally showcase your data in a flexible search user interface. We'll see how to rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production.
Citation preview
Rapid PrototypingwithSolr
presented byErik Hatcher, Technical Staff, Lucid Imagination
1
1
Abstract
2
Got data? Let's make it searchable! This interactive presentation will demonstrate getting documents into Solr quickly, provide some tips in adjusting Solr's schema to match your needs better, and finally showcase your data in a flexible search user interface. We'll see how to rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production.
2
Why prototype? Demonstrate Solr can handle your needs
Buy-in
"Prototyping: faster than teaching a 9-year-old Ju-Jitsu"
It's quick, easy, AND FUN!
The User Interface is the app
3
3
Got Data? Files?
Solr Cell
Databases? Data Import Handler
Feeds (Atom/RSS/XML)? Data Import Handler
3rd party repositories? Lucene Connectors Framework custom indexing scripts using a Solr API
CSV!!! CSV upload handler
4
4
UI Solritas (VelocityResponseWriter)
http://localhost:8983/solr/itas
Documentation: http://wiki.apache.org/solr/VelocityResponseWriter
5
5
LucidWorks for Solr great starting point
built-in and pre-configured: Clustering
Carrot2 Search UI
Solritas (VelocityResponseWriter) Server includes root context, handy for serving static files
Better stemming KStem
Tomcat, optionally
6
6
~/LucidWorks: start.sh
7
2010-05-21 08:53:49.595::INFO: Logging to STDERR via org.mortbay.log.StdErrLog2010-05-21 08:53:49.764::INFO: jetty-6.1.3May 21, 2010 8:53:50 AM org.apache.solr.core.SolrResourceLoader locateSolrHomeINFO: JNDI not configured for solr (NoInitialContextEx)May 21, 2010 8:53:50 AM org.apache.solr.core.SolrResourceLoader locateSolrHomeINFO: using system property solr.solr.home: /Users/erikhatcher/LucidWorks/lucidworks/jetty/../solrMay 21, 2010 8:53:50 AM org.apache.solr.core.SolrResourceLoader <init>INFO: Solr home set to '/Users/erikhatcher/LucidWorks/lucidworks/jetty/../solr/'...May 21, 2010 8:53:51 AM org.apache.solr.core.SolrCore registerSearcherINFO: [] Registered new searcher Searcher@21fb3211 main
7
Your Data
8
First Name,Last Name,Company,Title,Work CountryErik,Hatcher,Lucid Imagination,"Member, Technical Staff", USA...
8
First try
9
curl "http://localhost:8983/solr/update/csv?stream.file=EuroCon2010.csv"
undefined field First Name
9
Schema: dynamic field flexibility
10
<dynamicField name="*_s" type="string" indexed="true" stored="true"/><dynamicField name="*_t" type="text" indexed="true" stored="true"/>
10
Mapping to dynamic fields
11
curl "http://localhost:8983/solr/update/csv?stream.file=EuroCon2010.csv&fieldnames=first_s,last_s,company_s,title_t,country_s&header=true"
Document [null] missing required field: id
11
Identifying uniqueKey, or not
12
curl "http://localhost:8983/solr/update/csv?
stream.file=EuroCon2010.csv&fieldnames=first_s,id,company_s,title_t,country_s&header=true"
<?xml version="1.0" encoding="UTF-8"?><response><lst name="responseHeader"><int name="status">0</int><int name="QTime">40</int></lst></response>
12
http://localhost:8983/solr/itas
13
13
Schema tinkering Removed all example field definitions
Uncomment and adjust catch-all dynamic field: <dynamicField name="*" type="string" multiValued="false"/>
Ensure uniqueKey is appropriate Unusual in this data example: <!-- <uniqueKey>id</uniqueKey> -->
Make every document/field fully searchable! <copyField source="*" dest="text"/>
Then restart!14
14
Issues with no uniqueKey Remove from solrconfig.xml references to:
clustering component query elevation component data import handler
Then restart!
15
15
Reindexing with cleaner field names
16
# Delete all documentscurl "http://localhost:8983/solr/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E&commit=true"
# Index your datacurl "http://localhost:8983/solr/update/csv?
commit=true&stream.file=EuroCon2010.csv&fieldnames=first,last,company,title,country&header=true"
16
http://localhost:8983/solr/itas?facet.field=country
Faceting
17
17
country normalization
18
http://localhost:8983/solr/update/csv?commit=true&stream.file=EuroCon2010.csv&fieldnames=first,last,company,ti
tle,country&header=true&f.country.map=Great+Britain:United+Kingdom
18
UI treatments Customize request handler mappings
Edit templates hit display header/footer style
19
19
Customize request handlers
20
<requestHandler name="/browse" class="solr.SearchHandler"> <lst name="defaults"> <str name="wt">velocity</str> <str name="v.template">browse</str> <str name="v.layout">layout</str>
<str name="rows">10</str> <str name="fl">*,score</str>
<str name="defType">lucene</str> <str name="q">*:*</str> <str name="debugQuery">true</str> <str name="hl">on</str> <str name="hl.fl">title</str> <str name="hl.fragsize">0</str> <str name="hl.alternateField">title</str>
<str name="facet">on</str> <str name="facet.mincount">1</str> <str name="facet.missing">true</str> </lst> <lst name="appends"> <str name="facet.field">country</str> </lst></requestHandler>
20
hit.vm
21
<div class="result-document"> <p>$doc.getFieldValue('first') $doc.getFieldValue('last')</p> <p>$!doc.getFieldValue('title'), $!doc.getFieldValue('company')</p> <p>$!doc.getFieldValue('country')</p></div>
21
Voila!
22
22
Adding bells and whistles JQuery
<script type="text/javascript" src="/solr/admin/jquery-1.2.3.min.js"></script>
Let's add a tree map <script type="text/javascript" src="/scripts/treemap.js"></script> http://plugins.jquery.com/project/Treemap
23
23
tree map table
24
<script type="text/javascript"> function onLoad() { jQuery("#treemap-country").treemap(640,480, {}); }</script>----------------------------<body onload="onLoad();">----------------------------<table id="treemap-country">#foreach($facet in $response.getFacetField('country').values) <tr> <td>#if($facet.name)$esc.html($facet.name)#else<Unspecified>#end</td> <td>$facet.count</td> <td>#if($facet.name)$esc.html($facet.name)#{else}Unspecified#end</td> </tr>#end</table>
24
Tree map
25
25
Ajax fun: giveaways Add "static" templated page
JQuery Ajax request
snippet templated output
26
26
"static" Solritas page
27
solrconfig.xml<requestHandler name="/giveaways" class="solr.DumpRequestHandler"> <lst name="defaults"> <str name="wt">velocity</str> <str name="v.template">giveaways</str> <str name="v.layout">layout</str> </lst></requestHandler>
giveaways.vm<input type="button" value="Pick a Winner" onClick="javascript:$('#winner').load('/solr/generate_winner?sort=random_' + new Date().getTime() + '+asc');"><h2>And the winner is...</h2><center><font size="20"><div id="winner"></div></font></center>
27
fragment template
28
solrconfig.xml<requestHandler name="/generate_winner" class="solr.SearchHandler"> <!-- sort=random_... required --> <lst name="defaults"> <str name="wt">velocity</str> <str name="v.template">winner</str>
<str name="rows">1</str> <str name="fl">first,last</str>
<str name="defType">lucene</str> <str name="q">*:* -company:"Lucid Imagination" -company:"Stone Circle Productions"</str> </lst> </requestHandler>
winner.vm#set($winner=$response.results.get(0))$winner.getFieldValue('first') $winner.getFieldValue('last')
28
And the winner is...
29
29
Prototyping tools CSV update handler
Schema Browser
Solritas
Solr Explorer https://issues.apache.org/jira/browse/SOLR-1163
Solr Flare http://wiki.apache.org/solr/Flare
30
30
Refine, iterate, integrate What's next?
script full & delta indexing processes adjust schema
define fields, field types, analysis tweak configuration
caches, indexing parameters deploy to staging/production environments
31
31
Test Performance
Scalability
Relevance
Automate all of the above, start baselines and avoid regressions
32
32