32
Rapid Prototyping with Solr presented by Erik Hatcher, Technical Sta, Lucid Imagination 1 1

Rapid Prototyping with Solr

Embed Size (px)

DESCRIPTION

Got data? Let's make it searchable! This interactive presentation will demonstrate getting documents into Solr quickly, provide some tips in adjusting Solr's schema to match your needs better, and finally showcase your data in a flexible search user interface. We'll see how to rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production.

Citation preview

Page 2: Rapid Prototyping with Solr

Abstract

2

Got data?  Let's make it searchable!  This interactive presentation will demonstrate getting documents into Solr quickly, provide some tips in adjusting Solr's schema to match your needs better, and finally showcase your data in a flexible search user interface.  We'll see how to rapidly leverage faceting, highlighting, spell checking, and debugging.  Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production.

2

Page 3: Rapid Prototyping with Solr

Why prototype? Demonstrate Solr can handle your needs

Buy-in

"Prototyping: faster than teaching a 9-year-old Ju-Jitsu"

It's quick, easy, AND FUN!

The User Interface is the app

3

3

Page 4: Rapid Prototyping with Solr

Got Data? Files?

Solr Cell

Databases? Data Import Handler

Feeds (Atom/RSS/XML)? Data Import Handler

3rd party repositories? Lucene Connectors Framework custom indexing scripts using a Solr API

CSV!!! CSV upload handler

4

4

Page 5: Rapid Prototyping with Solr

UI Solritas (VelocityResponseWriter)

http://localhost:8983/solr/itas

Documentation: http://wiki.apache.org/solr/VelocityResponseWriter

5

5

Page 6: Rapid Prototyping with Solr

LucidWorks for Solr great starting point

built-in and pre-configured: Clustering

Carrot2 Search UI

Solritas (VelocityResponseWriter) Server includes root context, handy for serving static files

Better stemming KStem

Tomcat, optionally

6

6

Page 7: Rapid Prototyping with Solr

~/LucidWorks: start.sh

7

2010-05-21 08:53:49.595::INFO: Logging to STDERR via org.mortbay.log.StdErrLog2010-05-21 08:53:49.764::INFO: jetty-6.1.3May 21, 2010 8:53:50 AM org.apache.solr.core.SolrResourceLoader locateSolrHomeINFO: JNDI not configured for solr (NoInitialContextEx)May 21, 2010 8:53:50 AM org.apache.solr.core.SolrResourceLoader locateSolrHomeINFO: using system property solr.solr.home: /Users/erikhatcher/LucidWorks/lucidworks/jetty/../solrMay 21, 2010 8:53:50 AM org.apache.solr.core.SolrResourceLoader <init>INFO: Solr home set to '/Users/erikhatcher/LucidWorks/lucidworks/jetty/../solr/'...May 21, 2010 8:53:51 AM org.apache.solr.core.SolrCore registerSearcherINFO: [] Registered new searcher Searcher@21fb3211 main

7

Page 8: Rapid Prototyping with Solr

Your Data

8

First Name,Last Name,Company,Title,Work CountryErik,Hatcher,Lucid Imagination,"Member, Technical Staff", USA...

8

Page 9: Rapid Prototyping with Solr

First try

9

curl "http://localhost:8983/solr/update/csv?stream.file=EuroCon2010.csv"

undefined field First Name

9

Page 10: Rapid Prototyping with Solr

Schema: dynamic field flexibility

10

<dynamicField name="*_s" type="string" indexed="true" stored="true"/><dynamicField name="*_t" type="text" indexed="true" stored="true"/>

10

Page 11: Rapid Prototyping with Solr

Mapping to dynamic fields

11

curl "http://localhost:8983/solr/update/csv?stream.file=EuroCon2010.csv&fieldnames=first_s,last_s,company_s,title_t,country_s&header=true"

Document [null] missing required field: id

11

Page 12: Rapid Prototyping with Solr

Identifying uniqueKey, or not

12

curl "http://localhost:8983/solr/update/csv?

stream.file=EuroCon2010.csv&fieldnames=first_s,id,company_s,title_t,country_s&header=true"

<?xml version="1.0" encoding="UTF-8"?><response><lst name="responseHeader"><int name="status">0</int><int name="QTime">40</int></lst></response>

12

Page 13: Rapid Prototyping with Solr

http://localhost:8983/solr/itas

13

13

Page 14: Rapid Prototyping with Solr

Schema tinkering Removed all example field definitions

Uncomment and adjust catch-all dynamic field: <dynamicField name="*" type="string" multiValued="false"/>

Ensure uniqueKey is appropriate Unusual in this data example: <!-- <uniqueKey>id</uniqueKey> -->

Make every document/field fully searchable! <copyField source="*" dest="text"/>

Then restart!14

14

Page 15: Rapid Prototyping with Solr

Issues with no uniqueKey Remove from solrconfig.xml references to:

clustering component query elevation component data import handler

Then restart!

15

15

Page 16: Rapid Prototyping with Solr

Reindexing with cleaner field names

16

# Delete all documentscurl "http://localhost:8983/solr/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E&commit=true"

# Index your datacurl "http://localhost:8983/solr/update/csv?

commit=true&stream.file=EuroCon2010.csv&fieldnames=first,last,company,title,country&header=true"

16

Page 17: Rapid Prototyping with Solr

http://localhost:8983/solr/itas?facet.field=country

Faceting

17

17

Page 18: Rapid Prototyping with Solr

country normalization

18

http://localhost:8983/solr/update/csv?commit=true&stream.file=EuroCon2010.csv&fieldnames=first,last,company,ti

tle,country&header=true&f.country.map=Great+Britain:United+Kingdom

18

Page 19: Rapid Prototyping with Solr

UI treatments Customize request handler mappings

Edit templates hit display header/footer style

19

19

Page 20: Rapid Prototyping with Solr

Customize request handlers

20

<requestHandler name="/browse" class="solr.SearchHandler"> <lst name="defaults"> <str name="wt">velocity</str> <str name="v.template">browse</str> <str name="v.layout">layout</str>

<str name="rows">10</str> <str name="fl">*,score</str>

<str name="defType">lucene</str> <str name="q">*:*</str> <str name="debugQuery">true</str> <str name="hl">on</str> <str name="hl.fl">title</str> <str name="hl.fragsize">0</str> <str name="hl.alternateField">title</str>

<str name="facet">on</str> <str name="facet.mincount">1</str> <str name="facet.missing">true</str> </lst> <lst name="appends"> <str name="facet.field">country</str> </lst></requestHandler>

20

Page 21: Rapid Prototyping with Solr

hit.vm

21

<div class="result-document"> <p>$doc.getFieldValue('first') $doc.getFieldValue('last')</p> <p>$!doc.getFieldValue('title'), $!doc.getFieldValue('company')</p> <p>$!doc.getFieldValue('country')</p></div>

21

Page 22: Rapid Prototyping with Solr

Voila!

22

22

Page 23: Rapid Prototyping with Solr

Adding bells and whistles JQuery

<script type="text/javascript" src="/solr/admin/jquery-1.2.3.min.js"></script>

Let's add a tree map <script type="text/javascript" src="/scripts/treemap.js"></script> http://plugins.jquery.com/project/Treemap

23

23

Page 24: Rapid Prototyping with Solr

tree map table

24

<script type="text/javascript"> function onLoad() { jQuery("#treemap-country").treemap(640,480, {}); }</script>----------------------------<body onload="onLoad();">----------------------------<table id="treemap-country">#foreach($facet in $response.getFacetField('country').values) <tr> <td>#if($facet.name)$esc.html($facet.name)#else&lt;Unspecified&gt;#end</td> <td>$facet.count</td> <td>#if($facet.name)$esc.html($facet.name)#{else}Unspecified#end</td> </tr>#end</table>

24

Page 25: Rapid Prototyping with Solr

Tree map

25

25

Page 26: Rapid Prototyping with Solr

Ajax fun: giveaways Add "static" templated page

JQuery Ajax request

snippet templated output

26

26

Page 27: Rapid Prototyping with Solr

"static" Solritas page

27

solrconfig.xml<requestHandler name="/giveaways" class="solr.DumpRequestHandler"> <lst name="defaults"> <str name="wt">velocity</str> <str name="v.template">giveaways</str> <str name="v.layout">layout</str> </lst></requestHandler>

giveaways.vm<input type="button" value="Pick a Winner" onClick="javascript:$('#winner').load('/solr/generate_winner?sort=random_' + new Date().getTime() + '+asc');"><h2>And the winner is...</h2><center><font size="20"><div id="winner"></div></font></center>

27

Page 28: Rapid Prototyping with Solr

fragment template

28

solrconfig.xml<requestHandler name="/generate_winner" class="solr.SearchHandler"> <!-- sort=random_... required --> <lst name="defaults"> <str name="wt">velocity</str> <str name="v.template">winner</str>

<str name="rows">1</str> <str name="fl">first,last</str>

<str name="defType">lucene</str> <str name="q">*:* -company:"Lucid Imagination" -company:"Stone Circle Productions"</str> </lst> </requestHandler>

winner.vm#set($winner=$response.results.get(0))$winner.getFieldValue('first') $winner.getFieldValue('last')

28

Page 29: Rapid Prototyping with Solr

And the winner is...

29

29

Page 30: Rapid Prototyping with Solr

Prototyping tools CSV update handler

Schema Browser

Solritas

Solr Explorer https://issues.apache.org/jira/browse/SOLR-1163

Solr Flare http://wiki.apache.org/solr/Flare

30

30

Page 31: Rapid Prototyping with Solr

Refine, iterate, integrate What's next?

script full & delta indexing processes adjust schema

define fields, field types, analysis tweak configuration

caches, indexing parameters deploy to staging/production environments

31

31

Page 32: Rapid Prototyping with Solr

Test Performance

Scalability

Relevance

Automate all of the above, start baselines and avoid regressions

32

32