Word Up!

Preview:

DESCRIPTION

Word Up!. Using Lucene for full-text search of your data set. Full-text search. Review of full-text search options Focus on Lucene Integrating Lucene with JPA/Hibernate. Full-text search options. ‘LIKE’ queries SQL extensions Kludge with web search engine - PowerPoint PPT Presentation

Citation preview

Word Up!Using Lucene for full-text search of your data set

Full-text searchReview of full-text search optionsFocus on LuceneIntegrating Lucene with JPA/Hibernate

Full-text search options‘LIKE’ queriesSQL extensionsKludge with web search engineKludge with web search applianceEmbeddable search library

‘LIKE’ queries

‘LIKE’ queriesSimple, straightforwardFast, easy to implementLarge result setLimited fuzziness (wildcard or regex)

Full-text search extensionsNo standard syntax (Sybase, MSSQL, DB2, etc. all different)Administrative overhead for text search indicesOther limitations

Kludge with search engineExternal indexing/search software

ht://DigmnoGoSearchSphinxXapian

Not necessarily pure JavaCan be database-intensiveLag in updating search index

Kludge with search appliance“Black-box” solutions

ThunderstoneGoogle Search Appliance

Your data set mixes with public contentDoesn’t always work as advertisedCan’t fine-tune search

Embeddable search library

Search libraryExample: Apache LuceneDeploys as part of your application100% JavaFuzzy full-text search (Levenshtein algorithm)Searches against text, numeric, boolean fields with multiple optionsCan be integrated with JPA/Hibernate via Hibernate Search, Compass

About LuceneSearch index stored on file system (also JDBC and BDB options)Can store/retrieve data to/from search index (Lucene Projections)Can index HTML, XML, Office docs, PDFs, Exchange mail with external toolsSupports extended and multi-byte character sets by default

More about LuceneIndexes records as Lucene Document objectLucene Document doesn’t have to be a literal document – can be any arbitrary objectDocument can have any number of name-value pairsSynchronizing your data with search index is someone else’s problem …

Integrating with JPA / HibernateMost common method: Hibernate Search

Supports only Hibernate providerAutomatically updates search index when object persisted to databaseEntity classes mapped to separate indexesEntity fields mapped to Lucene index fields using Java annotations

Integrating with JPA/Hibernate …Alternate method: Compass ProjectSupports Hibernate, OpenJPA, othersNo release since 2009 – effectively unsupported

Annotated class example …@Indexed@Entity@Cacheable(true)

@Table(name="MARKER", schema="MAPLINK")

public class Marker extends MarkerA implements Serializable {

@Id@Column(name="MKR_MARKERID")@Field(store=Store.YES)private long mkrMarkerid;

@Column(name="MKR_LAT", nullable = true)@Field(store=Store.YES)@NumericFieldprivate Double mkrLat;

@Column(name="MKR_LONG", nullable = true)@Field(store=Store.YES)@NumericFieldprivate Double mkrLong;

@Indexed – tells Hibernate that this entity class should be

indexed

Annotated class example …@Indexed@Entity@Cacheable(true)

@Table(name="MARKER", schema="MAPLINK")

public class Marker extends MarkerA implements Serializable {

@Id@Column(name="MKR_MARKERID")@Field(store=Store.YES)private long mkrMarkerid;

@Column(name="MKR_LAT", nullable = true)@Field(store=Store.YES)@NumericFieldprivate Double mkrLat;

@Column(name="MKR_LONG", nullable = true)@Field(store=Store.YES)@NumericFieldprivate Double mkrLong;

@Field – tells Hibernate to create a matching name-value pair in the search index for this

entity class

Store.YES – stores the value for retrieval directly from the index, without touching the

database

Annotated class example …@Indexed@Entity@Cacheable(true)

@Table(name="MARKER", schema="MAPLINK")

public class Marker extends MarkerA implements Serializable {

@Id@Column(name="MKR_MARKERID")@Field(store=Store.YES)private long mkrMarkerid;

@Column(name="MKR_LAT", nullable = true)@Field(store=Store.YES)@NumericFieldprivate Double mkrLat;

@Column(name="MKR_LONG", nullable = true)@Field(store=Store.YES)@NumericFieldprivate Double mkrLong;

@NumericField – index as a numeric value, enables greater

than / less than / range searches

Let’s take a Luke at the index …

Practical search exercise

Questions!