21
Word Up! Using Lucene for full-text search of your data set

Word Up!

  • Upload
    baakir

  • View
    68

  • Download
    0

Embed Size (px)

DESCRIPTION

Word Up!. Using Lucene for full-text search of your data set. Full-text search. Review of full-text search options Focus on Lucene Integrating Lucene with JPA/Hibernate. Full-text search options. ‘LIKE’ queries SQL extensions Kludge with web search engine - PowerPoint PPT Presentation

Citation preview

Page 1: Word Up!

Word Up!Using Lucene for full-text search of your data set

Page 2: Word Up!

Full-text searchReview of full-text search optionsFocus on LuceneIntegrating Lucene with JPA/Hibernate

Page 3: Word Up!

Full-text search options‘LIKE’ queriesSQL extensionsKludge with web search engineKludge with web search applianceEmbeddable search library

Page 4: Word Up!

‘LIKE’ queries

Page 5: Word Up!

‘LIKE’ queriesSimple, straightforwardFast, easy to implementLarge result setLimited fuzziness (wildcard or regex)

Page 6: Word Up!

Full-text search extensionsNo standard syntax (Sybase, MSSQL, DB2, etc. all different)Administrative overhead for text search indicesOther limitations

Page 7: Word Up!

Kludge with search engineExternal indexing/search software

ht://DigmnoGoSearchSphinxXapian

Not necessarily pure JavaCan be database-intensiveLag in updating search index

Page 8: Word Up!

Kludge with search appliance“Black-box” solutions

ThunderstoneGoogle Search Appliance

Your data set mixes with public contentDoesn’t always work as advertisedCan’t fine-tune search

Page 9: Word Up!

Embeddable search library

Page 10: Word Up!

Search libraryExample: Apache LuceneDeploys as part of your application100% JavaFuzzy full-text search (Levenshtein algorithm)Searches against text, numeric, boolean fields with multiple optionsCan be integrated with JPA/Hibernate via Hibernate Search, Compass

Page 11: Word Up!

About LuceneSearch index stored on file system (also JDBC and BDB options)Can store/retrieve data to/from search index (Lucene Projections)Can index HTML, XML, Office docs, PDFs, Exchange mail with external toolsSupports extended and multi-byte character sets by default

Page 12: Word Up!

More about LuceneIndexes records as Lucene Document objectLucene Document doesn’t have to be a literal document – can be any arbitrary objectDocument can have any number of name-value pairsSynchronizing your data with search index is someone else’s problem …

Page 13: Word Up!

Integrating with JPA / HibernateMost common method: Hibernate Search

Supports only Hibernate providerAutomatically updates search index when object persisted to databaseEntity classes mapped to separate indexesEntity fields mapped to Lucene index fields using Java annotations

Page 14: Word Up!

Integrating with JPA/Hibernate …Alternate method: Compass ProjectSupports Hibernate, OpenJPA, othersNo release since 2009 – effectively unsupported

Page 15: Word Up!

Annotated class example …@Indexed@Entity@Cacheable(true)

@Table(name="MARKER", schema="MAPLINK")

public class Marker extends MarkerA implements Serializable {

@Id@Column(name="MKR_MARKERID")@Field(store=Store.YES)private long mkrMarkerid;

@Column(name="MKR_LAT", nullable = true)@Field(store=Store.YES)@NumericFieldprivate Double mkrLat;

@Column(name="MKR_LONG", nullable = true)@Field(store=Store.YES)@NumericFieldprivate Double mkrLong;

@Indexed – tells Hibernate that this entity class should be

indexed

Page 16: Word Up!

Annotated class example …@Indexed@Entity@Cacheable(true)

@Table(name="MARKER", schema="MAPLINK")

public class Marker extends MarkerA implements Serializable {

@Id@Column(name="MKR_MARKERID")@Field(store=Store.YES)private long mkrMarkerid;

@Column(name="MKR_LAT", nullable = true)@Field(store=Store.YES)@NumericFieldprivate Double mkrLat;

@Column(name="MKR_LONG", nullable = true)@Field(store=Store.YES)@NumericFieldprivate Double mkrLong;

@Field – tells Hibernate to create a matching name-value pair in the search index for this

entity class

Store.YES – stores the value for retrieval directly from the index, without touching the

database

Page 17: Word Up!

Annotated class example …@Indexed@Entity@Cacheable(true)

@Table(name="MARKER", schema="MAPLINK")

public class Marker extends MarkerA implements Serializable {

@Id@Column(name="MKR_MARKERID")@Field(store=Store.YES)private long mkrMarkerid;

@Column(name="MKR_LAT", nullable = true)@Field(store=Store.YES)@NumericFieldprivate Double mkrLat;

@Column(name="MKR_LONG", nullable = true)@Field(store=Store.YES)@NumericFieldprivate Double mkrLong;

@NumericField – index as a numeric value, enables greater

than / less than / range searches

Page 18: Word Up!

Let’s take a Luke at the index …

Page 19: Word Up!

Practical search exercise

Page 20: Word Up!

Questions!

Page 21: Word Up!