15
Lucene in Action Применение Lucene для построения высокопроизволительных систем Гавриленко Евгений Ведущий разработчик Artezio

Lucene in Action

Embed Size (px)

Citation preview

Page 1: Lucene in Action

Lucene in Action

Применение Lucene для построения

высокопроизволительных систем

Гавриленко ЕвгенийВедущий разработчик Artezio

Page 2: Lucene in Action

Lucene

• Что же это такое?• Twitter 1млрд запросов в день• hh.ru 400 запросов в секунду• LinkedIn, FedEx…

Page 3: Lucene in Action

Основные компоненты индексации

• IndexWriter• Directory (FSDirectory, RAMDirectory)• Analyzer• Document• Field / Multivalued fields

Page 4: Lucene in Action

Построение индексаvar directory = new RAMDirectory();//var directory = FSDirectory.Open("/tmp/testindex");

var analyzer = new RussianAnalyzer(Version.LUCENE_30);using (var writer = new IndexWriter(directory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED)){ for (var i = 0; i < 1000000; i++) { var doc = new Document(); doc.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); doc.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED)); doc.Add(new Field("text",string.Format("{0} строка 2.", i),Field.Store.YES,Field.Index.ANALYZED)); writer.AddDocument(doc); if (i%100000 == 0) Console.WriteLine("[{1}]: {0} документов сохранено.",i,DateTime.Now); } writer.Optimize();}

Page 5: Lucene in Action

Схема данных var doc1 = new Document(); doc1.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); doc1.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED));var field = new NumericField(“numericField1”, Field.Store.NO, true);doc1.Add(field.SetDoubleValue(value));

var doc2 = new Document(); doc2.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); doc2.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED)); doc2.Add(new Field(“blablaFild1", “blabla-body",Field.Store.YES,Field.Index.ANALYZED));

Page 6: Lucene in Action

Основные компоненты поиска

• IndexSearcher/MultiSearcher/ParallelMultiSearcher• Term• Query• TermQuery• TopDocs

Page 7: Lucene in Action

Query

• TermQuery• MultiFieldQueryParser• BooleanQuery• NumericRangeQuery• SpanQuery• …• QueryParser

Page 8: Lucene in Action

Поиск

var reader = IndexReader.Open(directory, true);var searcher = new IndexSearcher(reader);

var parser = new QueryParser(Version.LUCENE_30, "text", analyzer);var query = parser.Parse("20 строку");

var hits = searcher.Search(query, 100);

Console.WriteLine("total hits: {0}", hits.TotalHits);if (hits.TotalHits == 0) return;

var rdoc = reader.Document(hits.ScoreDocs[0].Doc);Console.WriteLine("value:{0}", rdoc.Get("text"));

Page 9: Lucene in Action

Поиск с сортировкойswitch (sl){

case "barcode": case "code": indexSort = new Sort(new SortField(sl, SortField.STRING,indexDir));

break;case "price":

indexSort = new Sort(new SortField(sl, SortField.DOUBLE, indexDir));break;

default: indexSort = new Sort(new SortField(sl, SortField.STRING, indexDir));

break;}...searcher.SetDefaultFieldSortScoring(true,false);var hits = searcher.Search(query, filter, count, indexSort);

Page 10: Lucene in Action

Paging

Page 11: Lucene in Action

Анализаторы

• StandardAnalyzer• SnowballAnalyzer• KeywordAnalyzer• WhitespaceAnalyzer

• RussianAnalyzer ()

Page 12: Lucene in Action

Применение в E-Commerce

EcommerceDB

Service/Daemon

LuceneIndex

searchservice

Search backend

Page 13: Lucene in Action

Linq to Lucenepublic class Article{ [Field(Analyzer = typeof(StandardAnalyzer))] public string Author { get; set; }

[Field(Analyzer = typeof(StandardAnalyzer))] public string Title { get; set; }

public DateTimeOffset PublishDate { get; set; }

[NumericField] public long Id { get; set; }

[Field(IndexMode.NotIndexed, Store = StoreMode.Yes)] public string BodyText { get; set; }

[Field("text", Store = StoreMode.No, Analyzer = typeof(PorterStemAnalyzer))] public string SearchText { get { return string.Join(" ", new[] {Author, Title, BodyText}); } }}

Page 14: Lucene in Action

Linq to Lucenevar directory = new RAMDirectory();

var provider = new LuceneDataProvider(directory, Version.LUCENE_30);

using (var session = provider.OpenSession<Article>()){ session.Add(new Article {Author = "John Doe", BodyText = "some body text", PublishDate = DateTimeOffset.UtcNow});}

var articles = provider.AsQueryable<Article>();var threshold = DateTimeOffset.UtcNow.Subtract(TimeSpan.FromDays(30));

var articlesByJohn = from a in articles where a.Author == "John Doe" && a.PublishDate > threshold orderby a.Title select a;Console.WriteLine("Articles by John Doe: " + articlesByJohn.Count());

var searchResults = from a in articles where a.SearchText == "some search query" select a;Console.WriteLine("Search Results: " + searchResults.Count());

Page 15: Lucene in Action

Полезные ресурсы

• Lucene http://lucene.apache.org/

• Lucene.Net http://lucenenet.apache.org

• Linq to Lucenehttps://github.com/themotleyfool/Lucene.Net.Linq

• “Lucene in Action” http://it-ebooks.info/book/2112