Upload
devowl-meetup
View
6.550
Download
0
Embed Size (px)
Citation preview
Lucene in Action
Применение Lucene для построения
высокопроизволительных систем
Гавриленко ЕвгенийВедущий разработчик Artezio
Lucene
• Что же это такое?• Twitter 1млрд запросов в день• hh.ru 400 запросов в секунду• LinkedIn, FedEx…
Основные компоненты индексации
• IndexWriter• Directory (FSDirectory, RAMDirectory)• Analyzer• Document• Field / Multivalued fields
Построение индексаvar directory = new RAMDirectory();//var directory = FSDirectory.Open("/tmp/testindex");
var analyzer = new RussianAnalyzer(Version.LUCENE_30);using (var writer = new IndexWriter(directory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED)){ for (var i = 0; i < 1000000; i++) { var doc = new Document(); doc.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); doc.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED)); doc.Add(new Field("text",string.Format("{0} строка 2.", i),Field.Store.YES,Field.Index.ANALYZED)); writer.AddDocument(doc); if (i%100000 == 0) Console.WriteLine("[{1}]: {0} документов сохранено.",i,DateTime.Now); } writer.Optimize();}
Схема данных var doc1 = new Document(); doc1.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); doc1.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED));var field = new NumericField(“numericField1”, Field.Store.NO, true);doc1.Add(field.SetDoubleValue(value));
var doc2 = new Document(); doc2.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); doc2.Add(new Field("text",string.Format("{0} строка.", i),Field.Store.YES,Field.Index.ANALYZED)); doc2.Add(new Field(“blablaFild1", “blabla-body",Field.Store.YES,Field.Index.ANALYZED));
Основные компоненты поиска
• IndexSearcher/MultiSearcher/ParallelMultiSearcher• Term• Query• TermQuery• TopDocs
Query
• TermQuery• MultiFieldQueryParser• BooleanQuery• NumericRangeQuery• SpanQuery• …• QueryParser
Поиск
var reader = IndexReader.Open(directory, true);var searcher = new IndexSearcher(reader);
var parser = new QueryParser(Version.LUCENE_30, "text", analyzer);var query = parser.Parse("20 строку");
var hits = searcher.Search(query, 100);
Console.WriteLine("total hits: {0}", hits.TotalHits);if (hits.TotalHits == 0) return;
var rdoc = reader.Document(hits.ScoreDocs[0].Doc);Console.WriteLine("value:{0}", rdoc.Get("text"));
Поиск с сортировкойswitch (sl){
case "barcode": case "code": indexSort = new Sort(new SortField(sl, SortField.STRING,indexDir));
break;case "price":
indexSort = new Sort(new SortField(sl, SortField.DOUBLE, indexDir));break;
default: indexSort = new Sort(new SortField(sl, SortField.STRING, indexDir));
break;}...searcher.SetDefaultFieldSortScoring(true,false);var hits = searcher.Search(query, filter, count, indexSort);
Paging
Анализаторы
• StandardAnalyzer• SnowballAnalyzer• KeywordAnalyzer• WhitespaceAnalyzer
• RussianAnalyzer ()
Применение в E-Commerce
EcommerceDB
Service/Daemon
LuceneIndex
searchservice
Search backend
Linq to Lucenepublic class Article{ [Field(Analyzer = typeof(StandardAnalyzer))] public string Author { get; set; }
[Field(Analyzer = typeof(StandardAnalyzer))] public string Title { get; set; }
public DateTimeOffset PublishDate { get; set; }
[NumericField] public long Id { get; set; }
[Field(IndexMode.NotIndexed, Store = StoreMode.Yes)] public string BodyText { get; set; }
[Field("text", Store = StoreMode.No, Analyzer = typeof(PorterStemAnalyzer))] public string SearchText { get { return string.Join(" ", new[] {Author, Title, BodyText}); } }}
Linq to Lucenevar directory = new RAMDirectory();
var provider = new LuceneDataProvider(directory, Version.LUCENE_30);
using (var session = provider.OpenSession<Article>()){ session.Add(new Article {Author = "John Doe", BodyText = "some body text", PublishDate = DateTimeOffset.UtcNow});}
var articles = provider.AsQueryable<Article>();var threshold = DateTimeOffset.UtcNow.Subtract(TimeSpan.FromDays(30));
var articlesByJohn = from a in articles where a.Author == "John Doe" && a.PublishDate > threshold orderby a.Title select a;Console.WriteLine("Articles by John Doe: " + articlesByJohn.Count());
var searchResults = from a in articles where a.SearchText == "some search query" select a;Console.WriteLine("Search Results: " + searchResults.Count());
Полезные ресурсы
• Lucene http://lucene.apache.org/
• Lucene.Net http://lucenenet.apache.org
• Linq to Lucenehttps://github.com/themotleyfool/Lucene.Net.Linq
• “Lucene in Action” http://it-ebooks.info/book/2112