15
MOVIE QUOTES SEARCH ENGINE Students: Meytal Bialik Zvi Cahana Supervisors: Hayim Makabee Oren Somekh Technion – Israel Institute Of Technology Computer Science Department 19.6.12 MQSE 3 Industrial Project – Final Presentation

MOVIE QUOTES SEARCH ENGINE

  • Upload
    eilis

  • View
    71

  • Download
    0

Embed Size (px)

DESCRIPTION

MOVIE QUOTES SEARCH ENGINE. Industrial Project – Final Presentation. Students: Meytal Bialik Zvi Cahana. Technion – Israel Institute Of Technology Computer Science Department. Supervisors: Hayim Makabee Oren Somekh. MQSE. 3. 19.6.12. Introduction. - PowerPoint PPT Presentation

Citation preview

Page 1: MOVIE  QUOTES  SEARCH  ENGINE

MOVIE QUOTES SEARCH ENGINE

Students:Meytal BialikZvi Cahana

Supervisors:Hayim MakabeeOren Somekh

Technion – Israel Institute Of TechnologyComputer Science Department

19.6.12 MQSE 3

Industrial Project – Final Presentation

Page 2: MOVIE  QUOTES  SEARCH  ENGINE

IntroductionThe Movie Quotes Search Engine project focuses on the creation of a search engine allowing a user to search for terms that appear in the dialogues of a movie.

The project consists of two main components:

A web application used as a user interface to the search engine.

A crawling engine used to maintain a searchable index and a content database.

Introduction

Goals

Methodology

System Diagram

Achievements

Testing

Screenshots

Conclusions

Page 3: MOVIE  QUOTES  SEARCH  ENGINE

Goals Relevant search results

Modern UI design

Rich search options

Video play option

Browser agnostic website

Large-scale movies database

Incremental, priority-based crawling

Introduction

Goals

Methodology

System Diagram

Achievements

Testing

Screenshots

Conclusions

Page 4: MOVIE  QUOTES  SEARCH  ENGINE

Methodology IMDb & OpenSubtitles.org dump files

SRT subtitle files

OpenSubtitles.org XML-RPC API

SQLite database

Apache Lucene

Java Servlets / JSP

HTML5 / CSS / JavaScript

Introduction

Goals

Methodology

System Diagram

Achievements

Testing

Screenshots

Conclusions

Page 5: MOVIE  QUOTES  SEARCH  ENGINE

System Diagram Introduction

Goals

Methodology

System Diagram

Achievements

Testing

Screenshots

Conclusions

Page 6: MOVIE  QUOTES  SEARCH  ENGINE

Achievements Crawling

Command-line tool Dump files parsing OpenSubtitles.org API based Subtitles downloading & indexing Cover art downloading Multithreaded pipelined execution Priority based Index recovery

Introduction

Goals

Methodology

System Diagram

Achievements

Testing

Screenshots

Conclusions

Page 7: MOVIE  QUOTES  SEARCH  ENGINE

Achievements Storage

SQLite-based database Movies metadata (popularity, rating, IMDb link...) Cover art ~20000 subtitles downloaded & indexed Local videos repository

Introduction

Goals

Methodology

System Diagram

Achievements

Testing

Screenshots

Conclusions

Page 8: MOVIE  QUOTES  SEARCH  ENGINE

Achievements Indexing

SRT files parsing & validating SRT files filtering

Translator comments Hearing impaired comments Format tags

Partitioning into overlapping search units Indexing using Lucene core

Stemming Stop words removal Actual indexing of the search units

~250ms per average SRT file

Introduction

Goals

Methodology

System Diagram

Achievements

Testing

Screenshots

Conclusions

Page 9: MOVIE  QUOTES  SEARCH  ENGINE

Achievements Searching

Searching using Lucene core Query parsing Search operators support Stemming Stop words removal Relevant buckets retrieval & ranking

Aggregating buckets to movies Merging of overlapping buckets Highlighting search words using Lucene core Buckets trimming to most relevant text Configurable weighted movie ranking

Lucene rank Popularity Rating Year

Introduction

Goals

Methodology

System Diagram

Achievements

Testing

Screenshots

Conclusions

Page 10: MOVIE  QUOTES  SEARCH  ENGINE

Achievements Web Application

JSP/HTML5/CSS/JavaScript based Full support for IE9 Modern UI design Search results snippets Multiple hits per movie Paging Video play option

Per result snippet Relevant scene Captions

Introduction

Goals

Methodology

System Diagram

Achievements

Testing

Screenshots

Conclusions

Page 11: MOVIE  QUOTES  SEARCH  ENGINE

TestingA testing platform enables comparing search results “quality” against different system configurations.

In each test, the search engine is queried with famous quotes

A test passes if relevant movie is found in the top-K results

Introduction

Goals

Methodology

System Diagram

Achievements

Testing

Screenshots

Conclusions

Page 12: MOVIE  QUOTES  SEARCH  ENGINE

TestingWe tested the system with a set of ~100 famous movie quotes.With biased system configuration and K=9, we acquired ~90% pass rate.

Introduction

Goals

Methodology

System Diagram

Achievements

Testing

Screenshots

Conclusions

Page 13: MOVIE  QUOTES  SEARCH  ENGINE

Screenshots Introduction

Goals

Methodology

System Diagram

Achievements

Testing

Screenshots

Conclusions

Page 14: MOVIE  QUOTES  SEARCH  ENGINE

Screenshots Introduction

Goals

Methodology

System Diagram

Achievements

Testing

Screenshots

Conclusions

Page 15: MOVIE  QUOTES  SEARCH  ENGINE

Conclusions Lucene is a powerful search platform

Optimal search results are difficult to define

Subtitles files from public sources should be further validated

HTML5 video support is still limited & browser dependent

Source control systems make life easier

Introduction

Goals

Methodology

System Diagram

Achievements

Testing

Screenshots

Conclusions