15
A Big Data Architecture for Search Kamran Khan, CEO The expert in the search space

Enterprise Search Summit Keynote: A Big Data Architecture for Search

Embed Size (px)

DESCRIPTION

This presentation was given by Search Technologies' CEO Kamran Khan at the November 2013 Enterprise Search Summit / KMWorld in Washington DC. He discussed how modern search engines are currently being combined with powerful independent content processing pipelines and the distributed processing technologies from big data to form new and exciting enterprise search architecture, delivering results only available to the biggest companies with the deepest pockets in the past. For more information visit http://www.searchtechnologies.com/.

Citation preview

Page 1: Enterprise Search Summit Keynote: A Big Data Architecture for Search

A Big Data Architecture for SearchKamran Khan, CEO

The expert in the search space

Page 2: Enterprise Search Summit Keynote: A Big Data Architecture for Search

The expert in the search space

Search Technologies Overview

San Diego, CA

San Jose, CR

Herndon, VA

Ascot, UK

Cincinnati, OH Karlsruhe, DE

• The leading IT Services company dedicated to Enterprise Search & Search-based Applications

• Implementation, Consulting, Managed Services• 120 employees and growing• Independent, working with all of the leading

software vendors and open source alternatives

Page 4: Enterprise Search Summit Keynote: A Big Data Architecture for Search

The expert in the search space

What Is Big Data?

Page 5: Enterprise Search Summit Keynote: A Big Data Architecture for Search

The expert in the search space

Where Did Modern Big Data Come From?

LOG FILES

Web Servers

Content

Web Servers

Content

LOG FILESLOG

FILES

Web Servers

Content

Page 6: Enterprise Search Summit Keynote: A Big Data Architecture for Search

The expert in the search space

What is Big Data?

LOG FILESLOG FILES

LOG FILES

LOG FILES

LOG FILES

LOG FILESLOG FILES

LOG FILES

LOG FILES

LOG FILES

LOG FILES

LOG FILES

LOG FILES

LOG FILES

LOG FILES

LOG FILES

LOG FILES

LOG FILES

LOG FILES

LOG FILES

LOG FILES

Page 7: Enterprise Search Summit Keynote: A Big Data Architecture for Search

The expert in the search space

What is Big Data?

Too big for a single machinePhysically impossible for a single machine

Data Aggregation & AnalysisSimply transforming data records is not enoughMust aggregate / “boil down” the data

Batch ProcessingVery long running jobs (not real-time)

Message: Lots of Data “Big Data”

Page 8: Enterprise Search Summit Keynote: A Big Data Architecture for Search

The expert in the search space

Enabling Technologies

Modern Statistical Analysis

Elastic / Cloud

Computing

Big Data For Search

Hadoop

Page 9: Enterprise Search Summit Keynote: A Big Data Architecture for Search

The expert in the search space

What is Big Data?

UnstructuredData

Content

Content

Content

Content

Content

Content

Content

Hadoop

Content

Content

Content

Content

Content

Content

Content

Content

Page 10: Enterprise Search Summit Keynote: A Big Data Architecture for Search

The expert in the search space

A Traditional Integrated Architecture

Search EngineSharePoint

Content Sources

Aspire Connector

ConnectorsIndex Pipeline

Search Index

Does a lot of what we need for Enterprise Search

Limitations• Limited support for modern analytics• Limited support for content processing• Re-indexing takes too long• Limits ability to do continuous improvement cycle

File System

RDBMS

Employee Directory

ETC.

Page 11: Enterprise Search Summit Keynote: A Big Data Architecture for Search

The expert in the search space

Why Content Processing is Important

Powerful & Complete Content Processing ServiceClean and consistent data and metadataAbility to supplement metadata

Support for Continuous Improvement CycleDevelop and maintain processing IPAbility to easily migrate to new search engines

Search EngineEmployee Directory

Content Sources

Aspire Connector

ConnectorsIndex Pipeline

Search IndexContent

ProcessingContent

ProcessingFile System

RDBMS

Employee Directory

ETC.

Page 12: Enterprise Search Summit Keynote: A Big Data Architecture for Search

The expert in the search space

A New Enterprise Search Architecture

Integrated Platform (Docs, Log Files and External data)Reduced CostBetter Agility and ScalabilityFast ReindexingExpanded Functionality

Search EngineEmployee Directory

Content Sources

Aspire Connector

ConnectorsIndex Pipeline Search

IndexContent

Processing &Tokenization

Secure Cache

Analytics

Docs, Log files,Supplemental

DataETC.

File System

RDBMS

Employee Directory

Page 13: Enterprise Search Summit Keynote: A Big Data Architecture for Search

The expert in the search space

Advanced Features & Analytics Enabled

Search and MatchForward and Reverse CitationLatent Semantic AnalysisMore Precise Term Weighting Beyond TF/IDFNear Duplicate DetectionDocument Topic TaggingResults ranking including popularityRecommendations based on user behaviorSuggested queries based on user behavior

Page 14: Enterprise Search Summit Keynote: A Big Data Architecture for Search

The expert in the search space

In Summary

Structured Big Data Technology Will Revolutionize Enterprise Search

New architecture for search providing better:Analytics and other functionalityContent processingAgilityEconomics and scalability

Big Data architectures will significantly move search forward

Page 15: Enterprise Search Summit Keynote: A Big Data Architecture for Search

For further informationwww.searchtechnologies.com

The expert in the search space