Real-time Analytics
from Small Data, Big
Data and Huge Data
Raanan Dagan, Big Data Solutions, Splunk
Copyright © 2012 Splunk Inc.
What I’ll Talk About
Machine Data
Splunk and Big Data, Real-time Analytics
Customer Use Cases
2
Big Data Comes from Machines
Volume | Velocity | Variety | Variability
GPS,
RFID,
Hypervisor,
Web Servers,
Email, Messaging
Clickstreams, Mobile,
Telephony, IVR, Databases,
Sensors, Telematics, Storage,
Servers, Security Devices, Desktops
Machine-generated data is one of the
fastest growing, most complex
and most valuable segments of big data
3
What Does Machine Data Look Like?
4
Sources
Care IVR
Middleware Error
Order Processing
Machine Data Contains Critical Insights
5
Order ID
Customer’s Tweet
Time Waiting On Hold
Product ID
Company’s Twitter ID
Sources
Care IVR
Middleware Error
Order Processing
Order ID
Customer ID
ID
Customer ID
Customer ID
Splunk: The Platform for Machine Data
6
Insight and Visualizations
for Executives
Statistical Analysis
Proactive Monitoring
Search and Investigation
Machine Data Operational Intelligence
Splunk Index
Customer
Facing Data
Outside the
Datacenter
Applications
Web logsLog4J, JMS, JMX.NET eventsCode and scripts
Networking
ConfigurationssyslogSNMPnetflow
Databases
ConfigurationsAudit/query logsTablesSchemas
Virtualization
& Cloud
HypervisorGuest OS, AppsCloud
Linux/Unix
ConfigurationssyslogFile systemps, iostat, top
Windows
RegistryEvent logsFile systemsysinternals
Logfiles Configs Messages Traps
Alerts
Metrics Scripts TicketsChanges
Click-stream dataShopping cart dataOnline transaction data
Manufacturing, logistics…CDRs & IPDRsPower consumptionRFID dataGPS data
Splunk Collects and Indexes Machine DataNo upfront schema. No RDBMS. No custom connectors.
7
Operational Intelligence for IT and Business Users
Web Intelligence
Application Management Business Analytics
Security & Compliance
LOB Owners/
Executives
LOB Owners/
ExecutivesCustomer
Support
Customer
Support
System
Administrator
System
Administrator
IT Operations Management
Operations
Teams
Operations
Teams
Security
Analysts
Security
Analysts
IT
Executives
IT
ExecutivesDevelopment
Teams
Development
Teams AuditorsAuditors
Website/Business
Analysts
Website/Business
Analysts
8
The Technical part
Splunk Has Four Primary Functions
• Searching and Reporting (Search Head)
• Indexing and Search Services (Indexer)
• Local and Distributed Management (Deployment Server)
• Data Collection and Forwarding (Forwarder)
A Splunk install can be one or all roles…
10
Scalability to Tens of TBs/Day on Commodity Servers
Send data from 1000s of servers using combination of Splunk Forwarders, syslog, WMI, message queues, or other remote protocols
Auto load-balanced forwarding to as many Splunk Indexers as you need to index terabytes/day
Offload search load to Splunk Search Heads
11
Analyzing Heterogeneous Data
No data normalization
Automatically handles
timestamps
Parsers not required
Index every term &
pattern “blindly”
No attempt to
“understand” up front
Normalization as it’s
needed
Faster implementation
Easy search language
Multiple views into the
same data
Knowledge applied at
search-time
No brittle schema to work
around
Multiple views into the
same data
Find transactions, patterns
and trends
Universal
Indexing
Late Structure
Binding
Analysis and Visualization
Rapid time-to-deploy: hours or days
12
Real-time Analytics
Data
Pa
rsin
g Q
ue
ue Parsing Pipeline
• Source, event typing
• Character set
normalization
• Line breaking
• Timestamp identification
• Regex transforms
Indexing
Pipeline
Real-time
Buffer
Raw data
Index Files
Real-time Search Process
Real-time Search Process
Monitor Input
Ind
ex
Qu
eu
e
TCP/UDP Input
Scripted InputSplunk
Index
13
Splunk and Hadoop
14
Splunk Hadoop Connect
Reliable Data Export
Import Hadoop Data
Splunk App for HadoopOps
End-to-end monitoring,
troubleshooting , analysis of
Hadoop environment
>>>>
Real-time Collection and
Analysis
Dashboards, Reports,
Access Controls
>>
15
Splunk Hadoop Connect
Delivers reliable integration
between Splunk and Hadoop
Export events collected and
aggregated in Splunk to HDFS
Explore and browse HDFS
directories and files
Import and index data from HDFS
for secure searching, reporting,
analysis and visualizations in Splunk
Splunk App for HadoopOps
16
End-to-end monitoring and
troubleshooting for Hadoop
Monitoring of entire Hadoop
environment (Network, Switch,
Operating System and Database)
Integrated alerting to track and
respond to activities from MapReduce
to the individual node in the cluster
Centralized real-time view of Hadoop
nodes using intuitive heatmap display
Summary - Splunk Big Data Solution
Product-basedSolution
Performance at scale
Integrated and End-to-end
17
Thank You
Recommended