Upload
michael-wilde
View
330
Download
1
Tags:
Embed Size (px)
Citation preview
ExploringMachine Data
@michaelwilde, Co-CTO, Splunk
Hi... I work at Splunk.
We stare at data all day.
WTF is Machine Data?!
is it logs?
is it netflow?
is it TWEETS?
Aaaahhh, well... kind of.
a simple way to describe the exhaust from technology
*or a big giant pain in the butt.
Volume | Velocity | Variety | Variability
GPS,RFID,
Hypervisor,Web Servers,
Email, MessagingClickstreams, Mobile,
Telephony, IVR, Databases,Sensors, Telematics, Storage,
Servers, Security Devices, Desktops
Machine-generated data is one of the fastest growing, most complex
and most valuable segments of big data
Machine data is the BIGgest DATA
no, not uswe’re justnice guyswho wantshow youcool stuff
you are a producer and consumer of data
building a service?
using an app?
Location-‐Based Messaging and Intelligence For Your App and Your Customers
Seth RabinowitzCEO
James RodmellCTO
2011-11-06 11:57:31,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.75496,-73.963853,60
2011-11-06 12:17:32,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.755001,-73.963886,70
2011-11-06 12:37:33,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754982,-73.963849,75
2011-11-06 12:57:34,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754984,-73.963883,85
2011-11-06 13:17:35,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754941,-73.9639,90
2011-11-06 13:37:36,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754948,-73.963874,90
2011-11-06 13:57:37,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754931,-73.963892,95
2011-11-06 14:17:38,50,00027d27-ae02-627d-a79a-fa0004d3a347,40.755232,-73.963522,100
2011-11-06 14:37:33,65,00027d27-ae02-627d-a79a-fa0004d3a347,40.754979,-73.9639,100
Data! Good!DATE/TIME
DEVICE ID
LAT/LONG
BATTERY STRENGTH
show them something cool already!
Oh, real quick. Did you check in
or tweet #splunk #interop
...please
All this data can be pretty cooland empowering
Text
except one little
PROBLEM
alot of it looks like this
13/Apr/2011 08:52:53,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.16,192.168.1.6,(empty),(empty),1099,135,epmap,(empty),0,113/Apr/2011 08:52:53,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.16,192.168.1.6,(empty),(empty),1100,43025,43025_tcp,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.75,192.168.1.6,(empty),(empty),1048,135,epmap,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.75,192.168.1.6,(empty),(empty),1049,43025,43025_tcp,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.75,192.168.1.6,(empty),(empty),1051,135,epmap,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.75,192.168.1.6,(empty),(empty),1052,43025,43025_tcp,(empty),0,113/Apr/2011 08:52:55,Info,Teardown,ASA-session-6-302014,TCP,192.168.2.64,192.168.1.6,(empty),(empty),1694,135,epmap,(empty),0,1
and we’re expected to talk to it like this
select (select max(answer.answer) from answer where answer.member_id in (select member_id from team_members where project_id in ( select project_idfrom project where Business_stream='Upstream' and stage='Appraise' andproject_id in (select project_id from projectextra where subteam<>1 ) ) ) andanswer.page_id=page.page_id) as thinl, (select max(avgscore) from task_projectwhere task_project.project_id not in (select project_id from projectextrawhere subteam=1 ) and task_project.project_id in (select project_id fromproject where stage='Appraise' and Business_stream = 'Upstream') andtask_project.page_id=page.page_id) as bmax, (select max(answer) from answerwhere answer.page_id=page.page_id) as datamax, (select avg(avgscore) fromtask_project where project_id=1 and task_project.page_id=page.page_id) asprojavg, (select avg(avgscore) from task_project where project_id not in(select project_id from projectextra where subteam=1) andtask_project.page_id=page.page_id) as companyavg, (select avg(avgscore) fromtask_project where project_id not in (select project_id from projectextrawhere subteam=1) and project_id in (select project_id from project whereBusiness_stream = 'Upstream') and task_project.page_id=page.page_id) asbusinessavg, page.* from page,riverorder where page.category_name='BusinessBoundaries' and stage_name='Appraise' andriverorder.category_name=page.category_name order byriverorder.riverorder,page.order_id select (select max(answer.answer) fromanswer where answer.member_id in ( select member_id from team_members whereproject_id in ( select project_id from project whereBusiness_stream='Upstream' and stage='Appraise' and project_id in (selectproject_id from projectextra where subteam<>1 ) ) ) andanswer.page_id=page.page_id) as thinl, (select max(avgscore) from task_projectwhere task_project.project_id not in (select project_id from projectextrawhere subteam=1 ) and task_project.project_id in (select project_id fromproject where stage='Appraise' and Business_stream = 'Upstream') andtask_project.page_id=page.page_id) as bmax, (select max(answer) from answerwhere answer.page_id=page.page_id) as datamax, (select avg(avgscore) fromtask_project where project_id=1 and task_project.page_id=page.page_id) asprojavg, (select avg(avgscore) from task_project where project_id not in(select project_id from projectextra where subteam=1) andtask_project.page_id=page.page_id) as companyavg, (select avg(avgscore) fromtask_project where project_id not in (select project_id from projectextrawhere subteam=1) and project_id in (select project_id from project whereBusiness_stream = 'Upstream') and task_project.page_id=page.page_id) asbusinessavg, page.* from page,riverorder where page.category_name='BusinessBoundaries' and stage_name='Appraise' andriverorder.category_name=page.category_name order byriverorder.riverorder,page.order_id
It could be better. yes? better is good!
{[-‐] checkin : {[-‐] badges : [], created : 1331454784, geolat : "30.2640941786", geolong : "-‐97.7414819408", mayor : {[-‐] type : "nochange" }, primarycategory : {[-‐] fullpathname : "Food:American Restaurants", iconurl : "https://foursquare.com/img/categories/food/default.png", id : "4bf58dd8d48988d14e941735", nodename : "American Restaurants" }, timezone : "America/Chicago", user : {[-‐] gender : "male" }, venue : {[-‐] id : "4d752b1bba682d43e7563876", name : "CNN Grill @ SXSW (Max's Wine Dive)" } }} readable, ya think?
Text
failed password | timechart count by client_ip
The languages to talk to data are getting better for us humans
Guys.. come on! Go back to the data please.
a simple way to describe a massive problem
A friend in Boulder can help
Need data?
The Social Media API
Jud ValeskiCo-Founder, CEO
Sometimes machine data is helpful to those OTHER than IT
Someone with a different
perspective sees your
exhaust as a source of fuel
please, please, pleaseCALL THE VP OF
ENGINEERINGat all of your vendors.
DEMAND REALTIME DATAIN A STREAM OVER THE WEB
IN JSON FORMAT
Hey audience!We still have a few
minutes.
What questions might you have
been saving until this exact moment?
Thanks.
@michaelwilde
Michael WildeSplunk Ninja
Co-CTO, SplunkWho else sends you on your way with a cute dog photo?