Upload
paul-sitowitz
View
80
Download
2
Embed Size (px)
Citation preview
Monetize the Noise: How Naming data junk became a security data treasurePaul Sitowitz & Scott King WalkerSeptember 28th, 2015
Verisign Confidential and Proprietary 2
Reduce, Reuse, Recycle
• Restore
• Repurpose
• Remake
• Reinvent
• ReimagineImage by Jakub Jankiewicz (jcubic / Kuba) (Open Clip Art Library, detail page) CC0, via Wikimedia Commons
Verisign Confidential and Proprietary 3
Noise, Noise, Noise, and Pigeon Droppings
• Excerpt is from: Wired Magazine, “Accept Defeat: The Neuroscience of Screwing Up” by Jonah Lehrer, 12.21.09.
• http://www.wired.com/2009/12/fail_accept_defeat/
Verisign Confidential and Proprietary 4
Junk & Noise
• These are the unwanted things that we usually discard or else try to block out
• Junk• Trash• Unused items• Not needed items• Useless items• Not liked items
• Noise• Loud sounds• Interference• Malicious signals• Harmful irritants • Bad smells
Verisign Confidential and Proprietary 5
Noise in our Data
• "Photon-noise" by Mdf - Photon-noise.jpg. Licensed under CC BY-SA 3.0 via Wikimedia Commons - https://commons.wikimedia.org/wiki/File:Photon-noise.jpg#/media/File:Photon-noise.jpg
Verisign Confidential and Proprietary 6
Data Analyzer, and Signal from Noise
• YXD• NXD• Resolution Success = Signal• Resolution Failure = Noise (or does it?)
• The Data Analyzer product is based on finding signal in this noise.
Verisign Confidential and Proprietary 7
Looking at NXDs• When a Name Server can not resolve a domain, an NXD
response is returned• This data is typically discarded as “junk”• Data Analyzer analyzes this data to identify domains
• with sufficient traffic • requested during business hours• requested from specific locations around the world• … and many other desirable characteristics (like clickable traffic)
• We rate and score these NXDs (from 1 to 10) and allow our customers to query them
Verisign Confidential and Proprietary 8
NXD Domains• Sample of NXDs with sufficient traffic (according to DA):
• GENTLEMANMILLION.NET• CDSYHD.NET• SILVERHORSETRADER• A2VERISIGNDNS.COM• SARAH.COM• 3RDBILLION.NET• XN--JJEEP-3F5FW08B.COM• PAULHUNTHOMES.COM• CAT-HUSE• SCOTTSTORAGE2.COM • MANNYSGOLFWORLD.COM
Verisign Confidential and Proprietary 9
EDAS Record Format
• The NXD DNS traffic data available to Verisign is stored in the EDAS record format
• A single EDAS formatted record contains:• The Requesting IP (recursive name server)• The Requested Domain including TLD (up to 3rd level)• The time of day that the request was made• The Site name were they request was received• The DNS Record Type for the request (typically A and AAAA)
Verisign Confidential and Proprietary 10
Big Data
• NXD request data is captured by the Traffic Monitoring team from our Edge sites for COM/NET/CC/TV
• Comprises 90% of NXD traffic
• The data is then ingested into the VSCC• an average of 300 Gigabytes each day
• Data Analyzer allows customers to query up to 26 weeks of raw NXD data
• That’s 42.2 Terabytes of data needed by a single customer query
• If we factor in a 3X replication model used by the VSCC at both the BRN and ILG sites, that adds up to about 250 Terabytes of raw storage!!!
Verisign Confidential and Proprietary 11
Query Processing Time
• A 26 week queryon raw NXD data can take more than 8 hours to complete
• And that’s running across more than a hundred powerful data node machines in the VSCC
• With this in mind, the Data Analyzer product also stores 60 days of aggregated data for our Complete Index in order to add more value with less time needed to produce results
• Our indexes take a few hours every night to build• This index based data supports very flexible filter based
queries
Verisign Confidential and Proprietary 12
Noisy Data
• With so much data comes the potential for a lot of noise:• But what kinds of noise?• How much noise?• How can we find this noise?
Verisign Confidential and Proprietary 13
Finding Noise in the NXD Data - Sample 100K
100 5700 113001690022500281003370039300449005050056100617006730072900785008410089700953000
10000
20000
30000
40000
50000
60000
70000
80000
Classic hockey stick pattern, very dramatic, but nothing to see here. Right?
Verisign Confidential and Proprietary 14
Top 1K NXD domains from 100K Sample
1 33 65 97 1291611932252572893213533854174494815135455776096416737057377698018338658979299619930
10000
20000
30000
40000
50000
60000
70000
80000
“Gap” – No domains with request frequencies between 9918 and 2820 in sample.
Verisign Confidential and Proprietary 15
The Spike
• The large “spike” in the previous graphs show an unusually large number of NXD requests
• Do we give these NXDs real high scores since they get lots of traffic?
• Or is this just plain noise in our data that we should discard?
• It also turns out that these large requests also exhibit similar request traffic patterns
• We believe that these requests are from “Botnets”
Verisign Confidential and Proprietary 16
The Botnet Problem
Traffic from Botnets is:• Automatic, behind the scenes Traffic• From infected computers• Algorithmically generated based on time/date
Traffic from Botnets can be detected by:• High traffic levels from consistent sets of recursive name servers• Lack of traffic from other name servers
1 51 1011512012513013514014515015516016517017518018519019510
10000
20000
30000
40000
50000
60000
70000
80000
Verisign Confidential and Proprietary 17
What Are Botnets
• Enable most sophisticated and popular types of cybercrime today
• They allow hackers to take control of many computers at a time which operate as part of a powerful "botnet”
• Many of these computers are infected without their owners knowledge
• Bots often spread themselves across the Internet by infecting unprotected computers
• Their goal is to stay hidden until they are instructed to carry out a task by a Command and Control server
Verisign Confidential and Proprietary 18
About Botnets• Botnets use an algorithm for generating domain names to
make it difficult to identify. While many may be NXDs, some are not
• These Botnet domains, if registered, would connect a Botnet to a Command and Control server that issues instructions to commit attacks
• Botnets are just “zombie” machines without C&C servers to tell then what to do
Verisign Confidential and Proprietary 19
Botnet Detection• NXDs with very large amounts of traffic and that exhibit
similar traffic patterns are most likely NOT requested by humans
• These domains are classified by the Botnet Detection Service (BDS) as “suspicious” and the requests are considered to be from “botnets”:
CDSYHD.NETA2VERISIGNDNS.COM3RDBILLION.NETPAULHUNTHOMES.COMSCOTTSTORAGE2.COM
GENTLEMANMILLION.NETCDSYHD.NETSILVERHORSETRADERA2VERISIGNDNS.COMSARAH.COM3RDBILLION.NETXN--JJEEP-3F5FW08B.COMPAULHUNTHOMES.COMCAT-HUSESCOTTSTORAGE2.COM MANNYSGOLFWORLD.COM
Verisign Confidential and Proprietary 20
About BDS
• Implemented using Hadoop streaming and the Mahout machine learning library
• Identifies similar NXD traffic patterns across many different name servers
• Runs once a day at 4:30pm EST• Analyzes 1 day of NXD data for COM/NET/CC/TV and
produces a “suspicious” domains list• Collects the past 60 days of suspicious domains and
publishes the unique collection to an HDFS folder in the VSCC
• Exposed to other products via a DAG data retrieval API
Verisign Confidential and Proprietary 21
A Patented Technology
• https://www.google.com/patents/US8745737
Verisign Confidential and Proprietary 22
DA Use Case of BDS• Prevent promotion of these suspicious domains to our
customers• Provides two major benefits:
• Customers benefit by not registering domains with high traffic that won’t see human traffic
• System efficiency benefit of less domains to query from
Verisign Confidential and Proprietary 23
Monetize the Noise
• Remember that “One engineer’s noise is another engineer’s signal.”
• The effort to make use of the BDS data earlier this year started with a joke of an idea. It was something like: “If we know what domains the infected computers are looking for, we could register those domains, take over their botnets and use them for ourselves!”.
• (Probably not really, because they tend to use encrypted instructions to prevent this, but maybe.)
Verisign Confidential and Proprietary 24
Monetize the Noise
This silly starting point lead down a list of other options:• Prevent the registration of these domains to clean
up .COM and .NET• Sell the data to a security company so they could pay to
block traffic to these domains.• Use the data itself to target the companies that most
desperately need the blocking service.
Verisign Confidential and Proprietary 25
How the connection happened
• Eventually, we found a security company interested. You may have heard of them… Verisign.
• Paul had a discussion about BDS with Jim Gould who asked him to present it at a PESAB meeting. That lead to an engineering to engineering discussion about the usefulness to the security side of the business.
• Once the engineering feasibility was in place, we had their product people talk to our product people, and the security use of the data was quickly approved.
• Takeaway is: “Don’t let the organizational structure stand in the way of a good use for your data.”
Verisign Confidential and Proprietary 26
Current Usage
• Data Analyzer uses these domains as a “black list” to filter them out of our indexes to prevent us from ever returning to our customers in order to help prevent potential registration
• Recursive DNS uses these to ensure that resolution requests for them are ignored to prevent potential “botnet” transmissions
• How else might we use the suspicious Botnet domain list?
Verisign Confidential and Proprietary 27
Future work
• BDS data from DA could be used in several ways within the company to improve security products. Blocking traffic within the Recursive service is just one use.
• How about:• Selling BDS data feed as a standalone or add-on security product.• Using traffic to BDS domains to prioritize Recursive sales leads.• Using BDS domains within a Recursive appliance to identify
infected computers on a network. (Don’t just block, disinfect!)
Verisign Confidential and Proprietary 28
Going Further
• block the registration of these suspected domains in Core• use the registration attempts to identify criminals • While our Botnet domain list only comprises
COM/NET/CC/TV domains, we can use BDS for other TLDs
• Maybe a service we could provide to other registries
Verisign Confidential and Proprietary 29
Botnet Domains And Request Information
Verisign Confidential and Proprietary 30
Digging Deeper
Verisign Confidential and Proprietary 31
Infection Detection
• With the help from a Recursive Server appliance that captures the IPs of the original requests
• We can track back from the Recursive server to the actual “Bots!”• If we can find these Bots then we can help to shut them down
• Another possibility might be to include the IP of the actual requesting machines inside the DNS messages using the EDNS0 - Extension mechanisms for DNS
• Allow for storing more information in DNS messages• Is currently used in about 10% of DNS messages to enable things
like GEO location
Verisign Confidential and Proprietary 32
Information Gathering• So far, since the suspected domains have all been NXDs,
the intended C&C servers have not yet been registered• We can use BDS to identify suspected domains based on
YXD traffic data that point to real, live C&C servers• While the BDS algorithm would definitely work on YXD
data, we might have some challenges:• TTL based caching by resolvers• Frequent IP switching for C&C server domains to avoid detection
Verisign Confidential and Proprietary 33
Taking Down C&C servers
• If we can identify the domains for suspected live C&C servers, perhaps we can:
• Block DNS resolution on EDGE and Electra servers • Use ‘Core” to suspend the registration for these domains so they
appear “out of zone”• Fine registrars• Go after domain owners
• While a service to the entire internet, there most likely would be legal implications in any of the above
Verisign Confidential and Proprietary 34
Room For Improvement
• While a great service, BDS does help out with Botnet transmissions that are NOT DNS based
• add support for IPV6 traffic• add monitoring to track the rate of false positives• use for analyzing traffic data for other TLDs• use for analyzing YXD traffic data• Potentially look at additional data points in the AVRO
summary feed currently used by the Real-Time cluster (RTC) and soon to be used as a replacement for the existing Traffic Monitor feed (end Q2 next year)
• Will also include traffic data from our Electra sites• And the missing 10% of NXD traffic data!
Verisign Confidential and Proprietary 35
Eureka!
• Excerpt is from: Wired Magazine, “Accept Defeat: The Neuroscience of Screwing Up” by Jonah Lehrer, 12.21.09.
• http://www.wired.com/2009/12/fail_accept_defeat/
Verisign Confidential and Proprietary 36
Takeaways
• Your Noise could be MY signal• Reimagine and Reuse & Find reasons to Keep more of
your data• Find more Value & Throw more effectively• Don’t let the organization stand in the way
© 2015 VeriSign, Inc. All rights reserved. VERISIGN and other trademarks, service marks, and designs are registered or unregistered trademarks of VeriSign, Inc. and its subsidiaries in the United States and in foreign countries. All other trademarks are property of their respective owners.