©2013
LEVERAGING OPEN SOURCE INTELLIGENCE (OSINT) TO COMBAT FRAUD
The illicit practices of diversion, theft of trade secrets, and counterfeiting pharmaceutical drugs
have been estimated to be a $200 billion per year industry. Managing and identifying intellectual
property infringement can help leaders maintain a competitive advantage and avoid loss in
market share. This presentation will review how OSINT can be used to reduce risk exposure,
identify potential loss incidents, and assist in loss recovery efforts. As fraud investigators and
other risk management practitioners seek to harness the overwhelming body of information
available through OSINT, this session will provide proactive new solutions to use in the field.
TYSON JOHNSON, CFE, CPP
Director, Global Risk Management,
ATS Automation
Oakville, ON
Tyson Johnson is a well-travelled risk management executive. He has worked in government,
global banking, and global manufacturing. Johnson has effectively led investigations in Mexico,
Thailand, China, India, Malaysia, as well as throughout North America and Europe. He has been
a pioneer in the field of developing Open Source Intelligence (OSINT) programs to support risk
reduction, loss prevention, and recovery for the past decade. As a former intelligence officer,
Johnson understands the need for strong information collection and analysis to support proactive
risk management. Tyson obtained his master’s degree from the Fletcher School of International
Law & Diplomacy, and has participated in leadership programs including the Governor
General’s Canadian Leadership Conference.
“Association of Certified Fraud Examiners,” “Certified Fraud Examiner,” “CFE,” “ACFE,” and the
ACFE Logo are trademarks owned by the Association of Certified Fraud Examiners, Inc. The contents of
this paper may not be transmitted, re-published, modified, reproduced, distributed, copied, or sold without
the prior consent of the author.
LEVERAGING OPEN SOURCE INTELLIGENCE (OSINT) TO COMBAT FRAUD
2013 ACFE Canadian Fraud Conference ©2013 1
NOTES Let’s start by learning a common language so we will be in
sync throughout the presentation:
Intelligence Cycle—While we all intuitively know this
cycle, it is worth reviewing for the purpose of this
presentation. The basic equation is: Information +
Analysis = Intelligence
The cycle we must go through involves these stages:
Collection of open source information
Filtering of the information collected to ensure we
have relevant and reliable content for the next stage
Analysis of the filtered, relevant content, utilizing
inductive reasoning skills to identify the “so what”
from all data
Production of insights and inferences that are
actionable and proactive in the identification of
fraud, fraud avoidance, fraud recovery, or loss
reduction
Big Data—This refers to all content available for
research, analysis, and review. Big data consists of both
structured and unstructured data, and the goal is for
fraud investigators to make sense out of all data.
Traditional anti-fraud tools are very good at running
analysis on structured content—looking for duplicate
payees, duplicate addresses, and so on.
How do we harvest and tag unstructured content from
the Internet (any and all formats) to develop more
structured data for analysis? That is what we will cover
in this presentation today.
Deep Web—When a typical search engine is used for
searching a term, the engine returns search results based
on popularity, page ranking, advertisements, and
ultimately all pages that the bots were able to capture
LEVERAGING OPEN SOURCE INTELLIGENCE (OSINT) TO COMBAT FRAUD
2013 ACFE Canadian Fraud Conference ©2013 2
NOTES from trolling the indexed surface Web. Any content not
indexed (i.e., not reachable by a bot and cannot be
crawled) is not identified or flagged for you, the
investigator.
As with an iceberg, the content visible on the surface
Web is similar to what we can see above the water line
of an iceberg. The Deep Web is an order of magnitude
larger than the surface Web and the content remains
largely unknown or invisible to typical search engines.
Open Source Intelligence (OSINT)—This refers to an
investigator being able to create actionable insights and
inferences from the wealth of related content that exists
in the Deep Web. OSINT is the production component
of the intelligence cycle.
Force Multiplier—The ability to harvest content from
the Deep Web in a manner that is largely automated and
filtered can and does reduce the fraud investigator’s
burden for collecting and filtering online, open source
information. Reducing the time spent collecting and
filtering the massive amounts of available online
content allows the investigator to focus more time on
analysis of filtered results and identifying fraud, and
working toward loss reduction and recovery efforts.
The term force multiplier also refers to the ability to
harvest content on a scale that is otherwise not possible
for an investigator or group of investigators (see
WebMD slides).
Normalizing unstructured content to semi-structured to
enable analytics—The concept of unstructured content
was introduced earlier. Most fraud investigators are
familiar with the analytics that can be run on structured
LEVERAGING OPEN SOURCE INTELLIGENCE (OSINT) TO COMBAT FRAUD
2013 ACFE Canadian Fraud Conference ©2013 3
NOTES data (employee lists, vendor lists, AP/AR lists, etc.).
When we harvest from the Deep Web, we are receiving
data in many formats and perhaps many languages. For
an investigator to be able to run analysis on the data, it
must be tagged, tuned, normalized, and enriched so that
it is useable—semi-structured. This is referred to as
data curation. Technology exists that can indeed curate
all data collected into a format (content silo) that is
searchable and ready for analysis.
The Goal: Creating New Intelligence to Identify Fraud,
Identify Perpetrators, Aid in Fraud Recovery and Loss
Reduction
In the slides that follow, we will review case studies in the
Big Pharma sector that focus on IP theft and fraudulent,
counterfeit, and diverted product, all sold and transacted
online. The size and scale of the problems facing Big
Pharma from the global online pharmacies (OLPs) and the
difficulty in rooting out the OLPs that are fraudulent and
harmful to people and companies will be explored.
We will also briefly review one case study in the original
equipment manufacturing (OEM) space and dissect the
problem and solutions to assist with the proactive
identification of possible theft, diversion, or counterfeit.
Before jumping into the case study reviews, remember that
online content is massive, and increasingly it is growing in
foreign language content. The ability to extract and
translate (machine translation) foreign language content is
critical to truly global fraud management programs.
Pharmaceutical Fraud—Setting the Stage
The large pharmaceutical companies spend significant
resources on research and development (R&D) to develop
LEVERAGING OPEN SOURCE INTELLIGENCE (OSINT) TO COMBAT FRAUD
2013 ACFE Canadian Fraud Conference ©2013 4
NOTES new drugs, patent, bring through trials, and eventually
receive permissions to sell and provide to the mass market.
Reporting indicates that U.S. pharmaceutical companies are
losing billions of dollars each year to counterfeiting, IP
theft, and diversion of products. The counterfeit drug
industry alone is estimated at $200 billion per year to
pharmaceutical providers.
Online Pharmacies
Many of us have read about the prolific growth of OLPs
across the world, with Canada presenting a big issue of
concern to the United States in this regard.
It might surprise you to know that there are currently at
least 50,000 OLPs across the world, with many new
domains registered each day for new OLPs. The problem
with traditional online searching is that it would be
impossible to find all OLPs, let alone effectively harvest
and compare their online content to determine if it is
fraudulent or connected to the other 49,999 OLPs you are
looking at. Developing usable OSINT from this
methodology is difficult, if not impossible. (Try
bookmarking all OLPs and then reviewing each day to see
what content is changed or different than the last time you
looked, etc.)
Running link analysis on all OLPs harvested is now a
possibility. The data from the OLP has been curated so that
the semi-structured data can look for common themes, such
as phone numbers, emails, payment systems, domain
addresses (ranges), registrants, and so on. The goal is to be
able to identify the hubs that control multiple domains so
law enforcement can take down one company but hundreds
or thousands of fraudulent OLPs.
LEVERAGING OPEN SOURCE INTELLIGENCE (OSINT) TO COMBAT FRAUD
2013 ACFE Canadian Fraud Conference ©2013 5
NOTES OEM Trade Board Reviews
Another similar problem is the theft of IP, counterfeit, and
diversion of products from OEMs in the marketplace.
Again, as with pharmaceutical companies, OEMs spend
significant resources on developing cutting-edge products
and protecting them with patents, hoping to capture market
share based on quick time to market and increasing the
hurdles for competitors to enter with a similar product.
New product launches or pre-production technologies are
often met with diverted or counterfeit products that erode
pricing models and profits, as well as cause quality and
reputational issues for the counterfeit products floating in
the market space as belonging to the OEM. Recent media
reporting on Hewlett-Packard serves as a case in point.
Harvesting technology allows us to continue to harvest
products at risk of counterfeit or theft by reviewing the
prevalence of certain makes and models of products for
sale online. Prolific sellers that are in proximity to
manufacturing sites are often indicators of product theft and
diversion. The ability to identify such products for sale,
within proximity to facilities or key markets, will enable
fraud investigators and loss prevention management to get
in front of these actions in a near-real-time manner,
avoiding potentially months of loss, identifying the internal
weaknesses that permit the theft and diversion of products,
and so on.