Download pdf - LEVERAGING OPEN SOURCE INTELLIGENCE (OSINT) TO …...LEVERAGING OPEN SOURCE INTELLIGENCE (OSINT) TO COMBAT FRAUD 2013 ACFE Canadian Fraud Conference ©2013 5 NOTES OEM Trade Board

©2013

LEVERAGING OPEN SOURCE INTELLIGENCE (OSINT) TO COMBAT FRAUD

The illicit practices of diversion, theft of trade secrets, and counterfeiting pharmaceutical drugs

have been estimated to be a $200 billion per year industry. Managing and identifying intellectual

property infringement can help leaders maintain a competitive advantage and avoid loss in

market share. This presentation will review how OSINT can be used to reduce risk exposure,

identify potential loss incidents, and assist in loss recovery efforts. As fraud investigators and

other risk management practitioners seek to harness the overwhelming body of information

available through OSINT, this session will provide proactive new solutions to use in the field.

TYSON JOHNSON, CFE, CPP

Director, Global Risk Management,

ATS Automation

Oakville, ON

Tyson Johnson is a well-travelled risk management executive. He has worked in government,

global banking, and global manufacturing. Johnson has effectively led investigations in Mexico,

Thailand, China, India, Malaysia, as well as throughout North America and Europe. He has been

a pioneer in the field of developing Open Source Intelligence (OSINT) programs to support risk

reduction, loss prevention, and recovery for the past decade. As a former intelligence officer,

Johnson understands the need for strong information collection and analysis to support proactive

risk management. Tyson obtained his master’s degree from the Fletcher School of International

Law & Diplomacy, and has participated in leadership programs including the Governor

General’s Canadian Leadership Conference.

“Association of Certified Fraud Examiners,” “Certified Fraud Examiner,” “CFE,” “ACFE,” and the

ACFE Logo are trademarks owned by the Association of Certified Fraud Examiners, Inc. The contents of

this paper may not be transmitted, re-published, modified, reproduced, distributed, copied, or sold without

the prior consent of the author.


2013 ACFE Canadian Fraud Conference ©2013 1

NOTES Let’s start by learning a common language so we will be in

sync throughout the presentation:

Intelligence Cycle—While we all intuitively know this

cycle, it is worth reviewing for the purpose of this

presentation. The basic equation is: Information +

Analysis = Intelligence

The cycle we must go through involves these stages:

Collection of open source information

Filtering of the information collected to ensure we

have relevant and reliable content for the next stage

Analysis of the filtered, relevant content, utilizing

inductive reasoning skills to identify the “so what”

from all data

Production of insights and inferences that are

actionable and proactive in the identification of

fraud, fraud avoidance, fraud recovery, or loss

reduction

Big Data—This refers to all content available for

research, analysis, and review. Big data consists of both

structured and unstructured data, and the goal is for

fraud investigators to make sense out of all data.

Traditional anti-fraud tools are very good at running

analysis on structured content—looking for duplicate

payees, duplicate addresses, and so on.

How do we harvest and tag unstructured content from

the Internet (any and all formats) to develop more

structured data for analysis? That is what we will cover

in this presentation today.

Deep Web—When a typical search engine is used for

searching a term, the engine returns search results based

on popularity, page ranking, advertisements, and

ultimately all pages that the bots were able to capture



NOTES from trolling the indexed surface Web. Any content not

indexed (i.e., not reachable by a bot and cannot be

crawled) is not identified or flagged for you, the

investigator.

As with an iceberg, the content visible on the surface

Web is similar to what we can see above the water line

of an iceberg. The Deep Web is an order of magnitude

larger than the surface Web and the content remains

largely unknown or invisible to typical search engines.

Open Source Intelligence (OSINT)—This refers to an

investigator being able to create actionable insights and

inferences from the wealth of related content that exists

in the Deep Web. OSINT is the production component

of the intelligence cycle.

Force Multiplier—The ability to harvest content from

the Deep Web in a manner that is largely automated and

filtered can and does reduce the fraud investigator’s

burden for collecting and filtering online, open source

information. Reducing the time spent collecting and

filtering the massive amounts of available online

content allows the investigator to focus more time on

analysis of filtered results and identifying fraud, and

working toward loss reduction and recovery efforts.

The term force multiplier also refers to the ability to

harvest content on a scale that is otherwise not possible

for an investigator or group of investigators (see

WebMD slides).

Normalizing unstructured content to semi-structured to

enable analytics—The concept of unstructured content

was introduced earlier. Most fraud investigators are

familiar with the analytics that can be run on structured



NOTES data (employee lists, vendor lists, AP/AR lists, etc.).

When we harvest from the Deep Web, we are receiving

data in many formats and perhaps many languages. For

an investigator to be able to run analysis on the data, it

must be tagged, tuned, normalized, and enriched so that

it is useable—semi-structured. This is referred to as

data curation. Technology exists that can indeed curate

all data collected into a format (content silo) that is

searchable and ready for analysis.

The Goal: Creating New Intelligence to Identify Fraud,

Identify Perpetrators, Aid in Fraud Recovery and Loss

Reduction

In the slides that follow, we will review case studies in the

Big Pharma sector that focus on IP theft and fraudulent,

counterfeit, and diverted product, all sold and transacted

online. The size and scale of the problems facing Big

Pharma from the global online pharmacies (OLPs) and the

difficulty in rooting out the OLPs that are fraudulent and

harmful to people and companies will be explored.

We will also briefly review one case study in the original

equipment manufacturing (OEM) space and dissect the

problem and solutions to assist with the proactive

identification of possible theft, diversion, or counterfeit.

Before jumping into the case study reviews, remember that

online content is massive, and increasingly it is growing in

foreign language content. The ability to extract and

translate (machine translation) foreign language content is

critical to truly global fraud management programs.

Pharmaceutical Fraud—Setting the Stage

The large pharmaceutical companies spend significant

resources on research and development (R&D) to develop



NOTES new drugs, patent, bring through trials, and eventually

receive permissions to sell and provide to the mass market.

Reporting indicates that U.S. pharmaceutical companies are

losing billions of dollars each year to counterfeiting, IP

theft, and diversion of products. The counterfeit drug

industry alone is estimated at $200 billion per year to

pharmaceutical providers.

Online Pharmacies

Many of us have read about the prolific growth of OLPs

across the world, with Canada presenting a big issue of

concern to the United States in this regard.

It might surprise you to know that there are currently at

least 50,000 OLPs across the world, with many new

domains registered each day for new OLPs. The problem

with traditional online searching is that it would be

impossible to find all OLPs, let alone effectively harvest

and compare their online content to determine if it is

fraudulent or connected to the other 49,999 OLPs you are

looking at. Developing usable OSINT from this

methodology is difficult, if not impossible. (Try

bookmarking all OLPs and then reviewing each day to see

what content is changed or different than the last time you

looked, etc.)

Running link analysis on all OLPs harvested is now a

possibility. The data from the OLP has been curated so that

the semi-structured data can look for common themes, such

as phone numbers, emails, payment systems, domain

addresses (ranges), registrants, and so on. The goal is to be

able to identify the hubs that control multiple domains so

law enforcement can take down one company but hundreds

or thousands of fraudulent OLPs.



NOTES OEM Trade Board Reviews

Another similar problem is the theft of IP, counterfeit, and

diversion of products from OEMs in the marketplace.

Again, as with pharmaceutical companies, OEMs spend

significant resources on developing cutting-edge products

and protecting them with patents, hoping to capture market

share based on quick time to market and increasing the

hurdles for competitors to enter with a similar product.

New product launches or pre-production technologies are

often met with diverted or counterfeit products that erode

pricing models and profits, as well as cause quality and

reputational issues for the counterfeit products floating in

the market space as belonging to the OEM. Recent media

reporting on Hewlett-Packard serves as a case in point.

Harvesting technology allows us to continue to harvest

products at risk of counterfeit or theft by reviewing the

prevalence of certain makes and models of products for

sale online. Prolific sellers that are in proximity to

manufacturing sites are often indicators of product theft and

diversion. The ability to identify such products for sale,

within proximity to facilities or key markets, will enable

fraud investigators and loss prevention management to get

in front of these actions in a near-real-time manner,

avoiding potentially months of loss, identifying the internal

weaknesses that permit the theft and diversion of products,

and so on.