9
The Essential Guide to Evaluating 10 areas you should be considering EBOOK as you select data providers

EBOOK The Essential Guide to Evaluating

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EBOOK The Essential Guide to Evaluating

The Essential Guide to Evaluating

10 areas you should be considering

E B O O K

as you select data providers

Page 2: EBOOK The Essential Guide to Evaluating

Capitalizing on the Tra�c Data ExplosionA decade ago, few companies had access to timely, accurate tra�c data. Today, companies can choose from numerous, diverse data sources to power machine learning algorithms for mapping, tra�c optimization, congestion analysis, smart city applications, location intelligence, and other analyses that pertain to understanding tra�c behavior in a particular geographic area. These include public datasets and ones that are commercially available. Data may be generated by consumer surveys, video analytics, roadway sensors, mobile phones, telematics devices (onboard devices (OBD) or third party), or connected vehicles.

How your tra�c data is collected, aggregated, structured, and refreshed determines the success of your initiative. As you evaluate di�erent data providers and map their capabilities to your needs, you should get answers from each potential provider in 10 information categories.

T H E E S S E N T I A L G U I D E T O E V A L U A T I N G T R A F F I C D A T A

2

GET STARTED RIGHT: DEFINE ANDDOCUMENT YOUR NEEDS

Of course, the first step is to define the goals of your initiative and resulting data needs, since di�erent data sources have distinct strengths and gaps. Clearly docu-ment your needs and refer to them as you explore potential data providers.

THEN, COLLECT INFORMATION FROM POTENTIAL DATA PROVIDERS

The next step is to ask the following questions in each of the 10 information categories so you can map potential providers’ capabilities to your needs.

Page 3: EBOOK The Essential Guide to Evaluating

Data densityIf you’re trying to predict tra�c in a specific geographic area, you need to have a significant proportion of vehicle tra�c (10-15 percent) in that area. If you’re working on improving a di�erent type of algorithm, you may simply need a large amount of data from any region.

Data consistencyData needs to be consistent in order for you to use it in applications. If data is coming from multiple upstream sources (e.g., di�erent vehicle manufacturers and models), this data must be normalized across sources in terms of attribute names, values, and units.

T H E E S S E N T I A L G U I D E T O E V A L U A T I N G T R A F F I C D A T A

3

Is the data timestamped?

Are all timestamps in a consistent time zone and format?

Are other attributes and data values normalized? If so, how?

What do vehicle or device IDs represent?

Do the identifiers change over time? If so, does the same change logic apply to all of the data?

What is the data density for the geographic areas in which I operate?

How much data should I expect every day?

WHAT TO ASK

WHAT TO ASK

2

1

Page 4: EBOOK The Essential Guide to Evaluating

Data latencySome applications, such as navigation, require very fresh data to work e�ectively. Others depend more on accuracy or precision rather than latency or recency.

3Data frequencyFrequency refers to how often data points are generated. On the one hand, high frequency creates additional cost since there is more data to handle. On the other hand, higher frequency usually provides better accuracy. For example, if you want to analyze a car accident, a frequency of one data point every five seconds won't be very helpful since an entire accident lasts about five seconds. For tra�c analysis, however, one data point per five seconds will result in a lot of data that won't add any precision to tra�c analysis.

WHAT TO ASK4

T H E E S S E N T I A L G U I D E T O E V A L U A T I N G T R A F F I C D A T A

How close is this dataset to real time?

WHAT TO ASK

How frequent are the data points?

4

Page 5: EBOOK The Essential Guide to Evaluating

Historical dataDi�erent applications use historical data for di�erent purposes. For example, location planning may require several years’ data to reveal seasonality patterns, while route planning may only need a few months’ data to derive inferences. The goals of your application will determine how much data you need.

How much historical data is available for a specific location?

How far back does this data go?

Was data collected at the same rate over the entire historical time period, or have data volumes increased/decreased?

WHAT TO ASK5T H E E S S E N T I A L G U I D E T O E V A L U A T I N G T R A F F I C D A T A

5

Data granularityThe more specific the goal of your application, the more granular data you will need.

How granular is the data? For example, can I get location data based on a street, postal code, province, or country?

What do vehicle or device IDs represent?

WHAT TO ASK6

Page 6: EBOOK The Essential Guide to Evaluating

T H E E S S E N T I A L G U I D E T O E V A L U A T I N G T R A F F I C D A T A

Data richnessSome datasets provide a bird’s eye view of tra�c (e.g., a trip’s starting location, ending location, distance, and speed), and some provide much more detail about what’s happening at every point in time. Your application may or may not benefit from additional data parameters.

Does the dataset just include time and location, or are there other parameters, such as speed, heading, acceleration, brake usage, air bag activation, seat belt status, or door status?

Can the provider define events (e.g., hard braking) and send those instead of the raw data?

Does the dataset include points of interest? If so, for which geographies are points of interest available?

Is reverse geocoding available, or is it something you will have to do yourself?

Can you define polygons that represent points of interest?

WHAT TO ASK7

6

Page 7: EBOOK The Essential Guide to Evaluating

Data access/integrationDevelopment resources are precious, and there is a lot of variation in how easy it is to integrate with the di�erent tra�c datasets available today. Many of these datasets were not designed to be used for more than one purpose. For example, your application logic may work better based on trips rather than individual data points.

It’s also important to consider whether you actually need all of the data that is available. For example, you may only need data points for vehicles that behave in a certain way (e.g., drove more than 100 kilometers/hour on a highway or less than 10 kilometers/hour in an urban area). Being able to specify the data you need in advance can save a lot of analyst or data science hours throughout the life of your project.

8T H E E S S E N T I A L G U I D E T O E V A L U A T I N G T R A F F I C D A T A

Is there one API or multiple APIs to access di�erent datasets?

Are the APIs designed to deliver large data volumes?

Do the APIs support both “push” and “pull” modes?

How do the APIs support high perfor-mance and reliability for real-time use cases like tra�c management?

What are the types of datasets provided: individual data points, trips, and/ or events?

How do the APIs allow you to filter the data you are receiving so you only receive the data you need?

Are you able to specify some attributes as mandatory and only receive data/pay for data when these attributes are included?

How easy would it be to integrate the dataset with other datasets you own? For example, how easy would it be to enhance the data with weather condition data from another dataset?

WHAT TO ASK

7

Page 8: EBOOK The Essential Guide to Evaluating

Data cleansing and noise reductionData cleansing can be an incredibly time-consuming exercise that does not add value to your core business. You should gain a clear understanding of how much cleansing your potential data provider will do for you, and what you’ll have to do yourself.

What strategies will be available to mitigate noise in the data?

If the dataset is based on mobile phones, is there a practical way to filter out irrelevant data points like people walking, cycling, or riding on trains?

For vehicle datasets, is there a way to flag and filter out invalid data points (e.g., impossible GPS coordinates)?

WHAT TO ASK9

Regulatory complianceData privacy is an important concern when it comes to tra�c data, and it has the attention of governments and citizens alike. Managing regulatory compliance is an important issue. For any tra�c data source, you should understand where it’s coming from and how it’s collected.

10

T H E E S S E N T I A L G U I D E T O E V A L U A T I N G T R A F F I C D A T A

Is the data provider transparent about its data sources?

Is this data obtained via opted-in sources?

How does the data provider support regulatory compliance?

WHAT TO ASK

8

Page 9: EBOOK The Essential Guide to Evaluating

G E T D A T A S A M P L E

The Otonomo Automotive Data Services Platform fuels an ecosystem of 15 OEMs and more than 100 service providers. Our neutral platform securely ingests more than 2.6 billion data points per day from over 18 million global connected vehicles, then reshapes and enriches it, to accelerate time to market for new services that delight drivers. Privacy by design is at the core of our platform, which enables GDPR and other privacy-regulation-compliant solutions using both personal and aggregate data. Use cases include emergency services, mapping, EV management, subscription-based fueling, parking, predictive maintenance, usage-based insurance, media measurement, in-vehicle package delivery, and dozens of smart city services. With an R&D center in Israel, and a presence in the United States, Europe, and Japan, Otonomo collaborated with twelve industries to transform their business with car data. More information is available at otonomo.io.

SHAREReady to take the next step? Contact Otonomo for a sample of our tra�c data generated by connected vehicles in 15 countries worldwide.

About Otonomo

[email protected] @otonomo_ otonomo.iootonomo © 2020

>530B curated kilometers

>2.6B daily data points

>18M accessible cars

>15 OEMs engaged

OTONOMO IN NUMBERS