Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications

Secure IoT Workshop, April 2016

2

Overview

3

● Building a Component Library○ Motivation

○ Approach

○ Key Insights and Results

○ Future Work

○ Summary

Problem 1: There are a lot of componentsComponent Family Number of Entries on

Parts.io

Connectors 35,835,399

Power Circuits 4,308,382

Diodes 2,200,889

Sensors/Transducers 1,429,819

Memories 1,319,651

Microcontrollers 818,329

Transistors 659,628

Drivers/Interfaces 121,955

Amplifiers 98,506

Transformers 86,021

4

Problem 2: Components are complicated

5

DigiKey Information for Atmel SAMD21

Characteristic Value

Manufacturer Part Number ATSAMD21E17A-MUT

Description IC MCU 32BIT 128KB FLASH 32QFN

Core Processor / Core Size ARM Cortex M0+ / 32-Bit

Speed 48MHz

Connectivity I²C, LIN, SPI, UART/USART

Peripherals Brown-out Detect/Reset, DMA, I²S, POR, PWM, WDT

Number of I/O 26

Program Memory Size / Type 128KB (128K x 8) / FLASH

RAM Size 16K x 8

Voltage Supply 1.62 V ~ 3.6 V

Data Converters A/D 10x12b, D/A 1x10b

Operating Temperature -40°C ~ 85°C

The datasheet for this part contains

1108 pages


● Websites lack the details necessary to actually build boards

● Requires designer to consult PDFs

6


7


8

Step 1: Transistors and Tables● Datasheets are usually small (1-4 pgs)

● Variety of formats

● Relatively small schema

● Data primarily in tables

● Example: match part numbers to minimum storage temperatures

9

Approach

Tabl

e Ex

trac

tor

10

doc | part_num | storage_temp_min------+----------+-----------------X.pdf | BC546 | -55X.pdf | BC547 | -55 X.pdf | BC548 | -55

Extracting Tables from PDFs

● Rule-based tools exist to extract data from PDFs

● Results are noisy and depend on the format of the PDF

● Potential signals are lost in the output

● Hard problem with active research: ICDAR, JIS, CIKM

● We made our own simple extractor for testing

11

data represented

12

Challenges in Table Extraction

13

DeepDive

● Framework for building machine learning systems● DeepDive applications have achieved better-than-human

accuracy● Operates on two key components: candidates and features

14

Candidate Extraction

● Candidate is part number and minimum storage temperature pair

● Design decisions significantly impact performance

● Generality is better○ match part numbers and numbers rather

than part numbers and storage temps○ More features to train on

15

Feature Extraction

● Not all features are created equal○ Alignment and sibling words○ Number characteristics○ Nearest part number○ Position

● Ideally minimize computations and data needed to label candidates with features○ Candidate features as a function of

individual features

16

Results

Predicted Incorrectly

Predicted Correctly

Positive Cases

35 349

90.9% of positives predicted correctly

17

● When run on a set of 100 PDFs with 384 unique pairs

Results: Error Analysis

● Of those 35 entries we didn’t get…

● Error analysis guides improvement of features

● Build robust systems

18

Future Work

● Parse data from non-table elements

19

Future Work

● Analyze more complex datasheets○ Microcontrollers that contain

subcomponents■ USB, Bluetooth, etc.

○ Datasheets that explain part numbers

20

Future Work

● Process data from sources outside of PDFs○ Information from distributors

■ Pricing, Popularity, Availability

○ Drivers

○ Reference schematics

○ Example application code

○ Development tools

21

Summary

● Building a Component Library○ There are millions of components of varying complexity○ Machine learning can be used to extract data from PDFs○ Success will enable exciting applications

■ Embedded Device Generation■ Detailed search engine for components■ Data analytics, and more

22

Questions?

Documents

Stanford Computer Forum - Secure IoT Workshop, April 2016forum.stanford.edu/events/2016/slides/iot/Luke.pdf · DeepDive Framework for building machine learning systems DeepDive applications