Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Secure IoT Workshop, April 2016
2
Overview
3
● Building a Component Library○ Motivation
○ Approach
○ Key Insights and Results
○ Future Work
○ Summary
Problem 1: There are a lot of componentsComponent Family Number of Entries on
Parts.io
Connectors 35,835,399
Power Circuits 4,308,382
Diodes 2,200,889
Sensors/Transducers 1,429,819
Memories 1,319,651
Microcontrollers 818,329
Transistors 659,628
Drivers/Interfaces 121,955
Amplifiers 98,506
Transformers 86,021
4
Problem 2: Components are complicated
5
DigiKey Information for Atmel SAMD21
Characteristic Value
Manufacturer Part Number ATSAMD21E17A-MUT
Description IC MCU 32BIT 128KB FLASH 32QFN
Core Processor / Core Size ARM Cortex M0+ / 32-Bit
Speed 48MHz
Connectivity I²C, LIN, SPI, UART/USART
Peripherals Brown-out Detect/Reset, DMA, I²S, POR, PWM, WDT
Number of I/O 26
Program Memory Size / Type 128KB (128K x 8) / FLASH
RAM Size 16K x 8
Voltage Supply 1.62 V ~ 3.6 V
Data Converters A/D 10x12b, D/A 1x10b
Operating Temperature -40°C ~ 85°C
The datasheet for this part contains
1108 pages
Problem 2: Components are complicated
● Websites lack the details necessary to actually build boards
● Requires designer to consult PDFs
6
Problem 2: Components are complicated
7
Problem 2: Components are complicated
8
Step 1: Transistors and Tables● Datasheets are usually small (1-4 pgs)
● Variety of formats
● Relatively small schema
● Data primarily in tables
● Example: match part numbers to minimum storage temperatures
9
Approach
Tabl
e Ex
trac
tor
10
doc | part_num | storage_temp_min------+----------+-----------------X.pdf | BC546 | -55X.pdf | BC547 | -55 X.pdf | BC548 | -55
Extracting Tables from PDFs
● Rule-based tools exist to extract data from PDFs
● Results are noisy and depend on the format of the PDF
● Potential signals are lost in the output
● Hard problem with active research: ICDAR, JIS, CIKM
● We made our own simple extractor for testing
11
data represented
12
Challenges in Table Extraction
13
DeepDive
● Framework for building machine learning systems● DeepDive applications have achieved better-than-human
accuracy● Operates on two key components: candidates and features
14
Candidate Extraction
● Candidate is part number and minimum storage temperature pair
● Design decisions significantly impact performance
● Generality is better○ match part numbers and numbers rather
than part numbers and storage temps○ More features to train on
15
Feature Extraction
● Not all features are created equal○ Alignment and sibling words○ Number characteristics○ Nearest part number○ Position
● Ideally minimize computations and data needed to label candidates with features○ Candidate features as a function of
individual features
16
Results
Predicted Incorrectly
Predicted Correctly
Positive Cases
35 349
90.9% of positives predicted correctly
17
● When run on a set of 100 PDFs with 384 unique pairs
Results: Error Analysis
● Of those 35 entries we didn’t get…
● Error analysis guides improvement of features
● Build robust systems
18
Future Work
● Parse data from non-table elements
19
Future Work
● Analyze more complex datasheets○ Microcontrollers that contain
subcomponents■ USB, Bluetooth, etc.
○ Datasheets that explain part numbers
20
Future Work
● Process data from sources outside of PDFs○ Information from distributors
■ Pricing, Popularity, Availability
○ Drivers
○ Reference schematics
○ Example application code
○ Development tools
21
Summary
● Building a Component Library○ There are millions of components of varying complexity○ Machine learning can be used to extract data from PDFs○ Success will enable exciting applications
■ Embedded Device Generation■ Detailed search engine for components■ Data analytics, and more
22
Questions?