8/8/2019 12-SOC and Embedded System
1/45
8/8/2019 12-SOC and Embedded System
2/45
Research
Interests
Chien-Chao Tseng
mailto:[email protected]:[email protected]8/8/2019 12-SOC and Embedded System
3/45
Wireless Access to Internet
GPRS/3G/PHS/WiMax/...
Internet WLAN/Bluetooth/PAN/...
Ad h
ocAd hoc
Mesh
Mobile NW
3G/GPRS/PHSWiMax/WLAN/Bluetooth/PAN
HeterogeneousWireless Overlay Networks
Multi-interfaceHandheld Devices
Roamin
gandHan
doffs
MyInt
erests
:-)
8/8/2019 12-SOC and Embedded System
4/45
EmbeddedWLAN
PHS3G/GPRS
Embedded OS for Multi-interfaceHandheld Devices Cross-layer design for Real-time Applications
Linux/Windows XP/CE Driver, Network, and Application Layers (VoIP)
Heterogeneous Wireless Networks WLAN/WiMax/3G/GPRS/PHS Roaming and Handovers
Multi-tier Wireless Network
WPAN, WLAN and Mobile Router
Roaming and Routing Wireless Mesh and Sensor Networks
Address Assignment and Routing Secured and Fast Accesses to Wireless
Network
8/8/2019 12-SOC and Embedded System
5/45
8/8/2019 12-SOC and Embedded System
6/45
Embedded Systems ( )R&D Results -
Low Power and Fast HandoverLow Power and Fast Handover
Cellular/WLAN Dual Model MobilesCellular/WLAN Dual Model Mobiles
Awarded by
2005 Mobile Communications Contest of Industrial Development Bureau, MOEA2005 Software Contest of National Center of High-Performance Computing2006 Embedded Software Contest of MOE
Power Consumption Evaluation
Handover Latencies Evaluation
System Architecture andPrototype of Cellular/WLAN
Dual Mode Mobile
8/8/2019 12-SOC and Embedded System
7/45
Prof. Li-Pin Chang
Recent research directions Embedded storage systems
Real-time systems and schedulingalgorithms
Hardware-software co-design
8/8/2019 12-SOC and Embedded System
8/45
Embedded Storage:Efficient wear-leveling algorithm for flash memory
To capture uneven usages from millions of blocks and tolevel them
Result: the most fast, effective, economic approachavailable!!
0
5,000,000
10,000,000
15,000,000
20,000,000
25,000,000
30,000,000
35,000,000
40,000,000
0 500 1,000 1,500 2,000 2,500 3,000 3,500
Time
LBA
Access pattern
0
20000
40000
60000
80000
100000
120000
140000
160000
1 18 35 52 69 86 103 120 137 154 171188 205 222 239 256 273 290 307 324 341
Block #
Erasecyc
le#
Block usage
Worn-out quickly!
8/8/2019 12-SOC and Embedded System
9/45
Real-Time Systems:Overload Management for Real-Time Object Tracking
Firm-real-time:(c,4)((4,7),c,4)
Proportional Adjustment:(c,4)(c,7)Average RMS error Average RMS error
Inter-arrival time of frames : 4ms. Workload-scaling factor: 4/7 (57%)
i
"j
FE
((4,7),2,4)
(1,4)
drop drop
i
'j
PP
(2,7)
(1,4)
8/8/2019 12-SOC and Embedded System
10/45
Hardware-Software Co-designReconfigurable computing for overload management
Reconfigurable computing foroverload management Past achievement:
Overload management for event-drivenreal-time embedded systems
Working-in-progress: To deal with transient workload bursts
with hardware acceleration Move critical tasks onto FPGA
Computing resource reclamation On-line floor planning On-line topology reconfiguration for
network-on-chip (NoC)
8/8/2019 12-SOC and Embedded System
11/45
Embedded Systems ( )-Research Directions
Low-power embedded systems
Video compression/decompression
8/8/2019 12-SOC and Embedded System
12/45
Plan in the near future
Low-power AVC/H.264 video CODECalgorithm and system design
8/8/2019 12-SOC and Embedded System
13/45
Multimedia Embedded Systems Lab( ) Research Directions
SoC Design for Advanced Video Codecs
DVB/MHP middleware & Java Runtime
Java Processor for DVB/MHP
Flexible Multimedia Codec SoC Platforms
OS Kernel Scheduler for Tightly-coupledHeterogeneous Multi-core Platforms
8/8/2019 12-SOC and Embedded System
14/45
Multimedia Embedded Systems LabR&D Results
H.264 Codec Accelerators on ARMIntegrator
Java Processor Accelerating Technologieson Spartan 3 and ML-310 Platforms(based on the open source JOP project)
Video Rate Control for HW/SW Co-designed SoCs (patent application)
Tightly-coupled H.264 encoder on TI-
OMAP 5912 Tightly-coupled kernel scheduler module
for ARM-Linux on TI-OMAP 5912
8/8/2019 12-SOC and Embedded System
15/45
8/8/2019 12-SOC and Embedded System
16/45
Architecture and Systems
Research Directions ( )
Embedded processor and SoC
Java processor, JIT compilation &VM
DSP designs and compilation
Low-power systems
Graphic processor
Superscalar ARM processor
Reconfigurable computing
8/8/2019 12-SOC and Embedded System
17/45
Architecture and SystemsR&D Results
ARM9-compatible processor withvideo/audio capabilities
Java stack operations folding Memory Constrained Java Just-in-time
Compiler DSP instruction set extensions Low-power Branch-Target-Buffer Low-power bus encodings
Low-power cache memory Graphic processor design techniques Superscalar ARM Reconfigurable computing
8/8/2019 12-SOC and Embedded System
18/45
8/8/2019 12-SOC and Embedded System
19/45
DSP Instruction Set Extensions
Current research topics Multiple-issue architecture
Exploring ISE in a multiple-issue architecture,
such as superscalar or Very Long InstructionWord (VLIW)
Hardware reusebility Reuse same or similar hardware resources in
different ASFUs while keep same performance
Overcome register file read/write port constraint Try to schedule the input and output of ASFU at
different time slots
8/8/2019 12-SOC and Embedded System
20/45
Low-power Bus Encodings
T0 + Discontinuous Address Table
BIBITS with Register Relabling
T0_BI_1,Variable-Stride,SRWEC
Leading-bytes encoding
I/D Selector,T0 DAT+Stride-Table
I/D Selector,BIBITS_RR+Leading-bytes
8/8/2019 12-SOC and Embedded System
21/45
Low-power Cache Memory
50%
Loop Buffer: loop
code loop buffer
Power Manager:
Low-pow
Loop
CPU
70%
30%
Power Manager
Normal
Low-pow
Low-pow
Normal
low-power
accesses
normal
accesses
8/8/2019 12-SOC and Embedded System
22/45
Graphic Processor
1
2 3 4
5 (PixelShader) (Texture) (Depth Processing)
1. A dynamically reconfigurable graphics hardware forresource reallocatable rendering pipeline
2. A Reconfigurable Texture Mapping Architecture
3. Implementation of texture Compression by GPU Driver4. Register Renaming for Pixel Shaders data/value
management5. Instruction scheduling mechanism for 3D GPU pixel shader6. An Efficient Texture Memory System Designs
7. Alpha Blending without Z Sort
6
8/8/2019 12-SOC and Embedded System
23/45
Superscalar ARM
Goal:a superscalar embedded processor featuring 800MHz clock rate @ 0.13um
1.8DMIPS / MHz superscalar performance under tough pipelinelatency
800K gate count cost-effective design Directions and achievements
Micro-architecture A 12-stage dual-issue superscalar processor with good instruction
fetch rate, issue rate, and efficient forwarding
Simulator A cycle-accurate simulator modeling more details than the well-
known simplescalar simulator
Compiler Working on GCC machine description to optimize performance
8/8/2019 12-SOC and Embedded System
24/45
Reconfigurable Computing
Motivations:Motivations: Improving the DesignImproving the Design
Methodology of EmbeddedMethodology of EmbeddedSystem HardwareSystem Hardware
Providing a BetterProviding a BetterPerformance with LowPerformance with Low
Development CostDevelopment Cost Shorting the Time-to-MarketShorting the Time-to-Market
of SoC Productsof SoC Products
Research Issues:Research Issues: Hardware/Software PartitionHardware/Software Partition Synthesize TechnologySynthesize Technology Reconfigurable ProcessingReconfigurable Processing
Element DesignElement Design
ReconfigurableArchitecture
Processor
(ARM7 / MIPS)
On-Chip Mem /
Cache MemData Engine
Reconfigurable
LogicConfiguration
Controllor
Main bus
Data bus
Memory Management Unit
External bus
Off-Chip
Memory
Memory-mapped
IO
( 1 / 2 )
8/8/2019 12-SOC and Embedded System
25/45
Research overview in SOC andEmbedded Systems ( )
Research theme: Content networking with deep packet inspection by
software and hardware solutions; with applications inInternet security (intrusion detection, anti-virus, anti-spam, content filtering, MSN/P2P management)
Embedded software Embedded Linux solutions: 7-in-1 10-in-1 A startup company, L7 Networks (L7-Networks.com),
2002, for all-in-one security gateways SoC
Key component in content networking: string matching hardware acceleration needed!
FPGA-based development to accelerate Aho Crosaicand Bloom Filtering algorithms
8/8/2019 12-SOC and Embedded System
26/45
Embedded and SoC GroupSelected R&D Results (2/2)
7-in-1 integrated security gateway
String Matching Engine to Accelerate AhoCorasic Machine
Unified Content Filtering Hardware Platform
String Matching Hardware with Bloom Filters
8/8/2019 12-SOC and Embedded System
27/45
LAN/DMZ
Redirect RouteMAC
FilterIn-LAN
FilterOut-WAN
FilterNAT
IPsec
VPNBandwidth
Mgt.
LAN/DMZ to WAN Outbound TrafficPolicy
Route
sniff
YY
Y
Y
RedirectBandwidth
Mgt.
IPsec
deVPNIn-WAN
Filter
Out-LAN
Filter deNAT
Y
Intrusion
Detection
Alerting
System
Route
FTP/POP3/SMTP/
Web/URL Filter with
Many-to-One NAT
WAN
WAN to DMZ/LAN Inbound Traffic
7-in-1 Integrated Security Gateway
7-in-1: VPN, Firewall, NAT, Routing, Content Filtering, Intrusion Detection, Bandwidth Management
Launched a startup in 2002:L7 Networks Inc.
8/8/2019 12-SOC and Embedded System
28/45
Next state
of AC
Bus
Text
Processor
Text
H1H2
Bit
vectors
Possibly
Matched?
.
.
.
.
.
.
.
.
.
.
.
.
Load
bit
vector
.
.
.
.
.
.
.
.
.
.
.
.
Root index tables
Root
next
table.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Index
Next state of Root-
Indexing.
.
.
.
.
.
State table
Load
stateCompute
next state1 0
.
.
.
.
.
.
Next state
address
Next
state
Root-Indexing
matching
Pre-Hashing
matching
AC
matchingRoot index
table
Root next
table
Bit vector
table
Next
state
address
State
table
String MatchingCoprocessor
Current
state
String Matching Engine toAccelerate Aho Corasic Machine
New Parallel Architecture with Pre-Hashing and Root-Indexing
8/8/2019 12-SOC and Embedded System
29/45
Unified Content FilteringHardwarePlatform
Resolve contentfiltering issues
Match withoutinterrupt CPU
Multipleconnectionsmanagement
On-fly match non-fixed payload
Multiple patternsand multiplematched outputs
Length Text Pointer
.
.
.
Text Descriptors in DRAM
StatusText
ID
First Matched
Offset
Last Match
OffsetLength Text Pointer Status
Text
IDFirst Matched
Offset
Last Match
Offset
Length Text Pointer StatusText
ID
First Match
Offset
Last Match
Offset
FA State
FA State
FA State
StringM
atchingSpecificDMA
Strin
gMatchingEngine
DMA/SM
DualPort
SRAM
Address2
Data2
Matched_Interrupt
Matched_Pattern_ID
Matched_Text_ID
Finished_Interrupt
Start
Read
Address1
Data1
Write
Matched_Text_OffsetText_Start_Address
Text_End_Address
Matched_Address
Status
Start
Start
CPU
Register
File
Content Filtering Hardware
8/8/2019 12-SOC and Embedded System
30/45
String Matching Hardware withBloom Filters
Leavingbyte
Entering
byte
Bloomfilter(1)
Bloomfilter(2)
Bloomfilter(3)
shift controller
Feature Set:1. Allow maximum shift distance if
possible.2. Reconfigure rules easily.3. Keep constant hardware complexity.
detectprefix(p,1)detectprefix(p,2)detectfactorin p
Platform:Xilinx ML310 Embedded Development
Platformwith embedded PowerPC 405 processorXilinx Virtex-II Pro XC2VP30 FPGAMontaVista Linux Professional Edition 3.0
8/8/2019 12-SOC and Embedded System
31/45
Embedded and SoC GroupMajor Projects
Excellence Project: Next Generation Information CommunicationNetworks ( , 2004~2008):
(with 24 faculty members)
Network Benchmarking Lab ( ,www.nbl.org.tw, , 2003~2007)
Attack Session Extraction and Comparison with Nessus (Cisco San
Jose, 2005~2006)
Content-based Network Security - Content Classification: Design,Implementation, and Evaluation ( , , 2004~2006)
(with , ) Open Source Product Testing Tools: In-Lab Live Testing ( ,
2005~2006)
http://www.nbl.org.tw/http://www.nbl.org.tw/8/8/2019 12-SOC and Embedded System
32/45
Biography of Ying-Dar Lin
Areas of research interests Design, implementation, analysis,
benchmarking of Internet gatewaydevices (10-in-1: routing, NAT,firewall, VPN, IDP, CF, anti-virus,anti-spam, IM, P2P, bandwidthmanagement, link load balance, etc.)
Internet security and QoS
Content networking
Test technologies of switch, router,WLAN, security, and VoIP
Publications International journal: 39
International conference: 33
IETF Internet Draft: 1
Industrial articles: 124
Books: 2
Patents: 16
Tech transfers: 8
B.S., NTU-CSIE, 1988
Ph.D., UCLA-CS, 1993
Professor, NCTU-CS, 1999~
Founder and Director, ITRI-NCTU
Network Benchmarking Lab (NBL;
www.nbl.org.tw), 2002~ Co-Founder,L7Networks Inc. (
www.L7.com.tw), co-invested by D-
Link, ZyXEL, and Advantech, 2002
Consultant, CCL/ITRI, 2002~
Well-cited paper: Multihop Cellular:A New Architecture for WirelessCommunications, INFOCOM 2000,YD Lin and YC Hsu; # of citations:150
http://www.nbl.org.tw/http://www.l7.com.tw/http://www.l7.com.tw/http://www.nbl.org.tw/8/8/2019 12-SOC and Embedded System
33/45
8/8/2019 12-SOC and Embedded System
34/45
PAM Match Filter
Sp
read
in
g
CTRL
Clock Generator
Clock
RecoveryDivider
Spreading
Gate Count :500
Max. Freq : 80MHz
Spreading
Gate Count :500Max. Freq : 80MHz
Clock Generator
Gate Count :2600Max. Freq : 165MHz
Clock Generator
Gate Count :2600Max. Freq : 165MHz
PAM Match FilterGate Count :4800
Max. Freq : 80MHz
PAM Match FilterGate Count :4800
Max. Freq : 80MHz
Digital Divider
Gate Count :900
Max. Freq : 60MHz
Digital Divider
Gate Count :900
Max. Freq : 60MHz
CTRL
Gate Count :1500
Max. Freq : 80MHz
CTRL
Gate Count :1500
Max. Freq : 80MHz
Clock Recovery
Gate Count :1500
Max. Freq : 178MHz
Clock Recovery
Gate Count :1500
Max. Freq : 178MHz
Data Rate 4/2/1 Mbps
PN Length 11 Chips
Freq. (MHz) 44(outer)/132(inner)
Max. IF 22MHz
Core Size 3700X3700um2
Power 420mW @4Mbps
Wireless Baseband Processor
8/8/2019 12-SOC and Embedded System
35/45
Proto-type 802.11b Baseband+MAC chip
Item Specification
Technology 0.25um CMOS 1P5M
VLSI Type Cell-Based Design
Function 802.11b Baseband+MAC
SystemFrequency
44MHz
Package 208 QFP
Gate Count Not available
Chip Size Not available
Power supply 2.5V (digital)
3.3V (analog)
PowerDissipation
650mW
PLL
A/D (I)
A/D (Q)D/A
8/8/2019 12-SOC and Embedded System
36/45
Architecture and SystemsR&D Results
ARM9-compatible processor withvideo/audio capabilities (technologytransferring)
Java stack operations folding (patents) Asynchronous 8051 on FPGA Low-power Branch-Target-Buffer (patent
application) Low-power bus encodings (patent
applications) Graphic processor design techniques
8/8/2019 12-SOC and Embedded System
37/45
SOC Electrical Design Automation( ) Research Directions
Reliable Interconnect Design Crosstalk-driven Interconnect Design
Design-for-Manufacture (DFM)Interconnect Design
Layout Migration VLSI Cell Migration with Topology
Preservation Post-Layout Platform for Verification
and Optimization
8/8/2019 12-SOC and Embedded System
38/45
SOC Electrical Design AutomationRD Results
Tile-based Gridless ECO Router with GraphReduction Two times faster than existing tile-based routers.
NEMO: A New Full-Chip Gridless Router Faster than all academic gridless routers
Crosstalk-driven Track Assignment
Pre-Detailed Routing Design FlowConsidering Capacitive- and Inductive-NoiseConstraints
8/8/2019 12-SOC and Embedded System
39/45
SOC EDA Group RD Results - NewECO Routing Design Flow
8/8/2019 12-SOC and Embedded System
40/45
SOC EDA Group RD Results Full-Chip Gridless Router
8/8/2019 12-SOC and Embedded System
41/45
Electronic System Level Design( )
R e q u i r e m e n t D e f i n i t i o n
S p e c i f i c a t i o nD e v e l o p m e n t
S p e c i f i c a t i o nM o d e l
H a r d w a r e
R T L D e v e l o p m e n tF P G A
P r o t o t y p e
S y n t h e s i s
S y s t e m A r c h i t e c t u r e
M o d e l D e v e l o p m e n t
S o f t w a r e D e v e l o p m e n t
S y s t e m I n t e g r a t i o na n d V e r i f i c a t i o n w i t h R T L
P l a c e m e n t a n d R o u t e
C h i p F a b r a c t i o n
D e s i g n r e g r e s s i o n
R e q u i r e m e n t D e f i n i t i o n
S p e c i f i c a t i o nD e v e l o p m e n t
S p e c i f i c a t i o nM o d e l
S y s t e m A r c h i t e c t u r e a n dT L M D e v e l o p m e n t
T L M
R T L
H W R e f i n e m e n t
S W
D e s i g na n d
D e v e l o p m e n t
H WV e r i f i c a t i o n
E n v i r o n m e n tD e v e l o p m e n t
TraditionalDesign Flow
Design Flowwith ESL
http://mapl.nctu.edu.tw
System LevelVerification
and
Integration
System Level
Verificationand
Integration
First Time Silicon Success
8/8/2019 12-SOC and Embedded System
42/45
Design Practice: Transaction LevelModeling for H.264 Decoder( )
3 2 - b i t A H B C o n t r o l B u s
E x t e r n a l M e m o r y I n t e r f a c eS
S D R A M 0
C A B A CC A V L C
S
1 2 8 - b i t A H B D a t a B u s
B i t - s t r e a mF I F O
A R M 9C P U
M
I n s t r u c t i o nM e m o r y
D a t aM e m o r y
I Q / I D C TS
M BT e x t u r e
B u f f e r
M B
M o t i o nB u f f e r
D a t a F e t c hS
I n t r a / I n t e r
P r e d i c t i o nS
S u b b l o c kR e c o n s t r u c t
B u f f e r
D e B l o c k i n gS , M
I I PF I F O
D BF I F O
D e I n t e r l a c e r S , M
D IF I F O
S D R A M 1 S D R A M 2 S D R A M 3
H a r d d w a r e I n p u t I n t e r f a c eM , M
S y n cF I F O
H D M I
I n t e r f a c e
S u b b l o c k P r o c e s s i n g U n i t
N A LP a r s i n g
http://mapl.nctu.edu.tw
BusArbitratio
n
ControlBus Output
Interface
DataTransactio
n
Bank
0
Bank
1
Bank2
Bank3
Bank
0
Bank
1
Bank2
Bank3
Bank
0
Bank
1
Bank2
Bank3
Bank
0
Bank
1
Bank2
Bank3
Bank0
Bank1
Bank
2
Bank
3
Bank0
Bank1
Bank
2
Bank
3
Bank0
Bank1
Bank
2
Bank
3
Bank0
Bank1
Bank
2
Bank
3
Bank0
Bank1
Bank2
Bank3
Bank0
Bank1
Bank2
Bank3
Bank0
Bank1
Bank2
Bank3
Bank0
Bank1
Bank2
Bank3
Bank
0
Bank
1
Bank2
Bank3
Bank
0
Bank
1
Bank2
Bank3
Bank
0
Bank
1
Bank2
Bank3
Bank
0
Bank
1
Bank2
Bank3
0 0
0 0
1 1
1 1
0 0
0 0
1 1
1 1
0 0
0 0
1 1
1 1
0 0
0 0
1 1
1 1
0 0
0 0
1 1
1 1
0 0
0 0
1 1
1 1
0 0
0 0
1 1
1 1
0 0
0 0
1 1
1 1
0 0
0 0
1 1
1 1
0 0
0 0
1 1
1 1
0 0
0 0
1 1
1 1
0 0
0 0
1 1
1 1
0 0
0 0
1 1
1 1
0 0
0 0
1 1
1 1
0 0
0 0
1 1
1 1
0 0
0 0
1 1
1 1
0 0
0 0
1 1
1 1
0 0
0 0
1 1
1 1
6 4
6 4
SDRAMControlle
r
Video
Pipe
CPU
M e m o r y
1B
2B
1B
NB
2B
MB
1
2
N
1
2
N
n ( 1 - n )
1nP
2nP
NPn
1n)1( q
Mqn)1(
2n)1( q
Cach
e
8/8/2019 12-SOC and Embedded System
43/45
SoC for Multi-Standard VideoCodec ( )
HDCapturin
g
ColorTransfor
m
Systemon Chip
ARM-9CPU
3-A
Functionaliti
Networkin
g
Architecture
C Model
VideoCodec
http://mapl.nctu.edu.tw
Embedded SRAMand Ob-Chip Bus
BusArbitratio
n
8/8/2019 12-SOC and Embedded System
44/45
VLSI/SOC Research for GraphicsVLSI/SOC Research for Graphics
SystemSystem (( ))
3-D Graphics Demo Here!
VLSI Information ProcessingLABdvisor: Lan-Da Van ([email protected])
/ h f d
mailto:[email protected]:[email protected]8/8/2019 12-SOC and Embedded System
45/45
VLSI/SOC Research for AdaptiveVLSI/SOC Research for Adaptive
CommunicationsCommunications (( ))
(Virtual SOC Platform) CoWare PlatformArchitect :
bus
ROM
ARM926
Instruction
din
Display
stub RAM
APB
SW
AHB
Data
clock
resetdTCMiTCM
Block diagram of platform
0x0
(0x100000)
0x400 0000
(0x100000)
0xc000 0000
(0x1)
20 / 32
20 / 32
1 / 8
32 / 32
32 / 32
AddrBits / DataBits
Memorylocation (size)
FFT HW 0x1000 0000
(0x4)
IU CU
MULTR8-FFT
DFMRAM
FFT/IFFT Chip Design
IP
Implementation