Upload
idalee
View
108
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Introduction to VLSI Algorithmic Design Automation Lab Research. Jun-Dong Cho VLSI Algorithmic Design Automation Lab. http://vada.skku.ac.kr School of Information and Telecommunication Sungkyunkwan Univ. Lab Introduction. - PowerPoint PPT Presentation
Citation preview
Introduction to VLSI Algorithmic Design
Automation Lab Research
Jun-Dong ChoVLSI Algorithmic Design Automation Lab.
http://vada.skku.ac.kr School of Information and Telecommunicat
ionSungkyunkwan Univ.
Lab Introduction
VLSI Algorithmic Design Automation: Lab (vada.skku.ac.kr) directed by Prof. Jun-Dong Cho studies SoC design problems and devoted to VLSI/SoC design automation, communication SoC. The lab consists of 5 Ph.D. students and 7 master students. Lab: SPW, Matlab Signal Processing Workshop, Code Composer, Cadence, Xilinx and high-performance PC’s and W/S’s.
Post Pc = Mobile computing + Intelligent environment
109 times bandwidth and 106 times power consumption 3GOPS to search a song in 0.5sec by humming from a D/B (co
ntaining 2000 songs) and 3D TV also requires several GOPS. By National Technology Roadmap for Semiconductors, in 2010,
4 billion transistors with 50nm is integrated into one chip and its clock speed would be 10GHz
New design methodology is required to handle wiring delay and intrinsic electrical noise.
Ultra low energy (10-100 Mops/mW), Ultra low cost S/W and H/W co-design, S/W-driven Design Reuse (e.g., softwa
re-Defined Radio)
~2003 VADA Research Themes
Low Power Reconfigurable MODEM (CDMA, WCDMA, OFDM) Architecture Design
Lower Power VLSI CAD: H/W & S/W co-design, Architecture-level/Logic-level, Optimizer, Placement/Routing Layout Optimizer
2004 VADA Research Theme
Lower Power Multiprocessor System on a Chip
VADA Class Lectures
2004: SoC Architecture2003: Software Defined Radio, Embedded Sy
stem Design~2002: VLSI Design for Digital Signal Processi
ng, Introduction to Digital Communication, Introduction to Computer Aided Design of Integrated Circuits, Computer Architecture - Microsystems for Multimedia Applications
1999:Low Power VLSI Design Optimization
Biography of Prof. Jun-Dong Cho
1983-1987 Samsung Electronics CAD 1989 Polytechnic Univ. Computer Science MS, 1993 Northwestern Univ. EECS Ph.D.1993.6: The 30th Design Automation Conference (Dallas, TX) Best paper, 1996.6: IEEE Senior Member, 2000.8-2001.8: IBM T.J. Watson 연구소 (Yorktown Height, NY), Design Automation Team Visiting Scientist (2001. 5: IBM Invention Achievement Award). 2000.10: Sungkyunkwan Univ. Best Professor Award 1990: reviewer: IEEE Trans. on VLSI Systems, IEEE Trans. on Circuit & Systems, IEEE Trans. on CAD of Integrated CircuitIn Program Committee of ICCAD, ISQED, SLIP, ASPDAC, ICVC, ASPASIC.Books: High-Performance Physical Design for MCM and Packages, World Scientific Co., Oct. 1996, Wiley Encyclopedia of Electrical and Electronics Eng., VLSI Circuit Layout, John Wiley and Sons, Inc. Co-authored with M. Sarrafzadeh, April, 1999, Chapter "Steiner Tree Problems in VLSI Layout Designs" in "Steiner Trees in Industries" Kluwer Academic Publishers. May 2001, "Lower Power Digital Core Design for Multimedia and Telecommuniations" to be published through IDEC, 2002
OFDM OFDM 방식의 방식의 DVB-T DVB-T 수신 시스템의 수신 시스템의 Software Software 구현구현
Real-DSP Co-Sim Board
DVB-T Model
Signal
Master
TI CCS
SPW
Operation cycle
Extraction
C simulation
Real DSP Model
Mode Creation Process
DVB-T Modeling
Process ASignalSource
Process B SCOPE
HOST
Step#1100% HOST Simulated Model
Process ASignalSource
Process B SCOPE
HOST
Step#2HOST-TARGET Co-Simulation
TARGET DSP HOST
Real-DSP Performance
SPW model
C code : Floating point
Simulink ModelTI Code
Composer
SignalMaster™ Emulation Platform: virtexII XCV6000 FPGA + TMS320C67
01 VLIW DSP
COFDM DVB-T receiver COFDM DVB-T receiver Hardware/Software Hardware/Software 분할 분할
요구되는 연산량 및 실시간 동작 가능성 각 기능 모듈의 동기 / 동작 schedule
Multi processor 의 경우
I/Q gen.DeMoD
FFTFFT Delay &Phase Rotator
FEC
TPS
EqualizerGI Remove
Coarse STR
Carrier Recovery
Fine STR
NCO
Timing Proc.
MRC
GI, ModeDetect
hardware
C Software code / hardware
Hardware 혹은 Software
Design of On-Chip-Bus: Network-On-Chip(NoC)
BUS 2
Interface
Master 3
Arbiter
Interface
Master 4
Interface
Slave 3
Interface
Slave 4
Master/Slave I/F
Bridge
Master/Slave I/F
Interface
Master 3
Interface
Master 1
Interface
Slave 1
Interface
Slave 2Arbiter
Multiprocessor SoC Platform Architecture
ARM926 DMA
Teak DSP
DPRAM
Shared memoryIPs
AMBA AHB
Arbiter&
Decoder
BIU/ Decoder
Memory
Communication interface
ARM
DSP
HW
DVB-T Baseband Receiver => HW/SW partitioning
HW/SW Co-design based on Multiprocessor SoC Platform for DVB-T Baseband Receiver
Teak DSP 플랫폼 구현
1. TEAK DSP 플렛폼 구현TEAK 용 DMA 구현연산 블록 구현XY 메모리 인터페이스 구현ARM 플랫폼과 연동을 위한 BIU(BUS INTERFACE UNIT)구현
2. DVB-T 수신기 HW/SW 분할중
Teak
P RAM
BIU
DPRAMTeakcore
CommunicationInterface
XRAM
YRAM
X,YMIU
DMA
▶ CI (Communication interface) Multiprocessor Platform 구조에서 IP`s 및 Shared memory 접근에 효율성을 증대하기 위해 Crossbar Switch 구조로 설계
▶ CI Controller 사용자에 의해 정의된 우선순위에 따라 slave 점유권을 Master 에게 재분배하는 Arbitration 기능 , CI Cell 제어 기능 .
▶ CI Cell 마스터 (Teak, ARM) 의 전송 요구를 controller 정보를 통해 마스터와 슬레이브 (Shared memory, IP`s) 간의 연결 .
Communication Interface 구현
Low-Power MPEG4 Codec Design
Low-Power Architecture For MPEG4 SOCReduction of loop memory Size (Fig.1)Array Address Translation for low row activation (Fig. 2)Memory Mapping for low data bus Transition
<Fig. 1> <Fig. 2>
LOOP MEMORY
ME
MC
DCT
IMC
Q-IQ IDCT
ZZ RLE VLE
DCT-domain
< Embedded Compression Video Encoder >
DEC ENC
IDCT
DISP
DCT-domain
3D Image Sensing Platform 구현
PE block SDRAM
SRAM& MUX
controller
• 실시간 동작을 위한 영상 처리 processing 의 H/W 설계• 영상 처리 알고리즘 및 SDRAM controller 의 HDL coding 을 통한
FPGA 구현
컴퓨터
VIDEODecoder
MedianFilter
(FPGA)
PCI Interfaceor
USB2.0 Interface
FIFO1
I2C
CCD Camera
3DProcessing
Module(FPGA)
SDRAM SDRAM SDRAM
SDRAM
SDRAM
FIFO2
SystemC 를 사용한 제한수신시스템 POD 암호모듈
Coware 의 ConvergenSC 를 사용하여 ARM926EJS Core 와 AMBA AHB 를 기반으로 한 Virtual Platform 설계를 담당하여 SystemC 를 이용하여 Transaction Level Modeling 방법으로 연구 중
PODinterface
logic
MPEG- 2transport
demultiplexerand
remultiplexer
Out- of- bandprocessing
Copyprotection
engine
Payloaddecryption
engine
CPUSecure
microprocessor
Memory controller
FLASH RAM
PCMCIAconnector
Point of deployment(POD) module
센서 네트워크 기반 모빌 홈케어 시스템
• 혈당기에서 측정된 혈당 데이터를 무선랜을 통해 전송할수 있도록 Wireless Interface Module
• Intel Xscale PXA255• 16MB Flash , 32MB SDRAM• Embedded Linux (2.4.18) 을 OS 로 사용• 10Mbps wired ethernet, 11Mbps WLAN
Other VADA Researches
2000.1 - 2000.12 : Low Power and High Performance Reconfigurable Equalizer for Cable MODEM, Samsung Electronics
2000.5 - 2000.11 : Fast and Low Power Search Engine for Speech Recognition, Samsung Electronics
2000.1 - 2000.12 : Reticle Frame Key Layout Placer for IC Reticles, Samsung Electronics
1999.2 - 1999.11 : Lower Power Decoder For Convolutional Encoder, Samsung Advance Institute of Technology
서버 / 클라이언트 시스템 , 보안 토큰 및 스마트 카드에서 각종 보안 프로토콜 (SSL, SET, IPSEC 등 ) 을 처리하는데 필요한 암호 알고리즘을 고속으로 처리할 수 있음 .
Features - 비밀키 알고리즘 지원 : DES, 3DES, AES, SEED - 해쉬 알고리즘 : MD-5, SHA-1, HAS-160 - 메시지 인증 알고리즘 : HMAC-MD5, HMAC-SHA1, HMAC-HA
S160 - 공개키 알고리즘 : RSA1024, DSA, DH, ECC160 - Modular arithmetic : Addition, Multiplication, Exponentiation, - True Hardware Random Number Generator - 16Kbyte Internal SRAM - PCI 2.1 Master/Slave 모드 지원 - MPC860 Interface - Window 2000 서버용 Device Driver
암호 프로세서 개발
Cable Modem Equalizer
RFDown
convertSAW AGC
Tuner 1IF =36MHz 2 sIF =F / 4
LPF A/D LPF Interpolator Nyquist
8-TapFFEand
PhaseDerotator
cos( n/2)
sin( n/2)
LPF Interpolator Nyquist
Reed-
SolomonDecoder(204,188)
t=8
CorrectedBits
AGCLoop
QDDFS
cos( )n sin( )n
CarrierRecovery
Loop
SymbolRecovery
Loop
12-TapDFE
PLL ClockGeneration
Receiver IC
Adaptive Equalizer
케이블 모뎀용 하향채널 수신부
Low Power Multimedia Design
Low Power Motion EstimatorMPEG-2 real time Motion Estimator2-dimension systolic array dual PE (Process El
ement ) Motion Estimation block: Memory access
Reduction 70%Fast and Low Power Viterbi Search Engine Usi
ng Inverse Hidden Markov Model
Shape-based Area Router for SoC designs
Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Si
gnal Processing
성균관대 조준동
재구성 플랫폼 기반 설계 방법재구성 플랫폼 기반 설계 방법
Real-time reconfiguration architecture with minimum configuration time
Design space explorationDynamic Memory and Power manageme
nt On a Chip (MPoC)
66% chips are not OK on first silicon (2004)
Mid-90s – 6 months late =
> 31% earnings loss
Today 3 month late =
$500M loss
Motivation
Wireless processing system 은 높은 throughput 과 함께 많은 계산을 필요로
하지만 엄격한 power 제약이 있음재구성 SoC 구현은 parallelism 에 의해
성능향상을 시도하고 , IP reuse 를 사용Hot spot bottleneck 에 의한 성능 예측을
통한 Algorithm partitioning
발전 방향
멀티미디어 응용 제품의 확대와 이에 필요한 대용량의 burst 데이터 전송요구를 만족하기 위한 통신 대역폭을 확장
Dual-Core Architecture (ARM+DSP) -> Multiprocessor SoC
HIERARCHY OF PLATFORMS
SDR solutionTier
0HR (HardwareRadio)
전통적인 하드웨어 구현
Tier 1
SCR(software controlled radios)
소프트웨어로 다중 하드웨어 요소에 대한 제어 특징을 구현
Tier 2
SDR(software defined radios)
소프트웨어로 변조와 기저대역 처리를 구현하고 , 다중 주파수 RF 는 고정된 기능의 하드웨어로 구현
Sand-Bridge(ARM+4DSP’s)
Tier 3
ISR(Ideal Software radio)
안테나에서 아날로그 변환 기능을 갖는 RF 구현을 통해 프로그램 능력을 확장
Tier 4
USR(Ultimate software radio)
디지털 처리 능력에 추가하여 , 빠른( 수 millisecond 이내 ) 통신 프로토콜 전환 능력까지 제공
최근 연구동향
Intel’s Reconfigurable Radio Architecture. (mesh + nearest neighbor)
Reconfigurable Baseband Processing, Picochip Portable Components using Containers for Heteroge
neous Platforms, Mercury Computer Systems, Inc. A configurable Platform, Altera, Excalibur, Xilinx Virte
x FPGA Adaptive Computing Machine, Quicksilver Tech. Mercury, Sky, Galileo, Tundra (crossbars, bridges) Virginia Tech’s reconfigurable hardware
Full Application Platform
users design full applications on top of hardware and software architectures
Nexperia Texas Instrument's OMAP multimedia platfor
m Infineon's M-Gold 3G wireless platform,Parthus' Bluetooth platformsARM's PrimeXsys wireless platform
OMAPTM(open multimedia application platform)
OMAP architecture는 platform 의 전체 clocking 과 idle mode 의 전체 control을 할 수 있는 SW/OS 가 있다 .
Dual core architecture 는 task 에 대해 가정 적당한 process 에게 task 를 할당하는 것이 가능
Processor-centric platform
focus on access to a configurable processor but doesn't model complete applications
Program-in Chip-out (PICO), HP Lab.UC. Berkeley, GARPImprov Systems ARCTensilicaTriscend
Fully programmable platform
consisting of FPGA logic and a processor core
System on a programmable chip(SOPC)Altera's Excalibur, Xilinx' Virtex-II Pro an
d Quicklogic's QuickMIPS Xilinx-IBM XBlue architecture
Communication- centric platform
interconnect architecture but doesn't typically provide a processor or a full application
Sonics' SiliconBackplane PalmChip's CoreFrame architectures.
IBM’s Coreconnect
초기의 32 비트에서 시작하여 128 비트까지 대역폭을 확장
Sonics Smart Interconnect IP
SMART (Sonics Methodology and Architecture for Rapid Time-to-Market)
plug-and-play on-chip communications network
Packet-based50 employees in a year IP 및 설계환경 제공 , SoC 설계 지원Cadence 와 연합 SiliconBackplne III 는 통신 + 미디어
Nexperia Digital Video Platform
Designing the initial platform, along with the pnx8500, wasn't quick and easy.
It involved about 300 hardware, software and systems people working between 1999 and 2001, of which 60 were involved with hardware.
Adaptive System on Chip
Scheduled Communication
A tiled architecture 각 tile 은 computational core 이며
각 interface 가 네트웍을 구성 Core interface 는 하나 이상의 tile
에서 발생하는 heterogeneous processing 의 사용을 제공함
The system connect using statically scheduled mesh of interconnect
Data 는 이웃하는 tile 과 communication pipeline 에 의해 이동하므로 fast clock rate 와 interconnection resource 의 시 분할이 가능
Core 와 runtime interconnect 의 재설정 능력에 의해 dynamic power management 를 가능케 한다 .
Communication Interface
-Stream data that passes through a communication interface is scheduled for a specific communication - clock cycle based on data link availability.-the result of scheduling for each interface is a set of instructions for its associated interconnect memory.
Scheduler
The scheduler manages 5lists of threads.Symmetric Multi-Processor(SMP) : Scheduler may be shared by all processors.Distributed : Scheduler exist every processors.
The access to the scheduler must be performed in critical section, and under the protection of a lock.
Other implemented objectsSpin lock : the low level test and set access Mutex : sequentialize access to shared dataSemaphore : sem_post is the only function that can be called in interruption handlers.
Review several types of scheduler
Symmetric Multiprocessor (SMP) • Unique scheduler shared by all processors and protected• The threads can run on any processor, and migrate
Centralized Non SMP (NON_SMP_CS)• Unique scheduler shared by all processors and protected• Every thread is assigned to a given processor and can
run only on itDistributed Non SMP (NON_SMP_DS)
• Many schedulers as processors, and as many locks as schedulers
• Every thread is assigned to a given processor and can run only on it
Implementation
• The scheduler_created variable must be declared with the volatile type qualifier to ensure that compiler will not optimize this seemingly infinite loop.
◈ Booting sequence
Experimental setup
Execution times of the MJPEG application Cycles spent in the CPU idle Loop
◈ Motion JPEG application
Experimental setup
◈ COMM application • Does not exchange data between processors.
• The only resource shared here is the bus
• The application uses the processors at about full power.
9-core and 16-core Mode
Evaluation Methodology
Compiler Research IssuesSynthesis of RTOS elements in the compiler
On the application side: Generation of an efficient application-specific static/run-time scheduler and synchronizationOn the hardware side: Generation of device drivers, memory management primitives, etc. using hardware specifications
Automatic retargetability for family of target architectures while preserving aggressive optimization
Automatic application partitioningMapping of process/task-level concurrency onto multiple PEs using programmer guidance in programmer’s model
Dynamic Power Management
Dynamic Power Management 는 data content 의 run-time variation 에 따른 서로 다른 clock domain 을 이용한 frequency 의 감소로 인한 power saving
Pre-computation 에 의한 반복적인 switching 제거 Valid data stream data일 경우만 연결시켜 불필요한 s
witching 을 제거 Reconfigurable clock based system balancing creates a
n environment of just in time computing which can reduce overall power usage.
Prefetch many frames in a optimal-sized buffer [[email protected]]
Power Metric
Based on network activity and HSPICE circuit simulation of interconnect, the network power consumption(Pint) is:
T : represents the number of tilesPIF/D: overhead of the instruction memory fetch and decodes: the number of streamNvs and Nivs: the number of valid and invalid transfer for strea
m s while Ps is the power consumed in transferring 1 bit through stream s
Dynamic Power Management in On-Chip Communication?
Encoding/decoding relationship• E.g. Bus invert coding, …
Advanced Bus Architecture:Error-resilient Coding
Error-detection code or error-correction code
Energy trade-off between• Retransmission• Error-correction coder/decoder
Energy Issue in On-chip Bus Arbitration
Centralized bus arbitrationAs bus scale grows up, energy inefficient• Energy cost of communicating with the
arbiter and the arbiter complexity grows up more than linearly.
Distributed bus arbitrationCode division multiple access (ISSCC’00)Just began to consider this problem.
Memory vs Reused-IP
Embedded Multiprocessor SoC Memory Management
Time-Space Exploration
Enumerate all Trade-off’s and select the one with the most benefit.
Branch and Bound method for estimating SoC metric.
Jiang Xu and Wayne WolfPrinceton University
A Multimedia Embedded Chip
iSoC
iSoC 는 SoC design 의 scalability, flexibility 를 향상시키기 위한 on-chip communication architecture
Dynamic Configuration규칙적이고 유연한 구조로 global com
munication 을 위한 traffic, power, speed, area requirement 모델링을 위해 예측 가능한 framework 를 제공
iSOC Compiler
Divides applications into parts, each of which fit into a specific core.
Determines data communications between the cores in a space-time fashion
Generate interconnect memory contents for each individual interface.
Application-specific multiprocessor SoC design flow
Cont.
활용 분야
- 선택적인 QoS 를 보장하는 프로토콜을 지원하여 Real Time Application 및 대용량 데이터 대역폭이 요구되는 응용 분야에 적합
- High frame rate video 및 3D 그래픽 관련 등과 같은 멀티미디어 대용량 응용분야 SoC 설계
- 온칩 네트워크 핵심 IP 및 설계 지원 툴을 하나의 플랫폼화한 플랫폼 기반 설계 환경을 구축하여 이를 다양한 SoC 설계에 활용함
Mission Statement
To carry out R&D programs which are 3 to 10 years ahead of today’s industrial needs in the field of ..
Design Technology for Integrated Information and Communication Systems for Human’s Well-Being
Reconfigurable SoC, Multi-media multi-Mode terminals, BAN for health-monitoring
결론
Design space explorationReal-time reconfiguration architecture with minimum configuration timeDynamic Memory and Power manageme
nt On a Chip (MPoC)
References
aSOC: A Scalable, Single-Chip Communications ArchitectureJian Liang, Sriram Swaminathan, and Russell TessierDepartment of Electrical and Computer EngineeringUniversity of Massachusetts, Amherst, MA. 01003.{jliang, tessier}@ecs.umass.edu
Configurable Platforms With Dynamic Platform Management: An Efficient Alternative to Application-Specific System-on-Chips
Krishna Sekar Kanishka Lahiri Sujit [email protected] [email protected] [email protected]. of ECE, UC San Diego, La Jolla, CANEC Laboratories America, Princeton, NJ