94
Communication Protocols

Communication Protocols - Home | Computer Science

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Communication Protocols - Home | Computer Science

Communication Protocols

Page 2: Communication Protocols - Home | Computer Science

Communication Protocols

• Layering– Lower levels provide services to higher level

– Easier to design

– Physical layer

• Lowest level in hierarchy

• Medium to carry data from one actor (device or node) to another

• Protocols: real-time or best effort– Parallel

– Serial

– Wireless

Page 3: Communication Protocols - Home | Computer Science

Parallel communication

• Multiple data, control, and power wires

– One bit per wire

• High data throughput with short distances

• Typically used when connecting devices on same IC or same circuit board

– Bus must be kept short• long parallel wires result in high capacitance values which requires

more time to charge/discharge

• Data misalignment between wires increases as length increases

• Higher cost, bulky

Page 4: Communication Protocols - Home | Computer Science

Parallel Protocols: PCI Bus

• PCI Bus (Peripheral Component Interconnect)– High performance bus designed by Intel

in the 1990’s

– Interconnects CPUs, expansion boards, memory

– Data transfer rates up to 1GBs for 64 bit addresses

– Synchronous bus architecture

– Multiplexed data/address lines

• PCI express

– Serial, point-to-point protocol

Source: http://computer.howstuffworks.com

Page 5: Communication Protocols - Home | Computer Science

Parallel Protocols: ARM Bus

• ARM Bus– Designed and used internally by ARM Corporation

– Interfaces with ARM line of processors

– Many IC design companies have own bus protocol

– Data transfer rate is a function of clock speed

– 32-bit addressing

Page 6: Communication Protocols - Home | Computer Science

Serial Communication

• Single data wire – transmit one bit at a time

• Higher data throughput with long distances

– Less average capacitance, so more bits per unit of time

• Complex protocol and interfacing logic

– Sender needs to decompose word into bits

– Receiver needs to recompose bits into word

– Control signals often on the same wire -> increasing protocol complexity

Page 7: Communication Protocols - Home | Computer Science

Serial Communication

time

bit 0 bit 1 bit n-1

no

char

start stop...

• Parameters:

– Baud (bit) rate.

– Number of bits per character.

– Parity/no parity.

– Even/odd parity.

– Length of stop bit (1, 1.5, 2 bits).

Page 8: Communication Protocols - Home | Computer Science

Serial Protocol: 8251 UART

• Universal asynchronous receiver transmitter• Takes parallel data and transmits serially at up to

max 450 Kbps• 8251 chip functions are integrated into standard PC

interface chip.

CPU 8251

status

(8 bit)

data

(8 bit)

serial

port

xmit/

rcv

Page 9: Communication Protocols - Home | Computer Science

Serial Protocols: I2C

• I2C (Inter-IC)– Two-wire serial bus protocol developed by Philips

Semiconductors ~20 years ago

– Enables peripheral ICs to communicate using simple communication hardware

• appropriate for peripherals where simplicity and low manufacturing cost are more important than speed

– Normal mode: 100 Kbps with 7-bit address

– Fast mode: 3.4 Mbpbs with10-bit address

– Common devices capable of interfacing to I2C bus:• EPROMS, Flash, and some RAM memory, real-time clocks, watchdog

timers, and microcontrollers

• Raspberry PI

Page 10: Communication Protocols - Home | Computer Science

Serial Protocols: USB

• USB (Universal Serial Bus)– Easier connection between PC and peripherals

– USB 1.1 has 2 data rates:• 12 Mbps for increased bandwidth devices

• 1.5 Mbps for lower-speed devices (joysticks, game pads)

– USB 2.0 runs at 480 Mbps; USB 3.1 up to 10 Gbps

– Tiered star topology can be used• One USB device (hub) connected to PC

• Up to 127 USB devices can be connected to hub

– USB host controller • Manages and controls bandwidth and driver software required by each

peripheral

• Dynamically allocates power downstream according to devices connected/disconnected

Page 11: Communication Protocols - Home | Computer Science

PCI Express (PCIe)• Serial, point-to-point protocol

• Bandwidth is very scalable: 1x-16x links

• Max 6.4GBps in either direction on x16

• Switches for connecting different devices

Source: http://computer.howstuffworks.com

Page 12: Communication Protocols - Home | Computer Science

Real-Time Communication & Protocol Examples

Page 13: Communication Protocols - Home | Computer Science

Class Overview

• What’ve covered until now in SW:

– Real-time scheduling, RTOS, RTIO started

• Where we are going today:

– RTIO, HW/SW codesign

• Due today:

– Article on RTIO

• Upcoming:

– HW2 assigned

– Individual project part 2 deadline extended to end of the

day Sunday, 2/19

Page 14: Communication Protocols - Home | Computer Science

Real-time Comm. Requirements– Real-time behavior

– Efficient, economical(e.g. centralized power supply)

– Appropriate bandwidth and communication delay

– Robustness

– Fault tolerance

– Maintainability

– Diagnosability

– Security

– Safety

Page 15: Communication Protocols - Home | Computer Science

Real-time IO•Field bus:

–A family of industrial computer network protocols used for real-time distributed control

• Carrier-sense multiple-access/collision-detection (CSMA/CD); used in Ethernet & CAN

• Alternatives:–Token rings, token busses

–Carrier-sense multiple-access/collision-avoidance: CSMA/CA• Each partner gets an ID (priority). After each bus transfer, all partners try setting their

ID on the bus; partners detecting higher ID disconnect themselves from the bus. Highest priority partner gets guaranteed response time; others communicate only if they are given a chance.

Page 16: Communication Protocols - Home | Computer Science

Event vs. time triggered• Event Triggered (ET):

– Computation/communication triggered by an external event

– Events are primarily generated by changes in the environment

– Efficient — only do things when they need to be done; rest and save energy/cpu time/bandwidth

– High peak-load if multiple events happen at once

– Hard to analyze due to asynchronous nature of events

• Time Triggered (TT):

– Computation/communication triggered by the system clock

– Events happen according to a fixed schedule:

• Inefficient — does things periodically, whether needed or not

– Enhanced analizability due to easily characterizable load, predictable interaction sequences, bus use, etc.

Page 17: Communication Protocols - Home | Computer Science

Time division multiple access

• Each assigned a fixed time slot:

http://www.ece.cmu.edu/~koopman/jtdma/jtdma.html#classical

Master sends sync

Some waiting time

Each slave transmits in its time slot

Variations (truncating unused slots, several slots per slave) exist

Page 18: Communication Protocols - Home | Computer Science

Advantages of TDMA-bussesover priority-driven schemes

– Can provide QoS guarantees

– TDMA resources support temporal composability, by separating resource access of different subsystems

– TDMA resources have a very deterministic timing behavior

– Can be made fault tolerant

– Support for error detection

– Support for error contention

• a faulty subsystem does not affect the correct behavior of the remaining system

[Ernesto Wandeler Lothar Thiele: Optimal TDMA Time Slot and Cycle

Length Allocation for Hard Real-Time Systems, ASP-DAC, 2006]

Page 19: Communication Protocols - Home | Computer Science

Field busses: Profibus• Process Field Bus (Profibus):

• PROFIBUS DP (Decentralized Peripherals) is used to operate sensors and actuators via a centralized controller in factory automation apps; runs at 9.6kbps – 12 Mbps; RS485 allows max 126 devices, but expansion is possible

• ROFIBUS PA (Process Automation) is used to monitor measuring equipment via a process control system in process automation apps; runs at 31.2 kbps; same message format as DP

– Focus on safety; 20% market share for field busses.– Integration with Ethernet via Profinet.

[http://www.profibus.com/]

Page 20: Communication Protocols - Home | Computer Science

Profibus: Application & Data Layers

• Application layer:– DP-V0 for cyclic exchange of data and diagnosis

– DP-V1 for acyclic data exchange and alarm handling

– DP-V2 for slave2slave comm and data exchange broadcast

• Data link: – FDL (Field bus Data Link) combines token passing with a master-slave method for Profibus-DP

• Each byte uses even parity and is transferred asynchronously with a start and stop bit

• Master signals the start of a new telegram with a SYN pause of at least 33 bits

• Various messages possible:

– Token

– Variable data length

– Fixed data length

– No data

– Brief ack

OSI-Layer PROFIBUS

7 Application DPV0 DPV1 DPV2

Management

6 Presentation

--5 Session

4 Transport

3 Network

2 Data Link FDL

1 Physical EIA-485 Optical MBP

Page 21: Communication Protocols - Home | Computer Science

Controller area network (CAN)– Designed by Bosch and Intel in 1981;

– Key concept: • every device can be connected by a single set of wires, and every device that is connected

can freely exchange data with any other device

– Originally designed for cars; now used also for:

• elevator controllers, copiers, telescopes, production-line control systems, and medical instruments

– Binary countdown arbitration (CSMA/CD)• Start from MSB, transmit each bit of priority

• Highest priority wins

– Throughput:10kbit/s - 1 Mbit/s

– Low and high-priority signals• maximum latency of 134 µs for high priority

www.can.bosch.com

Page 22: Communication Protocols - Home | Computer Science

Aircraft communication systems– Information exchange

• information many bytes of data: e.g. digital map, flight plan, etc.

• exchange : a response is expected, at min acknowledgment

• higher speed data link needed

– Control platform: sampling and data transmission• data : digital value of an analog parameter: e.g. speed; height etc.

• No response is expected, but:– Time, integrity and availability are the key drivers.

– The stability of the flight relies on this transmission

• Aeronautical response : ARINC 429 protocol

Page 23: Communication Protocols - Home | Computer Science

ARINC 429 overview• Developed by Aeronautical Radio, Incorporated (ARINC)

• Commonly used standard for the aircraft

• Electrical and data format standard for a 2-wire serial bus with one sender and many listeners.

• Each data is individually identified (by a label) and sent

Physical connection

DataLink/MAC

Network

Transport

Application

label data

A429

label data parity

32 bit

Page 24: Communication Protocols - Home | Computer Science

Information system requirements

• Ensure that the information is transmitted without any error.

– Data needs to be acknowledged

– Messages can be sent again in case of error

• Past aircraft uses A429 but added acknowledgement.

Physical connection

DataLink/MAC

Network

Transport

Application

A429 williamsburg

A429

Page 25: Communication Protocols - Home | Computer Science

ARINC 629

• Multi-transmitter protocol where many units share the same bus; originally designed for Boeing 777.

• Based on "waiting room" protocol:

– Each node is assigned a unique number of mini slots that must elapse with silence on the channel before the data transmission begins

• Three (groups of) time-out parameters:– SG — synchronization gap controlling access to the waiting room

– TGi — terminal gap, the personal time-out of node I

– TI — transmit interval preventing monopolization of channel

– TI > SG > max{TGi }

Page 26: Communication Protocols - Home | Computer Science

TTP (Time-Triggered Protocol)

Sources: Dr. Insup Lee & H. Kopetz

TTP – more than just a protocol– Network protocol

– Operating system scheduling philosophy

– Fault tolerance approach

Time-Triggered approach – Simple to implement

– Stable time base

– Cyclic schedules

Page 27: Communication Protocols - Home | Computer Science

TTP versions

• TTP/A (Automotive Class A = soft real time)

– A scaled-down version of TTP

– A cheaper master/slave variant

• Distributed master slave is expensive

• TTP/C (Automotive Class C = hard real time)

– A full version of TTP

– A fault-tolerant distributed variant

Page 28: Communication Protocols - Home | Computer Science

Protocol Layer in TTP/A

Page 29: Communication Protocols - Home | Computer Science

TTP/A: Polling

• Operation

– Master polls the other nodes (slaves)

– Non-master nodes transmit messages when they are polled

– Inter-slave communication through the master

Page 30: Communication Protocols - Home | Computer Science

Polling Tradeoffs

• Advantages– Simple protocol to implement

– Historically very popular

– Bounded latency for real-time applications

• Disadvantages– Single point of failure from centralized master

– Polling consumes bandwidth

– Network size is fixed during installation• Master can also discover nodes during reconfiguration

Page 31: Communication Protocols - Home | Computer Science

TTP/C

• TTP/C

– A time-triggered communication protocol for safety-critical (fault-tolerant) distributed real-time control systems

– Based on a TDMA media access strategy

• Clock synchronization: Each node measures the difference between the expected and the observed arrival time of a message to calculate the difference between the sender’s & receiver’s clocks

– Fail Silence• A subsystem is fail-silent if it either produces correct results or no

results at all, i.e., it is quiet in case it cannot deliver the correct service

Page 32: Communication Protocols - Home | Computer Science

Application software in host

FTU Membership

Redundancy

Management (RM)

SRU Membership

Clock Synchronization

Media Access: TDMA

Host Layer

FTU CNIFault tolerance unit

Communication

Network Interface (CNI)

FTU Layer

RM Layer

SRU LayerSmallest

Replaceable Unit

Data

Link/Physical

Layer

Basic CNI

TTP/C Protocol Layer

FTU Layer

Group two or more nodes into FTUs

RM Layer

Provide the mechanisms for the cold start of a TTP/C cluster

SRU Layer

Store the data fields of the received frames

Data Link/Physical Layer

Provide the means to exchange frames between the nodes

Page 33: Communication Protocols - Home | Computer Science

(a) Two active nodes, two shadow nodes

(b) Triple modular redundancy: three active nodes with one shadow

(c) Two active nodes without a shadow node

FTU Configuration Examples in TTP/C

Page 34: Communication Protocols - Home | Computer Science

Controller to run protocol

DPRAM (dual ported RAM)

Used for memory-mapped network interface

BG (Bus Guard)

Hardware watchdog to ensure “fail silent”

HW must use highly accurate time sources

Dual redundant crystal oscillators are used for Boeing 777

Page 35: Communication Protocols - Home | Computer Science

TTP/C Frame

• I-Frames used for initialization

• N-Frames used for normal messages

Page 36: Communication Protocols - Home | Computer Science

Cycle in TTP/C

• TDMA Cycle– One FTU sends results twice

– Then next FTU sends results

– And so on, until back to the next message from the first FTU

• Cluster Cycle

– Cluster cycle involves scheduling all messages and tasks

Page 37: Communication Protocols - Home | Computer Science

TTP/A vs. TTP/C

Service TTP/A TTP/C

Clock Synchronization Central

Multimaster

Distributed,

Fault-Tolerant

Mode Switches yes yes

Communication Error Detection Parity 16/24 bit CRC

Membership Service simple full

External Clock Synchronization yes yes

Time-Redundant Transmission yes yes

Duplex Nodes no yes

Duplex Channels no yes

Redundancy Management no yes

Shadow Node no yes

Page 38: Communication Protocols - Home | Computer Science

Pros and Cons of TTP

• Advantages– Simple protocol to implement

– Deterministic response time

– No wasted time for master polling messages

• Disadvantages– Wasted bandwidth when some nodes are idle

– Stable clocks

– Fixed network size during installation

Page 39: Communication Protocols - Home | Computer Science

FlexRay• Robust, scalable, deterministic, and fault-tolerant digital serial

bus system designed for use in automotive applications

• Developed by consortium: BMW, Ford, Bosch,Daimler-Chrysler… – Specified in SDL; finalized in 2009

• Built as extension to TTP and Byteflight protocols.• Improved error tolerance and time-determinism

• Meets requirements with transfer rates >> CAN

– initially targeted for ~ 10Mbit/sec;

– design allows much higher data rates

• TDMA (Time Division Multiple Access) protocol:Fixed time slot with exclusive access to the bus

• Cycle subdivided into a static and a dynamic segments

Page 40: Communication Protocols - Home | Computer Science

TDMA in FlexRay• Exclusive bus access enabled for short time in each case.

Dynamic segment for transmission of variable length information.Fixed priorities in dynamic segment: minislots for each potential sender.Bandwidth used only when it is actually needed.

htt

p://w

ww

.tzm

.de

/Fle

xR

ay/F

lexR

ay_

Intr

od

uctio

n.h

tml

Page 41: Communication Protocols - Home | Computer Science

Structure of Flexray networksBus Guardian (BG) protects the system against failing processors by gating access to Bus Driver (BD)

Page 42: Communication Protocols - Home | Computer Science

Comparison of real-time protocols

FIP = Flexible time triggered protocol; statically scheduled with centralized arbitration

LON = for building automation, uses TDMA with CSMA/CA and dynamically varies the

number of slots per device for each schedule

Page 43: Communication Protocols - Home | Computer Science

Wireless communication

• Infrared (IR)– Frequencies just below visible light spectrum

– Diode emits infrared light to generate signal

– Infrared transistor detects signal

– Cheap to build but need line of sight, limited range

– Data transfer rate of 9.6 kbps and 4 Mbps

• Radio frequency (RF)– Electromagnetic wave frequencies in radio spectrum

– Analog circuitry and antenna needed on both sides

– Line of sight not needed, transmitter power determines range

Page 44: Communication Protocols - Home | Computer Science

RFID• Use of EM field to transfer data, for identifying and tracking tags

attached to objects; no need for line of sight

• Active vs. passive tags– Active transmits ID, they are low power (~10-100uA) but higher cost ($10-

$200/unit retail)

– Passive can be read by RF - no intrinsic power consumption (powered by EM induction) and cheaper ($0.20-0.40)

• Readers– $100+ to $1000s, range from read and report to smart tracking, etc.

• Using RFID for real-time location systems (RTLS)– Only active tags work with range 100m+ in line of sight, or 1-20m

obstructed

– Battery - up to years on a single charge @ <1Hz transmission rate

– Location accuracy as close as 30cm with reader presence

Page 45: Communication Protocols - Home | Computer Science

Bluetooth, BLE, ZigbeeB

luet

oo

th • IEEE 802.15.1

• Developed and licensed by the Bluetooth Special Interest Group (SIG)

BLE • Adopted into

Bluetooth specification

• Bluetooth Low Energy Technology

ZigB

ee • IEEE 802.15.4

• Maintained and published by the ZigBeeAlliance

Page 46: Communication Protocols - Home | Computer Science

Side By Side ComparisonBluetooth BLE ZigBee

Band 2.4GHz 2.4GHz 2.4GHz, 868MHz, 915MHz

Antenna/HW Shared Independent

Power 100 mW ~10 mW 30 mW

Battery Life Days – months 1-2 years 6 months – 2 yrs

Range 10-30 m 10 m 10-75 m

Data Rate 1-3 Mbps 1 Mbps 25-250 Kbps

Network Topologies

Ad hoc, point to point, star

Ad hoc, point to point, star

Mesh, ad hoc, star

Time to Wake and Transmit

3s 3ms 15ms

Security 128-bit encryption 128-bit encryption 128-bit encryption

Page 47: Communication Protocols - Home | Computer Science

Wireless Protocols: 802.11

• IEEE 802.11

– Standard for wireless LANs

– Specifies parameters for PHY and MAC layers of network• PHY layer

– handles transmission of data between nodes

– data transfer rates up to 600 Mbit/s for 802.11n

– operates in 2.4 / 5 GHz frequency band (RF)

• MAC layer

– medium access control layer

– protocol responsible for maintaining order in shared medium

– collision avoidance/detection

Page 48: Communication Protocols - Home | Computer Science

Summary

• Interfacing: on & off chip

• Real-time IO

– Profibus

– CAN

– ARINC

– TTP/A & TTP/C

– FlexRey

• Wireless

– IR, BLE, ZigBee, RFID, 802.11

Page 49: Communication Protocols - Home | Computer Science

Hardware/Software Codesign

Tajana Simunic Rosing

Department of Computer Science and Engineering

University of California, San Diego.

Page 50: Communication Protocols - Home | Computer Science

ES Design

Verification and Validation

HardwareHardware components

Page 51: Communication Protocols - Home | Computer Science

System Architecture: YesterdayPCB design

3MHIGH DENSITY

GraphicsExternal

BusI/OLAN

SCSI/

IDE

DRAMVRAM

Processor

Cache/DRAM

Controller

Audio Motion

VideoVRAM

DRAM

Cache

VRAMDRAM

PCI Bus

ISA/EISA

Add-in board

Page 52: Communication Protocols - Home | Computer Science

A System Architecture: TodayHW/SW Codesign of a SoC

MEMORY

Cache/SRAM

Processor

Core

DSP

Processor

Core

Graphics Video

VRAM

Glue Glue

En

cry

ptio

n/

De

cry

ptio

n

PCI Interface

EISA InterfaceI/

O I

nte

rfac

e

Mo

tio

n

LA

N In

terf

ace

SCSI

Page 53: Communication Protocols - Home | Computer Science

System Design Problem Areas

Interface

Processor ASIC

Memory

Inte

rface

Analog I/O

DM

A

2. HDL Modeling

Architectural synthesis

Logic synthesis

Physical synthesis

3. Software synthesis,

Optimization,

Retargetable code gen.,

Debugging &

Programming environ.

1. Design environment, co-simulation

constraint analysis.

4. Test Issues

Page 54: Communication Protocols - Home | Computer Science

HW-centric view of a Platform

ApplicationSpace

HW-SW Kernel

MEM

FPGACPU Processor(s), RTOS(es)

and SW architecture

IP can be:

• HW or SW

• hard, soft or ‘firm’ (HW)

• source or object (SW)

Scaleable

bus, test, power, IO,

clock, timing architectures

+ Reference Design

Programmable

SW IP

Hardware IP

Pre-Qualified/Verified

Foundation-IP*

Foundry-Specific

HW Qualification

Reconfigurable Hardware Region

(FPGA, LPGA, …)

SW architecture

characterisation

Source: Grant Martin and Henry Chang, “Platform-Based Design:

A Tutorial,” ISQED 2002, 18 March 2002, San Jose, CA.

Page 55: Communication Protocols - Home | Computer Science

SW-Centric View of Platforms

Output DevicesInput devices

Hardware Platform

I O

Hardware

Software

network

Software Platform

Application Software

Platform API

API

RT

OS

BIOS

Device DriversN

etw

ork

Co

mm

un

icat

ion

Source: Grant Martin and Henry Chang, “Platform-Based Design:

A Tutorial,” ISQED 2002, 18 March 2002, San Jose, CA.

Page 56: Communication Protocols - Home | Computer Science

HW/SW Codesign: Motivations

• Benefit from both HW and SW

–HW:

• Parallelism -> better performance, lower power

• Higher implementation cost

–SW

• Sequential implementation -> great for some problems

• Lower implementation cost, but often slower and higher power

Page 57: Communication Protocols - Home | Computer Science

Software or hardware?

Decision based on hardware/ software partitioning

Page 58: Communication Protocols - Home | Computer Science

Hardware/software codesign

Processor P1

Processor P2 Hardware

Specification

Mapping

Page 59: Communication Protocols - Home | Computer Science

System Partitioning

– Good partitioning mechanism:

1) Minimize communication across bus

2) Allows parallelism -> both HW & CPU operating concurrently

3) Near peak processor utilization at all times

process (a, b, c)

in port a, b;

out port c;

{

read(a);

write(c);

}

Specification

Line ()

{

a = …

detach

}

Processor

Capture

Model HW

Partition

Synthesize

Interface

Page 60: Communication Protocols - Home | Computer Science

Determining Communication Level

–Easier to program at application level• (send, receive, wait) but difficult to predict

–More difficult to specify at low level• Difficult to extract from program but timing and

resources easier to predict

Application

Program

Operating

System

I/O driver

I/O bus

Application

hardware

(custom)

I/O driver

I/O bus

Send, Receive, Wait

Register reads/writes

Interrupt service

Bus transactions

Interrupts

Page 61: Communication Protocols - Home | Computer Science

Partitioning Costs

• Software Resources–Performance and power consumption

–Lines of code – development and testing cost

–Cost of components

• Hardware Resources–Fixed number of gates, limited memory & I/O

–Difficult to estimate timing for custom hardware

–Recent design shift towards IP• Well-defined resource and timing characteristics

Page 62: Communication Protocols - Home | Computer Science

Functional

Blocks

Feature

Points

Source Lines of

Code (SLOC)

Software

Development and

Testing Cost

Calibration

Language

Conversion

Equivalent SLOC

including reuse

Software

development effort

Software

maintenance effort

Software schedule

Software

Cost

Analysis

Process

Page 63: Communication Protocols - Home | Computer Science

I/O Count

Die Area

Core Area

Gate Count

Wafer

Characteristics

Design Cost

Tooling Cost

Wafer Fabrication

and Sawing Cost

Single-Chip-

Package Cost

Feature Size

Interconnect

Length

Die Yield

Number Up

Die Cost

Chip Hardware

Cost

I/O Format

Rent’s Rule

Test Development Cost

Productivity, reuse

S/G Ratio

I/O Count

Die Area

Core Area

Gate Count

Wafer

Characteristics

Design Cost

Tooling Cost

Wafer Fabrication

and Sawing Cost

Single-Chip-

Package Cost

Feature Size

Interconnect

Length

Die Yield

Number Up

Die Cost

Chip Hardware

Cost

I/O Format

Rent’s Rule

Test Development Cost

Productivity, reuse

S/G RatioHardware

Cost

Analysis

Process

Page 64: Communication Protocols - Home | Computer Science

Hardware/Software Partitioning

memory

ASIC

ASIC

Processor

Simple architectural model: CPU + 1 or more ASICs on a bus

• Properties of classic partitioning algorithms

– Single rate; Single-thread: CPU waits for ASIC

– Type of CPU is known; ASIC is synthesized

Page 65: Communication Protocols - Home | Computer Science

HW/SW Partitioning Styles

• HW first approach

– start with all-ASIC solution which satisfies constraints

– migrate functions to software to reduce cost

• SW first approach

– start with all-software solution which does not satisfy constraints

– migrate functions to hardware to meet constraints

Page 66: Communication Protocols - Home | Computer Science

Codesign Verification

• Run SW on the CPU

• Simulate HW (Verilog)

Verilog Simulator

Application-specific

hardware

Hardware

Process 1

Hardware

Process 1

Bus interface

Verilog PLI

Software

process 1

Software

process 2

Unix sockets

Page 67: Communication Protocols - Home | Computer Science

SpecC model

Page 68: Communication Protocols - Home | Computer Science

Gate Count Lines of Code

Derived from

Foresight

I/O Count Number Up

Fab. Cost

Test Cost

Die Size

SCP Cost

HW SW

Dev. Cost Dev. Schedule

Maintenance Cost

Cost Analysis

(Ghost)

System Performance

Metrics

System

Cost

Outputs

Co-Design Process

System

Requirements

Capture

Functional

Behavior Block

Diagram

State

Machines

Mini-

specs

Library

Elements

User-

defined

Reusables

Resource

Specification

Architecture

Block Diagram

Data Flow

Monitors

System

Characteristics

Foresight Co-Design

Integrated Toolset

Page 69: Communication Protocols - Home | Computer Science

Industry Initiatives • Seamless Co-Verification Environment-CVE

• Proridium (Foresight)

– Customers: Boeing, Microsoft, Raytheon, Oracle etc.

• CoWare (now in Synopsys)

– Cosimulation and IP integration

– One of founding members of SystemC (language)

• New FPGA synthesis tools incorporate CPUs

• Platform-based design

– Platform: predesigned architecture that designers can use to build systems for a given range of applications

Page 70: Communication Protocols - Home | Computer Science

ILP for HW/SW Partitioning

Ingredients:

• Cost function

• Constraints

Involving linear expressions of integer variables from a set X

Def.: The problem of minimizing (1) subject to the constraints (2) is called an integer programming (IP) problem.

If all xi are constrained to be either 0 or 1, the IP problem said to be a 0/1

integer programming problem.

Cost function )1(,with NxRaxaC i

Xx

iii

i

Constraints: )2(,with: ,, RcbcxbJjXx

jjijiji

i

Page 71: Communication Protocols - Home | Computer Science

FAQ on integer programming

Maximizing the cost done by setting C‘=-C

Integer programming is NP-complete.

Running times increase exponentially with problem size

Commercial solvers can solve for thousands of variables

IP models are a good starting point for modelling even if in the end heuristics have to be used to solve them.

Page 72: Communication Protocols - Home | Computer Science

IP model for HW/SW partitioningNotation:Index set I denotes task graph nodes. Index set L denotes task graph node types

e.g. square root, DCT or FFTIndex set KH denotes hardware component types.

e.g. hardware components for the DCT or the FFT. Index set J of hardware component instancesIndex set KP denotes processors.

All processors are assumed to be of the same typeT is a mapping from task graph nodes to their types

T: I L

Therefore:

Xi,k: =1 if node vi is mapped to HW component type k KH Yi,k: =1 if node vi is mapped to processor k KP NY ℓ,k =1 if at least one node of type ℓ is mapped to processor k KP

Page 73: Communication Protocols - Home | Computer Science

ConstraintsOperation assignment constraints

KHk KPk

kiki YXIi 1: ,,

All task graph nodes have to be mapped either in software or in hardware.

Variables are assumed to be integers.

Additional constraints to guarantee they are either 0 or 1:

1:: , kiXKHkIi

1:: , kiYKPkIi

Page 74: Communication Protocols - Home | Computer Science

Operation assignment constraints

ℓ L, i:T(vi)=cℓ, k KP: NY ℓ,k Yi,k

•For all types ℓ of operations and for all nodes i of this type:

– if i is mapped to some processor k, then that processor must implement the functionality of ℓ.

•Decision variables must also be 0/1 variables:

ℓ L, k KP: NY ℓ,k 1.

Page 75: Communication Protocols - Home | Computer Science

Resource & design constraints

• k KH, the cost for components of that type should not exceed its maximum.

• k KP, the cost for associated data storage area should not exceed its maximum.

• k KP the cost for storing instructions should not exceed its maximum.

• The total cost (k KH) of HW components should not exceed its maximum

• The total cost of data memories (k KP) should not exceed its maximum

• The total cost instruction memories (k KP) should not exceed its maximum

Page 76: Communication Protocols - Home | Computer Science

Scheduling

Processor

p1 ASIC h1

FIR1 FIR2

v1 v2 v3 v4

v9 v10

v11

v5 v6 v7 v8

e3 e4

t

p1

v8 v7

v7 v8

or

...

... ...

...

t

c1

or

...

... ...

...e3

e3

e4

e4t

FIR2 on h1

v4 v3

v3 v4

or

...

... ...

...

Communication channel c1

Page 77: Communication Protocols - Home | Computer Science

Scheduling / precedence constraints

• For all nodes vi1 and vi2 that are potentially mapped to the same processor or hardware component instance, introduce a binary decision variable bi1,i2 withbi1,i2=1 if vi1 is executed before vi2 and

= 0 otherwise.Define constraints of the type(end-time of vi1) (start time of vi2) if bi1,i2=1 and(end-time of vi2) (start time of vi1) if bi1,i2=0

• Ensure that the schedule for executing operations is consistent with the precedence constraints in the task graph.

• Timing constraints need to be met

Page 78: Communication Protocols - Home | Computer Science

Example• HW types H1, H2 and H3 with

costs of 20, 25, and 30.

• Processors of type P.

• Tasks T1 to T5.

• Execution times:

T H1 H2 H3 P

1 20 100

2 20 100

3 12 10

4 12 10

5 20 100

Page 79: Communication Protocols - Home | Computer Science

Operation assignment constraint

T H1 H2 H3 P

1 20 100

2 20 100

3 12 10

4 12 10

5 20 100

X1,1+Y1,1=1 (task 1 mapped to H1 or to P)

X2,2+Y2,1=1

X3,3+Y3,1=1

X4,3+Y4,1=1

X5,1+Y5,1=1

KHk KPk

kiki YXIi 1: ,,

Page 80: Communication Protocols - Home | Computer Science

Operation assignment constraint

•Assume types of tasks are ℓ =1, 2, 3, 3, and 1.

ℓ L, i:T(vi)=c ℓ, k KP: NY ℓ,k Yi,k

Functionality 3 to be implemented on

processor if node 4 is mapped to it.

Page 81: Communication Protocols - Home | Computer Science

Other equations•Time constraint: Application specific hardware required for time constraints under 100 time units.

T H1 H2 H3 P

1 20 100

2 20 100

3 12 10

4 12 10

5 20 100

Cost function:

C=20 #(H1) + 25 #(H2) + 30 # (H3) + cost(processor) + cost(memory)

Page 82: Communication Protocols - Home | Computer Science

Result•For a time constraint of 100 time units and cost(P)<cost(H3):

T H1 H2 H3 P

1 20 100

2 20 100

3 12 10

4 12 10

5 20 100

Solution:T1 H1

T2 H2

T3 P

T4 P

T5 H1

Page 83: Communication Protocols - Home | Computer Science

Separation of scheduling and partitioning

•Combined scheduling/partitioning very complex; Heuristic: Compute estimated schedule

•Perform partitioning for estimated schedule

•Perform final scheduling

•If final schedule does not meet time constraint, go to 1 using a reduced overall timing constraint.

2nd Iteration

t

specification

Actual execution time

1st Iteration

approx. execution time

t

Actual execution time

approx. execution time

New specification

Page 84: Communication Protocols - Home | Computer Science

Summary

• HW/SW codesign is complicated and limited by performance estimates

• Algorithms are in research and development,

– much of the work is still done by expert designers

Page 85: Communication Protocols - Home | Computer Science

Sources and References

• Peter Marwedel, “Embedded Systems Design,” 2004.

• Giovanni De Micheli @ EPFL

• Vincent Mooney @ Gatech

• Nikil Dutt @ UCI

Page 86: Communication Protocols - Home | Computer Science

CMOS VLSI Trends

Yesterday

(1980s)

Today Tomorrow

memory

gate arrays

ASICs

processors

memory

struc. ASIC

ASICs

processors

reconfigurable

SoC

memory

ASICs

processors

reconfigurable(no processor)

platform SoC

custom SoC

struc. ASIC(no processor)

struc. SoC

Page 87: Communication Protocols - Home | Computer Science

Increasing Customization Cost

Example: Design with

80 M transistors in

100 nm technology

Estimated Cost -

$85 M -$90 M

12 – 18 months

Top cost drivers

Verification (40%)

Architecture Design (26%)

Embedded Design 1400 man months (SW)

1150 man months (HW)

HW/SW integration

*Handel H. Jones, ”How to Slow the Design Cost Spiral,” Electronics Design Chain, September 2002, www.designchain.com

Page 88: Communication Protocols - Home | Computer Science

Responses to Increasing Cost

• General purpose ISA

– Universality high volumes and reuse

– Abstraction compilation technologies and high application/development productivity

• Custom silicon for embedded platforms in sufficiently high volumes

– Domain specific ISAs, e.g., DSPs

– Application Specific Standard Products

– Reconfigurable hardware

• HW/SW Codesign

Page 89: Communication Protocols - Home | Computer Science

HW/SW Codesign Issues• Task level concurrency management

Which tasks in the final system?

• High level transformationsTransformation outside the scope of traditional compilers

• Hardware/software partitioningWhich operation mapped to hardware, which to software?

• CompilationHardware-aware compilation

• SchedulingPerformed several times, with varying precision

• Design space explorationSet of possible designs, not just one.

Page 90: Communication Protocols - Home | Computer Science

Partitioning Analysis

• Result of compilation is synthesizable HDL and assembly code for the processor

• Compiler & profiler determine dependence and rough performance estimates

Page 91: Communication Protocols - Home | Computer Science

HW & SW Foundries

• HW1– LSI Logic ASIC Wafer Foundry

Data• 0.18 mm feature size• 8 inch wafers• 6 layers

– TSMC 018 Wafer Processing

• HW2– Samsung Semiconductor ASIC

Wafer Foundry Data• 0.35 mm feature size• 6 inch wafers• 4 layers

– TSMC 035 Wafer Processing

• SW1– Nominal to High development

effort

• SW2– Low to Nominal development

effort

Page 92: Communication Protocols - Home | Computer Science

Packaging

Fabrication

Tooling

Design

Testing

0%

20%

40%

60%

80%

100%1

00

0, N

o

10

00

, 2

0%

10

00

, 4

0%

10

00

0, N

o

10

00

0, 2

0%

10

00

0, 4

0%

10

00

00, N

o

10

00

00, 2

0%

10

00

00, 4

0%

Recu

rrin

gProduction Quantity and Level of Reuse

Pe

rce

nt

of

To

tal

Co

st

Software development

Packaging

Fabrication

Tooling

Design

Testing

MIXED Implementation Using HW1 and SW1

Reuse of:

• Gate-level IP

• Code

Page 93: Communication Protocols - Home | Computer Science

0

5

10

15

20

25

30

35

40

45

0 10 20 30 40 50 60 70 80 90 100

Percent Custom Hardware

To

tal

Co

st

($/c

hip

)

HW1/SW1 HW1/SW2

HW2/SW1 HW2/SW2

Total Cost Per Chip

10,000 Units

Page 94: Communication Protocols - Home | Computer Science

Co-simulation for HW & SW• Transistor-level accurate

– post layout SPICE model

• Gate-level accurate– precise HDL gate delay model

• Cycle accurate– correct transitions at clock edges

– timing information between edges is thrown away

• Bus accurate– cycle accurate bus model

– behavioral model of processor, hardware

• Instruction set accurate– instruction set simulator used for processors

– used for early design space exploration