228
High Performance Embedded Systems July 2020 Electronics Engineering Department Electronics Master Program MPSoCs

High Performance Embedded Systems MPSoCs

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: High Performance Embedded Systems MPSoCs

High Performance Embedded Systems

July 2020

Electronics Engineering Department

Electronics Master Program

MPSoCs

Page 2: High Performance Embedded Systems MPSoCs

Outline

2

• Multiprocessors Architecture and Taxonomy

• Parallel Execution Mechanism

• Multiprocessors Design Techniques

• Memory Systems

• Processors Symmetry

• Co-processing

Page 3: High Performance Embedded Systems MPSoCs

3

Multiprocessors Architecture and Taxonomy

Taken from: https://arstechnica.com/gadgets/2020/05/intels-comet-lake-desktop-cpus-are-here/

Intel 4004 Core i9??

Page 4: High Performance Embedded Systems MPSoCs

4

Multiprocessors Architecture and Taxonomy

Taken from: https://arstechnica.com/gadgets/2020/05/intels-comet-lake-desktop-cpus-are-here/

Intel 4004 Core i9

Page 5: High Performance Embedded Systems MPSoCs

5

Multiprocessors Architecture and Taxonomy

Taken from: https://arstechnica.com/gadgets/2020/05/intels-comet-lake-desktop-cpus-are-here/

Exynos 7420 finFET transistors

Page 6: High Performance Embedded Systems MPSoCs

6

Multiprocessors Architecture and Taxonomy

Taken from: https://arstechnica.com/gadgets/2020/05/intels-comet-lake-desktop-cpus-are-here/

Exynos 7420 finFET transistors

Page 7: High Performance Embedded Systems MPSoCs

7

Multiprocessors Architecture and Taxonomy

Taken from: https://www.researchgate.net/publication/257711815_Where_Photovoltaics_Meets_Microelectronics/figures?lo=1

Page 8: High Performance Embedded Systems MPSoCs

8

Multiprocessors Architecture and Taxonomy

Taken from: https://www.semiconductor-digest.com/2020/03/10/transistor-count-trends-continue-to-track-with-moores-law/

Page 9: High Performance Embedded Systems MPSoCs

9

Multiprocessors Architecture and Taxonomy

Taken from: https://www.elprocus.com/difference-between-soc-system-on-chip-single-board-computer/

SoC

Page 10: High Performance Embedded Systems MPSoCs

10

Multiprocessors Architecture and Taxonomy

Taken from: http://soc.inha.ac.kr/index.php/Project

2-Parallel Radix-

2^4 FFT/IFFT

Processor Chip for

MB-OFDM UWB

communications

Page 11: High Performance Embedded Systems MPSoCs

11

Multiprocessors Architecture and Taxonomy

Taken from: PrSoC: Programmable System-on-chip (SoC) for silicon prototyping IEEE 2008

Page 12: High Performance Embedded Systems MPSoCs

12

Multiprocessors Architecture and Taxonomy

Taken from: https://www.elprocus.com/difference-between-soc-system-on-chip-single-board-computer/

SoC

MPSoC

Page 13: High Performance Embedded Systems MPSoCs

13

Multiprocessors Architecture and Taxonomy

Taken from: https://commons.wikimedia.org/wiki/File:ARM-Cortex-A9.gif

¿MPSoCs?

Page 14: High Performance Embedded Systems MPSoCs

14

Multiprocessors Architecture and Taxonomy

SoC

Taken from: W. Wolf Multiprocessor Systems-On-Chip

• Is an integrated circuit that implements

most or all of the functions of a

complete electronic system.

• The most fundamental characteristic of

an SoC is complexity.

Page 15: High Performance Embedded Systems MPSoCs

15

Multiprocessors Architecture and Taxonomy

SoC

Taken from: W. Wolf Multiprocessor Systems-On-Chip

Many product categories:

• Cell phones.

• Telecommunications and networking.

• Digital television.

• Videos games.

• …..

Page 16: High Performance Embedded Systems MPSoCs

16

Multiprocessors Architecture and Taxonomy

SoC Example

Taken from: W. Wolf Multiprocessor Systems-On-Chip

Processing Elements

Page 17: High Performance Embedded Systems MPSoCs

17

Multiprocessors Architecture and Taxonomy

SoC Example

Taken from: W. Wolf Multiprocessor Systems-On-Chip

Memory

Page 18: High Performance Embedded Systems MPSoCs

18

Multiprocessors Architecture and Taxonomy

SoC Example

Taken from: W. Wolf Multiprocessor Systems-On-Chip

Communications

Page 19: High Performance Embedded Systems MPSoCs

19

Multiprocessors Architecture and Taxonomy

SoC Example

Taken from: W. Wolf Multiprocessor Systems-On-Chip

MPSoCs?

Page 20: High Performance Embedded Systems MPSoCs

20

Multiprocessors Architecture and Taxonomy

MPSoCs?

Wait!

What is a Parallel Architecture?

Page 21: High Performance Embedded Systems MPSoCs

21

Multiprocessors Architecture and Taxonomy

Parallel Architecture

“A large collection of processing elements that communicate and cooperate to

solve large problems fast”. - Almasi.

Taken from: M. Aguilar MPSoCs

Page 22: High Performance Embedded Systems MPSoCs

22

Multiprocessors Architecture and Taxonomy

Parallel Architecture

“A large collection of processing elements that communicate and cooperate to

solve large problems fast”. - Almasi.

Taken from: M. Aguilar MPSoCs

Page 23: High Performance Embedded Systems MPSoCs

23

Multiprocessors Architecture and Taxonomy

Parallel Architecture

“A large collection of processing elements that communicate and cooperate to

solve large problems fast”. - Almasi.

Taken from: M. Aguilar MPSoCs

SoC

HW+SW

Page 24: High Performance Embedded Systems MPSoCs

24

Multiprocessors Architecture and Taxonomy

Parallel Architecture

“A large collection of processing elements that communicate and cooperate to

solve large problems fast”. - Almasi.

Taken from: M. Aguilar MPSoCs

SoC

HW+SW

Technology was increased

Page 25: High Performance Embedded Systems MPSoCs

25

Multiprocessors Architecture and Taxonomy

Parallel Architecture

“A large collection of processing elements that communicate and cooperate to

solve large problems fast”. - Almasi.

Taken from: M. Aguilar MPSoCs

SoC

HW+SW

Technology was increased

Page 26: High Performance Embedded Systems MPSoCs

26

Multiprocessors Architecture and Taxonomy

Parallel Architecture

“A large collection of processing elements that communicate and cooperate to

solve large problems fast”. - Almasi.

Taken from: M. Aguilar MPSoCs

SoC

HW+SW

MPSoCs Technology was increased

Page 27: High Performance Embedded Systems MPSoCs

27

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

Serial Communication

Parallel Communication

Page 28: High Performance Embedded Systems MPSoCs

28

Multiprocessors Architecture and Taxonomy

Here we go

What are MPSoCs?

Taken from: W. Wolf Multiprocessor Systems-On-Chip

Page 29: High Performance Embedded Systems MPSoCs

29

Multiprocessors Architecture and Taxonomy

What are MPSoCs?

“Are the latest incarnation of very largescale integration (VLSI)

technology”

Taken from: W. Wolf Multiprocessor Systems-On-Chip

???

Page 30: High Performance Embedded Systems MPSoCs

30

Multiprocessors Architecture and Taxonomy

What are MPSoCs?

“Are the latest incarnation of very largescale integration (VLSI)

technology”

Taken from: W. Wolf Multiprocessor Systems-On-Chip

???• Silicon

• Power

• Area

• …

Page 31: High Performance Embedded Systems MPSoCs

31

Multiprocessors Architecture and Taxonomy

What are MPSoCs?

“Are the latest incarnation of very largescale integration (VLSI)

technology”

“A single integrated circuit can contain over

100 million transistors, and the International Technology Roadmap

for Semiconductors predicts that chips with a billion transistors are

within reach”

Taken from: W. Wolf Multiprocessor Systems-On-Chip

Page 32: High Performance Embedded Systems MPSoCs

32

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

MPSoCs

“The multiprocessor System-on-Chip (MPSoC) is a system-on-a-chip

(SoC) which uses multiple processors (see multi-core), usually

targeted for embedded applications”.

SoC

HW+SW

MPSoCs Understood!!

Page 33: High Performance Embedded Systems MPSoCs

33

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

MPSoCs

“The multiprocessor system-on-chip (MPSoC) uses multiple CPUs

along with other hardware subsystems to implement a system”. -

Wayne Wolf.

Multiprocessor = Multicore?

Page 34: High Performance Embedded Systems MPSoCs

34

Multiprocessors Architecture and Taxonomy

General Structure MPSoCs

Processing Elements (PE)

• Relation with application context and requirements.

• MPSoCs Homogenous.

• MPSoCs Heterogenous

• Interconnection Element

• Buses.

• NoCs (Networks on Chip). More information here.

Taken from: M. Agular MPSoCs

Page 35: High Performance Embedded Systems MPSoCs

35

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

Advantage in MPSoCs

• Performance

• Powerful platform (Cores).

• Users.

• Applications.

• Tasks into same application.

Power Consumption

• Low power from parallel approach.

Page 36: High Performance Embedded Systems MPSoCs

36

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

Page 37: High Performance Embedded Systems MPSoCs

37

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

MPSoCs Beneficts

• Wireless.

• Multimedia: video and audio.

• Health.

• Military.

• Avionics.

• Aerospacial

Page 38: High Performance Embedded Systems MPSoCs

38

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

Multiprocessor = Multicore?

Page 39: High Performance Embedded Systems MPSoCs

39

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

Multiprocessor

• Platform with several CPUs.

• Parallel approach was used.

Multicore

• Platform with only one CPU.

• Multiple cores into CPU.

Page 40: High Performance Embedded Systems MPSoCs

40

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

MPSoCs Software

Page 41: High Performance Embedded Systems MPSoCs

41

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

Parallel Approaches

Parallel

Approaches

Page 42: High Performance Embedded Systems MPSoCs

42

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

Parallel Approaches

Parallel

Approaches

Bits

Threads

TasksInstructions

Data

Page 43: High Performance Embedded Systems MPSoCs

43

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

MPSoCs Architecture?

Page 44: High Performance Embedded Systems MPSoCs

44

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

MPSoCs

PEs

Page 45: High Performance Embedded Systems MPSoCs

45

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

MPSoCs

Homogeneous Heterogenous

PEs

Page 46: High Performance Embedded Systems MPSoCs

46

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

MPSoCs Heterogeneous

• Different PEs, for example

• GPU (General Purpose Unit).

• DSPs.

• HW Acceleration

• NoC infrastructure.

• Better performance and power consumption

• Use in embedded system.

• Portable system.

• Power consumption.

Page 47: High Performance Embedded Systems MPSoCs

47

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

MPSoCs Homogenous

• PEs to conform a SoC.

• PE is instanced several times.

• Instance is connected by communication

infrastructure.

• Flexibility and Scalability.

• Worst power consumption.

Page 48: High Performance Embedded Systems MPSoCs

48

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

MPSoCs Taxonomy?

Page 49: High Performance Embedded Systems MPSoCs

49

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

Processor Organization

Serial

SISD

Uniprocessor

Multi ALUOverlapped

operations

Parallel

SIMD MISD MIMD

Vector

processor

Array

processor

Tightly

coupled

Loosely

coupled

Shared

memory

Symmetric

multiprocessor

(SMP)Nonuniform

memory access

(NUMA)

Distributed

memory

Clusters

Page 50: High Performance Embedded Systems MPSoCs

50

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

Where are located MPSoCs?

Page 51: High Performance Embedded Systems MPSoCs

51

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

Processor Organization

Serial

SISD

Uniprocessor

Multi ALUOverlapped

operations

Parallel

SIMD MISD MIMD

Vector

processor

Array

processor

Tightly

coupled

Loosely

coupled

Shared

memory

Symmetric

multiprocessor

(SMP)Nonuniform

memory access

(NUMA)

Distributed

memory

Clusters

MPSoCs

Page 52: High Performance Embedded Systems MPSoCs

52

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs and Parallel Computing Lectures Notes

MISD

• This architecture executing

different operations over

different data bundle.

• Multiprocessing approach and

MPSoCs were located in this

category.

Page 53: High Performance Embedded Systems MPSoCs

53

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

MPSoCs

Homogeneous Heterogenous

PEs

Memory Access

Uniform Access (UMA)

Non-Uniform Access (NUMA)

Processors Symmetry

SMP (Symmetric Multi-processing)

AMP (Asymmetric Multi-processing)

Memory Architecture

Share Memory

Distributed memory

MPSoCs Architecture

Page 54: High Performance Embedded Systems MPSoCs

54

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

ARM Cortex A9

Page 55: High Performance Embedded Systems MPSoCs

55

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

Analog Devices - Blackfin

Page 56: High Performance Embedded Systems MPSoCs

56

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

TI Davinci DM355

Page 57: High Performance Embedded Systems MPSoCs

57

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

TI OMAP5

Page 58: High Performance Embedded Systems MPSoCs

58

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

ST Microelectronic Nomadik

Page 59: High Performance Embedded Systems MPSoCs

59

Multiprocessors Architecture and Taxonomy

Taken from: M. Aguilar MPSoCs

Nexperia

Page 60: High Performance Embedded Systems MPSoCs

60

Multiprocessors Architecture and Taxonomy

Taken from: http://linuxgizmos.com/new-arm-cortex-a72-nearly-twice-as-fast-as-cortex-a57/

Cortex-A72

Page 61: High Performance Embedded Systems MPSoCs

Outline

61

• Multiprocessors Architecture and Taxonomy

• Parallel Execution Mechanism

• Multiprocessors Design Techniques

• Memory Systems

• Processors Symmetry

• Co-processing

Page 62: High Performance Embedded Systems MPSoCs

62

Parallel Execution Mechanism

Taken from: Parallel Computing Lectures Notes

Page 63: High Performance Embedded Systems MPSoCs

63

Parallel Execution Mechanism

Taken from: Parallel Computing Lectures Notes

Consider following approaches

• Shared memory.

• Threads.

• Message Passing.

• Data Parallel.

• Hybrid.

• Others

All these can be implemented on any architecture.

Page 64: High Performance Embedded Systems MPSoCs

64

Parallel Execution Mechanism

Taken from: Parallel Computing Lectures Notes

Consider following approaches

• Shared memory.

• Threads.

• Message Passing.

• Data Parallel.

• Hybrid.

• Others

All these can be implemented on any architecture.

Page 65: High Performance Embedded Systems MPSoCs

65

Parallel Execution Mechanism

Taken from: Parallel Computing Lectures Notes

Shared Memory

• Tasks share a common address space, which they read and write

asynchronously.

• Various mechanisms such as locks/semaphores may be used control access to

the shared memory.

• Advantage

• No need to explicitly communicate of data tasks simplified programming.

• Disadvantages

• Need to take care when managing memory, avoid synchronization conflicts.

• Harder to control data locality.

Page 66: High Performance Embedded Systems MPSoCs

66

Parallel Execution Mechanism

Taken from: Parallel Computing Lectures Notes

In Hardware

• Shared memory systems use:

• UMA (Uniform Memory Access)

• NUMA (Non- Uniform Memory

Access)

• COMA (Cache-only memory

architecture)

In Software

• Inter-process communication (IPC).

• Virtual memory mapping.

Page 67: High Performance Embedded Systems MPSoCs

67

Parallel Execution Mechanism

Taken from: Parallel Computing Lectures Notes

Consider following approaches

• Shared memory.

• Threads.

• Message Passing.

• Data Parallel.

• Hybrid.

• Others

All these can be implemented on any architecture.

Page 68: High Performance Embedded Systems MPSoCs

68

Parallel Execution Mechanism

Taken from: Parallel Computing Lectures Notes

Threads

• A thread can be considered as a

subroutine in the main program.

• Threads communicate with each other

through the global memory.

• Commonly associated with shared

memory architectures and operating

systems.

• Posix Threads or pthreads.

• OpenMP.

Page 69: High Performance Embedded Systems MPSoCs

69

Parallel Execution Mechanism

Taken from: Parallel Computing Lectures Notes

Threads

Advantages

• Responsiveness.

• Faster execution.

• Lower resource consumption.

• Better system utilization.

• Simplified share and communication

• Parallelization.

• Drawbacks

• Synchronization.

• Thread crashes a process.

Page 70: High Performance Embedded Systems MPSoCs

70

Parallel Execution Mechanism

Taken from: Parallel Computing Lectures Notes

Consider following approaches

• Shared memory.

• Threads.

• Message Passing.

• Data Parallel.

• Hybrid.

• Others.

All these can be implemented on any architecture.

Page 71: High Performance Embedded Systems MPSoCs

71

Parallel Execution Mechanism

Taken from: Parallel Computing Lectures Notes

Message Passing

• A set of tasks that use their own local memory

during computation.

• Data exchange through sending and receiving

messages.

• Data transfer usually requires cooperative

operations to be performed by each process.

• For example, a send operation must have a

matching receive operation.

• MPI

• Example here

Page 72: High Performance Embedded Systems MPSoCs

72

Parallel Execution Mechanism

Taken from: Parallel Computing Lectures Notes

Consider following approaches

• Shared memory.

• Threads.

• Message Passing.

• Data Parallel.

• Hybrid.

• Others.

All these can be implemented on any architecture.

Page 73: High Performance Embedded Systems MPSoCs

73

Parallel Execution Mechanism

Taken from: Parallel Computing Lectures Notes

Data Parallel

• Consider the following characteristics:

• Parallel work performs operations on a data set,

organized into a common structure.

• Tasks works collectively on the same data structure,

with each task working on a different partition.

• Tasks perform the same operation on their partition.

• Shared memory architectures, all tasks may have

access to the data structure through global memory.

• Distributed memory architectures the data structure is

split up and resides as “chunks” in the local memory

of each task.

• More information here.

Page 74: High Performance Embedded Systems MPSoCs

74

Parallel Execution Mechanism

Taken from: Parallel Computing Lectures Notes

Consider following approaches

• Shared memory.

• Threads.

• Message Passing.

• Data Parallel.

• Hybrid.

• Others

All these can be implemented on any architecture.

Page 75: High Performance Embedded Systems MPSoCs

75

Parallel Execution Mechanism

Taken from: Parallel Computing Lectures Notes

Hybrid

• Using various models (for example OpenMP/MPI).

• Single Program Multiple Data (SPMD)

• Single program is executed by all tasks simultaneously.

• Multiple Program Multiple Data (MPMD)

• Has multiple executables. Task can execute the same of different programs

as other task

Page 76: High Performance Embedded Systems MPSoCs

76

Parallel Execution Mechanism

Taken from: Parallel Computing Lectures Notes

Consider following approaches

• Shared memory.

• Threads.

• Message Passing.

• Data Parallel.

• Hybrid.

• Others. (Depends on the architecture)

All these can be implemented on any architecture.

Page 77: High Performance Embedded Systems MPSoCs

77

Parallel Execution Mechanism

Taken from: Parallel Computing Lectures Notes

Others

• MCAPI (Multicore Association)

• Poly-Platform

• CUDA

Page 78: High Performance Embedded Systems MPSoCs

78

Parallel Execution Mechanism

Taken from: Parallel Computing Lectures Notes

Others

• MCAPI (Multicore Association)

• Poly-Platform

• CUDA

Page 79: High Performance Embedded Systems MPSoCs

79

Parallel Execution Mechanism

Taken from: https://en.wikipedia.org/wiki/Multicore_Association

MCAPI (Multicore Association)

• Founded in 2005

• First specification and referred to as MCAPI

• Based on message-passing

• Target is addressed to system, toolchain and programming language

heterogeneous.

• Active working

• MCAPI

• Virtualization.

• Open Asymmetric Multiprocessing (OpenAMP)

Page 80: High Performance Embedded Systems MPSoCs

80

Parallel Execution Mechanism

Taken from: Parallel Computing Lectures Notes

Others

• MCAPI (Multicore Association)

• Poly-Platform

• CUDA

Page 81: High Performance Embedded Systems MPSoCs

81

Parallel Execution Mechanism

Taken from: http://polycoresoftware.com/poly-platform

Poly-Platform

• Collection productivity tools

• Migrating process

• Main approach multicore platforms.

• Driven supports for several SoC, OS and Transport Information.

Page 82: High Performance Embedded Systems MPSoCs

82

Parallel Execution Mechanism

Taken from: Parallel Computing Lectures Notes

Others

• MCAPI (Multicore Association)

• Poly-Platform

• CUDA

Page 83: High Performance Embedded Systems MPSoCs

83

Parallel Execution Mechanism

Taken from: https://en.wikipedia.org/wiki/CUDA

CUDA

• Initial release 2007.

• Parallel computing platform and

application programming interface.

• Created by NVIDIA.

• GPU approach.

• Supports in Windows, Linux and

macOS.

Page 84: High Performance Embedded Systems MPSoCs

Outline

84

• Multiprocessors Architecture and Taxonomy

• Parallel Execution Mechanism

• Multiprocessors Design Techniques

• Memory Systems

• Processors Symmetry

• Co-processing

Page 85: High Performance Embedded Systems MPSoCs

85

Multiprocessors Design Techniques

Taken from: W.Wolf High-Performance Embedded Computing

Embedded Systems Design Flows

• Co-design flows.

• Platform-based design.

• Two-stage process.

• Programming platforms.

• Standards-Based design.

MPSoCs?

Page 86: High Performance Embedded Systems MPSoCs

86

Multiprocessors Design Techniques

Challenges

• Software development is a major challenge for MPSoC designers.

• Software that runs on the multiprocessor must be high performance, real time,

and low power.

• Each MPSoC requires its own software development environment: compiler,

debugger, simulator, and other tools.

• Better understanding of how to abstract tasks properly to capture the essential

characteristics of their low-level behavior for system-level analysis.

Taken from: W.Wolf Multprocessor Systems on Chip

Page 87: High Performance Embedded Systems MPSoCs

87

Multiprocessors Design Techniques

Taken from: W. Wolf Multiprocessor Systems on Chip

Challenges

• Networks-on-chips have emerged over the past few years as an architectural

approach to the design of single-chip multiprocessors.

• FPGAs have emerged as a viable alternative to application-specific integrated

circuits (ASICs) in many markets. FPGA fabrics are also starting to be

integrated into SoCs.

Page 88: High Performance Embedded Systems MPSoCs

88

Multiprocessors Design Techniques

Taken from: SoC Lectures Notes

Challenges

• C code sequence is not easy to replace.

• Algorithm specification contains parallel specifications (Model of computation

KPN, SDF, etc).

• Not new programming languages.

• Automatically and parallel programming.

• Platform-based design (SW synthesis) or SW and HW synthesis.

Page 89: High Performance Embedded Systems MPSoCs

89

Multiprocessors Design Techniques

Taken from: MPSoCs https://slideplayer.com/slide/8773117/

Challenges

All MPSOC design have the following requirements:

• Speed.

• Power.

• Area.

• Application Performance.

• Time to market.

Page 90: High Performance Embedded Systems MPSoCs

90

Multiprocessors Design Techniques

Taken from: SoC Lectures Notes

MPSoCs Programming

• Task mapping to multiprocessor or cores.

• Communication inter-processor management.

• Data transfer engine management.

• Shared resource management.

• Memory management

• Debugging.

Page 91: High Performance Embedded Systems MPSoCs

91

Multiprocessors Design Techniques

Taken from: SoC Lectures Notes

MPSoCs Exploration

• Divide computational and communications.

Page 92: High Performance Embedded Systems MPSoCs

92

Multiprocessors Design Techniques

Taken from: SoC Lectures Notes

Virtual Processing Unit VPU

• Load simulator: It is a high-level simulation of

the core behavior.

• Functional simulator: Native execution of

tasks, scheduling is given by the VPU OS.

Page 93: High Performance Embedded Systems MPSoCs

93

Multiprocessors Design Techniques

Taken from: SoC Lectures Notes

Virtual Processing Unit VPU

Allows spatial and temporal modeling of task mapping to PE

Page 94: High Performance Embedded Systems MPSoCs

94

Multiprocessors Design Techniques

Taken from: SoC Lectures Notes

Virtual Platform

• It is a software model that allows the exploration of hardware and software.

• It allows hardware platform exploration and optimization.

• Software development, debugging and optimization.

• Concurrent hardware and software design.

Page 95: High Performance Embedded Systems MPSoCs

95

Multiprocessors Design Techniques

Taken from: SoC Lectures Notes

Virtual Platform

• Requirements:

• High speed in terms of simulation process.

• Compromise between simulation speed and precision.

• Flexibility.

• Usability by developers not experts in hardware.

Page 96: High Performance Embedded Systems MPSoCs

96

Multiprocessors Design Techniques

Design Techniques

• Core-based Strategy.

• Wrappers.

• System-level design flow.

• Platform-based design.

• Component-based design.

Taken from: W.Wolf High-Performance Embedded Computing

Page 97: High Performance Embedded Systems MPSoCs

97

Multiprocessors Design Techniques

Design Techniques

• Core-based Strategy.

• Wrappers.

• System-level design flow.

• Platform-based design.

• Component-based design.

Taken from: W.Wolf High-Performance Embedded Computing

Page 98: High Performance Embedded Systems MPSoCs

98

Multiprocessors Design Techniques

Core-based Strategy

• Core-based synthesis strategy for the IBM CoreConnect bus.

• Coral tool automates many of the tasks required to stitch together multiple

cores using virtual components.

• Each virtual component describes the interfaces for a class of real

components.

• Coral can synthesize some combinational logic.

• Coral also checks the connections between cores using Boolean decision

diagrams.

Taken from: W.Wolf High-Performance Embedded Computing

Page 99: High Performance Embedded Systems MPSoCs

99

Multiprocessors Design Techniques

Core-based Strategy

Core Connect provides three types of busses:

• A high-speed processor local bus (PLB).

• An on-chip peripheral bus (OPB).

• A device control register (DCR) bus for configuration and status information.

Taken from: W.Wolf High-Performance Embedded Computing

Page 100: High Performance Embedded Systems MPSoCs

100

Multiprocessors Design Techniques

Taken from: SoC Lectures Notes

Core-based Strategy

Page 101: High Performance Embedded Systems MPSoCs

101

Multiprocessors Design Techniques

Design Techniques

• Core-based Strategy.

• Wrappers.

• System-level design flow.

• Platform-based design.

• Component-based design.

Taken from: W.Wolf High-Performance Embedded Computing

Page 102: High Performance Embedded Systems MPSoCs

102

Multiprocessors Design Techniques

Wrappers

• Treats both hardware and software as

components.

• A wrapper is a design unit that interfaces a

module to another module.

• A wrapper can be hardware or software

and may include both.

• The wrapper performs only low-level

adaptations, such as protocol

transformationTaken from: W.Wolf High-Performance Embedded Computing

Page 103: High Performance Embedded Systems MPSoCs

103

Multiprocessors Design Techniques

Wrappers

Heterogeneous multiprocessor introduce several types of problems:

• Many chips have multiple communication networks to match the network to

the processing needs. Synchronizing communication across network

boundaries is more difficult than communicating within a network.

• Specialized hardware is often needed to accelerate interprocess

communication and free the CPU for more interesting computations.

• The communication primitives should be at a higher level of abstraction than

shared memory.

Taken from: W.Wolf High-Performance Embedded Computing

Page 104: High Performance Embedded Systems MPSoCs

104

Multiprocessors Design Techniques

Wrappers

A dedicated CPU is added to the system, its software must be adapted

in several ways:

1. The software must be updated to support the platform’s communication

primitives.

2. Optimized implementations of the host processor’s communication

functions must be provided for interprocessor communication.

3. Synchronization functions must be provided.

Taken from: W.Wolf High-Performance Embedded Computing

Page 105: High Performance Embedded Systems MPSoCs

105

Multiprocessors Design Techniques

Design Techniques

• Core-based Strategy.

• Wrappers.

• System-level design flow.

• Platform-based design.

• Component-based design.

Taken from: W.Wolf High-Performance Embedded Computing

Page 106: High Performance Embedded Systems MPSoCs

106

Multiprocessors Design Techniques

System-Level Design

• An abstract platform is created from a combination of system requirements,

models of the software, and models of the hardware components.

• Abstract platform is analyzed to determine the application’s performance

and power/energy consumption.

• Based on the results of this analysis, software is allocated and scheduled

onto the platform.

• Golden abstract architecture that can be used to build the implementation.

Taken from: W.Wolf High-Performance Embedded Computing

Page 107: High Performance Embedded Systems MPSoCs

107

Multiprocessors Design Techniques

System-Level Design

Taken from: W.Wolf High-Performance Embedded Computing

Page 108: High Performance Embedded Systems MPSoCs

108

Multiprocessors Design Techniques

System-Level Design

Major elements of an abstract architecture:

1. Software tasks are described by their data and

scheduling dependencies; they

interface to an API.

2. Hardware components consist of a core and an

interface.

3. The hardware/software integration is modeled by

the communication network that connects the CPUs

that run the software and the hardware IP

cores.

Taken from: W.Wolf High-Performance Embedded Computing

Page 109: High Performance Embedded Systems MPSoCs

109

Multiprocessors Design Techniques

Design Techniques

• Core-based Strategy.

• Wrappers.

• System-level design flow.

• Platform-based design.

• Component-based design.

Taken from: W.Wolf High-Performance Embedded Computing

Page 110: High Performance Embedded Systems MPSoCs

110

Multiprocessors Design Techniques

Platform-based Design

• Design space: platform selection

• Platform programming

• Multi-CPUs

• Concurrency

• Real-Time

• Platform developer must be

provided tools (compiler, editors,

debuggers, simulators, etc)

Taken from: Introduction to Embedded Systems

Page 111: High Performance Embedded Systems MPSoCs

111

Multiprocessors Design Techniques

Platform-based Design

• Start with functional specifications

• Task graphs.

• Nodes: Task to complete

• Edges: Communication and

dependence between tasks

• Execution time on the nodes.

• Data communicated on the edges.

Taken from: MPSoCs https://slideplayer.com/slide/8773117/

Page 112: High Performance Embedded Systems MPSoCs

112

Multiprocessors Design Techniques

Platform-based Design

• Map task on pre-designed HW.

• Use extended task graph for SW and

Communication

Taken from: MPSoCs https://slideplayer.com/slide/8773117/

Page 113: High Performance Embedded Systems MPSoCs

113

Multiprocessors Design Techniques

Platform-based Design

• Map task on pre-designed HW.

• Use extended task graph for SW and

Communication

Taken from: MPSoCs https://slideplayer.com/slide/8773117/

Page 114: High Performance Embedded Systems MPSoCs

114

Multiprocessors Design Techniques

Design Techniques

• Core-based Strategy.

• Wrappers.

• System-level design flow.

• Platform-based design.

• Component-based design.

Taken from: W.Wolf High-Performance Embedded Computing

Page 115: High Performance Embedded Systems MPSoCs

115

Multiprocessors Design Techniques

Component Based Design

• Conceptual MPSOCs platform.

• SW, Processor, IP, Communication

Fabric.

• Parallel Development

• Use APIs.

• Quicker time to market.

Taken from: MPSoCs https://slideplayer.com/slide/8773117/

Page 116: High Performance Embedded Systems MPSoCs

116

Multiprocessors Design Techniques

Component Based Design

Taken from: MPSoCs https://slideplayer.com/slide/8773117/

Page 117: High Performance Embedded Systems MPSoCs

117

Multiprocessors Design Techniques

Multicore Application Programming Studio (MAPS)

• Developed at RWTH Aachen University in Germany.

• It is a platform that offers tools and technologies for MPSoC programming.

• Main features are:

• Sequential C code partition.

• Parallel programming model.

• Mapping and scheduling.

• Different types of applications.

• Functional Verification (Virtual Platform).

• Multiple applications environment.

• IDE easy to use.

Taken from: M. Aguilar SoC Lectures Notes

Page 118: High Performance Embedded Systems MPSoCs

118

Multiprocessors Design Techniques

MAPS Flow

Taken from: M. Aguilar SoC Lectures Notes

Page 119: High Performance Embedded Systems MPSoCs

119

Multiprocessors Design Techniques

MAPS Flow

Taken from: M. Aguilar SoC Lectures Notes

Page 120: High Performance Embedded Systems MPSoCs

120

Multiprocessors Design Techniques

MAPS Programming Model: C for Paralell Network (CPN)

• Embedded Systems programming was used C language.

• CPN is a language developed as an extension of ANSI C in order to

describe process networks (KPN and SDF).

• A compiler called cpn-cc performs a transformation source-to-source to

convert code in CPN to code C standard with the APIs of the target

architecture.

Taken from: M. Aguilar SoC Lectures Notes

Page 121: High Performance Embedded Systems MPSoCs

121

Multiprocessors Design Techniques

MAPS Programming Model: C for Paralell Network (CPN)

Taken from: M. Aguilar SoC Lectures Notes

Page 122: High Performance Embedded Systems MPSoCs

122

Multiprocessors Design Techniques

MAPS Virtual Platform (MVP)

• MAPS Virtual Platform (MVP)

• High level: abstract PEs based on SystemC.

• Low level: (Instruction Set Simulators) ISS-based virtual platform.

• “mPhone” smartphone virtual.

Taken from: M. Aguilar SoC Lectures Notes

Page 123: High Performance Embedded Systems MPSoCs

123

Multiprocessors Design Techniques

Virtual Processing Element

• It is a parameterizable processing element.

• Clock frequency.

• Type (RISC, VLIW, DSP, etc).

• Scheduling algorithm (Round robin, EDF, based on priorities, etc).

Taken from: M. Aguilar SoC Lectures Notes

Page 124: High Performance Embedded Systems MPSoCs

Outline

124

• Multiprocessors Architecture and Taxonomy

• Parallel Execution Mechanism

• Multiprocessors Design Techniques

• Memory Systems

• Processors Symmetry

• Co-processing

Page 125: High Performance Embedded Systems MPSoCs

125

Memory Systems

Memory Systems

Taken from: W. Wolf High-Performance Embedded Computing

Memory Systems

Page 126: High Performance Embedded Systems MPSoCs

126

Memory Systems

Memory Systems

• The memory system is a traditional bottleneck in computing.

• Not only are memories slower than processors, but processor clock rates

are increasing much faster than memory cycle times.

Taken from: W. Wolf High-Performance Embedded Computing and

https://www.taringa.net/+serviciotecnico/consulta-cuello-de-botella-cpu-debil-en-gpu-potente_15casq

Page 127: High Performance Embedded Systems MPSoCs

127

Memory Systems

Memory Systems

Taken from: Multi-core architectures

Page 128: High Performance Embedded Systems MPSoCs

128

Memory Systems

Memory Systems

Taken from: MPSoCs Hardware platforms Lectures Notes

Page 129: High Performance Embedded Systems MPSoCs

129

Memory Systems

Memory Systems

• Start with a look at parallel memory systems in scientific multiprocessors.

• Consider models for memory and motivations for heterogeneous memory

systems.

• Look at what sorts of consistency mechanisms are needed in embedded

multiprocessors.

Taken from: W. Wolf Hugh-Performance Embedded Computing

Page 130: High Performance Embedded Systems MPSoCs

130

Memory Systems

Memory Systems

Taken from: W. Wolf High-Performance Embedded Computing

Memory Systems

Homogeneous Heterogenous

Page 131: High Performance Embedded Systems MPSoCs

131

Memory Systems

Memory Systems

Taken from: W. Wolf High-Performance Embedded Computing

Memory Systems

Homogeneous Heterogenous

Page 132: High Performance Embedded Systems MPSoCs

132

Memory Systems

Memory Systems

In terms of understanding memory systems considers following case study:

• Scientific processors traditionally use parallel, homogeneous memory

systems to increase system performance.

• Multiple memory banks allow several memory accesses to occur

simultaneously.

Taken from: W. Wolf High-Performance Embedded Computing

Page 133: High Performance Embedded Systems MPSoCs

133

Memory Systems

Memory Systems

• Each bank is separately addressable.

Taken from: W. Wolf High-Performance Embedded Computing

Page 134: High Performance Embedded Systems MPSoCs

134

Memory Systems

Memory Systems

• If the memory system has n banks,

then n accesses can be performed in

parallel.

• This is known as the peak access

rate.

Taken from: W. Wolf High-Performance Embedded Computing

Page 135: High Performance Embedded Systems MPSoCs

135

Memory Systems

Memory Systems

• Cannot keep the memory busy all of

the time.

• A simple statistical model lets us

estimate performance of a random-

access program.

Taken from: W. Wolf High-Performance Embedded Computing

Page 136: High Performance Embedded Systems MPSoCs

136

Memory Systems

Memory Systems

• Assume that the program accesses a

certain number of sequential

locations, then moves to some other

location.

• Where:

• λ describes probability of a

nonsequential memory access (a

branch in code to be a nonconsecutive

data location).

• k describes sequential accesses.Taken from: W. Wolf High-Performance Embedded Computing

Page 137: High Performance Embedded Systems MPSoCs

137

Memory Systems

Memory Systems

• Where:

• 𝑝 𝑘 = 𝜆 1 − 𝜆 𝑘−1

• And the mean length of a sequential

access sequence is:

• 𝐿𝑏 =1− 1−𝜆 𝑚

𝜆

Taken from: W. Wolf High-Performance Embedded Computing

Page 138: High Performance Embedded Systems MPSoCs

138

Memory Systems

Memory Systems

• Use program statistics to estimate

the average probability of

nonsequential accesses, design the

memory system accordingly.

• Use software techniques to

maximize the length of access

sequences wherever possible.

Taken from: W. Wolf High-Performance Embedded Computing

Page 139: High Performance Embedded Systems MPSoCs

139

Memory Systems

Memory Systems

Taken from: W. Wolf High-Performance Embedded Computing

Memory Systems

Homogeneous Heterogenous

Page 140: High Performance Embedded Systems MPSoCs

140

Memory Systems

Memory Systems

• Embedded systems can make use of multiple-bank memory systems, but they

also make use of more heterogeneous memory architectures.

• They do so to improve the real-time performance and lower the power

consumption of the memory system.

Taken from: W. Wolf High-Performance Embedded Computing

Page 141: High Performance Embedded Systems MPSoCs

141

Memory Systems

Memory Systems

Why do heterogeneous memory systems

improve real-time performance?

Taken from: W. Wolf High-Performance Embedded Computing

Page 142: High Performance Embedded Systems MPSoCs

142

Memory Systems

Memory Systems

• The energy required to perform a memory access depends in part on the size of

the memory block being accessed.

• A heterogeneous memory may be able to use smaller memory blocks, reducing

the access time.

• Energy per access also depends on the number of ports on the memory block.

• By reducing the number of units that can access a given part of memory, the

heterogeneous memory system can reduce the energy required to access that

part of the memory space.

Taken from: W. Wolf High-Performance Embedded Computing

Page 143: High Performance Embedded Systems MPSoCs

143

Memory Systems

Memory Systems

Taken from: W. Wolf High-Performance Embedded Computing

Memory Systems

Homogeneous Heterogenous

Consistent Memory Systems

Page 144: High Performance Embedded Systems MPSoCs

144

Memory Systems

Memory Systems

Taken from: W. Wolf High-Performance Embedded Computing

Shared

variables

Consistent

Memory Systems

Snooping

cachesCache

consistency

Page 145: High Performance Embedded Systems MPSoCs

145

Memory Systems

Memory Systems

• Shared variables

• To worry about whether two processors see the same state of a shared variable.

• If reads and writes of two processors are interleaved, then one processor may write

the variable after another one has written it, causing that processor to erroneously

assume the value of the variable.

• Critical sections, guarded by semaphores, to ensure that critical operations occur in

the right order.

• Use atomic test-and-set operations (often called spin locks) to guard small pieces of

memory.

Taken from: W. Wolf High-Performance Embedded Computing

Page 146: High Performance Embedded Systems MPSoCs

146

Memory Systems

Memory Systems

• Cache consistency

• If two processors access the same

memory location, then each may have

a copy of the location in its own cache.

• If one processing element writes that

location, then the other will not

immediately see the change and will

make an incorrect computation.

Taken from: W. Wolf High-Performance Embedded Computing

Page 147: High Performance Embedded Systems MPSoCs

147

Memory Systems

Memory Systems

• Snooping Cache

• This type of cache contains extra

logic that watches the

multiprocessor interconnect for

memory transactions.

• When it sees a write to a location

that it currently contains, it

invalidates that location.

Taken from: W. Wolf High-Performance Embedded Computing

Page 148: High Performance Embedded Systems MPSoCs

148

Memory Systems

Memory Systems

Taken from: W. Wolf High-Performance Embedded Computing

Shared

memory

Memory Systems

Architecture

Hybrid

memoryDistributed

memory

Page 149: High Performance Embedded Systems MPSoCs

149

Memory Systems

Memory Systems

Taken from: W. Wolf High-Performance Embedded Computing

Shared

memory

Memory Systems

Architecture

Hybrid

memoryDistributed

memory

Page 150: High Performance Embedded Systems MPSoCs

150

Memory Systems

Memory Systems

• Shared Memory

• Shared memory parallel computers vary

widely, but generally have in common the

ability for all processors to access all

memory as global address space.

• Multiple processors can operate

independently but share the same memory

resources.

Taken from: W. Wolf High-Performance Embedded Computing,

https://en.wikipedia.org/wiki/Shared_memory#/media/File:Shared_memory.svg,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 151: High Performance Embedded Systems MPSoCs

151

Memory Systems

Memory Systems

• Shared Memory

• Changes in a memory location effected by

one processor are visible to all other

processors.

• Historically, shared memory machines

have been classified as UMA and NUMA,

based upon memory access times.

Taken from: W. Wolf High-Performance Embedded Computing,

https://en.wikipedia.org/wiki/Shared_memory#/media/File:Shared_memory.svg,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 152: High Performance Embedded Systems MPSoCs

152

Memory Systems

Memory Systems

• Shared Memory (Uniform Memory

Access UMA)

• Most commonly represented today by

Symmetric Multiprocessor (SMP)

machines.

• Identical processors.

Taken from: W. Wolf High-Performance Embedded Computing,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 153: High Performance Embedded Systems MPSoCs

153

Memory Systems

Memory Systems

• Shared Memory (Uniform Memory

Access UMA)

• Equal access and access times to

memory.

Taken from: W. Wolf High-Performance Embedded Computing,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 154: High Performance Embedded Systems MPSoCs

154

Memory Systems

Memory Systems

• Shared Memory (Uniform Memory Access

UMA)

• Sometimes called CC-UMA - Cache

Coherent UMA. Cache coherent means if one

processor updates a location in shared

memory, all the other processors know about

the update. Cache coherency is accomplished

at the hardware level.

Taken from: W. Wolf High-Performance Embedded Computing,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 155: High Performance Embedded Systems MPSoCs

155

Memory Systems

Memory Systems

• Shared Memory (Non-Uniform Memory

Access NUMA)

• Often made by physically linking two or

more SMPs.

• One SMP can directly access memory of

another SMP.

Taken from: W. Wolf High-Performance Embedded Computing,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 156: High Performance Embedded Systems MPSoCs

156

Memory Systems

Memory Systems

• Shared Memory (Non-Uniform Memory

Access NUMA)

• Not all processors have equal access time to

all memories.

• Memory access across link is slower

• If cache coherency is maintained, then may

also be called CC-NUMA - Cache Coherent

NUMA.

Taken from: W. Wolf High-Performance Embedded Computing,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 157: High Performance Embedded Systems MPSoCs

157

Memory Systems

Memory Systems

• Shared Memory

• Advantages

• Global address space provides a user-

friendly programming perspective to

memory.

• Data sharing between tasks is both fast

and uniform due to the proximity of

memory to CPUs.

Taken from: W. Wolf High-Performance Embedded Computing,,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 158: High Performance Embedded Systems MPSoCs

158

Memory Systems

Memory Systems

• Shared Memory

• Disadvantages

• Primary disadvantage is the lack of

scalability between memory and CPUs.

Adding more CPUs can geometrically

increases traffic on the shared memory-CPU

path, and for cache coherent systems,

geometrically increase traffic associated with

cache/memory management.

Taken from: W. Wolf High-Performance Embedded Computing,,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 159: High Performance Embedded Systems MPSoCs

159

Memory Systems

Memory Systems

• Shared Memory

• Disadvantages

• Programmer responsibility for

synchronization constructs that ensure

"correct" access of global memory.

Taken from: W. Wolf High-Performance Embedded Computing,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 160: High Performance Embedded Systems MPSoCs

160

Memory Systems

Memory Systems

Taken from: W. Wolf High-Performance Embedded Computing

Shared

memory

Memory Systems

Architecture

Hybrid

memoryDistributed

memory

Page 161: High Performance Embedded Systems MPSoCs

161

Memory Systems

Memory Systems

• Distributed Memory

• Like shared memory systems, distributed

memory systems vary widely but share a

common characteristic.

• Distributed memory systems require a

communication network to connect inter-

processor memory.

Taken from: W. Wolf High-Performance Embedded Computing,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 162: High Performance Embedded Systems MPSoCs

162

Memory Systems

Memory Systems

• Distributed Memory

• Processors have their own local memory.

Memory addresses in one processor do not

map to another processor, so there is no

concept of global address space across all

processors.

Taken from: W. Wolf High-Performance Embedded Computing,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 163: High Performance Embedded Systems MPSoCs

163

Memory Systems

Memory Systems

• Distributed Memory

• Because each processor has its own local

memory, it operates independently.

Changes it makes to its local memory have

no effect on the memory of other

processors. Hence, the concept of cache

coherency does not apply.

Taken from: W. Wolf High-Performance Embedded Computing,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 164: High Performance Embedded Systems MPSoCs

164

Memory Systems

Memory Systems

• Distributed Memory

• When a processor needs access to data in

another processor, it is usually the task of

the programmer to explicitly define how

and when data is communicated.

Synchronization between tasks is likewise

the programmer's responsibility.

Taken from: W. Wolf High-Performance Embedded Computing,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 165: High Performance Embedded Systems MPSoCs

165

Memory Systems

Memory Systems

• Distributed Memory

• The network "fabric" used for data transfer

varies widely, though it can be as simple as

Ethernet.

Taken from: W. Wolf High-Performance Embedded Computing,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 166: High Performance Embedded Systems MPSoCs

166

Memory Systems

Memory Systems

• Distributed Memory

• Advantages

• Memory is scalable with the number

of processors. Increase the number of

processors and the size of memory

increases proportionately.

Taken from: W. Wolf High-Performance Embedded Computing,

https://en.wikipedia.org/wiki/Shared_memory#/media/File:Shared_memory.svg,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 167: High Performance Embedded Systems MPSoCs

167

Memory Systems

Memory Systems

• Distributed Memory

• Advantages

• Each processor can rapidly access its

own memory without interference and

without the overhead incurred with

trying to maintain global cache

coherency.

Taken from: W. Wolf High-Performance Embedded Computing,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 168: High Performance Embedded Systems MPSoCs

168

Memory Systems

Memory Systems

• Distributed Memory

• Advantages

• Cost effectiveness: can use

commodity, off-the-shelf processors

and networking.

Taken from: W. Wolf High-Performance Embedded Computing,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 169: High Performance Embedded Systems MPSoCs

169

Memory Systems

Memory Systems

• Distributed Memory

• Disadvantages

• The programmer is responsible for

many of the details associated with data

communication between processors.

• It may be difficult to map existing data

structures, based on global memory, to

this memory organization.

• .Taken from: W. Wolf High-Performance Embedded Computing,

https://en.wikipedia.org/wiki/Shared_memory#/media/File:Shared_memory.svg,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 170: High Performance Embedded Systems MPSoCs

170

Memory Systems

Memory Systems

• Distributed Memory

• Disadvantages

• Non-uniform memory access times -

data residing on a remote node takes

longer to access than node local data.

Taken from: W. Wolf High-Performance Embedded Computing,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 171: High Performance Embedded Systems MPSoCs

171

Memory Systems

Memory Systems

Taken from: W. Wolf High-Performance Embedded Computing

Shared

memory

Memory Systems

Architecture

Hybrid

memoryDistributed

memory

Page 172: High Performance Embedded Systems MPSoCs

172

Memory Systems

Memory Systems

• Hybrid Memory

• The largest and fastest computers in the

world today employ both shared and

distributed memory architectures.

• The shared memory component can be a

shared memory machine and/or graphics

processing units (GPU).

Taken from: W. Wolf High-Performance Embedded Computing,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 173: High Performance Embedded Systems MPSoCs

173

Memory Systems

Memory Systems

• Hybrid Memory

• The distributed memory component is

the networking of multiple shared

memory/GPU machines, which know

only about their own memory - not the

memory on another machine. Therefore,

network communications are required to

move data from one machine to another.

Taken from: W. Wolf High-Performance Embedded Computing,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 174: High Performance Embedded Systems MPSoCs

174

Memory Systems

Memory Systems

• Hybrid Memory

• Current trends seem to indicate that this

type of memory architecture will

continue to prevail and increase at the

high end of computing for the

foreseeable future.

Taken from: W. Wolf High-Performance Embedded Computing,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 175: High Performance Embedded Systems MPSoCs

175

Memory Systems

Memory Systems

• Hybrid Memory

• Advantages and Disadvantages

• Whatever is common to both shared and

distributed memory architectures.

• Increased scalability is an important

advantage.

• Increased programmer complexity is an

important disadvantage.

Taken from: W. Wolf High-Performance Embedded Computing,

https://computing.llnl.gov/tutorials/parallel_comp/#MemoryArch

Page 176: High Performance Embedded Systems MPSoCs

176

Memory Systems

Design Memory Systems?

Taken from: W. Wolf High-Performance Embedded Computing,

Page 177: High Performance Embedded Systems MPSoCs

177

Memory Systems

Design Memory Systems

A simple model of memory components for parallel memory design would include

three major parameters of a memory component of a given size.

• Area: The physical size of the logical component. This is most important in chip design, but it also

relates to cost in board design.

• Performance: The access time of the component. There may be more than one parameter, with

variations for read and write times, page mode accesses, and so on.

• Energy: The energy required per access. If performance is characterized by multiple modes, energy

consumption will exhibit similar modes.

Taken from: W. Wolf High-Performance Embedded Computing,

Page 178: High Performance Embedded Systems MPSoCs

178

Memory Systems

Design Memory Systems

Taken from: W. Wolf High-Performance Embedded Computing,

Page 179: High Performance Embedded Systems MPSoCs

179

Memory Systems

Memory Systems

Taken from: https://www.xataka.com/ordenadores/el-cuello-de-botella-de-la-ley-de-moore-no-esta-en-los-procesadores-sino-en-las-memorias

Page 180: High Performance Embedded Systems MPSoCs

180

Memory Systems

Memory Systems

Taken from: https://www.xataka.com/ordenadores/el-cuello-de-botella-de-la-ley-de-moore-no-esta-en-los-procesadores-sino-en-las-memorias

Page 181: High Performance Embedded Systems MPSoCs

Outline

181

• Multiprocessors Architecture and Taxonomy

• Parallel Execution Mechanism

• Multiprocessors Design Techniques

• Memory Systems

• Processors Symmetry

• Co-processing

Page 182: High Performance Embedded Systems MPSoCs

182

Processors Symmetry

Taken from: W. Wolf High-Performance Embedded Computing

Symmetric

SMP

Multi-processing

Asymmetric

AMP

Page 183: High Performance Embedded Systems MPSoCs

183

Processors Symmetry

Taken from: W. Wolf High-Performance Embedded Computing

Symmetric

SMP

Multi-processing

Asymmetric

AMP

Page 184: High Performance Embedded Systems MPSoCs

184

Processors Symmetry

Taken from: M. Aguilar SoCs

Symmetric Multi-processing (SMP)

• System with multiple processors or cores that are communicated by a single

shared memory and are controlled by a single operating system

Page 185: High Performance Embedded Systems MPSoCs

185

Processors Symmetry

Taken from: https://www.geeksforgeeks.org/what-is-smp-symmetric-multi-processing/

Symmetric Multi-processing (SMP)

• Identical: All the processors are treated equally i.e. all are identical.

• Communication: Shared memory is the mode of communication among

processors.

• Complexity: Are complex in design, as all units share same memory and data

bus.

• Expensive: They are costlier in nature.

• Unlike asymmetric where a task is done only by Master processor, here tasks of

the operating system are handled individually by processors.

Page 186: High Performance Embedded Systems MPSoCs

186

Processors Symmetry

Taken from: https://www.geeksforgeeks.org/what-is-smp-symmetric-multi-processing/

Symmetric Multi-processing (SMP)

• Applications

• This concept finds its application in parallel processing, where time-sharing

systems(TSS) have assigned tasks to different processors running in parallel

to each other, also in TSS that uses multithreading i.e. multiple threads

running simultaneously.

Page 187: High Performance Embedded Systems MPSoCs

187

Processors Symmetry

Taken from: https://www.geeksforgeeks.org/what-is-smp-symmetric-multi-processing/

Symmetric Multi-processing (SMP)

• Advantages

• Throughput: Since tasks can be run by all the processors unlike in

asymmetric, hence increased degree of throughput(processes executed in unit

time).

• Reliability: Failing a processor doesn’t fail whole system, as all are equally

capable processors, though throughput do fail a little.

Page 188: High Performance Embedded Systems MPSoCs

188

Processors Symmetry

Taken from: https://www.geeksforgeeks.org/what-is-smp-symmetric-multi-processing/

Symmetric Multi-processing (SMP)

• Disadvantages

• Complex design: Since all the processors are treated equally by OS, so

designing and management of such OS become difficult.

• Costlier: As all the processors share the common main memory, on account

of which size of memory required is larger implying more expensive.

Page 189: High Performance Embedded Systems MPSoCs

189

Processors Symmetry

Taken from: https://www.enea.com/globalassets/downloads/operating-systems/enea-oseck/enea-smp-platform-for-xilinx-zynq-datasheet.pdf

Symmetric Multi-processing (SMP)

Page 190: High Performance Embedded Systems MPSoCs

190

Processors Symmetry

Taken from: https://www.enea.com/globalassets/downloads/operating-systems/enea-oseck/enea-smp-platform-for-xilinx-zynq-datasheet.pdf

Symmetric Multi-processing (SMP)

More information here

Page 191: High Performance Embedded Systems MPSoCs

191

Processors Symmetry

Taken from: W. Wolf High-Performance Embedded Computing

Symmetric

SMP

Multi-processing

Asymmetric

AMP

Page 192: High Performance Embedded Systems MPSoCs

192

Processors Symmetry

Taken from: M. Aguilar SoC Lectures Notes

Asymmetric Multi-processing (AMP)

• Is a system with multiple processors or cores that are communicated by a single

shared memory and each processor or cores is controlled by an independent

operating system (different or equal).

Page 193: High Performance Embedded Systems MPSoCs

193

Processors Symmetry

Asymmetric Multi-processing (AMP)

• Characteristics

• Processors are not treated equally.

• Tasks of the operating system are done by master processor.

• No Communication between Processors as they are controlled by the

master processor.

• Process are master-slave.

• Systems are cheaper.

• Systems are easier to design.

Taken from: https://www.geeksforgeeks.org/what-is-smp-symmetric-multi-processing/

Page 194: High Performance Embedded Systems MPSoCs

194

Processors Symmetry

Taken from: https://www.openampproject.org/old_website/docs/mca/BKK19%20OpenAMP%20Introduction.pdf

Asymmetric Multi-processing (AMP)

Page 195: High Performance Embedded Systems MPSoCs

195

Processors Symmetry

Taken from: https://www.openampproject.org/old_website/docs/mca/BKK19%20OpenAMP%20Introduction.pdf

Asymmetric Multi-processing (AMP)

Page 196: High Performance Embedded Systems MPSoCs

196

Processors Symmetry

Taken from: https://www.openampproject.org/old_website/docs/mca/BKK19%20OpenAMP%20Introduction.pdf

Asymmetric Multi-processing (AMP)

Page 197: High Performance Embedded Systems MPSoCs

197

Processors Symmetry

Asymmetric Multi-processing (AMP)

Taken from: https://github.com/OpenAMP/open-amp

Page 198: High Performance Embedded Systems MPSoCs

198

Processors Symmetry

Asymmetric Multi-processing (AMP)

Taken from: https://github.com/OpenAMP/open-amp

Page 199: High Performance Embedded Systems MPSoCs

199

Processors Symmetry

Taken from: https://www.openampproject.org/old_website/docs/mca/BKK19%20OpenAMP%20Introduction.pdf

Asymmetric Multi-processing (AMP)

Page 200: High Performance Embedded Systems MPSoCs

Outline

200

• Multiprocessors Architecture and Taxonomy

• Parallel Execution Mechanism

• Multiprocessors Design Techniques

• Memory Systems

• Processors Symmetry

• Co-processing

Page 201: High Performance Embedded Systems MPSoCs

201

Co-processing

Taken from: https://www.researchgate.net/publication/250840737_Automatic_Generation_of_Application-

Specific_Architectures_for_Heterogeneous_MPSoC_through_Combination_of_Processors/figures

Page 202: High Performance Embedded Systems MPSoCs

202

Co-processing

Taken from: https://www.researchgate.net/publication/250840737_Automatic_Generation_of_Application-

Specific_Architectures_for_Heterogeneous_MPSoC_through_Combination_of_Processors/figures

Page 203: High Performance Embedded Systems MPSoCs

203

Co-processing

Taken from: https://www.researchgate.net/publication/250840737_Automatic_Generation_of_Application-

Specific_Architectures_for_Heterogeneous_MPSoC_through_Combination_of_Processors/figures

Page 204: High Performance Embedded Systems MPSoCs

204

Co-processing

Taken from: http://www.cecs.uci.edu/~papers/esweek06/codes/p288.pdf

Page 205: High Performance Embedded Systems MPSoCs

205

Co-processing

Taken from: https://www.researchgate.net/publication/221656884_A_Generic_Wrapper_Architecture_for_Multi-

Processor_SoC_Cosimulation_and_Design/figures?lo=1

Page 206: High Performance Embedded Systems MPSoCs

206

Co-processing

Taken from: https://link.springer.com/chapter/10.1007/978-3-319-01113-4_1

Page 207: High Performance Embedded Systems MPSoCs

207

Co-processing

What is a coprocessor?

Page 208: High Performance Embedded Systems MPSoCs

208

Co-processing

A coprocessor is:

• A computer processor used to supplement functions of the primary processor.

• Several operations performed by the coprocessor such as:

• Floating Point (FPU).

• Graphics Processing.

• Signal Processing.

• Cryptography.

• Etc, ……

Taken from: https://youtu.be/xrMUv9ZVKY0

Page 209: High Performance Embedded Systems MPSoCs

209

Co-processing

A coprocessor is:

• By offloading processor intensive tasks from the main processor, coprocessor can

accelerate system performance.

• Coprocessors allow a line of computers to be customized, so that customers who

do not need extra performance need not pay for it.

Taken from: https://youtu.be/xrMUv9ZVKY0

Page 210: High Performance Embedded Systems MPSoCs

210

Co-processing

Functions

• A coprocessor may not be a general-purpose processor.

• Coprocessors cannot fetch instructions from memory, execute program flow

control instructions, do input/output operations manage memory and so on.

• The coprocessor requires the host (main) processor to fetch the coprocessor

instructions and handle all other operations aside from the coprocessor functions.

• In some architectures the coprocessor is a more general-purpose computer but

carries out only a limited range of functions under the close control of a

supervisory processor.

Taken from: https://youtu.be/xrMUv9ZVKY0

Page 211: High Performance Embedded Systems MPSoCs

211

Co-processing

Taken from: https://www.doulos.com/knowhow/arm/using_your_c_compiler_to_exploit_neon/Resources/using_your_c_compiler_to_exploit_neon.pdf

Coprocessor

Page 212: High Performance Embedded Systems MPSoCs

212

Co-processing

NEON Arm

• v7-A architecture, ARM has introduced a powerful SIMD implementation called

NEON™.

• NEON is a coprocessor which comes with its own instruction set for vector

operations.

• Most vector operations carry out the same operation on all elements of their

operand vector(s) in parallel.

• Using your C compiler to exploit NEON™ Advanced SIMD.

Taken from: https://youtu.be/xrMUv9ZVKY0

Page 213: High Performance Embedded Systems MPSoCs

213

Co-processing

NEON Arm

• The goal of NEON is to provide a powerful, yet comparatively easy to program

SIMD instruction set that covers integer data types of up to 64-bit width as well

as single precision floating point (32 bit).

• Instead it shares its sixteen 128-bit registers with the vector floating point unit.

• Executed on the same processor core, NEON performance is influenced by

context switching overhead, non-deterministic memory access latency

(cache/MMU access) and interrupt handling.

Taken from: https://youtu.be/xrMUv9ZVKY0

Page 214: High Performance Embedded Systems MPSoCs

214

Co-processing

NEON Arm

Taken from: https://youtu.be/xrMUv9ZVKY0

Page 215: High Performance Embedded Systems MPSoCs

215

Co-processing

NEON Arm

Taken from: https://youtu.be/xrMUv9ZVKY0

Page 216: High Performance Embedded Systems MPSoCs

216

Co-processing

NEON Arm

Taken from: https://youtu.be/xrMUv9ZVKY0

Page 217: High Performance Embedded Systems MPSoCs

217

Co-processing

NEON Arm

Taken from: https://youtu.be/xrMUv9ZVKY0

Page 218: High Performance Embedded Systems MPSoCs

218

Co-processing

NEON Arm

Taken from: https://youtu.be/xrMUv9ZVKY0

Page 219: High Performance Embedded Systems MPSoCs

219

Co-processing

NEON Arm

Taken from: https://youtu.be/xrMUv9ZVKY0

Page 220: High Performance Embedded Systems MPSoCs

220

Co-processing

DSP’s

Taken from: Introduccion a los Sistemas Empotrados Lectures Notes

Page 221: High Performance Embedded Systems MPSoCs

221

Co-processing

DSP’s

Taken from: M. Aguilar SoC Lectures Notes

Page 222: High Performance Embedded Systems MPSoCs

222

Co-processing

DSP’s

Taken from: M. Aguilar SoC Lectures Notes

Page 223: High Performance Embedded Systems MPSoCs

223

Co-processing

GPU

Taken from: https://www.anandtech.com/show/14101/nvidia-announces-jetson-nano

Page 224: High Performance Embedded Systems MPSoCs

224

Co-processing

GPU

Taken from: https://www.anandtech.com/show/14101/nvidia-announces-jetson-nano

Page 225: High Performance Embedded Systems MPSoCs

225

Co-processing

Flight controller UAV

Taken from: https://cdn.sparkfun.com/assets/d/d/9/9/3/Pixhawk4-DataSheet.pdf

Page 226: High Performance Embedded Systems MPSoCs

226

Co-processing

Flight controller UAV

Taken from: https://cdn.sparkfun.com/assets/d/d/9/9/3/Pixhawk4-DataSheet.pdf

Page 227: High Performance Embedded Systems MPSoCs

227

References

[1] Lectures Notes, Tecnologico de Costa Rica, Course SoC.

[2] W. Wolf. High-Performance Embedded Computing: Architectures, Applications

and Methodologies. Elsevier, United States of America, 2007.

[3] E. Ashford and S. Arunkumar Introduction to Embedded Systems, 2017

Lectures notes and materials are available in TEC-Digital and web portal

www.ie.tec.ac.cr/sarriola/HPEC

www.ie.tec.ac.cr/joaraya

Page 228: High Performance Embedded Systems MPSoCs

228