View
134
Download
6
Category
Tags:
Preview:
DESCRIPTION
Slides presented at the FlexTiles Workshop at FPL'2014. Presentation #1: FlexTiles overview FlexTiles is a heterogeneous many-core platform reconfigurable at run-time developed within an FP7 project.
Citation preview
www.flextiles.eu
FlexTiles
The FP7 - FlexTiles project
overview www.flextiles.eu
Philippe MILLET, PhD, FPL 2014
philippe.millet@thalesgroup.com
Project coordinator: THALES
Funding budget: 3,670,000€
Starting date: 15/10/2011
Duration: 36 months (42)
2 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Some challenging applications within THALES
Cognitive radio
Source:
the India economy review
Adapt continuously the
frequency and protocol to
available ones
Avoid jammers or
obfuscated
communications
3 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Some challenging applications within THALES
Smart camera
Highway: follow cars, detect traffic jam or accidents
Airport : find and follow people, detect abandoned
luggage, strange or dangerous behaviours.
Dynamicity depends on the number of detections
Cameras have local
processing capability to
send data only when
something "interesting"
has been detected.
4 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Some challenging applications within THALES
UAV
Autonomous, take decisions
without or with low control.
React to the environment.
Self-repair.
Adapt the mission to what the UAV finds.
Activate software parts to match the actual situation.
The software is dynamically activated and mapped to the
available resources.
5 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Real-time embedded products at THALES
Embedded Real-Time Market
low power consumption
target in a range from 10W - 40W
some products are designed with <1W
General Purpose Processors are too hungry
low volumes (less than 1000 pieces/year)
designing dedicated ASIC is not an option
long life-time (~20 years)
Long Life No Maintenance
hardware upgrade or retrofit must cost as little as possible
programmable device is preferred
Adapt to environment dynamicity, flexibility &
dependability
6 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Real-time embedded products at THALES
Embedded Real-Time Market
low power consumption
target in a range from 10W - 40W
some products are designed with <1W
General Purpose Processors are too hungry
low volumes (less than 1000 pieces/year)
designing dedicated ASIC is not an option
long life-time (~20 years)
Long Life No Maintenance
hardware upgrade or retrofit must cost as little as possible
programmable device is preferred
Adapt to environment dynamicity, flexibility &
dependability
We need more than static dataflow.
We need adaptability
in the software
as well as in the hardware
7 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Homogeneous Manycore: a solution?
One way to get high performance / watt is parallelism.
• Instead of 1 big core with high computation power but also high
power consumption,
• have more "smaller" cores in parallel
8 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Parallelisation is not enough? Did we miss something?
Homogeneous?
9 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Challenge
PROCESSORS (GPPs)
FPGA
DSP
available architectures: already heterogeneous systems
With ManyCores and integration, the
architectures are changing...
10 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Challenge
PROCESSORS
FPGA
DSP
Source:
http://www.gamearenaph.com
Source:
http://www.vision.caltech.edu
APPLICATIONS
computation demanding applications
Usual way:
put as many resources as
necessary to execute the
application in any situation.
=> hardware must allow the
hardest case to execute
Dynamicity:
=> the hardest case is
unknown
=> too costly, too heavy, too
high power consumption.
11 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Challenge
PROCESSORS
FPGA
DSP
Source:
http://www.gamearenaph.com
Source:
http://www.vision.caltech.edu
APPLICATIONS
Source:
http://www.funtoosh.com
how can we fit big applications in the hardware?
How to efficiently
map complex applications
to heterogeneous many-core architectures
with limited budget
(power, performance, …)
???
LIMITED BUDGET
Source: http://www.lnci.org.au
12 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Customized/Customizable chips vs. FPGA
Xilinx – ZYNQ : FPGA with
a dual ARM A9 core
MPCore with reconfiguration
capabilities
Cluster Cluster Cluster
Cluster Cluster Cluster
Cluster Cluster Cluster
Fabric
Controller
core
Fabric
GOOD Parallelization
POOR Customization
POOR Parallelization
GOOD Customization
ST – P2012 aka STHORM
(Heterogeneous manycore fabric)
Once done: Dedicated to a
specific domain of applications
Affordable only for large series
of products.
Main issue: Domain dedication
idem with MPSoCs (TI-OMAPs)
13 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
FlexTiles Proposes
A 3D stacked chip based on:
A manycore layer
GPPs
DSPs
A FPGA layer
A 3D-NoC
GOOD Parallelization
GOOD Customization
Customization at low price
Opportunity: self adaptive capabilities
Future application needs
14 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Consortium and questions
Partners & Third
Party
Country Main scientific and
technical contributions
THALES France Infrastructure and
applications
KIT Germany Virtualisation layer
TUE Netherlands Kernel ; NoC
CSEM Switzerland DSP
CEA France NoC ; 3D stacking
UR1 France Reconfigurable technology
SUNDANCE United
Kingdom
FPGA Demonstrator
ACE Netherlands Parallelisation and
compilation Tools
RUB Germany Integration
FPGA scheduling
9 partners in 5 countries
15 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
3D IC: Some Definitions
3D-IC chip stacking:
Adding a new vertical interconnect chain in CMOS :
Through Silicon Via (TSV) + microbump
Top Die
Silicon Interposer
Cu Pillars
TSV
Substrate
Top Die
Silicon Interposer
Cu Pillars
TSV
Substrate
Top Die
Silicon Interposer
Cu Pillars
TSV
Substrate
(Backside micro-
bump)
(bump) BEOL: Back-End-Of-Line
FEOL: Front-End-Of-Line
16 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
TSV Technologies
Via First TSV (Polysilicon filled) AR 20,
5x100µm Processed before CMOS front-end steps
Pitch: ~10µm
Density: 10000 TSV/mm²
SOI substrate, High-voltage
AR 7,
2 x 15µm
Via Middle TSV (Copper filled)
AR 10,
10x100µm
Processed after CMOS front-end steps
Pitch: 40µm to 50µm
Density: 500 TSV/mm²
Best flexibility in layout and design
Higher density of I/Os
Via Last TSV (Copper liner)
AR 1,
80x80µm AR 2,
60x120µm AR 3,
40x120µm
Processed after metallization
Pitch: ~100µm
Density: 100 TSV/mm²
Minimal impact on circuit layout
3 families of TSV’s with different characteristics
17 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
3D-IC: Stacking Options
Face-to-Face:
Inter-die connection:
TSV are not required
Package connection:
TSV are required to propagate
the I/Os
Face-to-Back:
Inter-die connection:
TSV are required
Package connection:
Flip-chip
BULK
METAL LAYERS
a)
b)
PACKAGE
PACKAGE
BULK
METAL LAYERS
BULK
METAL LAYERS
TSVs
TSVs
Micro-bumps
Bumps
Bumps
BULK
METAL LAYERSMicro-bumps
18 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
TSV Cost Evaluation
I/O Pads µbuffers
µbuffers
I/O Pads
µbuffers
µbuffers
F2B
F2F
Face-to-Face Face-to-Back
Nb of TSV Area (%) Nb of TSV Area (%)
TOTAL POWER TSVs 1847 (~×2) 1.34% 792 0.57%
TOTAL I/O TSVs 305 0.22% 3603 2.62%
TOTAL TSVs 2152 1.56% 4395 (~×2) 3.19%
Package I/O
3D I/O power
3D I/O
Bottom die power
Top die power
Package I/O power
Face-to-Face stacking option:
• High number of power TSV
• Half total TSV compared to Face-to-Back alternative
19 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
FlexTiles 3D-IC Floorplanning (1/2)
Floorplan validation using Spyglass® Physical (Atrenta)
Early prototyping tool for 3D design
exploration
Comparison of Face-to-Face and Face-to-Back
3D-stacking options
Definition of microbump/TSV areas
400 connections to be placed
per NoC link
40µm pitch
6µm TSV diameter
4µm keep-out-zone
20 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
FlexTiles 3D-IC Floorplanning (1/2)
Floorplan validation using Spyglass® Physical (Atrenta)
Early prototyping tool for 3D design
exploration
Comparison of Face-to-Face and Face-to-Back
3D-stacking options
Definition of microbump/TSV areas
400 connections to be placed
per NoC link
40µm pitch
6µm TSV diameter
4µm keep-out-zone
15 GPPs, 6 DSPs...
21 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
FlexTiles 3D-IC Floorplanning (2/2)
Detailed floorplan snapshots
Manycore layer Reconfigurable layer
5500µm
4900µ
m
22 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Self adaptive?
Adapt the architecture to application requests at "real-time"
Improve yield and extend life-time of sub-micron technologies
Fault tolerance
Increase energy efficiency
give the right task to the best available processor
finalize the mapping at runtime
Temperature management re-mapping
Triplication, voting fault / error detection
Self-repair re-mapping taking dead cores into account
How to program it?
23 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
FlexTiles … a complete platform
Virtualisation
layer
relocatable binary code
Parallelisation, partioning
Application
Hardware Nodes
Compilation Synthesis, P&R
relocatable bitstream
Hardware Abstraction Layer
Hardware Abstraction Layer API
Operating Library API
Kernel Resource
Monitoring &
Allocation
DIAGNOSIS
O = F(L)
ACTION
SYSTEM
toolchain
operating
library
heterogenous
manycore
MONITORING
24 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Application
(C code)
C to SpearDE
representation
Conversion
(Thales)
Data parallelisation Mapping
(Thales)
Graphic input
(manual)
+
C kernels
Streaming optimisation
(ACE)
Compilation & Link
(ACE)
architecture
representation
Master Cores
GPP
Slave cores
eFPGA, DSP
Library of IPs
Tool flow and MoC
Binaries
Acc compiler or C2VHDL tools
(CSEM / UR1 / RUB)
Masters control slaves
Architecture
configuration
GUI (KIT)
25 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
: Clusters group managed
by a state management
: Cluster group input/output
Act Act
Act Act
Act
Act Act Act
state 1
state 2
state 3
states management
cluster group event
Model of Computation & Model of Programmation
Optimisation and parallelisation tools work on static applications
find static clusters inside the applications based on SDF/CSDF MoC
Bring Dynamicity with higher hierarchical level
: actor ~ task or tasks
: static cluster
Act
: Cluster input/output
actor: consumes and produces
token of data with predefined
and static rules
SDF, CSDF MoC
26 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Act
sensor
data
states management
event
Act
state 1
nop
state 1
states management
states management
Act Act
Act
state 2
Act
Act
states management
event
Act Act
Act
state 1
Act
Act
states management
Act Act
Act
state 1
Act
Act s
c
a
t
t
e
r
g
a
t
h
e
r
sensor
data
cluster group 3
cluster group 4
cluster group 5
cluster group 2
cluster group 1 event
event
event
Model of Programmation
: Actor
: static cluster
Act
: Clusters group managed
by one state management
: Cluster group input/output
: Cluster input/output
27 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Dynamicity at cluster group level
Act
sensor
data
states management
event
Act
state 1
nop
state 1
states management
states management
Act Act
Act
state 2
Act
Act
states management
event
Act Act
Act
state 1
Act
Act
states management
Act Act
Act
state 1
Act
Act s
c
a
t
t
e
r
Act Act
Act
state 1.1
Act
Act
Act Act
Act
state 1.2
Act
Act
g
a
t
h
e
r
sensor
data
cluster group 3
cluster group 4
cluster group 5
cluster group 2
cluster group 1 event
event
event
: Actor
: static cluster
Act
: Clusters group managed
by one state management
: Cluster group input/output
: Cluster input/output
28 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Start a new part of the application
Act
sensor
data
states management
event
Act
state 1
states management
states management
Act Act
Act
state 2
Act
Act
states management
event
Act Act
Act
state 1
Act
Act
states management
Act Act
Act
state 1
Act
Act s
c
a
t
t
e
r
Act Act
Act
state 1.1
Act
Act
Act Act
Act
state 1.2
Act
Act
g
a
t
h
e
r
sensor
data
cluster group 3
cluster group 4
cluster group 5
cluster group 2
cluster group 1 event
event
event
Act Act
Act
state 2
Act
: Actor
: static cluster
Act
: Clusters group managed
by one state management
: Cluster group input/output
: Cluster input/output
29 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Modification of the behaviour
sensor
data
states management
event
states management
states management
Act Act
Act
state 2
Act
Act
states management
event
Act Act
Act
state 1
Act
Act
states management
Act Act
Act
state 1
Act
Act s
c
a
t
t
e
r
Act Act
Act
state 1.1
Act
Act
Act Act
Act
state 1.2
Act
Act
g
a
t
h
e
r
sensor
data
cluster group 3
cluster group 4
cluster group 5
cluster group 2
cluster group 1 event
event
event
Act Act
Act
state 2
Act
Act Act
Act
state 2
: Actor
: static cluster
Act
: Clusters group managed
by one state management
: Cluster group input/output
: Cluster input/output
30 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Modification of the parallelisation level
sensor
data
states management
event
states management
states management
Act Act
Act
state 2
Act
Act
states management
event
Act Act
Act
state 1
Act
Act
states management
Act Act
Act
state 1
Act
Act s
c
a
t
t
e
r
g
a
t
h
e
r
sensor
data
cluster group 3
cluster group 4
cluster group 5
cluster group 2
cluster group 1 event
event
event
Act Act
Act
state 2
Act
Act Act
Act
state 2
: Actor
: static cluster
Act
: Clusters group managed
by one state management
: Cluster group input/output
: Cluster input/output
31 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Act
sensor
data
states management
event
Act
state 1
states management
states management
Act Act
Act
state 2
Act
Act
states management
event
Act Act
Act
state 1
Act
Act
states management
Act Act
Act
state 1
Act
Act s
c
a
t
t
e
r
Act Act
Act
state 1.1
Act
Act
Act Act
Act
state 1.2
Act
Act
g
a
t
h
e
r
sensor
data
cluster group 3
cluster group 4
cluster group 5
cluster group 2
cluster group 1 event
event
event
Act Act
Act
state 2
Act
Model of Programmation
: Actor
: static cluster
Act
: Clusters group managed
by one state management
: Cluster group input/output
: Cluster input/output
32 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Flexible Tiles, Modularity and scalability: common interfaces
Homogeneous
GPP nodes
Heterogeneous
accelerators
nodes
GPP Node
AI
DSP
Node
NI
GPP Node
NI
NoC
NI NI NI
AI AI
NI
Config. Ctrl.
DDR Ctrl.
NI
GPP Node
NI
I/O
NI
Generic
Interfaces
eFPGA Domain
(Reconfigurable HW acc.)
Dedicated
Accelerator
Node
Dedicated
Accelerator
Node
Tile 1 Tile 2
33 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Flexible Tiles, Modularity and scalability: common interfaces
Homogeneous
GPP nodes
Heterogeneous
accelerators
nodes
GPP Node
AI
DSP
Node
NI
GPP Node
NI
NoC
NI NI NI
AI AI
NI
Config. Ctrl.
DDR Ctrl.
NI
GPP Node
NI
I/O
NI
Generic
Interfaces
eFPGA Domain
(Reconfigurable HW acc.)
Dedicated
Accelerator
Node
Dedicated
Accelerator
Node
Tile 2 Tile 1
34 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Tile Tile Tile
Tile Tile Tile
Tile Tile Tile
New dynamic reconfigurable technology
Homogeneous manycore
NoC
FlexTiles: a 3D stack chip
3D stacked reconfigurable layer
35 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Tile Tile Tile
Tile Tile Tile
Tile Tile Tile
New dynamic reconfigurable technology
3D stacked reconfigurable layer
Homogeneous manycore
NoC
FlexTiles: a 3D stack chip
Map Accelerated functions
36 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Tile Tile Tile
Tile Tile Tile
Tile Tile Tile
New dynamic reconfigurable technology
3D stacked reconfigurable layer
Homogeneous manycore
NoC
FlexTiles: a 3D stack chip
Duplicate
37 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Tile Tile Tile
Tile Tile Tile
Tile Tile Tile
New dynamic reconfigurable technology
3D stacked reconfigurable layer
Homogeneous manycore
NoC
FlexTiles: a 3D stack chip
Migrate
38 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Conclusion
Parallelisation: The only way to reach HPC with low power
consumption.
More efficiency only with customisation
ASIC: Only affordable for high volumes
Reconfigurable customisation is the solution:
Increase accessibility to the heterogeneous manycore technology
Offers self-adaptive capabilities
39 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Perspectives
Extended FlexTiles architecture with 3D interposer
Flexibility on computation power thanks
to manycore chiplets
Specialized chiplets for
high-constrained
application
System Architecture • Scale-out architecture • Coherent island with
remote communications
Chiplet • Small chips
• Advanced technology
node • Generic • High volume • Low cost
Interposer • Passive or active • Mature technology node • Application specific • Medium volume • Cost effective assembly • European fabs
40 / 40
The info
rmation c
onta
ined in this
docum
ent and a
ny a
ttachm
ents
are
the p
ropert
y o
f F
lexT
iles c
onsort
ium
. Y
ou a
re h
ere
by n
otified that any r
evie
w, dis
sem
ination, dis
trib
ution,
copyin
g o
r oth
erw
ise u
se o
f th
is d
ocum
ent m
ust be d
one in a
ccord
ance w
ith the C
A o
f th
e p
roje
ct (T
RT
/DJ/6
24412785.2
011).
Tem
pla
te v
ers
ion 1.0
© CEA. All rights reserved – R. LEMAIRE – AHS-2014
Follow us
Self-Adaptive Heterogeneous Many-Core Technology
Based on Flexible Tiles
www.flextiles.eu www.flextiles.biz
Recommended