Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Lecture04:ISAPrinciplesSupplements
CSE564ComputerArchitectureSummer2017
DepartmentofComputerScienceandEngineeringYonghongYan
[email protected]/~yan
1
Contents
1. Introduc@on2. ClassifyingInstruc@onSetArchitectures3. MemoryAddressing4. TypeandSizeofOperands5. Opera@onsintheInstruc@onSet6. Instruc@onsforControlFlow7. EncodinganInstruc@onSet8. CrosscuMngIssues:TheRoleofCompilers9. RISC-VISA
• Supplements2
Lecture03Supplements
• MIPSISA• RISCvsCISC• CompilercompilaGonstages• ISAHistorical– AppendixL• ComparisonofISA– AppendixK
3
PuMngitalltogether:theMIPSarchitecture(Asimple64-bitload-storearchitecture)
• Usegeneral-purposeregisterswithaload-storearchitecture• Supporttheseaddressingmodes:displacement(withaddressoffsetof12-16bits),immediate(size8-16bits),andregisterindirect.• Supportthesedatasizesandtypes:8-,16-,and64-integersand64-bitIEEE754floaGng-pointnumbers.
4
PuMngitalltogether:theMIPSarchitecture(Asimple64-bitload-storearchitecture)
• SupportthesesimpleinstrucGons:load,store,add,subtract,moveregister-register,andshi\.
• Compareequal,comparenotequal,compareless,branch,jump,call,andreturn.
• UsefixedinstrucGonencodingifinterestedinperformance,andusevariableinstrucGonencodingifinterestedincodesize.
5
MIPSemphasized
• Asimpleload-storeinstrucGonset• Designforpipeliningefficiency• Efficiencyasacompilertarget.
6
Instruc@onlayoutforMIPS
7
Theloadandstoreinstruc@onsinMIPS
8
Examplesofarithme@c/logicalinstruc@ons
9
Typicalcontrolflowinstruc@onsinMIPS
10
Subsetoftheinstruc@onsinMIPS64
11
MIPSdynamicinstruc@onmixforfiveSPECint2000programs
12
MIPSdynamicinstruc@onmixforfiveSPECfp2000programs
13
Graphicaldisplayofinstruc@ons
14
Ra@oofexecu@on@meandcodesizeforcompiledcodeversushandwri[encode
15
16
Summary:Instruc@onSetDesign(MIPS)
• Usegeneralpurposeregisterswithaload-storearchitecture:YES• Provideatleast16generalpurposeregistersplusseparatefloaGng-pointregisters:31GPR&32FPR• Supportbasicaddressingmodes:displacement(withanaddressoffsetsizeof12to16bits),immediate(size8to16bits),andregisterdeferred;:YES:16bitsforimmediate,displacement(disp=0=>registerdeferred)• AlladdressingmodesapplytoalldatatransferinstrucGons:YES• UsefixedinstrucGonencodingifinterestedinperformanceandusevariableinstrucGonencodingifinterestedincodesize:Fixed• Supportthesedatasizesandtypes:8-bit,16-bit,32-bitintegersand32-bitand64-bitIEEE754floaGngpointnumbers:YES• SupportthesesimpleinstrucGons,sincetheywilldominatethenumberofinstrucGonsexecuted:load,store,add,subtract,moveregister-register,and,shi\,compareequal,comparenotequal,branch(withaPC-relaGveaddressatleast8-bitslong),jump,call,andreturn:YES• AimforaminimalistinstrucGonset:YES
RISCVsCISC
• CISC(complexinstrucGonsetcomputer)– VAX,IntelX86,IBM360/370,etc.• RISC(reducedinstrucGonsetcomputer)– MIPS,DECAlpha,SUNSparc,IBM801
17
RISCvs.CISC
• CharacterisGcsofISAs
18
CISC RISC Variable length instruction
Single word instruction
Variable format Fixed-field decoding
Memory operands Load/store architecture
Complex operations Simple operations
19
RISCvs.CISCInstruc@onSetDesign
• Thehistoricalbackground:– Infirst25years(1945-70)performancecamefrombothtechnologyanddesign.– Designconstraints:• smallandslowmemories:compactprogramsarefast.• smallno.ofregisters:memoryoperands.• anemptstobridgethesemanGcgap:modelhighlevellanguagefeaturesininstrucGons.
• noneedforportability:samevendorapplicaGon,OSandhardware.• backwardcompaGbility:everynewISAmustcarrythegoodandbadofallpastones.
– Result:powerfulandcomplexinstrucGonsthatarerarelyused.– ICtechnologyandmicroprocessorsin1970s:lowercosts,lowpower
consumpGon,higherclockrates,cheaperandlargermemories.
20
RISCvs.CISCInstruc@onSetDesign
• EmergenceofRISC– VerylargescaleintegraGon(processoronachip):siliconreal-estateata
premium.Micro-storeoccupiesabout70%ofchiparea:replacemicro-storewithregisters==>load/storeISA.
– IncreaseddifferencebetweenCPUandmemoryspeeds.– ComplexinstrucGonswerenotusedbynewcompilers.– So\warechanges:• reducedrelianceonassemblyprogramming,newISAcanbeintroduced.• standardizedvendorindependentOS(Unix)becameverypopularinsomemarketsegments(academiaandresearch)–needforportability
– EarlyRISCprojects:IBM801(America),BerkeleySPUR,RISCIandRISCIIandStanfordMIPS.
Complex vs. Simple Instructions
• Complex instruction: An instruction does a lot of work, e.g. many operations – Insert in a doubly linked list – Compute FFT – String copy
• Simple instruction: An instruction does small amount of work, it is a primitive using which complex operations can be built – Add – XOR – Multiply
21
Complex vs. Simple Instructions
• Advantages of Complex instructions + Denser encoding à smaller code size à better memory
utilization, saves off-chip bandwidth, better cache hit rate (better packing of instructions)
+ Simpler compiler: no need to optimize small instructions as much
• Disadvantages of Complex Instructions - Larger chunks of work à compiler has less opportunity to
optimize (limited in fine-grained optimizations it can do) - More complex hardware à translation from a high level to
control signals and optimization needs to be done by hardware
22
ISA-level Tradeoffs: Semantic Gap
• Where to place the ISA? Semantic gap – Closer to high-level language (HLL) à Small semantic
gap, complex instructions – Closer to hardware control signals? à Large semantic
gap, simple instructions
• RISC vs. CISC machines – RISC: Reduced instruction set computer – CISC: Complex instruction set computer • FFT, QUICKSORT, POLY, FP instructions? • VAX INDEX instruction (array access with bounds
checking)
23
ISA-level Tradeoffs: Semantic Gap
• Some tradeoffs (for you to think about)
• Simple compiler, complex hardware vs. complex compiler, simple hardware – Caveat: Translation (indirection) can change the tradeoff!
• Burden of backward compatibility
• Performance? – Optimization opportunity: Example of VAX INDEX
instruction: who (compiler vs. hardware) puts more effort into optimization?
– Instruction size, code size 24
X86: Small Semantic Gap: String Operations
• An instruction operates on a string – Move one string of arbitrary length to another location – Compare two strings
• Enabled by the ability to specify repeated execution of an instruction (in the ISA) – Using a “prefix” called REP prefix
• Example: REP MOVS instruction – Only two bytes: REP prefix byte and MOVS opcode byte (F2
A4) – Implicit source and destination registers pointing to the two
strings (ESI, EDI) – Implicit count register (ECX) specifies how long the string is
25
X86: Small Semantic Gap: String Operations
26
REP MOVS (DEST SRC)
How many instructions does this take in MIPS?
Small Semantic Gap Examples in VAX
• FIND FIRST – Find the first set bit in a bit field – Helps OS resource allocation operations
• SAVE CONTEXT, LOAD CONTEXT – Special context switching instructions
• INSQUEUE, REMQUEUE – Operations on doubly linked list
• INDEX – Array access with bounds checking
• STRING Operations – Compare strings, find substrings, …
• Cyclic Redundancy Check Instruction • EDITPC – Implements editing functions to display fixed format output
• Digital Equipment Corp., “VAX11 780 Architecture Handbook,” 1977-78.
27
Small versus Large Semantic Gap
• CISC vs. RISC – Complex instruction set computer à complex instructions • Initially motivated by “not good enough” code generation
– Reduced instruction set computer à simple instructions • John Cocke, mid 1970s, IBM 801 – Goal: enable better compiler control and optimization
• RISC motivated by – Memory stalls (no work done in a complex instruction when
there is a memory stall?) • When is this correct?
– Simplifying the hardware à lower cost, higher frequency – Enabling the compiler to optimize the code better • Find fine-grained parallelism to reduce stalls
28
How High or Low Can You Go?
• Very large semantic gap – Each instruction specifies the complete set of control
signals in the machine – Compiler generates control signals – Open microcode (John Cocke, circa 1970s) • Gave way to optimizing compilers
• Very small semantic gap – ISA is (almost) the same as high-level language – Java machines, LISP machines, object-oriented machines,
capability-based machines
29
A Note on ISA Evolution
• ISAs have evolved to reflect/satisfy the concerns of the day
• Examples: – Limited on-chip and off-chip memory size – Limited compiler optimization technology – Limited memory bandwidth – Need for specialization in important applications (e.g., MMX)
• Use of translation (in HW and SW) enabled underlying implementations to be similar, regardless of the ISA – Concept of dynamic/static interface – Contrast it with hardware/software interface
30
Effect of Translation
• One can translate from one ISA to another ISA to change the semantic gap tradeoffs
• Examples – Intel’s and AMD’s x86 implementations translate x86
instructions into programmer-invisible microoperations (simple instructions) in hardware
– Transmeta’s x86 implementations translated x86 instructions into “secret” VLIW instructions in software (code morphing software)
• Think about the tradeoffs
31
Compila@onProcessinC
• CompilaGonprocess:gcchello.c–ohello– ConstrucGnganexecutableimageforanapplicaGon– FOURstages– Command:
gcc<opGons><source_file.c>
• CompilerTool– gcc(GNUCompiler)• mangcc(onLinuxm/c)
– icc(IntelCcompiler)
4StagesofCompila@onProcess
Preprocessinggcc–Ehello.c–ohello.ihello.càhello.i
Compila@on(a]erpreprocessing)gcc–Shello.i–ohello.s
Assembling(a]ercompila@on)gcc–chello.s–ohello.o
Linkingobjectfilesgcchello.o–ohello
OutputàExecutable(a.out)Runà./hello(Loader)
4StagesofCompila@onProcess
1. Preprocessing(Thosewith#…)– ExpansionofHeaderfiles(#include…)– SubsGtutemacrosandinlinefuncGons(#define…)
2. CompilaGon– Generatesassemblylanguage– VerificaGonoffuncGonsusageusingprototypes– Headerfiles:PrototypesdeclaraGon
3. Assembling– Generatesre-locatableobjectfile(containsm/cinstrucGons)– nmapp.o
0000000000000000Tmain Uputs
– nmorobjdumptoolusedtoviewobjectfiles
4StagesofCompila@onProcess(contd..)
4. Linking– Generatesexecutablefile(nmtoolusedtoviewexefile)– Bindsappropriatelibraries• StaGcLinking• DynamicLinking(default)
• LoadingandExecuGon(ofanexecutablefile)– Evaluatesizeofcodeanddatasegment– Allocatesaddressspaceintheusermodeandtransfersthem
intomemory– Loaddependentlibrariesneededbyprogramandlinksthem– InvokesProcessManageràProgramregistraGon
CompilingaCProgram
• gcc<op'ons>program_name.c
• OpGons:------------Wall:Showsallwarnings-ooutput_file_name:Bydefaulta.outexecutablefileiscreatedwhenwecompileourprogramwithgcc.Instead,wecanspecifytheoutputfilenameusing"-o"opGon.-g:IncludedebugginginformaGoninthebinary.
• mangcc
Fourstagesintoone
LinkingMul@plefilestomakeexecutablefile
• Twoprograms,prog1.candprog2.cforonesingletask– TomakesingleexecutablefileusingfollowinginstrucGons
First,compilethesetwofileswithopGon"-c"gcc-cprog1.cgcc-cprog2.c-c:Tellsgcctocompileandassemblethecode,butnotlink.Wegettwofilesasoutput,prog1.oandprog2.oThen,wecanlinktheseobjectfilesintosingleexecutablefileusingbelowinstrucGon.gcc-oprogprog1.oprog2.oNow,theoutputisprogexecutablefile.Wecanrunourprogramusing./prog
Linkingwithotherlibraries
• Normally,compilerwillread/linklibrariesfrom/usr/libdirectorytoourprogramduringcompilaGonprocess.– Libraryareprecompiledobjectfiles
• TolinkourprogramswithlibrarieslikepthreadsandrealGmelibraries(rtlibrary).– gcc<opGons>program_name.c-lpthread-lrt
-lpthread:Linkwithpthreadlibraryàlibpthread.sofile-lrt:Linkwithrtlibraryàlibrt.sofileOpGonhereis"-l<library>"AnotheropGon"-L<dir>"usedtotellgcccompilersearchforlibraryfileingiven<dir>directory.
sourcefile 1
sourcefile 2
sourcefile N
objectfile 1
objectfile 2
objectfile N
libraryobjectfile 1
libraryobjectfile M
loadfile
usually performed by a compiler, usually in one uninterrupted sequence
linking(relocation +
linking)compilation
Compila@on,Linking,Execu@onofC/C++Programs
hnp://www.tenouk.com/ModuleW.html