Data Representation and Architecture Modelling

Data Representation and Architecture Modelling

Data Representation and Architecture Modelling RevisionBinary systemConversionConvert decimal to binaryConvert binary to decimal and hexadecimalInteger representationUnsigned notationSigned notationExcess notationTows complementAdvantages of using Twos complement

Floating point representationWhat decimal floating point number is represented by the following 32 bits (single precision format)? Show your workings.1 100 0011 1 000 1010 0000 0000 0000 0000What is the range of negative numbers in this representationDefine negative overflow and underflow in this representation.SolutionMethodthe sign-bit is one, negative number biased exponent = 10000111 = 128 + 4 +2 + 1 = 135 The real exponent = 135-127= 8 the normalized mantissa = 000 1010 0000 0000 0000 0000. the real mantissa = 1.000101 the final value represented = -(1.0001012) x 28 = 1000101002 = -(256+16+ 4)= -276Negative range: -(2-2-23)x 2127 to- 2-127Negative overflow and underlflow Negative over: value less than -(2-2-23)x 2127. Negative underflow: 2-127 < value < 0.

CPU CPU registers:PC, IR, AC,MAR, MBRSystem busData bus, Address bus, and control busPipeliningRole of pipeliningPipelining hazards (control hazards, data hazards, and structural hazards)What is the disadvantage of using a very long stage pipeline?

Exercise Suppose you have designed a processor implementation whose five pipeline stages take the following amounts of time: IF(instruction fetch)=20ns, ID (instruction decode)=10ns, EX (execution)=20ns, MEM (memory operation)=35ns and WB (write back)=10ns.(a) What is the minimum clock period for which your processor functions properly?(b) What should be redesigned first to improve this processors performance?(c) Assume this processor is redesigned with 50 pipeline stages. Is it true to say that the new processor is 10 times faster than the previous design with 5 pipeline stages?solution(a) The minimum clock period is the time of the longest stage: stage MEM takes 35ns. (b) The MEM should be redesigned to reduce the clock cycle.(c) Probably not. Longer pipelines can be faster due to higher clock rates, unlikely that the clock rate is 10x faster due to uneven pipeline stages and register overheadsFurthermore, longer pipelines tend to make data and control hazards require longer stalls.higher clock-rate processor is likely to be more power-hungry proportional to the increase in clock-speed7Question 2An instruction requires four stages to execute: stage 1 (instruction fetch) requires 30 ns,stage 2 (instruction decode) = 9 ns, stage 3 (instruction execute) = 20 ns and stage 4 (store results) = 10 ns. An instruction must proceed through the stages in sequence. What is the minimum asynchronous time for any single instruction to complete?We want to set this up as a pipelined operation. How many stages should we have and at what rate should we clock the pipeline?

HintsThe minimum time it takes to execute all the 4 stages of an instruction.

We have 4 natural stages given and no information on how we might be able to further subdivide them, so we use 4 stages in our pipeline.Clock rate?use the longest stageOr use a time that closely matches the shortest stage, but integrally divisible into the other stages. DISCUSS EACH CASE.Question 3The pipeline for these instructions runs with a 100 MHz clock with the following stages: instruction fetch = 2 clocks, instruction decode = 1 clock, fetch operands = 1 clock,execute = 2 clocks, and store result = 1 clock.HINTS FOR QUESTION 3THE longest stage takes two cycle. Hence we need to execute one instruction per 2 cycles. What is the rate then?The Operand Fetch unit must wait until the prior instruction stores its result. before it can retrieve one of its operands (e.g. Op Fetch for 2 must wait until Op Store for 1 completes). Asa result, things begin backing up in the pipeline, and we produce one instruction output only every 4 cycles.

No dependencies

Execute instruction every 2 cycles. Cock rate? dependency

From the table we still begin fetching instructions every two cycles.However the operand fetch for 2 instruction must wait until Op Store for instruction 1 completes. (wait for another 2 cycles). Hence, the rate????13Memories CPU registersCache memoryMain memory (electronic memory)Magnetic memory (hard drive)Optical memory Magnetic tape

Cache memoryCache memory enhances computer performance using:Temporal locality principleSpatial locality principleCache mapping Associative Mapped CacheDirect-Mapped CacheSet-Associative Mapped CacheWhy is cache memory needed?CPU slowed down by the main memory

When a program references a memory location, it is likely to reference that same memory location again soon.

A memory location that is near a recently referenced location is more likely to be referenced than a memory location that is far away.

Cache memoryResides between the CPU and the main memoryOperates at a speed near to that of the CPUData is exchanged between CPU and main memory through the cache memoryCache memory use locality principles to enhances computer performance.Temporal locality principleSpatial locality principleTemporal locality principleWhen a program references a memory location, it is likely to reference that same memory location again soon.

Cache memory keeps records of data recently being used. Spatial locality principleA memory location that is near a recently referenced location is more likely to be referenced than a memory location that is far away.

Cache memory copies not only the recently referenced memory locations but also its nearby.Cache mapping Commonly used methods:Associative Mapped CacheDirect-Mapped CacheSet-Associative Mapped Cache

Associative Mapped CacheAny main memory blocks can be mapped into each cache slot.To keep track of which of the 227 possible blocks is in each slot, a 27-bit tag field is added to each slot.

Associative Mapped CacheValid bit is needed to indicate whether or not the slot holds a line that belongs to the program being executed.Dirty bit keeps track of whether or not a line has been modified while it is in the cache.

Associative Mapped CacheThe mapping from main memory blocks to cache slots is performed by partitioning an address into fields.For each slot, if the valid bit is 1, then the tag field of the referenced address is compared with the tag field of the slot.

Associative Mapped CacheHow an access to the memory location (A035F014)16 is mapped to the cache.

If the addressed word is in the cache, it will be found in word (14)16 of a slot that has a tag of (501AF80)16 , which is made up of the 27 most significant bits of the address.

Associative Mapped CacheAdvantagesAny main memory block can be placed into any cache slot.Regardless of how irregular the data and program references are, if a slot is available for the block, it can be stored in the cache.

Associative Mapped CacheDisadvantagesConsiderable hardware overhead needed for cache bookkeeping.There must be a mechanism for searching the tag memory in parallel.

Direct-Mapped CacheEach cache slot corresponds to an explicit set of main memory.In our example we have 227 memory blocks and 214 cache slots.A total of 227 / 214 = 213 main memory blocks can be mapped onto each cache slot.

Direct-Mapped CacheThe 32-bit main memory address is partitioned into a 13-bit tag field, followed by a 14-bit slot field, followed by a five-bit word field.

Direct-Mapped CacheWhen a reference is made to the main memory address, the slot field identifies in which of the 214 slots the block will be found.If the valid bit is 1, then the tag field of the referenced address is compared with the tag field of the slot.

Direct-Mapped CacheHow an access to memory location (A035F014)16 is mapped to the cache. If the addressed word is in the cache, it will be found in word (14)16 of slot (2F80)16 which will have a tag of (1406)16.

Direct-Mapped CacheAdvantagesSimple and inexpensiveThe tag memory is much smaller than in associative mapped cache.No need for an associative search, since the slot field is used to direct the comparison to a single field.Direct-Mapped CacheDisadvantagesFixed location for a given memory block.If a program accesses 2 blocks that map to the same line repeatedly, caches misses are very high.Set-Associative Mapped CacheCombines the simplicity of direct mapping with the flexibility of associative mappingFor this example, two slots make up a set. Since there are 214 slots in the cache, there are 214/2 =213 sets.

Set-Associative Mapped CacheWhen an address is mapped to a set, the direct mapping scheme is used, and then associative mapping is used within a set.

Set-Associative Mapped CacheThe format for an address has 13 bits in the set field, which identifies the set in which the addressed word will be found. Five bits are used for the word field and 14-bit tag field.

Typical exam questionExplain the difference between direct mapped cache and associative mapped cache.Explain how cache memory uses temporal and spatial locality principles to enhance computers performance.Web languages (html,xml, xhtml)Difference between these languagesDisadvantages of using htmlHow does XHTML solve these problemsAdvantages of CSSDifference between HTML selector, CLASS selectors and ID selectors

htlm selector:h{bgcolor:green;color: red;font-weight: bold;}Class selector:.section { color: red; font-weight: bold; }ID selector:#section{ color: red; font-weight: bold;}An ID selector applies styles to an element in the same way as a class. The main difference between an ID selector and a class is that an ID can be used only once on each page, whereas a class can be used many times.

Computer networksNetwork classes and default maskTCP/IP model (internet model)The role of each layerExample of protocols at each layer and there role.TCP vs UDP How is error and flow control achieved? Layer responsible for this?SubnettingRole of subnettingSubnet addressHost addressBroadcast addressRange of addresses in a subnetExerciseGiven a host configuration with an IP address 192.158.15.33 and a subnet mask 255.255.255.248:What is the subnet address?What is the host address?What is the broadcast address?What is the number of possible hosts and range of host addresses in this subnet?Solution192.168.10.32 0.0.0.1 192.168.10.39The number if bits for the host is 3 and therefore the number if hosts allowed in in this subnet is 23-2=6 The range of address is 192.168.10.33 - 192.168.10.38. Exam Duration 1:30 hours3 questions: 30 minutes each Time : MayPreparation:Past exam papersRevise all the questions given in two assignmentsConsult revision slidesConcentrate on the preparation listAttempt the Mock exam on my websiteNext week mock exam

Fin Good Luck

Documents

Data Representation and Architecture Modelling