1993 an Application-Specific DSP for Portable Applications

Embed Size (px)

Citation preview

  • 7/27/2019 1993 an Application-Specific DSP for Portable Applications

    1/9

    An Application Specific DSP for Portable ApplicationsYair Beery Sh ad Berger Bat-Sheva vadia

    This paper describes the features, the architecture and a few application s of thePIN ^^ DSP core. PINE s a 16-bit Digital Signal Processor designed for portab le

    telecom munic ations and consumer electronics products requiring smart pow ermanagement. It enables low-co st, low-power DSP processing with several levelsof modularity in the RAM, ROM and I/O, perm itting efficient DSP-based ASICdevelopment. PINESmodular design approach allows the same DSP core to beused for v arious applications.

    1. INTRODUCTIONIn recent years we have seen a proliferation of compact, powerful consumerelectronic devices, including CD players, minidisk players, personal digitalassistants (PDA s) and cellular telephone products. These devices push theboundaries of both performance and miniaturization. Combined with the costconstraints of the consumer markets, these devices require cost-efficient,performance-optimized DSP hardware and sophisticated, application-tuned DSPalgorithms.PIN ^^ is a 16-bit Digital Signal Processor designed f or portable

    telecom munic ations and consumer electronics products requiring smart pow ermanagement. It enables low-cost, low-power DSP processing with several levelsof modularity in the RAM , ROM and I/O, perm itting efficient DSP-based ASICdevelopment.Its design was app lication-driven, based on e xtensive analysis of the targetapplications and the instructions and architectural features essential to thoseapplica tions. A detailed static and dynamic statistical analys is of the instructionsusage in the app lications have been perform ed. The instruction set was definedbased on this analysis and was also enhanced with instructions to supportmicrocontroller functions.PINES modu lar design approach allows the same DSP c ore to be used for variou sapplications simply by adding on-chip mem ory, periph erals and custom logic. To

  • 7/27/2019 1993 an Application-Specific DSP for Portable Applications

    2/9

    quickly move from concept to silicon requires the right development tools,especially software developme nt tools, which support this architectural approach.

    2 SP CORE KEY FEATURESThe fo llowing list describes the main features of the PIN DSP core.Technology

    Double metal C MOS technology25 ns cycle time 6 5VPower M anagement:- fully static design- wide o perating voltage range 3-5.5V- low power dissipation- SLOWmode- STOPmode

    Architecture16-bit fixed-point DSP core with a high level of m odularity:- expandable Data RAM /ROM- expandable Program ROM- up to 8 userdefined registers- up to 16 levels of stack16x16-bit multiplier; single cycle multiply-accumulate instructions36-bit ALU; two 3 6-bit accumulatorsSix ge neral-purpose, 16-bit pointer-registers with two dedicated a ddressarithmetic units for da ta memory indirect addressing, circular buffering,loop counters, and program memory indirect addressing.Zero Overhead Looping, REPEAT and BLOCK-REPEAT instructionsAutomatic saturation modeDivide and Normalize step supportAdvanced, WindowsTMbased Development Tools

    9

  • 7/27/2019 1993 an Application-Specific DSP for Portable Applications

    3/9

    3. ARCHITECTUREA block diagram of the DSP core is shown in Figure 1.The main blocks of theDSP core are a Computation Unit (CU), which includes ALU , multiplier andaccumulators, a Data Address Arithmetic Unit (DAAU), and a Program ControlUnit (PCU). All other peripheral blocks including program and data memory,which are application-specific, are defined as part of the user-specific logic,implemented around the core on the same silicon die.Data is transferred on the following 16-bit buses: a bidirectional X Data Bus(XDB), and two unidirectional buses - the Y Data Bus (YDB) and the ProgramData bus (PDB). The XDB is the main data bus, where most of the data transfersoccur. Instruction word fetches take place in parallel over the PDB. The busstructure can transfer up to two 16-bit words in the same instruction cycle.

    Figure 1 PINE Block Diagram

    50

  • 7/27/2019 1993 an Application-Specific DSP for Portable Applications

    4/9

    The X Address Bus (XAB) and the Y Address Bus (YAB) drive the addresses forXRAM and YRAM . The Program Address Bus (PAB) drives the programmemory addresses.

    Computation UnitThe com putation unit is the heart of the DSP . It contains the multiplier, the A LU,and the two accumulators.The mu ltiplier unit consists of a 16x 16 bit single-cycle, non-pipelined multiplier,two 16-bit input registers (X and Y), a 32-bit output register (P), and an outputshifter. Together with the Data ALU, PINE an perform a singlecycle Multiply-Accumulate (MAC ) instruction. The P register is updated only after a multiplyinstruction and not after a change in the X and/or Y registers.The P register is sign-extended into 36 bits and then shifted. The output shifteris capable of shifting data from the P register into the ALU .The Data ALU performs all arithmetic, logical and shifting operations on dataoperands. The D ata ALU consists of a 36-bit, single cycle , non-p ipelined ALUunit, two 36-bit accumulator registers (A0 and Al ), and a saturation unit.The ALU can perform a positive or negative accumulate, add, subtract, com pare,shift, logical, and sev eral other operations, most of them in one instruction cycle.It uses a 2s complement arithmetic.Each accumulator is organized as two regular 16-bit registers (AOH, AOL, A1Hand A IL ) and a 4-bit extension nibble (AOE and A lE ). The exten sion nibbles AOEand A lE offer protection against 32-bit overflow. Saturation arithmetic is providedto selectively limit ove rflow when reading A0 or A1 from the AOH or A1H portio nof an accumulator to the AOE or A l E portion respectively. T he ALU unit sup portsnorma lization, division and roun ding.

    Data Address Arithmetic UnitThe Data Address Arithmetic Unit (DAAU) performs all address calculationsnecess ry to address data operands in data and program memories. In addition,it supports loop counter operations. This unit operates in parallel with other coreresources to minimize address generation overhead . The DAAU can implement

    5

  • 7/27/2019 1993 an Application-Specific DSP for Portable Applications

    5/9

    linear and modulo arithmetic. The DAAU contains six address registers forindirect addressing , and two configuratio n registers for modulo andincrement/decrement step control. The registers are divided into two groups forsimu ltaneou s addressi ng over XAB and YAB (o r PAB).The D AAU can genera te two addresses every instruction cycle which can be post-modified by two modifiers: linear and modulo. The address modifiers allow thecreation of data structures in memory for circular buffers, delay lines, FIFOs,software stacks, etc.The modulo modifier implemented in PINE s a simplifie d mechanism especiallydesigned to achiev e small silicon area [ l] . It requires that the modulo size shouldbe an integer multiple of the step size. This restriction has a minor effect onalgorith ms' im plem entation , but has a major effect on hardw are comp lexity.

    Program Control UnitThe Progra m Contro l Unit (PCU ) performs instruction fetch, instruction decoding,exceptio n handlin g, and wait state suppo rt. In addition, it suppo rts a mechanismfor internal PROM protection.The PCU generates the next address to the program m emory and con trols hardwareloops . It contains the Repeat/Block-Repeat unit, and two 16-bit registers: theProgram Coun ter and the Loop Counter.The Repeat/Block-Repeat unit performs hardware-loop calculations and control,with no overhead other than the one-time execution of set-up instructions REP andBKREP. It consists of two 16-bit dedicated registers for start and end addressesof the block-repeat, and two 8-bit repeat and block-repeat counters.The program con troller implements a three-level pipelin e architecture. In theoperation of the pipeline, concurrent fetch, operand fetch and execution occur.This allo ws instruction execution to overlap . Thus , the effective executio n timefor most instructions is one cycle. Each pipeline stage is completed before itsresult is needed by the next instruction. The pipeline is an interlock ing pipeline,transparent to the user, which simplifies programming.

    52

  • 7/27/2019 1993 an Application-Specific DSP for Portable Applications

    6/9

    Memory OrganizationTwo independent 64K word m emory spaces are available: the data space (XRAMand YRAM ) and the program space (PROM ).The data space of PINEhas a unique and novel structure [2]. It is divided into nX data space for the XRAM, and a Y data space for the YRAM, both residing inthe 64K word data memory space. The XRAM space has an internal space (on-core data RAM o r ROM ) of 1K word, and an external off-core space of 62K. TheYRAM space is 1K word internal only (RAM or ROM ). The above data spacepartition allows modular expansion of the intemal XRAM and YRAM, and at thesame time enables the two RAMS to be viewed as single continuous data RAM.This data structure virtually emulates a dual-port RAM but consumes about 70of its silicon area.PINE core p eripherals are memory mapped I/O into the data space and aredepending on the Application Specific DSP (ASDSP) configuration.The program memory PROM can be implemented as internal and/or externalmemory up to a total of 64K words.The PINEDSP c ore sup ports eight option al user-definable registers, w hich can belocated off-core. This enables expansio n of the core. These registers appea r inthe data register fields of all relevant instructions. With these registers, externalcomp utation units can be loaded w ith data and read at the end of the computationdirectly into internal registers in a single cycle. Opera tions such as paritycalculation, location of first-1-bit in a word, special shifts, and min/m ax/mid valuecan be easily performed in parallel in a few cycles with the PINE core.

    Power ManagementPINEhas two power save operation modes, the slow and stop modes.SLOW mode: The PINE lock can be slowed down by writing a 4-bit value to aspecial memory-mapped register located external to the core. The clock divisionfactor can be 1, 2, ... 16. The core power dissipation is reduced as follows:

    3+22/N mA for 5V operation, and1.3 9.7 /N m or 3.3V operation,

    where N is the clock d ivision factor.

    5

  • 7/27/2019 1993 an Application-Specific DSP for Portable Applications

    7/9

    STOP m ode: Since the PINEcore VLSI design is fully static, the clock can bestopp ed by setting a special bit. The reset signal will reactivate the core. TheRAM content and all registers which are not defined as being affected by the resetwill remain unaffected.

    Instruction SetPINE S 16-bit instruction encoding has been optimized to supp ort the high estparallelism allowed by the architecture. Several of the most common DSPbenchmarks are shown in Table 1 The instructions fall into 6 groups: arithmeticand logical, multiply and multiply-accumulate, move, branch and call, loop andcontrol. The instruction set was designed to support both DSP req uireme nts as wellas general purpose control functions.

    PINEDSP BenchmarkN taps FIR filterN aps LMSN taps comp lex FIR filterN Cascaded IIR Biquads

    Execution TimeN x 2511s4N x 2511s4N x 2511s5N x 25ns

    Table 1. PINE Benchmarks

    4. APPLICATIONSThe P INE architecture, which provides pow erful DSP w hile keeping low powerdissipation, is attractive to many po rtable application s such as mobile com puting,cellular phones, wireless PBX, w ireless data, Personal Digital Assistants (PDAs)and person al audio. Fo llowin g are two applica tions which benefit from thisarchitecture.Cellular C ommunicationsThe new digital cellular standard s call for state-of-the-art speech coding , channelcodin g and modem techniq ues needed to implement speech cod ers such as VSELPor half-rate GSM, channel coders and modems such as DQPSK. The speech and

    5

  • 7/27/2019 1993 an Application-Specific DSP for Portable Applications

    8/9

    channel code rs can be realized most efficiently with DSP techniques. Sim ilarly,the modulation/demodulationschemes set forth in the various standards, are bestimplemented w ith DSP-based solutions. Integrating all these functions in a singleapplication specific DSP, tightly tailored to m eet the custom er requirem ents willprovide the best solution at lowest possible cost. Additionally, the reduc ed batterypower requirem ents will extend battery life, or, alternatively, permit the batteriesto be smaller. Other capabilities, soon to appear on the market, includingacoustical echo cancellation, voice-activated dialing and noise reduction are allDSP-based and can be easily integrated either by adding more software (moreprogram memory) o r adding more on-chip peripherals to supp ort these functions.

    Personal Digital AssistantsThe m arket for handheld compu ters, sometimes called Personal Digital Assistants,is expected to grow fro m 2 million units to o ver 10million units annually by 1995These m obile, handhe ld information management device s offer m ultiple interfacessuch as pen, voice, mouse and keyboard. By incorporating wirelesscomm unication technologies such as cellular or data packet radio, they can allowthe user to access inform ation services and send and receive email, faxe s and voicemessages.Because these new systems push the absolute limits of performa nce, integration andlow power c onsum ption, they present a formidable challenge to IC suppliers andsystems designers alike. The DSP processing requirements for such advancedcapabilities as speech and handwriting recognition on top of modem and wirelesscapabilities, are substantial and require a highly optimized DSP-based solution.An ASDSP solution is essential to achieve the high deg ree of integration requiredby these pico-com puters.

    5 SUMMARYFaced with escalating demands for high performance, high functionality,miniaturization and sm art power utilization, designers are becoming increasinglyaware of the limitations of general-purpo se DSP architectures. PINE wasdeveloped specifically to fill this growing need. Its highly optimized, modulararchitecture, combined with a complete set of development tools, will permitdesigners not only to achieve their perform ance, size and power goals, but to m eetthe ever shrinking windows of opp ortunity n today's cOnsumer electronics markets.

    55

  • 7/27/2019 1993 an Application-Specific DSP for Portable Applications

    9/9

    References[l]G. Wertheizer, Y . Be'ery, B. Ovadia, Y. Gross, R. Perets, Y. Milstein, A

    simplified cyclical buffer , U.S.A. Patent application, Serial No. 003 640January 13 1993.

    [2]R.Perets, Y. Be'ery, B. Ovadia, Y. Gross, Y. Milstein, G. Wertheizer, Anexpandable m emory for a digital signal processor , U .S.A . Patent application,June 1993.

    AcknowledgementsThe authors would like to thank Shabtai Adlersberg, Bo w Ben-Nun, Carina Ben-Zvi, Eitan Eliyahu, Yael Gross, Asher Kaminker, Yakov Milstein, Avi Mintz,Ronen Perets, Shimon Raviv, and Gideon Wertheizer fo r their contributions to thedefinition and development of the PINE DSP.

    Yair Be'ery*, Bat-Sheva OvadiaDSP Semiconductors, Ltd

    Ben Gurion St.Givat Shmuel, 51905Israele-mail: ybeeryom cimail .com

    and Tel Aviv UniversityDept. of Electrical EngineeringRamat Aviv, 69978Israele-mail : ybeeryaeng. au.ac .il

    Shaul BergerDSP Semiconductors USA, Inc.2855 Kifer Rd., Suit 202Santa Clara, CA 95051USAe-mail shaul-berge ramc imail .com

    Pine is a trademark o DSP Semiconductors Inc.

    56