Open Core Platform based on OpenRISC Processor …432670/FULLTEXT01.pdfOpenCores organization: OpenRISC OR1200 processor, CONMAX WISH-BONE interconnection IP core, Memory Controller

Master Thesis

Open Core Platform based on OpenRISC

Processor and DE2-70 Board

Xiang LI

Company: ENEA

University: Royal Institute of Technology,

School of Information and

Communication Technology,

Stockholm, Sweden

Industry Supervisor: Johan Jorgensen

KTH Supervisor & Examiner: Ingo Sander

Master Thesis Number: TRITA-ICT-EX-2011:62

This page is intentionally left blank

Abstract

The trend of IP core reuse has been accelerating for years because of theincreasing complexity in the System-on-Chip (SoC) designs. As a result,many IP cores of different types have been produced. Meanwhile, similar tothe free software movement, an open core community has emerged becausesome designers choose to share their IP cores by using open source licenses.The open cores are growing fast due to their inherently attractive propertieslike accessible internal structure and usually no cost for license.

Under this background, the master thesis was proposed by the companyENEA (Malmo/Lund branch), Sweden. It intended to evaluate the qualitiesof the open cores, as well as the difficulty and the feasibility of building anembedded platform by exclusively using the open cores.

We contributed such an open core platform. It includes 5 open cores from theOpenCores organization: OpenRISC OR1200 processor, CONMAX WISH-BONE interconnection IP core, Memory Controller IP core, UART16550,and General Purpose IOs (GPIO) IP core. More than that, we added thesupports to DM9000A and WM8731 ICs for Ethernet and Audio features.On the software side, uC/OS-II RTOS and uC/TCP-IP stack have beenported to the platform. The OpenRISC toolchain for software developmentwas tested. And a MP3 music player application has created to demonstratethe system.

The open core platform is targeted to the Terasic’s DE2-70 board withALTERA Cyclone II FPGA. It aims to have high flexibility for a wide rangeof embedded applications and at the same time with very low costs.

The design of the thesis project are fully open and available online. We hopeour work can be useful in the future as a starting point or a reference bothfor academic research or for commercial purposes.

Keywords: SoC, OpenCores, OpenRISC, WISHBONE, DE-70, uC/OS-II

i


Acknowledgement

This master thesis was started in January, 2008. It cost me and my partnerLin Zuo more than 6 months to implement the project. There were lots ofdifficulties when solving the technical issues, but the writing of the thesis waseven a greater challenge that I ever had. Finally the writing was completedin January, 2011.

During the time, many people generously gave considerable help to me.Without those supports the thesis wouldn’t have been come to this far.So hereby I’d take the opportunity to express my deepest gratitude to thefollowing people:

Johan Jorgensen, the ENEA hardware team leader and our industry su-pervisor. Johan is the best supervisor that I can expect. He is knowledgeable,experienced and full of brilliant ideas. He is good at communicating and en-couraging people. With a short conversation he can take my pressures away.As a supervisor, he participated almost in all aspects during the thesis. Heproposed the topic, made the detailed plan together with us, helped to setupthe working environment, and kept checking our progress. When we were introuble, he provided not only useful suggestions but sometimes even lookeddown to the source code level. When I was upset because the thesis writingwent slow, he was always supportive. It is my pleasure to have Johan as asupervisor.

Ingo Sander, our KTH supervisor and examiner. Ingo’s lectures were veryimpressive and valuable to me, which opened many windows to the newfields. This was the reason I wanted to be his thesis student. Ingo activelymonitored the thesis although we were in different cities. He followed ourweekly reports and replied with advice and guidance. He tried to set higherrequirements to us, which created bigger challenges but I also gained moreexperiences at the same time. And I especially appreciate his patience whenthe thesis writing was failed to finish on time. The knowledge learnt fromIngo truly benefits me, which I am using it everyday now in my career life.

iii

iv

Lin Zuo, my thesis partner and good friend. As a partner he contributeda solid part to the thesis. He worked really hard and took over many heavytasks. For some of them I was not confident but Lin made them come true. Asa friend he is a very funny guy. When our work seemed going to a dead end,Lin can easily amuse me in his special way and ease the anxieties. Thanks tohim the thesis becomes a successful and interesting memory when we lookback to the life in Malmo.

And my parents. Without their love and supports throughout my life, Iwould never get this chance to write a master thesis acknowledgement.

Sincerely thanks to all the people who helped me on the way to my today’sachievements. Thank you all!

Contents

Abstract i

Acknowledgement iii

List of Figures xi

List of Tables xv

1 Introduction 1

1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . 1

1.2 Thesis Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Chapter Overview . . . . . . . . . . . . . . . . . . . . . . . . 5

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Open Cores in a Commercial Perspective 7

2.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.1 What is Open Core? . . . . . . . . . . . . . . . . . . . 8

2.1.2 Formal Definition of Open Source . . . . . . . . . . . 8

2.1.3 Licenses Involved to Evaluate . . . . . . . . . . . . . . 9

2.1.4 GNU and FSF . . . . . . . . . . . . . . . . . . . . . . 9

2.1.5 Free Software 6= Free of Charge . . . . . . . . . . . . . 10

v

vi

2.2 BSD License . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 GNU Licenses . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.1 GPL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.2 LGPL . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3.3 Evaluations on the GNU Licenses . . . . . . . . . . . . 17

2.4 The Price for Freedom — Comments on the GNU Philosophy 19

2.5 Developing Open Cores or Not — How Open Source Productscould Benefit . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.6 Utilizing Open Cores or Not — Pros and Cons . . . . . . . . 23

2.7 The Future of Open Cores . . . . . . . . . . . . . . . . . . . . 25

2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3 Platform Overview 31

3.1 DE2-70 Board . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2 Digital System with Open Cores . . . . . . . . . . . . . . . . 33

3.2.1 System Block Diagram . . . . . . . . . . . . . . . . . . 33

3.2.2 Summary of Addresses . . . . . . . . . . . . . . . . . . 35

3.3 Software Development and Operating System Layer . . . . . 36

3.3.1 Software Development . . . . . . . . . . . . . . . . . . 37

3.3.2 Operating System and uC/OS-II RTOS . . . . . . . . 41

3.3.3 uC/TCP-IP Protocol Stack . . . . . . . . . . . . . . . 42

3.3.4 Hardware Abstraction Layer (HAL) Library . . . . . . 42

3.4 Demo Application: A MP3 Music Player . . . . . . . . . . . . 43

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

vii

4 OpenRISC 1200 Processor 47

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.2 OR1200 Features and Architecture . . . . . . . . . . . . . . . 49

4.3 OR1200 Registers . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.4 Interrupt Vectors . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.5 Tick Timer (TT) . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.6 Programmable Interrupt Controller (PIC) . . . . . . . . . . . 56

4.7 Porting uC/OS-II to OR1200 . . . . . . . . . . . . . . . . . . 58

4.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 58

4.7.2 uC/OS-II Context Switching . . . . . . . . . . . . . . 58

4.7.3 Context Switching in OR1200 Timer Interrupt . . . . 59

4.7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 61

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5 WISHBONE Specification and CONMAX IP Core 63

5.1 Importance of Interconnection Standard . . . . . . . . . . . . 64

5.2 WISHBONE in a Nutshell . . . . . . . . . . . . . . . . . . . . 66

5.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.2.2 WISHBONE Interface Signals . . . . . . . . . . . . . . 68

5.2.3 WISHBONE Bus Transactions . . . . . . . . . . . . . 72

5.2.3.1 Single Read/Write Transaction . . . . . . . . 73

5.2.3.2 Block Read/Write Transaction . . . . . . . . 76

5.2.3.3 Read-Modify-Write (RMW) Transaction . . 78

5.2.3.4 Burst Transaction . . . . . . . . . . . . . . . 79

5.3 CONMAX IP Core . . . . . . . . . . . . . . . . . . . . . . . . 89

5.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 89

viii

5.3.2 CONMAX Architecture . . . . . . . . . . . . . . . . . 90

5.3.3 Register File . . . . . . . . . . . . . . . . . . . . . . . 90

5.3.4 Parameters and Address Allocation . . . . . . . . . . . 91

5.3.5 Functional Notices . . . . . . . . . . . . . . . . . . . . 92

5.3.6 Arbitration . . . . . . . . . . . . . . . . . . . . . . . . 93

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6 Memory Blocks and Peripherals 97

6.1 On-chip RAM and its Interface . . . . . . . . . . . . . . . . . 99

6.1.1 On-chip RAM Pros and Cons . . . . . . . . . . . . . . 100

6.1.2 ALTERA 1-Port RAM IP Core and its Parameters . . 100

6.1.3 Interface Logic to the WISHBONE bus . . . . . . . . 101

6.1.4 Data Organization and Address Line Connection . . . 102

6.1.5 Memory Alignment and Programming Tips . . . . . . 106

6.1.6 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . 108

6.2 Memory Controller IP Core . . . . . . . . . . . . . . . . . . . 109

6.2.1 Introduction and Highlights . . . . . . . . . . . . . . . 110

6.2.2 Hardware Configurations . . . . . . . . . . . . . . . . 112

6.2.2.1 Address Allocation . . . . . . . . . . . . . . . 112

6.2.2.2 Power-On Configuration (POC) . . . . . . . 114

6.2.2.3 Tri-state Bus . . . . . . . . . . . . . . . . . . 115

6.2.2.4 Miscellaneous SDRAM Configurations . . . . 116

6.2.2.5 Use Same Type of Devices on One MemoryController . . . . . . . . . . . . . . . . . . . . 116

6.2.3 Configurations for SSRAM and SDRAM . . . . . . . . 116

6.2.4 Performance Improvement by Burst Transactions . . . 118

ix

6.3 UART16550 IP Core . . . . . . . . . . . . . . . . . . . . . . . 120

6.4 GPIO IP Core . . . . . . . . . . . . . . . . . . . . . . . . . . 122

6.5 WM8731 Interface . . . . . . . . . . . . . . . . . . . . . . . . 123

6.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 123

6.5.2 Structure of the WM8731 Interface . . . . . . . . . . . 123

6.5.3 HDL Source Files and Software Programming . . . . . 125

6.6 DM9000A Interface . . . . . . . . . . . . . . . . . . . . . . . . 126

6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

7 Conclusion and Future Work 129

7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

7.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

7.2.1 Improve and Optimize the Existing System . . . . . . 132

7.2.2 Extension and Research Topics . . . . . . . . . . . . . 132

7.3 What’s New Since 2008 . . . . . . . . . . . . . . . . . . . . . 133

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

A Thesis Announcement 135

A.1 Building a reconfigurable SoCusing open source IP . . . . . . . . . . . . . . . . . . . . . . . 135

A.2 Further information . . . . . . . . . . . . . . . . . . . . . . . 136

B A Step-by-Step Instruction to Repeat the Thesis Project 137

B.1 Hardware / Software Developing Environment . . . . . . . . . 137

B.1.1 DE2-70 Board . . . . . . . . . . . . . . . . . . . . . . 138

B.1.2 A RS232-to-USB Cable . . . . . . . . . . . . . . . . . 138

x

B.1.3 An Ethernet Cable . . . . . . . . . . . . . . . . . . . . 139

B.1.4 A Speaker or an Earphone . . . . . . . . . . . . . . . . 139

B.1.5 A PC . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

B.1.6 Cygwin . . . . . . . . . . . . . . . . . . . . . . . . . . 139

B.1.7 OpenRISC Toolchain . . . . . . . . . . . . . . . . . . . 140

B.1.8 Quartus II . . . . . . . . . . . . . . . . . . . . . . . . . 140

B.1.9 The Thesis Archive File . . . . . . . . . . . . . . . . . 141

B.2 Step-by-Step Instructions . . . . . . . . . . . . . . . . . . . . 141

B.2.1 Quartus Project and Program FPGA . . . . . . . . . 141

B.2.2 Download Software Project by Bootloader . . . . . . . 143

B.2.3 Download Music Data and Play . . . . . . . . . . . . . 147

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

List of Figures

2.1 Difference between LGPL and GPL . . . . . . . . . . . . . . . 17

2.2 Difference between open software and open core . . . . . . . . 18

2.3 Increasing competitors force the price lower . . . . . . . . . . 21

3.1 Platform overview . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 DE2-70 board . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.3 Hardware block diagram of the DE2-70 . . . . . . . . . . . . 33

3.4 Open core system block diagram . . . . . . . . . . . . . . . . 34

3.5 Software development workflow . . . . . . . . . . . . . . . . . 38

4.1 OR1200 architecture . . . . . . . . . . . . . . . . . . . . . . . 50

4.2 Improve performance with Harvard architecture . . . . . . . . 50

4.3 16-bit SPR address format . . . . . . . . . . . . . . . . . . . . 53

4.4 Tick timer structure and registers . . . . . . . . . . . . . . . . 56

4.5 PIC structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.1 Interconnection is important in multiprocessor systems . . . . 65

5.2 Overview of the WISHBONE . . . . . . . . . . . . . . . . . . 68

5.3 An example of WISHBONE signal connections . . . . . . . . 73

5.4 An example of WISHBONE single transactions . . . . . . . . 74

xi

xii

5.5 An example of a WISHBONE block write transaction . . . . 76

5.6 Block transactions are helpful in multi-master systems . . . . 77

5.7 Master B is blocked by master A . . . . . . . . . . . . . . . . 77

5.8 An example of a WISHBONE RMW transaction . . . . . . . 78

5.9 Maximum throughput with single transactions . . . . . . . . 79

5.10 Maximum throughput with burst transactions . . . . . . . . . 80

5.11 An example of a constant writing burst transaction . . . . . . 81

5.12 An example of an incrementing reading burst transaction . . 83

5.13 An burst transaction with wait states . . . . . . . . . . . . . 87

5.14 Core architecture overview . . . . . . . . . . . . . . . . . . . . 90

5.15 An example of using CONMAX . . . . . . . . . . . . . . . . . 92

6.1 Hardware platform architecture . . . . . . . . . . . . . . . . . 97

6.2 On-chip RAM module internal structure . . . . . . . . . . . . 102

6.3 Overview of data organization in OpenRISC systems . . . . . 105

6.4 Legal and illegal memory accesses . . . . . . . . . . . . . . . . 106

6.5 Example of memory padding . . . . . . . . . . . . . . . . . . 108

6.6 Memory Controller in the OpenRISC system . . . . . . . . . 109

6.7 Logic to activate a CS . . . . . . . . . . . . . . . . . . . . . . 113

6.8 32-bit address configuration for Memory Controller . . . . . . 114

6.9 WM8731 Interface in the OpenRISC system . . . . . . . . . . 123

6.10 WM8731 Interface internal structure . . . . . . . . . . . . . . 124

6.11 32-bit music data format . . . . . . . . . . . . . . . . . . . . . 126

B.1 Top level entity of the project . . . . . . . . . . . . . . . . . . 142

B.2 Program ALTERA FPGA . . . . . . . . . . . . . . . . . . . . 143

xiii

B.3 Software project . . . . . . . . . . . . . . . . . . . . . . . . . 144

B.4 Makefile script . . . . . . . . . . . . . . . . . . . . . . . . . . 145

B.5 Build software project . . . . . . . . . . . . . . . . . . . . . . 145

B.6 Check USB-to-serial port ID . . . . . . . . . . . . . . . . . . . 146

B.7 Downloading and flushing finished . . . . . . . . . . . . . . . 147

B.8 IP configuration . . . . . . . . . . . . . . . . . . . . . . . . . 148

B.9 Start software project via bootloader . . . . . . . . . . . . . . 149

B.10 Ethernet connected . . . . . . . . . . . . . . . . . . . . . . . . 149

B.11 Decode MP3 file . . . . . . . . . . . . . . . . . . . . . . . . . 150

B.12 Set volume and play the music . . . . . . . . . . . . . . . . . 151


List of Tables

3.1 Summary of addresses . . . . . . . . . . . . . . . . . . . . . . 36

4.1 OR1200 GPRs (part) . . . . . . . . . . . . . . . . . . . . . . . 52

4.2 Exception types and causal conditions . . . . . . . . . . . . . 54

6.1 Parameters of 1-port RAM IP core . . . . . . . . . . . . . . . 101

6.2 Data organization for 32-bit ports . . . . . . . . . . . . . . . 103

6.3 Memory Controller register configurations . . . . . . . . . . . 118

6.4 System performance test results . . . . . . . . . . . . . . . . . 119

6.5 NIOS II processor comparison (part) . . . . . . . . . . . . . . 120

6.6 List of memory blocks and peripherals . . . . . . . . . . . . . 126

xv


Chapter 1

Introduction

This chapter intends to give an introduction and a chapter overview to themaster thesis made by Xiang Li and Lin Zuo.

1.1 Background and Motivation

The goal of the thesis is to implement a low cost computing platform byexclusively using open source Intellectual Property (IP) cores.

The idea was brought by Johan Jorgensen, the hardware team leader fromENEA [1]. We have discussed the reasons to start this project, which arelisted below:

• ENEA wants to gain knowledge and insight in the viability of usingopen cores.

• Based on the knowledge they can decide whether or not to start busi-ness involving HW design (ASIC/FPGA soft/hard IP cores).

• If the platform is successful, it might be continued and improved as aproduct.

• This project also proves some embedded applications can be done withthe open cores.

3 factors of the recent development and innovation of the System-on-Chip(SoC) technology give the possibility to do this thesis:

1

2 CHAPTER 1

1. The Design Reuse methodology with IP cores

As the increasing complexity of the digital electronic systems and thegrowing time-to-market pressure, engineers are forced to utilize previ-ously made blocks, i.e. IP cores, as many as possible into new designs.This is called design reuse methodology.

The methodology produces more and more IP cores that are readyto use. These IP cores provide great convenience when designing newsystems. For example in this thesis project we can quickly build asystem with the selected IP cores. If we had to create those IP coresourselves, it would be an impossible work for us to finish the systemwithin limited time.

2. The appearance of the Open Core community

Most IP cores are implemented by Hardware Description Languages(HDLs), either VHDL or Verilog HDL. The HDL source codes can befurther synthesized to digital circuits by software tools.

Similarly to the free software community, there has emerged an opencore community, where the designers publish their IP core HDL sourcecodes that are protected by open source licenses. The open source IPcores are called in short Open Cores.

The OpenCores organization is the world’s largest site/community fordevelopment/discussion of open source hardware IPs [2, 3]. In the web-site, there are hundreds of opening or finished projects regarding to theopen cores, which cover from CPUs to all kinds of peripherals. Some ofthe open cores are with very good quality and have been successfullyused in commercial/industrial projects.

One of the most interesting advantages of the open cores is they arefree to access. This is especially critical for thesis students like us whodo not have enough budgets to acquire commercial IP cores. Besides,build up a low cost system with open cores for commercial purposesis also an attractive topic.

3. The fast system prototyping via FPGA development board

Compare to the traditional ASIC design flow which takes long timeand is also more costly/risky, the fast system prototyping on a FPGA-based development board has proven a good methodology to acceleratethe functional verification. Many vendors have designed a large numberof FPGA boards for this purpose with high performance. One exampleis the Terasic’s DE2-70 board [4, 5] that we used for the thesis project.

The DE2-70 is equipped with a high density ALTERA Cyclone IIFPGA, large volume RAM/ROM components, and plenty of periph-erals including Audio devices and Ethernet interface. It provided us

CHAPTER 1 3

an ideal hardware platform to try out open cores, and allowed us toquickly build up an open core based digital system with demonstra-tions.

Because of the 3 factors described above, it becomes feasible to executethe project, which implements an open core based computing system ona DE2-70 FPGA board which is low cost but powerful and versatile. Ifsuch a system can be made, it would be interesting both for commercialand academic purposes. With the platform, many other possibilities can befurther extended.

As mentioned before, the idea was initially proposed by Johan Jorgensenfrom ENEA and who is the industrial supervisor of us. The thesis is alsocoached by Ingo Sander, our supervisor from KTH.

The thesis was performed by Xiang Li (me) and my partner Lin Zuo inENEA (Malmo/Lund) branch, Sweden. Most of the implementation wasdone from January to July in 2008. Because of the administrative reasonswe had to write 2 separate theses, which have similar structures but differentfocuses based on our responsibility. For Lin Zuo’s thesis, please refer to [6].

The theses and the archived project files are available at this link [7].

1.2 Thesis Objectives

The thesis is to implement a computing platform with open cores. Howeverthis is a very broad topic. After discussed and approved by the supervisors,we had elaborated and refined the thesis into detailed tasks. In this sectionthe tasks of the thesis are summarized, including both we achieved and faileddue to the time limitation.

The original thesis announcement made by Johan Jorgensen is copied asAppendix A, from where we can get an overall idea of the initial purposesof the thesis:

1. Evaluate quality, difficulty of use and the feasibility of open source IPs

2. Design the system in a FPGA and also evaluate the system perfor-mance

3. Investigate license issues and their impact on commercial use of opensource IP

4. Port embedded Linux to the system

4 CHAPTER 1

Because it was hard to foresee how much work can be completed withinlimited time, after discussion we defined the thesis tasks in 3 different levels:Level One/Two/Three.

Below it is the list of the tasks. The responsibility is marked in the brackets.

The Level One tasks are mandatory for the thesis. It contains the very basicgoals to create a working system with open cores:

• Study the open source licenses and investigate the impacts of the li-censes for commercial usages (Xiang and Lin)

• Build a FPGA system on a DE2-70 develop board with the followingopen cores:

– OpenRISC CPU (Xiang)

– WISHBONE bus protocol and CONMAX IP core (Lin)

– Memory Controller for SSRAM/SDRAM (Lin)

– UART connection (Xiang)

• Setup the toolchain (compiler etc.) for software development, and de-sign applications for demonstrating the hardware platform (Xiang)

• Port uC/OS-II Real Time Operating System (RTOS) to the platform(Xiang)

• Build an equivalent system with ALTERA technology and evaluatethe performance comparing to our system (Lin)

The Level Two tasks are mainly to extend the system with more features:

• Add support to WM8731 Audio CODEC such that the system canplay music (Xiang)

• Add support to DM9000A Ethernet controller (Lin)

• Port uC/TCP-IP stack to the system such that the system can com-municate with a PC with TCP/UDP protocols (Lin)

• Design a software application to demonstrate these features (Xiang)

The Level Three tasks are advanced tasks:

• Build multiprocessor system with more than one OpenRISC CPU

CHAPTER 1 5

• Port Linux to the platform

At the end, we finished most tasks on Level One and Two, but failed tostart the tasks of Level Three because of the limited time of the thesis.

1.3 Chapter Overview

The thesis contains 7 chapters. An overview of the chapters is given below.

As mentioned before, the contents of this thesis focuses mainly on the tasksthat Xiang Li was responsible. For Lin Zuo’s part, e.g. system performancecomparison, please refer to Lin’s thesis [6].

Chapter 1 is the chapter you are reading which gives an introduction tothe thesis.

Chapter 2 introduces 3 widely used open sources licenses: GPL, LGPL andthe BSD license, and discusses the impacts of the licenses for the open cores.The chapter is placed before the system implementation chapters becauseit is a primary task to investigate. We don’t want to violate the licenseswhile using open cores. Also it would be interesting to know the influencesof the open sources licenses if an open core based system will be used forcommercial purposes.

From Chapter 3 to Chapter 6, the implementation of the open core basedcomputing platform is described.

Chapter 3 gives an overall impression of the system architecture, includingthe hardware block diagram, the software development workflow, and thedescription of a demonstration application.

Chapter 4 focuses on the OpenRISC OR1200 CPU, which is the heart ofthe computing system. The processor is discussed from many aspects likehardware, software, and porting the uC/OS-II RTOS.

Chapter 5 is dedicated for the WISHBONE bus protocol and an open coreimplementation of the protocol, i.e. the CONMAX IP core. The bus protocolorganizes the whole system. It is so important that we will use a separatechapter for it.

In Chapter 6 the memory blocks and the peripherals of the system areintroduced, including Memory Controller IP core, UART 16550 IP core,GPIO IP core, and the interfaces for WM8731 and DM9000A etc.

6 CHAPTER 1

Chapter 7 finally concludes the thesis, and also provides a list of todowhich can be interesting topics to research in the future.

At the end, 2 appendixes are attached. Appendix A is the copy of thethesis announcement made by Johan Jorgensen. Appendix B is a step bystep instruction of how to reproduce the project on a DE2-70 board withthe downloaded files from [7].

References:

[1] Website, ENEA, http://www.enea.com/, Last visit: 2011.01.31,Enea is a global software and services company focused on solutions forcommunication-driven products.

[2] Website, OpenCores.org, http://www.opencores.org/,Last visit: 2011.01.31,OpenCores is a community that enable engineers to develop open sourcehardware, with a similar ethos to the free software movement.

[3] Webpage, Advertising at OpenCores, from OpenCores.org,http://opencores.org/opencores,advertise, Last visit: 2011.01.31.

[4] Website, Terasic Technologies, http://www.terasic.com.tw/,Last visit: 2011.01.31.

[5] Webpage, Altera DE2-70 Board, from Terasic Technologies,http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&No=226,Last visit: 2011.01.31.

[6] Lin Zuo, System-on-Chip design with Open Cores, Master Thesis,Royal Institue of Technology (KTH), ENEA, Sweden, 2008,Document Number: KTH/ICT/ECS-2008-112.

[7] Webpage, The online page of the thesis and project archive,http://www.olivercamel.com/post/master thesis.html,Last visit: 2011.01.31.

http://www.enea.com/

http://www.opencores.org/

http://opencores.org/opencores,advertise

http://www.terasic.com.tw/

http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&No=226


http://www.olivercamel.com/post/master_thesis.html

Chapter 2

Open Cores in a CommercialPerspective

This chapter might look a bit weird in an engineering thesis, but it wasreally the very primary task to do even before the implementation of thethesis started.

Because the thesis involves open cores and most open cores are covered byvarious open source licenses, like GPL, LGPL and the BSD license, goodunderstanding to those licenses are inevitably needed when evaluating opencores. Quoted Johan Jorgensen, our thesis supervisor, “We want a platformthat is reconfigurable, no-frills and cheap that can be used for in-houseprojects, but we do not want to hit the mines of open source. So we needto know if open cores are a safe bet and what kind of performance they willhave on a given FPGA.” In this chapter, 3 of most widely used open sourcelicenses, GPL, LGPL and the BSD license are introduced.

If building an open core based system really become true, another topicis also interesting for the companies: would the open source licenses limitthe system for commercial purposes? What are the benefits and the trade-offs? As the thesis project was executed in the company ENEA, we believeanalyses about the open cores in a commercial perspective would be usefulfor them. So some general discussions are also made in the chapter aboututilizing open cores on behalf of a company.

Hope the introductions and evaluations to the open cores and the opensource licenses in this chapter could answer 2 questions: As a companyproject manager, should we develop new open cores? Or should we improveexisting open cores and integrate them into the next products?

7

8 CHAPTER 2

2.1 Basic Concepts

In this section several basic concepts regarding to open cores are introduced.

2.1.1 What is Open Core?

Open core is a shortening of Open Source Intellectual Property (IP) Core,which is a combination of the concepts of “open source” and “IP core”.

There is no need to spend much words on introducing IP cores becausethose are quite well known and discussed already in plenty of articles like[1, 2]. In short, the increasing complexity of electronic systems and time-to-market pressure force engineers to utilize already made blocks, i.e. IP cores,as many as possible into the next system, which is so called the design reusemethodology.

Actually semiconductor IP cores have a broad definition and can be in anyform of “reusable unit of logic, cell, or chip layout design” [3], but in thisthesis, the discussion of the “IP Core” is narrowed down to only digital logicblocks designed by HDL and targeted to FPGA/ASIC. This is because allopen cores used in the thesis project are of this type.

2.1.2 Formal Definition of Open Source

When coming to the other concept “open source” of the open core, mostpeople simply just take it literally as “you can look at or get a copy ofsource code”. However, this is a common misunderstanding.

Open Source Initiative (OSI) [4], an official organization of open sourcecommunity, has a formal definition to the term “open source”, which can befound on the Internet at the link [5]. It is too long to cite the full texts here.

If haven’t read the concept before, you will find the open source is a morecomplicated concept than ever imagine. It does not only refer to the accessto source codes, but also defines lots of criteria needed to follow. The actualeffects of those criteria might be important to be aware of, before publishproducts as open source or utilize something in any form of open source,which naturally includes open cores.

Because the OSI definition of open source is too long, personally in the thesisI would like to simplify the definition to “the source codes that are coveredby certain open source licenses”, such that we can concentrate on studying

CHAPTER 2 9

the open source licenses only.

A list of approved open source licenses can be found in OSI’s website [6].Different licenses in the list may have different rules and regulations, butif the source codes are covered by any one of them, you may call it opensource.

2.1.3 Licenses Involved to Evaluate

Now we have established the open source licenses as the target to study.Next question is to define what licenses are involved. Because the OSI listof licenses is really long, not possible to go through them all.

All open cores used in the thesis project are covered by either LGPL or theBSD license. Due to the LGPL is based on the GPL, all 3 licenses, i.e. GPL,LGPL and the BSD License, will be introduced in the later sections.

Actually the open cores used in the thesis are not covered by the exact BSDlicense, but a “BSD-style” license. However the introduction to the originalBSD license will still be useful.

2.1.4 GNU and FSF

Because the GPL and LGPL will be introduced later, several words relatedto the background are worth to mention.

The letter “G” in the GPL and LGPL stands for “GNU”, which is a name ofa project to develop a Unix-like operating system that is completely free soft-ware. The GNU licenses were initially designed to protect the liberty of thefree software of that project without being violated. But later they becamemore and more popular and now widely used to cover a large proportion offree software all over the world.

Sometimes the GNU is also treated incorrectly as an organization thatis responsible for the project, probably because its website is named aswww.gnu.org. But in fact it is Free Software Foundation (FSF) that takesthe role. The FSF is a corporation founded to support the free softwaremovement as well as to be a sponsor to the GNU project. So don’t be con-fused when seeing “Copyright (c) <year>, Free Software Foundation, Inc”in the GNU licenses.

Both the GNU project and the FSF were started by Richard Stallman, alegend because of his contributions for the free software movement. The

10 CHAPTER 2

thesis won’t go further on his story, but lists some references instead [7–14].

2.1.5 Free Software 6= Free of Charge

The free software movement mentioned above is a social movement aimingto prompt people’s freedom on accessing and improving the source codes ofthe software. If a software truly assures the freedom for the users, it can becalled free software.

Free software is a concept close to the open source but highlights more on therights of the freedom [15]. Similarly, it is also often misunderstood literally as“the software that is free of charge”. We will come back again to this conceptin the later section, for now please just remember the official explanationto the concept here: “ ‘Free software’ is a matter of liberty, not price. Tounderstand the concept, you should think the ‘free’ as in ‘free speech’, notas in ‘free beer’ ” [16].

The explanation is meaningful. It implies people can sell free software witha price, in case of the freedom is guaranteed as the same time. It is allowedto make profits by utilizing free software, so as to the open cores.

So far, all related concepts have been introduced. From the next section, wewill start discussing about the open source licenses, and then make analysesfor the open cores in a commercial perspective.

2.2 BSD License

The Berkeley Software Distribution (BSD) license gets its name becauseit was first designed to cover a Unix-like operating system developed byUniversity of California, Berkeley [17]. Later it was revised by removinga clause which was too impracticable and limited the license to be widelyaccepted. This story can be found in [17, 18]. Now the BSD license is alsocalled the “new” BSD license1.

The BSD license is the 3rd most popular open source license according to thearticle [19], which also mentions that the first two ahead of it are the GPLand the LGPL, and those two account for almost 80% of all open sourcelicenses in use. The article doesn’t tell how the statistics were made, but itshows the truth that the 3 licenses are quite widely used in the open sourceworld.

1Or modified, or simplified, or 3-clause BSD license

CHAPTER 2 11

The BSD license is popular probably because it has very few restrictions,both for authors and especially for users. The full text of the BSD licensecan be found at [20].

Let’s take a look at the 3 clauses of the license:

Redistribution and use in source and binary forms, with or without modifi-cation, are permitted provided that the following conditions are met:

• Redistributions of source code must retain the above copyright notice,this list of conditions and the following disclaimer.

• Redistributions in binary form must reproduce the above copyrightnotice, this list of conditions and the following disclaimer in the doc-umentation and/or other materials provided with the distribution.

• Neither the name of the <ORGANIZATION> nor the names of itscontributors may be used to endorse or promote products derived fromthis software without specific prior written permission.

The 3 clauses assert only minimum requirements comparing to the otheropen source licenses.

If we try to understand the clauses, Clause 1 means people cannot removethe text of the BSD license from the source codes. Clause 2 says if someonewill redistribute the source codes in another format, like selling a softwareproduct, the people has to announce that the product contains somethingcovered by the BSD license designed by someone else. Both Clause 1 and 2are very basic things that everyone should do by conscience, even if there arenot asked by a license. Just like when writing an article, a list of referencesis needed to acknowledge previous contributors’ work.

Clause 3 is a little complicated to understand. It is used to prevent the origi-nal author’s name of the source codes from being abused, and the drawbacksof the source codes from being incorrectly attributed. For an example, ifsomeone gets some BSD license protected source codes which were made bya famous expert, the people can improve the codes and redistribute/sell it,but cannot use the expert’s name in the advertisement without permission.

Under the 3 clauses, there is a long paragraph of disclaimer in the BSDlicense. According to it, basically the source codes covered by the BSD licenseprovide no warranties to the users. This is good for the authors since theypromise nothing through the license. And this is fair because most authorsgive the source codes for free when they choose the BSD license, so of coursethey should not be responsible for any liability. But for the users, they maytake risks when working with the source codes under the BSD license.

12 CHAPTER 2

Now let’s evaluate the BSD license in a business world.

First of all, the license talks nothing regarding to money. This means tech-nically people can develop source codes under the BSD license and sell it atany price he/she wants, although this is not a usual case. Because if someonewould like to make money by selling products, he/she would rather choosea commercial license stricter than the BSD license which prohibits makingcopies freely.

Another interesting point is that the BSD license sets no limitations onkeep using the license after redistributing. This is different from the GNUlicenses. If the open source codes are there, everyone can take the codes,modify, improve and redistribute them. The BSD license requires nothingon what license to use for the modified versions. So it is totally possibleto use another commercial license over the BSD license. This is very goodnews for vendors, because they can take a BSD licensed IP core, integrateit into a new product, and then sell it as the way they want with their owncommercial licenses.

For the open cores, perhaps the most attractive part of the BSD license isthat it doesn’t ask you to publish the modified source codes when redis-tribute (or sell) the products in a non-source form. On the contrary, theGNU licenses do. Thinking about the open cores which are in the form ofthe HDL source codes, after the products are finally built and ready to sell,the open cores will not be in the HDL codes any more but become siliconchips. Because the BSD license doesn’t force to provide source codes, thecompanies can safely make modifications to the BSD licensed open cores,combine them together with their own patents, meanwhile happily sellingthe new chips without telling their competitors the design details.

To conclude the BSD license, it is a quite loosely restricted open source li-cense. Modified source codes that originally covered by the BSD license evendo not have to be open source any more. This makes the license completelycompatible for commercial purposes. Companies are welcome to utilize opencores under the BSD license in their business, using their own commerciallicenses instead, as long as keeping a reference together with the productswhich shows certain BSD licensed open cores are contained, like including acopy of the license in the user manuals.

CHAPTER 2 13

2.3 GNU Licenses

In this section we are about to introduce the GNU licenses, including theGPL and LGPL.

GNU General Public License (GPL) is one of the most popular open sourcelicenses and perhaps the strictest of the world published by the FSF. GNULesser General Public License (LGPL) is a supplement of additional per-missions to the GPL by removing some limitations from it. So to speak, theLGPL is based on the GPL rather than an individual license. We will startwith the GPL and then take a look at how the LGPL is lesser than the GPL.

2.3.1 GPL

The GPL is a very strict open source license that designed to make thesource codes become free software and keep them as free software forever.The full text of the GPL can be found at [21].

According to the formal definition of “free software” [16], a program is freesoftware if its users have all 4 freedoms:

• The freedom to run the program, for any purpose;

• The freedom to study how the program works, and adapt it to yourneeds;

• The freedom to redistribute copies so you can help your neighbor;

• The freedom to improve the program, and release your improvementsto the public, so that the whole community benefits.

So generally, if someone is using source codes covered by the GPL, he/she hasthe rights to “run, copy, distribute, study, change and improve the software”[16]. On the other hand, if someone decides to use the GPL to cover thedesigned programs when publishing them, he/she grants these rights to thefuture users and customers.

To make sure the freedom is guaranteed for all GPL users and all potentialusers in the future, the license designer also defined obligations that theGPL users must follow while enjoy the freedom.

14 CHAPTER 2

There are 4 main obligations can be summarized among the long text of theGPL:

1. Keep using the GPL forever;

2. Provide source codes in any case;

3. Publish the modified parts;

4. Make the GPL cover the entire new project.

First and foremost, the GPL will keep covering the source codes forever, andthere is no way to remove or bypass it since it is applied. If people decide toedit GPLed source codes and redistribute them, the GPL will be the onlychoice as the license for the revised work. The people cannot use any otheralternative license to cover the new work instead of the GPL, because “ThisLicense gives no permission to license the work in any other way . . . ” (GPLsection 5). There are many licenses that have no conflicts with the GPLin principle, like the BSD license. This is called “GPL compatible”. Peoplecan create a new project by combining 2 separate works that the one iscovered by the BSD license and the other by the GPL without any problem.However, only the GPL can be used to cover the new project, not the BSDlicense. So once the GPL is there, it will be always there. It is required tokeep using the license forever for all improved versions in the future.

The 2nd obligation well expresses the thought of the GPL as an open sourcelicense—the source codes have to be provided in any case. After modify aGPLed work and when redistributing it, if the final product is in the form ofthe source codes, it is easy to understand that the codes should be open. Buteven if the final product is in a non-source form, according the section 6 ofthe GPL, the source codes that generating the final product still have to beprovided with the products. For example, if a company produces and sellsGPLed software in non-source form like binary or executable files, normallyit has to provide either an extra CD including the source codes or a webserver which allows the users to download those files freely. This obligationmakes the GPL stronger than many other open source licenses like the BSDlicense, which do not ask for a copy of the source codes. When coming tohardware world for the open cores, this means at the same time silicon chipsare sold, certain HDL source codes and/or design details have to be public.

The 3rd obligation makes the GPL even stronger. It forces its users to publishalso the modified parts when providing the source codes, i.e. it is a must toshow ALL source codes which generating the final products. For example, ifsomeone started with a GPLed work and spent a lot of time to improve it,when selling improved version he/she cannot just provide the original source

CHAPTER 2 15

codes without the details of the modifications. According to the GPL, nosecret is allowed to hide. All details of the modifications and improvementshave to be public, although not everyone might be happy to do so.

Most disagreements and arguments come from the last obligation, that theGPL must cover the entire project, which already caused many criticisms[22, 23]. In section 0 the GPL defines the term “modify”. It says “To ‘modify’a work means to copy from or adapt all or part of the work in a fashionrequiring copyright permission, other than the making of an exact copy.”And when “Conveying Modified Source Versions”, in section 5 term (c) theGPL makes the rules: “You must license the entire work, as a whole, underthis License to anyone who comes into possession of a copy. This License willtherefore apply, . . . , to the whole of the work, and all its parts, regardlessof how they are packaged. This License gives no permission of license thework in any other way . . . ” These clauses make the GPL spread to the partswhich previously could have not belonged to a GPL covered work. And thisis why some people called the GPL “infectious”. When we are going to reusesome GPLed source codes for a new product, this will surely fall into thedefinition “modify”. And when redistributing the improved version, section5 will take effect to force applying the GPL to the whole new work, whichmeans we have to unfortunately open source for the entire work, includingthe parts which are not initially covered by the GPL. For a simple example, ifa program which had 4,950 lines of codes and now adding a 50 lines functioncopied from a GPLed software, all the final 5,000 lines have to become opensource and be public to the future users, even though the GPLed codes onlycounts 1% of all.

2.3.2 LGPL

The last obligation of the GPL described in last section is obviously toostrong than many users who are not that enthusiastic in free software move-ment could accept. Indeed, many companies are afraid that their patentsand proprietaries could be violated when combing them with GPLed things,so they keep themselves away from the GPL. Therefore, a light version ofthe GPL is developed by the FSF by removing the infectious attribute fromthe GPL.

The LGPL is previously named as “GNU Library General Public License”,which shows it is primarily developed for libraries. Because the GPL is toostrict and infectious, if it is applied to a library, all programs that linking tothis library will have to become open source. This is not the case that manydevelopers, especially commercial companies, would like to see. And as aresult, the GPLed library may be gradually forgotten by people. To solve

16 CHAPTER 2

the problem, a compromised license, i.e. the LGPL, is designed and usedparticularly for libraries. Soon, the FSF realized that (1) LGPL can be usedfor not only libraries but many other software as well, and (2) the choosingbetween the GPL and the LGPL by authors is a strategy of development[24], but not only depend on whether it is targeted to a library or not, sonow the LGPL is renamed to “GNU Lesser General Public License”.

The LGPL is a set of additional permissions added to the GPL. By thosepermissions, the LGPL removes the last obligation of the four describedabove. And this is the only difference between the LGPL and the GPL. Allthe other three obligations of the GPL still remain.

In section 0, LGPL defines several more terms than the GPL:

“The Library” refers to a covered work governed by this License, other thanan Application or a Combined Work as defined below.

An “Application” is any work that makes use of an interface provided bythe Library, but which is not otherwise based on the Library.

A “Combined Work” is a work produced by combining or linking an Appli-cation with the Library.

In section 4 “Combined Works” the LGPL says:You may convey a Combined Work under terms of your choice that, taken to-gether, effectively do not restrict modification of the portions of the Librarycontained in the Combined Work and reverse engineering for debugging suchmodifications . . .

And in section 5 “Combined Libraries” it says:You may place library facilities that are a work based on the Library sideby side in a single library together with other library facilities that arenot Applications and are not covered by this License, and convey such acombined library under terms of your choice . . .

By these clauses, the LGPL clearly separates 2 different sets of components,the “Library” that is a LGPLed module and an “Application” from some-where else which isn’t governed by the license. Because the LGPL is initiallydesigned for libraries, it keeps using the phrases like library and application.

According to the LGPL, it will not affect other modules which just connectto a LGPLed module. We can draw a picture to explain the rules. Supposewe have a software project including 2 existed libraries and a lately designedapplication linked to them. As Figure 2.1 (a) shows, Library A is covered bythe LGPL, while Library B is from a company and covered by a commerciallicense. The Application links to the 2 libraries. In this case, the whole

CHAPTER 2 17

project is a combined work, and it is allowed to license the project underuser defined terms, as long as these terms do not conflict with the LGPL,and also guarantee that the LGPLed library is still open source.

(a) LGPL stays along with other licenses (b) GPL infects other modules

Figure 2.1: Difference between LGPL and GPL

However, the scenario will be totally different if Library A is covered by theGPL. If it is, because the GPL forces the whole project to be licensed only inthe way of the GPL, all the rest parts of the project, i.e. the Library B andthe Application, have to become GPLed whatever their previous licensesare.

2.3.3 Evaluations on the GNU Licenses

So far, the introduction to the GPL and the LGPL is finished. It is timeto go back to our topic to discuss how the open cores will be influenced bythese licenses.

For commercial purposes, my answer is that: generally the open cores thatcovered by the LGPL are OK to use if the company won’t spend too manyefforts to improve the cores; while the GPLed open cores are not suggestedfor commercial purposes, unless they are used individually per silicon chip.

Basically the open cores covered by the LGPL are free to use is because theyare not infectious like the GPL. So there is no worry when connecting theopen cores and the proprietary IP cores together to compose a larger system.Besides, most likely the utilized open cores will not be modified too muchthan the initial version. Otherwise companies would rather like to invent anew core than reusing an existing one. So this means to open source for thedetails of the improved parts as required by the LGPL would be acceptablefor the companies.

18 CHAPTER 2

Because of 2 reasons the companies need to be careful when facing to theGPLed open cores: (1) the GPL is infectious, and (2) the difference betweenopen source software and open cores.

For software, the GPL defines a concept called “aggregate” in section 5:A compilation of a covered work with other separate and independent works,which are not by their nature extensions of the covered work, and which arenot combined with it such as to form a larger program, in or on a volumeof a storage or distribution medium, is called an “aggregate” . . . Inclusion ofa covered work in an aggregate does not cause this License to apply to theother parts of the aggregate.

By this definition, the GPL won’t infect if any 2 programs just stay togetherand are not related to each other. For example, if a web browser and a mediaplayer are stored in the same computer, the media player license will notinfluence the web browser if it is using the GPL. Or another example, if acompiler is GPLed, the source codes being processed by the compiler do nothave to be open source.

Unfortunately, the concept of aggregate doesn’t work for open cores in thehardware world. Although open software and open cores are both in the formof source codes at the beginning, open software will be finally compiled intobinary or executable format which is still software, however the HDL codesof open cores will be synthesized by EDA tools and transformed to hardwareat the end. Like Figure 2.2 shows, in a hardware system, every open corewill be connected together via a bus or other interconnection structures. Asa result, the definition of aggregate is no longer satisfied in this case, andthe GPL will be applied to the whole system.

(a) Software aren’t linked in computers (b) IP cores link with others in chips

Figure 2.2: Difference between open software and open core

So to speak, GPLed open cores are not suggested to use because it will forceother parts of system to become open source, which is generally the case that

CHAPTER 2 19

companies would not like to see. There are indeed some really good opencores under the GPL like the LEON processor with SPARC architecture[25], but unless they are used solely to form a system, i.e. to make a siliconchip includes only one open core so that no other cores will be infected, opencores covered by the LGPL will be always a better choice than the GPL.

To sum up this section, both the GPL and the LGPL are very strict opensource licenses. They grant users the freedom to use source codes, but mean-while stipulate obligations must to follow. Generally, the open cores underthe LGPL have no problem for commercial purposes, but the GPLed opencores are not because they are infectious. When the products including opencores under the LGPL are published, basically a company has to do the fol-lowing:

1. Announce the open cores contained in the product;

2. Attach a copy of both the GPL and the LGPL;

3. Publish the source codes that generating the final product.

2.4 The Price for Freedom — Comments on theGNU Philosophy

The GNU GPL and LGPL have been introduced in the previous section. Nowit is time to think about something at a higher level—the GNU philosophywe could learn from the licenses. Let’s start from a misunderstanding of thefree software.

Many people misunderstand the word “free” of “free software” as “free ofcharge” or “zero price”. So they feel strange and uncomfortable when theyare asked to pay for the copies of the free software (like some Linux products)from distributors. They may query “Isn’t this FREE software?”

Actually this is a common misunderstanding for free software. In dictionarythere are 2 meanings of the word “free”. The one is “not under control ornot subject to obligations”, i.e. freedom, and the other is “available withoutcharge”, i.e. no cost for money. The GNU says clearly in the 3rd paragraphof the preamble of the GPL text as “When we speak of free software, we arereferring to freedom, not price.” They also emphasize this point many timesby saying “Free software is a matter of liberty, not price. To understandthe concept, you should think of free as in free speech, not as in free beer.”[15, 16, 26] In fact, the GNU even encourages people to charge as muchmoney as they can by selling free software [26].

20 CHAPTER 2

From which we can feel at the same time the GNU is struggling on protectingthe freedom of knowledge accessibility, they are also trying to clarify thatthe free software licenses won’t prevent people from making money. So onone hand the GNU believe the knowledge should be the wealth of all humanbeings and no one should set barriers to limit its propagations, such thatpeople could benefit from the knowledge. On the other hand, they hope thebusinessmen can charge at any price as they want to get a substantial profitby selling free software as long as they follow the licenses and promise norestriction on the freedom.

This sounds great, like a win-win solution that everyone benefits. The pro-ducers get the money and the consumers get the knowledge. But the questionis: will this become true?

Personally speaking, I totally like this philosophy, because it describes agreat idea that everyone shares their knowledge and everyone works togetherto make the world a better place. I would also show my respect and the bestpraises to those free software developers and their work.

However, at the same time it is needed to point out the problem—a paradoxregarding to the concept of “free”. The GNU argues that the freedom hasno conflict with making money. But from my perspective, the “freedom”and the “free of charge” are actually the same thing, or would rather say,the “freedom” will finally result in “free of charge”. So to speak, the GNUphilosophy is incompatible with the business rules that widely applied inour economic society nowadays.

This is easy to understand, because it is the human nature that we willalways choose the better way for ourselves. If there is something that youcan get freely, no one will pay to get it anymore. Take the air as an examplewhich everyone can breathe. Would you like to spend money on that? If asales man comes to you and says “Hey, we have a new product called ‘free air’which has no difference with the normal air but just cost 100 dollars”, willyou buy it? The easier to get something, the cheaper it is, and vice versa.This explains why water in some countries is free of charge, but in someothers needs to pay, and may be even more expensive than a life in desert.Therefore, at the same time the GNU licenses are used to grant the freedomto users, it makes free software easier to get. And this will reduce the priceof the free products, meanwhile lower down the profit that developers couldhave had.

Now let’s think about what will happen from a business perspective. Supposewe are a software company named A Company that is going to sell a newlydeveloped brilliant product. Since impressed so much by the free softwaremovement we decide to use the GPL to cover our source codes to make the

CHAPTER 2 21

product free software. By selling each copy of the software we will receive1000 dollars. That sounds perfect! Everything is fine in the first several daysbecause the product is quite excellent and has a good market, until anotherB Company appears. The B Company was our A Company’s customer atthe beginning so they got a copy of the source codes together with theproduct. However, soon they start selling a similar product easily developedbased on our source codes, which costs only 500 dollars. This is immoral,however surprisingly we cannot prevent them according to the GPL. Thisis because (1) the GPL asks us to provide the source codes when sellingproducts; (2) the licensees are allowed to redistribute (convey) the sourcecodes under the GPL; (3) “You may charge any price or no price for eachcopy that you convey . . . ” (GPL section 4). All of above show that the BCompany’s behavior is legal according to the GPL! Now assuming you arethe next customer coming to buy the product, there are 2 choices betweenA and B, even if you know that B is the immoral one, what do you do?

The story above includes only 2 competitors A and B. Actually if a 500dollars price is still high enough for a proper profit, more companies willcontinue joining in to sell the products at an even lower price. All of theseactions will force the price of the product to become lower and lower, untilthe profit is low enough that no one else would like to waste time on doingthese things any more. Figure 2.3 shows the trend.

Figure 2.3: Increasing competitors force the price lower

Besides, in our modern society the Internet is so popular to everyone. If thesource codes are freely to get from the network, people will likely downloadit instead of buying it if they feel the price is too high. Although usually itis a skill to compile the source codes that not everyone understands, manyof them would still tend to find the instructions and to learn rather thanpay for the product to the companies.

In a sentence, open source makes products unprofitable. This is the price

22 CHAPTER 2

paid for the freedom. That’s why most free software is free or has a very lowprice, because the market forces it to be. That’s why most big companies donot support free software movement nor would like to make their productsopen source, because it will reduce the money they could have earned.

2.5 Developing Open Cores or Not — How OpenSource Products could Benefit

2 questions were mentioned at the very beginning of the chapter. Now it istime to solve the first one: As a company, should we develop new open coresfor sale?

Generally speaking, the answer is NO if you expect to make money withthe cores, because we have discussed in the previous section that making acommercial product open source will result in less profit than not doing so.

However, the open source things are still good in some other ways. In thissection, we will introduce how the open source products could benefit thebusiness.

The first benefit is that, developing open source products is always a goodstrategy to make a company attracting more publicity. Many open sourceproducts are considered with a better quality or at least to some aspects1

because bugs are easier to find in case of open source as everyone can lookinto it. Therefore, open source things are easier to propagate through theInternet without much advertising. This makes producing an open sourceproduct a good way to announce the company itself, which can be comparedto the discount information of the supermarkets. Every store frequently putsup some eye-catching posters or slogans like “buying a dozen coke gives70% off” etc. When people are attracted into the store, they are likely tobuy many things else expect for the discounting products. So, open sourceproducts could take this role as advertisements.

The second benefit gained by open source products is that this is a goodway to sell services. Because it is most likely that many users who are goingto use open source things may not know how, or a company that going toinclude an open core into their next product may not understand the designdetails of the open core and how to adapt it, in such cases a good marketemerges by selling services, especially for those consulting companies whichhave a good background on providing services. In fact, almost all companieswho provide IP core products also support services at the same time.

1Just think about Linux versus Windows.

CHAPTER 2 23

The third benefit is that developing open source products may give a pushon selling related physical products. The idea of GNU philosophy actuallymakes the knowledge free to get, but after all the source codes have to runon some physical things somehow. If the software is for free, we can earnthat part back by selling hardware products. So silicon companies mightlike open cores like the LEON processor if they can increase the sales of thechips. Or embedded company may be happy to develop open source softwarefor their hardware system, because this won’t affect the sales of the productslike digital cameras, mobile phones or PDAs etc.

2.6 Utilizing Open Cores or Not — Pros and Cons

In this section we try to answer the other question: if it is not suggested todevelop open cores for sale, how about utilizing the existing ones? Is it goodidea or not?

Please note the difference. “To develop” open cores means to design anopen core from scratch for sale, while “utilizing” open cores means to reuseexisting open cores into the next products.

To answer the question will result in an evaluation of the pros and cons ofusing open cores. This is why we will discuss about the advantages as wellas the possible risks in this section.

The most obvious advantage of using open cores is to accelerate the designof the new products. To reuse blocks that have been designed is much betterthan re-design them from beginning.

Another very attractive point of open cores is the price, because most opencores are free to get. This lowers down a lot the threshold of the money tostart a new project, and thus especially good for small companies. Take ourthesis project as an example, all we needed for the project are just a developboard which cost $329 dollars, a PC, and several cables. All the IP coresare free of charge. If we were using commercial cores, the price would beconsiderably too high to afford.

The third advantage is that the open cores are adaptable. Because all thesource codes are open, necessary changes can be easily made to integrateopen cores into new systems. Comparing to commercial cores, usually theyhave certain techniques to hide the design details from the users. To makechanges on these cores therefore costs quite long period which requires thehelp from the vendors.

24 CHAPTER 2

And one more advantage that often get ignored is the open cores couldmake the users feel safe. Because open cores have all their design detailsin public, users can see those details although they may not necessarilydo so. This makes it impossible for the developers to hide harmful designsin the products without telling the customers. This advantage makes theopen cores well suited for security critical products that used for nationalor military purposes.

At the mean time open cores also have disadvantages that need to take intoconsideration.

The most impressed one for me is that the open cores are often less supportedthan the commercial products. So it depends on the capability of the developteam in a company to decide how fast the open cores can be adapted intonew products. Here the “less supported” may be represented in many aspectslike less documents, no telephone support etc. Most open cores are designedby engineers or fans individually for the purpose of interest. After a toughdesign work, only few of them can still have passion on working for serviceslike writing good documents. For example, in our project we used an opencore named OpenRISC. We followed a long time to do exactly as one of itsdocuments said but there were always problems. Finally it turned out thatthe date of that document was written at the year 2001 while the latestrevision of the source codes was at 2006. Except for the documents, whenhaving any questions regarding to the open cores, there is no way to findquick technical supports. Although there may be a forum on the Internetwhich you can post a message for discussing, most likely the feedbacks arenot guaranteed to come back in an expected time.

Another disadvantage is that the open cores are often less verified thanthe commercial ones. Because verifying hardware cores needs quite a lot ofcomplicated procedures and equipments, only powerful companies could dothese things well. They can even build a real chip to test the quality of thecores, but this seems too hard for individual developers.

Besides, please notice that using open cores could suffer high risks on lawor patent issues. These risks may come from:

1. The charge from other business competitors, big companies, and opensource opponents;

2. The open source licenses were initially designed to cover software.When coming to hardware, they may not be that reliable on protectingopen cores;

3. The explanation and execution of the law for protecting the licensesmay vary from place to place.

CHAPTER 2 25

To summary, the question “should a company utilize existing open cores andintegrate them into next products?” has no absolute answer. It depends onthe judgments of the smartest project managers. It is definitely not an easydecision for a company to use an open core which is not familiar before. Andit could be affected by the factors like budget, time, product volume, designteam, as well as many others.

However, there is one definite conclusion we can draw here, that every com-pany should keep an eye on the open cores. Some big companies have spe-cialized departments and engineers which continuously collecting the infor-mation of the IP cores and evaluating them, so that they could find the rightcores they need in a short time and utilize them to accelerate the develop-ment. If we constantly spend efforts on studying and testing open cores,maybe one of them will be exactly what we need next time.

2.7 The Future of Open Cores

As a part of the open core, it is interesting to discuss about the future. Inthis section I’d like to give my prediction.

Basically, it is for sure that the open cores will keep growing. First this isbecause people need to reuse cores to accelerate the system development.Second, due to the source codes of open cores are fully open, everyone couldget in touch with the open cores and work together to improve them. Soalthough there are some open cores may look not well enough today, theywill become better and better. If take the free software community whichis growing all the time as a reference, we can foresee that open cores willfollow the same track, because both of them are open source.

But at the same time open source is against the business rules, as discussedthis in previous section. So open cores will not be easily accepted by bigcompanies, nor will be welcomed by investments, because in most cases theinvestments only like things or industries which could make more moneyback. This means there will be no strong stimulus which forces open coresto grow in a rapid speed.

In a little bit detail, I guess those open cores that have simple structure,like UART, mouse controller or something similar that is easy to designand implement, will get a better change to become popular. They are easyto be developed by small teams or even individual engineers and are easyto achieve a good quality. As they are free and good enough, why not usethem to accelerate new system design? On the contrary, the complex coreslike processors will be held tightly by big companies for still a long time.

26 CHAPTER 2

This is because complicated cores have much more profits than simple ones.Suppose if all computers are using open core processors but not the productsfrom Intel or ARM, this will make them crazy.

So my prediction for the future of the open cores would be: the open coreswill grow, but not too fast. And simple cores will get a better change tobecome popular.

2.8 Conclusion

In this chapter we talked about open cores in a commercial perspective.

Firstly we talked about some basic concepts about the open cores. The opencores are the IP cores whose source codes are covered by open source licensesfor example the BSD license and the GNU licenses.

Then we introduced the BSD license, the GPL and the LGPL in detail, alsopointed out that there will be no problem to use the open cores that coveredby the BSD license or the LGPL in commercial products. But the GPL isnot suggested mainly because it will infect other parts of the system, andforce them to become open source as well.

After that we discussed a little about the free software philosophy, whichdepicts an attracting image that everyone shares knowledge. However, thisis against the business rules because it makes knowledge free to get andthus unprofitable. So this philosophy will not be appreciated by those bigcompanies which earned a lot of money on selling proprietary products.

Due to the same reason, it is not suggested for a company to develop a newopen core for sale, because it will not earn enough money back. Howeverthis is not absolute. There are also some good effects that selling open corescould bring.

Regarding to the question that should a company utilize existing open coresinto next commercial products or not, my answer is that, there is no definiteanswer. This is the responsibility of the smartest project managers, becauseusing open cores do have advantages but also take risks. So it is a practicalquestion that depends on different situations.

We also talked about the future of open cores. I guess open cores will keepgrowing yet not too fast, because there is currently no strong power (bigcompanies, huge investment) that pays enough attention on pushing theopen cores.

CHAPTER 2 27

Perhaps the only definite conclusion in this chapter is that everyone shouldkeep an eye on the open cores, because the next person who benefits fromthe open cores maybe you.

References:

[1] Michael Keating and Pierre Bricaud, Reuse Methodology Manual forSystem-on-a-Chip Designs, Kluwer Academic, 3rd Edition, Jun. 2002,ISBN: 1-4020-7141-8.

[2] R. K. Gupta and Y. Zorian, Introducing Core-Based System Design,IEEE Design & Test of Computers, 14(4): 15–25, Oct.–Dec. 1997.

[3] Webpage, Semiconductor Intellectual Property Core, from Wikipedia,http://en.wikipedia.org/wiki/IP core, Last visit: 2011.01.31.

[4] Website, Open Source Initiative (OSI),http://www.opensource.org/, Last visit: 2011.01.31,OSI is a non-profit corporation formed to educate about and advocatefor the benefits of open source and to build bridges among differentconstituencies in the open-source community.

[5] Webpage, The Open Source Definition, from OSI,http://www.opensource.org/docs/osd, Last visit: 2011.01.31.

[6] Webpage, Open Source Licenses, from OSI,http://www.opensource.org/licenses, Last visit: 2011.01.31,The licenses approved by the OSI comply with the open source defini-tion.

[7] Webpage, GNU, from Wikipedia,http://en.wikipedia.org/wiki/GNU, Last visit: 2011.01.31.

[8] Webpage, GNU Project, from Wikipedia,http://en.wikipedia.org/wiki/GNU Project, Last visit: 2011.01.31.

[9] Webpage, Richard Stallman, from Wikipedia,http://en.wikipedia.org/wiki/Richard Stallman,Last visit: 2011.01.31.

[10] Webpage, Free Software Foundation, from Wikipedia,http://en.wikipedia.org/wiki/Free Software Foundation,Last visit: 2011.01.31.

[11] Webpage, Free Software, from Wikipedia,http://en.wikipedia.org/wiki/Free software,Last visit: 2011.01.31.

http://en.wikipedia.org/wiki/IP_core

http://www.opensource.org/

http://www.opensource.org/docs/osd

http://www.opensource.org/licenses

http://en.wikipedia.org/wiki/GNU

http://en.wikipedia.org/wiki/GNU_Project

http://en.wikipedia.org/wiki/Richard_Stallman

http://en.wikipedia.org/wiki/Free_Software_Foundation

http://en.wikipedia.org/wiki/Free_software

28 CHAPTER 2

[12] Webpage, Free Software Movement, from Wikipedia,http://en.wikipedia.org/wiki/Free software movement,Last visit: 2011.01.31.

[13] Website, GNU Operating System,http://www.gnu.org/, Last visit: 2011.01.31.

[14] Website, Free Software Foundation (FSF),http://www.fsf.org/, Last visit: 2011.01.31.

[15] Webpage, Why “Open Source” Misses the Point of Free Software,by Richard Stallman, from GNU, http://www.gnu.org/philosophy/open-source-misses-the-point.html, Last visit: 2011.01.31.

[16] Webpage, The Free Software Definition, from GNU,http://www.gnu.org/philosophy/free-sw.html,Last visit: 2011.01.31.

[17] Webpage, BSD Licenses, from Wikipedia,http://en.wikipedia.org/wiki/BSD licence, Last visit: 2011.01.31.

[18] Webpage, The BSD License Problem, from GNU,http://www.gnu.org/philosophy/bsd.html, Last visit: 2011.01.31.

[19] Webpage, The Modified BSD License—An Overview, from OSS Watch,http://www.oss-watch.ac.uk/resources/modbsd.xml,Last visit: 2011.01.31.

[20] Webpage, The BSD License, from OSI,http://www.opensource.org/licenses/bsd-license.php,Last visit: 2011.01.31.

[21] Webpage, GNU General Public License, from GNU,http://www.gnu.org/copyleft/gpl.html, Last visit: 2011.01.31.

[22] Greg R. Vetter, “Infectious” Open Source Software: Spreading Incen-tives or Promoting Resistance?, Rutgers Law Journal, 36:53–162, 2004,SSRN: http://ssrn.com/abstract=585922, Last visit: 2011.01.31.

[23] Webpage, Microsoft’s Ballmer: “Linux is a cancer”, Jun. 1st, 2001,from Linux, http://www.linux.org/news/2001/06/01/0003.html,Last visit: 2011.01.31.

[24] Webpage, Why You Shouldn’t Use the Lesser GPL for Your Next Li-brary, from GNU,http://www.gnu.org/licenses/why-not-lgpl.html,Last visit: 2011.01.31.

http://en.wikipedia.org/wiki/Free_software_movement

http://www.gnu.org/

http://www.fsf.org/

http://www.gnu.org/philosophy/open-source-misses-the-point.html

http://www.gnu.org/philosophy/open-source-misses-the-point.html

http://www.gnu.org/philosophy/free-sw.html

http://en.wikipedia.org/wiki/BSD_licence

http://www.gnu.org/philosophy/bsd.html

http://www.oss-watch.ac.uk/resources/modbsd.xml

http://www.opensource.org/licenses/bsd-license.php

http://www.gnu.org/copyleft/gpl.html

http://ssrn.com/abstract=585922

http://www.linux.org/news/2001/06/01/0003.html

http://www.gnu.org/licenses/why-not-lgpl.html

CHAPTER 2 29

[25] Website, Gaisler Research,http://www.gaisler.com/, Last visit: 2011.01.31,Gailer Research is a privately owned company that provides IP coresand supporting development tools for embedded processors based onthe SPARC architecture.

[26] Webpage, Selling Free Software, from GNU,http://www.gnu.org/philosophy/selling.html,Last visit: 2011.01.31.

[27] Webpage, The GNU Manifesto, from GNU,http://www.gnu.org/gnu/manifesto.html, Last visit: 2011.01.31,This is an old and good article suggested to read, which well describesthe GNU’s philosophy. An interesting point is in the section “Won’tprogrammers starve?”, by saying “just not paid as much as now” itreflects that the GNU do realize that the free software movement willlower down the money people could earn.

http://www.gaisler.com/

http://www.gnu.org/philosophy/selling.html

http://www.gnu.org/gnu/manifesto.html


Chapter 3

Platform Overview

In the last chapter we discussed open source licenses and open cores ingeneral. From this chapter and the rest of the thesis, we will come back tothe engineering topics, and focus on the thesis project which implementeda computing platform with open cores.

Before going too much into technical details, it is always good to have anoverview to the system. This is the purpose to have this chapter. Chap-ter 3 tries to give the readers an overall impression to the platform. Andthen in the following chapters, more details will be covered regarding to theprocessor, the bus structure and the peripherals of the system.

For the open core computing platform, it can be divided into 4 layers asshowed in the figure below. From bottom to top, it has Hardware layer, Dig-ital/FPGA layer, Operating System layer, and Software/Application layer.We will follow this thread to describe the system in this chapter.

Figure 3.1: Platform overview

31

32 CHAPTER 3

3.1 DE2-70 Board

The thesis project is based on a DE2-70 FPGA board.

The DE2-70 board is produced by Terasic [1, 2]. It is equipped with anALTERA Cyclone II 2C70 FPGA, together with large volume RAM/ROMcomponents and plenty of peripherals including Audio devices and an Eth-ernet interface.

The DE2-70 board presents us a reliable and powerful hardware platform.With careful FPGA design to drive the hardware, the board can be imple-mented as different systems with multimedia, networking and many otherpossible features. The DE2-70 board is the foundation of the thesis project.

The layout of the DE2-70 board is showed in Figure 3.2 [3].

Figure 3.2: DE2-70 board

A hardware block diagram is also given in Figure 3.3 [3]. In Figure 3.3, thehardware resources used by the thesis project are marked with grey color.The 2M SRAM and 64M SDRAM store the program codes and data. The24-bit Audio CODEC plays the music. The 10/100 Ethernet PHY/MACconnects the board to a PC with TCP/UDP protocols. The RS-232 port isused for serial communication. Buttons and 7-SEG LEDs work as generalinputs/outputs. And of course, most of the designs are made inside the

CHAPTER 3 33

Cyclone II FPGA.

Figure 3.3: Hardware block diagram of the DE2-70

Because the DE2-70 board is a product of Terasic, we won’t spend too muchtext for the details of the board. For more information please refer to theDE2-70 user manual [3].

3.2 Digital System with Open Cores

A large amount of time of the thesis project was spent on designing a digitalsystem on the Cyclone II FPGA of the DE2-70 board, where we adaptedthe selected open cores to the FPGA and connected them together as acomputing platform. In this section, an overview is given for the open coresystem inside the FPGA.

3.2.1 System Block Diagram

If take a closer look to the digital system, it can be drawn as the blockdiagram showed below. The white ones are the digital blocks inside theFPGA. The blue ones are the hardware ICs on the DE2-70 board.

34 CHAPTER 3

Figure 3.4: Open core system block diagram

The OpenRISC OR1200 IP core is chosen to be the CPU of the computingplatform, because it is the most famous open source processor IP core. TheCPU clock is set to 50MHz in the system. More details of the OpenRISCOR1200 will come up in Chapter 4.

Because the OpenRISC uses WISHBONE as the bus protocol, naturally wehad to organize the system with the WISHBONE bus. The CONMAX IPcore is used to construct the WISHBONE infrastructure. It connects theCPU with the memory blocks and the peripherals. The WISHBONE busprotocol and the CONMAX IP core are discussed in detail in Chapter 5.

Various types of memories are utilized for the system. The ALTERA on-chip RAM IP core provides an easy way to access the FPGA on-chip RAM.But we had to write an interface to adapt it to the WISHBONE network.The ALTERA on-chip RAM is not an open core, but it is free to use onALTERA’s FPGAs. The Memory Controller IP core makes it possible forthe system to access the external SSRAM and SDRAM. It generates the

CHAPTER 3 35

signals to read/write those ICs.

The system is also extended with multiple peripherals. The UART16550 IPcore provides the serial communication between the system and the PCs. TheGPIO IP core helps to set the output signals to the LEDs and captures theinput signals from the buttons and switches. To utilize the external AudioCODEC WM8731 and the Ethernet PHY/MAC DM9000A, we designed 2interfaces to control those ICs.

The memory blocks and the peripherals are described in Chapter 6.

In total, 5 open cores are used in the system. All of them are available atopencores.org [4].

• OpenRISC OR1200 IP Core

• CONMAX IP Core

• Memory Controller IP Core

• UART16550 IP Core

• GPIO IP Core

My partner Lin Zuo and I have decided to open the designs we made to thepublic as well. So most parts of the FPGA project, i.e. the open cores andour designs, are open source. The FPGA project can be found in the projectarchive file at [5].

3.2.2 Summary of Addresses

From the perspective of the programmers, a hardware system is more orless a matter of a group of registers with different addresses. So it could behelpful to understand the system with a list of addresses.

A summary of the addresses of the system is given in Table 6.1 below.Because the OpenRISC is with 32-bit address bus, all addresses are of 32-bit length.

All addresses are statically allocated, and can be accessed directly by soft-ware. The values of the addresses are decided partly by the CONMAX IPcore, and partly by the design of the peripheral IP cores themselves.

36 CHAPTER 3

0x00000000 ∼0x00007FFC

32KB On-chip RAM

0x10000000 ∼0x101FFFFC

2MB SSRAM

0x18000000 ∼0x1800004C

1st Memory Controller Control Registers

0x20000000 ∼0x23FFFFFF

64MB SDRAM

0x28000000 ∼0x2800004C

2nd Memory Controller Control Registers

0xC0000000 DM9000A Index Register0xC0000004 DM9000A Data Register0xD0000000 WM8731 Control Register0xD0000010 WM8731 DAC Data Register0xE0000000 ∼0xE0000006

UART16550 Registers (8-bit)

0xF0000000 ∼0xF0000024

GPIO Registers

0xFF000000 ∼0xFF00003C

CONMAX Registers

Table 3.1: Summary of addresses

3.3 Software Development and Operating SystemLayer

The hardware system cannot work alone without running software. So wealso spent much effort to develop software programs for the platform. In thissection, firstly the software development workflow is introduced.

In complex systems, the software is often divided into multiple layers for abetter architecture. In our case there are 2 layers: the operating system layerand the application layer. The operating system layer contains an operatingsystem, hardware drivers, Application Programming Interface (API) func-tions etc. They are designed to control the hardware platform, and simplifythe application development. On top of the operating system layer, the ap-plication layer is mainly to implement a program with specific features, forexample a music player.

The operating system layer is described in this section, and the applicationlayer in the next section.

CHAPTER 3 37

3.3.1 Software Development

The software development was done on a PC with Windows XP SP2. Wedidn’t use Linux mainly for 2 reasons:

1. To design the FPGA project ALTERA’s Quartus is required. But wehad problems with Quartus Linux version.

2. We didn’t have adequate knowledge for Linux, so working with Win-dows was more productive.

To create Linux-like environment under Windows, we used Cygwin [6]. TheLinux environment is needed for the GNU toolchain.

The GNU toolchain is a collection of programming tools produced by theGNU project [7]. Natively the GNU toolchain doesn’t support the Open-RISC CPU, but because they are open source, the OpenRISC developersborrowed them for the OpenRISC processor.

We use C programming language to develop software. The modified GNUtoolchain for the OpenRISC helps to convert C source codes into executableOpenRISC instructions.

It is easy to install the OpenRISC toolchain under the Cygwin. Just needto download the tools and follow the installation instructions from the offi-cial webpage [8]. When doing the thesis, we used a very old version of theOpenRISC toolchain. The toolchain package was with the date 2003-04-13.Now the OpenRISC develop team has released much newer version1.

The OpenRISC toolchain contains a set of tools. Mostly we use the onesbelow [9–13]:

• GCC: It compiles the C source files into object files for the OpenRISC.

• GNU Binutils: It works as the linker, which links all object files,maps the absolute addresses based on the linker script, and producesthe executable target file.

• Or1ksim: The Or1ksim is a low level simulator for the OpenRISC1000 architecture. Based on the executable file it can simulate thebehavior of the OpenRISC processor. For example, it can execute in-struction by instruction and display the contents of the CPU registers.

1The old toolchain is no longer available from the official website, but we included acopy in the thesis project archive.

38 CHAPTER 3

This is a good way to study how the CPU works and how the systemhandles the stack and function calls. The Or1ksim can also work asa debug target, which can be connected to the GDB. There we canperform C source code level debugging.

• GDB: GDB is the GNU Project Debugger. It is used together withthe Or1ksim to debug the C source codes.

• Makefile: The GNU Make is a utility which automatically compilesthe source files and builds them as an executable target. It saves usfrom typing the GCC commands line by line.

With the tools, the software development workflow can be described withthe figure below.

Figure 3.5: Software development workflow

Firstly the source files are compiled into object files by the GCC. Thenthe object files are linked together by the GNU Binutils as the executabletarget file. The linker works following the linker script configurations. Theexecutable files are used in multiple ways. It can be translated into Intel-HEX format by the objcopy. Also it can be converted back to the assemblycodes by the objdump. Both objcopy and objdump are parts of the GNUBinutils. All the steps above can be done by the Makefile with 1 commandin Cygwin.

With the generated executable file, the next steps should download it to theDE2-70 board, run and debug the software program. However, they werenot easy. Because we didn’t have a JTAG connection between the PC and

CHAPTER 3 39

the OpenRISC CPU, it was not possible to send data to the target CPUdirectly1 2.

We made workarounds to achieve the downloading and debugging. For down-loading the programs to the DE2-70 board, 2 methods were used:

1. With a MIF file linked to the ALTERA on-chip RAM IP core

2. With a bootloader and the serial communication

In the FPGA project, it is possible to link a MIF-format file to the ALTERAon-chip RAM IP core. In this way the data stored inside the on-chip RAMwill be initialized when the FPGA is programmed, and the CPU can alreadyaccess the on-chip RAM and execute the programs from there. We designeda software tool “ihex2mif”. After the or32 executable files are translatedinto the Intel-HEX files, the ihex2mif can further convert them to the MIFformat. There are 2 limitations of this downloading method: (1) the FPGAproject has to be updated every time when the MIF file is updated; (2) thesoftware program must be small enough to fit into the on-chip RAM, whichis 32KB in our project.

For larger programs, we designed a bootloader to download them via theserial connection. The bootloader is a small program that can be stored inthe on-chip RAM. It is downloaded to the DE2-70 board through the firstdownloading method described above, so when the system starts the CPUexecutes the bootloader from the on-chip RAM. The bootloader initializesthe hardware, and then listens to the UART port to receive the programdata from the PC. On the PC side, a software tool was designed to interpretthe Intel-HEX files. It reads out the addresses and the data from the HEXfiles and sends them over the serial connection. A USB-to-RS232 cable wasused to connect the PC and the DE2-70 board. When the bootloader receivesdata, it stores them to the correct addresses. When all data of a programhave been received, the bootloader issues a command to the OpenRISC,which jumps to a specific address and executes the downloaded programfrom there.

1To setup a JTAG connection for downloading and debugging purposes, it needs (1)to activate the OpenRISC CPU debug interface in the FPGA project, (2) a JTAG cable,and more importantly (3) software supports on the PC side. Normally for a commercialprocessor they are provided by the vendor. But for the OpenRISC on the DE2-70, wedidn’t have those when doing the thesis.

2By the way, now it is possible to buy an OpenRISC development board plus a JTAGdebugger from ORSoC [14].

40 CHAPTER 3

In the thesis project, the 32KB on-chip RAM is dedicated for the bootloader,so the FPGA project doesn’t have to be updated all the time. The 2MBSSRAM is used to store the program received by the bootloader.

Both the ihex2mif and the bootloader can be found in the project archivefile [5].

Without a direct JTAG connection, for debugging the programs there are 3workarounds:

1. With the serial communication

2. With the Or1ksim simulator and the GDB

3. With FPGA verification tools

The easiest way of debugging the software is to print characters through theserial connection to the PC. When certain information has displayed on thescreen, it is for sure that the program has run to a specific location amongthe source codes.

Another workaround is to simulate the programs with the Or1ksim, whichshows the values of the CPU registers after the program is paused. Alsothe Or1ksim can be connected to the GDB, where we can simulate from ahigher source code level. For example, a breakpoint can be placed to observethe content of a variable when the program has run to there. The Or1ksimcan even simulate the timer interrupts, which we used to debug the RTOScontext switch. The limitations of the method are (1) it is only off-linesimulation, and (2) it cannot simulate most DE2-70 hardware peripherals.

It is also possible to borrow the FPGA verification tools for debugging thesoftware, mainly to place probes and observe the signals on the CPU buses.This can be done either with off-line simulation, e.g. the Quartus built-inwaveform simulator or the ModelSim, or with in-system debugging, e.g. theQuartus SignalTap. The FPGA tools inspect the waveforms on the CPUbuses. With the waveforms we can further analyze which instruction theCPU is executing or what address the CPU is accessing. This method ismostly used for examining the software/hardware combined issues. And itis difficult sometimes to setup the trigger conditions for the signal sampling.

Above all, the software development workflow is introduced.

CHAPTER 3 41

3.3.2 Operating System and uC/OS-II RTOS

An Operating System (OS) is an important program runs on a CPU, whichmanages hardware resources and provides common services for efficient ex-ecution of various software applications [15]. With an operating system, theperformance of a computing platform will be greatly improved.

Most existing OSs can benefit a platform from the aspects listed below:

• Support multitasking and schedule the tasks;

• Manage hardware resources, e.g. provide drivers for popular hardwaredevices;

• Simplify application development by API library and service functions;

• Standardize software development;

• Include useful midware packages, like TCP/IP stack, command lineconsole, file systems etc.

Because the advantages of the operating system, when planning for thethesis, it was determined that we were going to port an existing OS to thecomputing platform.

Discussions were made about which operating system to use. Our supervisorJohan Jorgensen proposed Linux. But because the limited knowledge toLinux and the uncertainty about how fast the hardware platform can becreated, we were not sure at the beginning if the time would be enough ornot to port Linux. So finally we decided to start with uC/OS-II.

The uC/OS-II is a famous Real-Time Operating System (RTOS) from theMicrium [16]. It has the following advantages for us:

• The source codes of the uC/OS-II are available, and can be used foracademic purposes without requiring a license [17].

• It is simpler comparing to Linux. And there are sufficient materials,books, online resources to help understanding how it works.

• We had lectures in school with the uC/OS-II, and already had pro-gramming experience with it.

• There were people who successfully ported the uC/OS-II to the Open-RISC CPU before, whose work can be taken as references.

• The uC/OS-II is a RTOS. It can better serve applications with real-time constraints. This suits our needs because the computing platformof the thesis project is mainly aiming for embedded applications.

42 CHAPTER 3

In Chapter 4 where the OpenRISC CPU is discussed, a section is reservedfor the details of porting the uC/OS-II RTOS to the OpenRISC.

3.3.3 uC/TCP-IP Protocol Stack

Many possibilities can be extended if a platform provides the networkingfeature. On the DE2-70 board there is a DM9000A Ethernet interface. So itbecame an interesting topic to add the Ethernet support in our system.Tobe able to communicate over an Ethernet connection, the TCP/IP protocolsare necessary. A software implementation of the TCP/IP protocols is calleda TCP/IP stack. To design a TCP/IP stack from scratch is a huge work andimpossible for us to finish, but luckily there are existing ones.

We decided to use the uC/TCP-IP protocol stack for 2 obvious reasons:

1. The uC/TCP-IP is also a product of the Micrium [16]. It is designedto work together with the uC/OS-II RTOS.

2. The uC/TCP-IP provides the hardware driver for the DM9000A.

Apparently the uC/TCP-IP is the easiest option, but still some work hadto be done to make it work with the OpenRISC CPU. My partner Lin Zuowas responsible for the uC/TCP-IC and the DM9000A. For more details,please refer to his thesis [18].

3.3.4 Hardware Abstraction Layer (HAL) Library

Another attempt we made for the thesis was trying to build up a library,which intends to collect the functions that controlling the hardware. Thefunctions hide the details of operating the hardware. In this way, at a higherlevel the programmers can develop software applications without learninghow the hardware system works exactly. The library of the functions is alsoreferred as the Hardware Abstraction Layer (HAL).

Because of the limited time, we could start only a very primary step forthe HAL. Some basic functions were designed to control the OpenRISCProgrammable Interrupt Controller (PIC), the Tick Timer (TT), and theUART16550 IP core.

The HAL functions can be found in the thesis archive file [5].

CHAPTER 3 43

3.4 Demo Application: A MP3 Music Player

In the previous sections, the hardware layer, the digital/FPGA layer andthe operating system layer of the computing platform are introduced. Basedon those, user applications can be easily developed.

We decided to implement a MP3 music player as a demo application because:

1. It is very popular. Almost everyone plays MP3 files.

2. It demonstrates most features of the platform, including the AudioCODEC and the Ethernet.

The MP3 player was designed in 2 parts: a music player running on theDE2-70 board, and a client program running on the PC.

On the PC side, the selected MP3 file is firstly converted into WAV formatby the client program. Then the client program sends the data of the WAVfile to the DE2-70 board using UDP packages via the Ethernet connection.After all music data transferred, the client program issues a PLAY commandto the music player. The client program can also send other commands, e.g.to control the volume.

The client program uses libmad to decode the MP3 files. Libmad [19] isa high-quality MPEG audio decoder program which is capable of 24-bitoutput. It is free software under the GPL. With the libmad, it saved timefor us from implementing MP3 decoding algorithm1.

On the DE2-70 side, after the hardware is initialized, the music player keepschecking the uC/TCP-IP stack and the UART port. When a new UDPpackage is received, the music data will be buffered in the external 64MBSDRAM. When a PLAY command arrives through the serial connection, themusic player copies the music data from the SDRAM and forwards them tothe Audio CODEC. The Audio CODEC converts the music data to analogsignals, which can be further amplified by a speaker etc.

The demo application is included in the thesis archive file [5]. It can bereproduced on any DE2-70 board. In Appendix B, detailed step-by-stepinstructions are given for the interested readers who want to try out thedemo MP3 player.

1At the beginning, the libmad was planned to be integrated into the music player onthe DE2-70 board. If so, we can send less data over the Ethernet. But because the supportsof some common C library functions, like malloc(), for the OpenRISC were missing, thelibmad had to be moved to the PC side.

44 CHAPTER 3

3.5 Summary

Above all, we have introduced the computing platform from 4 different lay-ers. To make a summary for this chapter as well as for the computing plat-form, a feature list is given below:

• General purpose and multi-functional embedded platform

• Low cost by using open cores and open source software

• Most source codes and design details are free

– except for the ALTERA’s built-in IPs like RAM, FIFO, PLL

– except for uC/OS-II and uC/TCP-IP

• Based on Terasic’s DE2-70 board and ALTERA’s Cyclone II FPGA

• OpenRISC OR1200 processor (50MHz, no cache, no MMU)

• WISHBONE bus standard implemented with CONMAX IP core

• Memory Controller IP core (for 2MB SSRAM and 64MB SDRAM)

• RS232 by UART16550 IP core

• Buttons, LEDs, 7-segments by GPIO IP core

• WISHBONE interface for WM8731 audio CODEC (DAC only)

• WISHBONE interface for DM9000A Ethernet controller

• Porting uC/OS-II to OpenRISC processor

• Porting uC/TCP-IP to OpenRISC processor(achieved 3KB speed for a stable connection)

• LibMAD (running on PC) is used to convert MP3 to WAV format

• Bootloader that download software binary files via RS-232

• ihex2mif that convert ihex format to ALTERA’s mif format

CHAPTER 3 45

References:



[3] User Manual, DE2-70 Development and Education Board User Manual,Terasic Technologies, Version 1.03,http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=53&No=226&PartNo=4,Last visit: 2011.01.31.

[4] Website, OpenCores.org, http://www.opencores.org/,Last visit: 2011.01.31.


[6] Website, Cygwin, http://www.cygwin.com/, Last visit: 2011.01.31.

[7] Webpage, GNU Toolchain, from Wikipedia,http://en.wikipedia.org/wiki/GNU toolchain,Last visit: 2011.01.31.

[8] Webpage, GNU Toolchain for OpenRISC,http://opencores.org/openrisc,gnu toolchain,Last visit: 2011.01.31.

[9] Website, GCC: the GNU Compiler Collection, http://gcc.gnu.org/,Last visit: 2011.01.31.

[10] Website, GNU Binutils, http://www.gnu.org/software/binutils/,Last visit: 2011.01.31.

[11] Webpage, OpenRISC 1000: Architectural simulator,http://opencores.org/openrisc,or1ksim,Last visit: 2011.01.31.

[12] Website, GDB: The GNU Project Debugger,http://www.gnu.org/software/gdb/, Last visit: 2011.01.31.

[13] Website, GNU Make, http://www.gnu.org/software/make/,Last visit: 2011.01.31.




http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=53&No=226&PartNo=4

http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=53&No=226&PartNo=4



http://www.cygwin.com/

http://en.wikipedia.org/wiki/GNU_toolchain

http://opencores.org/openrisc,gnu_toolchain

http://gcc.gnu.org/

http://www.gnu.org/software/binutils/

http://opencores.org/openrisc,or1ksim

http://www.gnu.org/software/gdb/

http://www.gnu.org/software/make/

46 CHAPTER 3

[14] Website, ORSoC, http://www.orsoc.se/, Last visit: 2011.01.31.

[15] Webpage, Operating System, from Wikipedia,http://en.wikipedia.org/wiki/Operating system,Last visit: 2011.01.31.

[16] Website, Micrium, http://www.micrium.com/, Last visit: 2011.01.31.

[17] Webpage, uC/OS-II License, from Micrium,http://micrium.com/page/downloads/os-ii evaluationdownload, Last visit: 2011.01.31.


[19] Website, underbit technologies,http://www.underbit.com/products/mad/, Last visit: 2011.01.31,This is where to find the information of the LibMAD.

http://www.orsoc.se/

http://en.wikipedia.org/wiki/Operating_system

http://www.micrium.com/

http://micrium.com/page/downloads/os-ii_evaluation_download

http://micrium.com/page/downloads/os-ii_evaluation_download

http://www.underbit.com/products/mad/

Chapter 4

OpenRISC 1200 Processor

4.1 Introduction

OpenRISC 1200 is an open source 32-bit processor IP core. It is very famousand has widely used in many industrial and academic projects.

Since the thesis was started till today, the OpenRISC project is alwayson the top of the “Most popular projects” list of the opencores.org [1]. Ifconsidering the opencores.org is the 1st well known open core website, wecan easily conclude that the OpenRISC processor is one of the most popularopen core processors of the world.

The information of the OpenRISC can be found at its official webpage [2],from where the HDL source codes and the documents are free to downloadafter a registration. There is also a forum on the website for the OpenRISCrelated discussions. This is the major way to get technical supports for theOpenRISC. Otherwise it is also possible to consult commercial companies.

There are several confusing terms regarding to the OpenRISC. In fact, the“OpenRISC” is a project, that aiming to create a free, open source comput-ing platform available under the GNU (L)GPL licenses [2]. The “OpenRISC1000” or shortly “OR1K” is the name of a CPU architecture. Base on theOR1K architecture, it can have many different derivatives. The “OpenRISC1200” or shortly “OR1200” is a 32-bit processor IP core implemented fol-lowing the OR1K architecture. So when talking about a processor, OR1200should be used as the exact name. But because currently the OR1200 isthe only active implementation of the OR1K family, and the OR1K is theonly architecture under the OpenRISC project, sometimes the name Open-RISC is misused with the OR1200 to both refer to the processor. There is

47

48 CHAPTER 4

another term called “OpenRISC Reference Platform (ORP)”. It stands forthe computing platforms with the OR1200 centered as the processor. Ourthesis project was actually working out an ORP.

The OpenRISC has a long history since the project was created in Septem-ber 2001 [2]. It was initiated by Damjan Lampret, who also founded theopencores.org a little earlier in October 1999 [3]. It is not hard to imaginethat promoting and supporting the OpenRISC was a big reason to give birthto the opencores.org. As the OpenRISC is an open source project, later onmany other people were involved to give their contributions. These namesare listed in the Past Contributors and Project Maintainers [2]. From 2005,Damjan started his own company and gradually became not so active in theOpenRISC community. In November 2007, a Swedish company ORSoC [4]took over the maintenance of the opencores.org as well as the OpenRISCproject until today. It is good to have a strong power from a commercialcompany to push the project. In a posted message, Marcus Erlandsson—theCTO of the ORSoC—mentioned: “Our mission/goal is to make the Open-RISC a worldclass ‘open source’ 32-bit RISC processor aimed both for com-mercial companies as well as for non-commercial products.” [5] So it looksthe OpenRISC will have a promising future.

There are several companies which providing OpenRISC related products orservices. Beyond Semiconductor [6], the company started by Damjan Lam-pret, provides enhanced commercial versions of the OpenRISC which arerenamed as the BA processor family. ORSoC [4] has developed OpenRISCdevelopment kit which includes a FPGA (CPU) board, an I/O board anda debugger etc1. Because the ORSoC is currently the maintainer of theopencores.org, it is possible that the kit will be promoted as the standardOpenRISC development platform. Another company called Embecosm [7]mainly focuses on software development. They have worked a lot on portingthe GNU tool chain and the OpenRISC simulator. And there is a Koreancompany Dynalith Systems [8], which has made a complete SW/HW envi-ronment called OpenIDEA2.

About the OpenRISC documents and tutorials, honestly only few are usefulfor the beginners. In hardware, the OpenRISC 1000 Architecture Manual [9]is always the standard reference. Also the recently updated the OpenRISC1200 IP Core Specification [10] and the OpenRISC 1200 Supplementary

1The price for the FPGA board plus a debugger but exclude the I/O board cost meabout 260 EUR. Strangely the ORSoC chose an Actel FPGA for the board, but not usingproducts from Xilinx or ALTERA.

2OpenIDEA looks very attractive from the Dynalith’s website, because it is highlyintegrated and might be the easiest way to start implementing some real OpenRISC basedprojects. But the price was about 800 USD when we asked. That was too high for a thesisproject I think.

CHAPTER 4 49

Programmer’s Reference Manual [11] could be useful1. In software, peoplecan just follow the instructions on the OpenRISC official toolchain webpage[12] to get the OpenRISC compiler and debugging toolset. When trying tobuild the toolchain manually, an application note [13] from Jeremy Bennettis recommended as a reference.

So far, we had introduced the OpenRISC processor from many aspects.In the following sections, the OR1200 architecture will be firstly describedand then several topics related to the OR1200, including the registers, theexceptions, the Tick Timer (TT) and the Programmable Interrupt Controller(PIC). After that, a section is reserved for the details of porting the uC/OS-II Real-Time Operating System (RTOS) to the OR1200.

4.2 OR1200 Features and Architecture

To give an overview of the CPU performance, some of the OR1200 features[10] are listed below:

• 32-bit RISC processor

• Harvard architecture

• 5-stage pipeline

• Cache and MMU supported

• WISHBONE bus interface

• 300 Dhrystone 2.1 MIPS at 300MHz using 0.18u process

• Target medium and high performance networking and embeddedapplications

• Competitors include ARM10, ARC and Tensilica RISC processors

The OR1200 has a clear architecture, showed in Figure 4.1. It has a CPU/DSPcore at the center and includes data/instruction caches, data/instructionMemory Management Units (MMUs), a timer, an interrupt controller, adebug unit and a power management unit.

1These 2 documents were updated in late 2010. We didn’t have them when doing thethesis.

50 CHAPTER 4

Figure 4.1: OR1200 architecture

The OR1200 is with Harvard architecture, i.e. it has separated instructionbus and data bus. To maximize the CPU performance by utilizing the Har-vard architecture, it is recommended to use 2 physically isolated memories tostore the instructions and the data. This is showed in Figure 4.2(a). Unfor-tunately in the thesis project we used a non-optimized structure like Figure4.2(b), which limited the CPU performance. It was too late to change thedesign when we realized it.

(a) Instruction/Data accesses in parallel

(b) Instruction/Data accesses share the same RAM port

Figure 4.2: Improve performance with Harvard architecture

CHAPTER 4 51

In the system of Figure 4.2(a), the CPU can access the instructions andthe data stored in 2 memory devices in parallel. But in Figure 4.2(b), theinstruction port and the data port of the OR1200 have to share the onlymemory device, i.e. only 1 of them is allowed to access the memory at a time.Furthermore, due to the CPU usually needs to access the instructions andthe data in an alternate way, the WISHBONE network, i.e. the CONMAXIP core, has to arbitrate for the ports to decide who can take the grantof the WISHBONE bus. This is another negative factor that influences thesystem performance. So the CPU performance is largely limited in Figure4.2(b) than (a).

The OR1200 uses the WISHBONE as the bus standard for both the instruc-tion and the data buses. The WISHBONE bus as well as the CONMAX IPcore will be discussed in Chapter 5.

As mentioned before, all OR1200 HDL source files can be downloaded fromthe opencores.org website [2]. In most cases there is no need for the users tomodify the files, except for the “or1200 defines.v”. In the or1200 defines.v,the users can easily make configurations for the OR1200 implementation,for example to enable or disable functional blocks, or to select target FPGAmemory types etc. It is needed to go through this file before compiling theOR1200 FPGA project.

Of all 8 functional blocks showed in Figure 4.1, we disabled the instruc-tion/data caches and MMUs in the thesis project. The debug unit and thepower management unit were implemented but not used1. Only the ticktimer and the PIC were tested in hardware, and we understand completelyhow they work. So in the later sections some texts will be spent on the TTand the PIC, but before that the OR1200 registers and the exceptions willbe introduced first.

4.3 OR1200 Registers

There are 2 types of registers in the OR1200, the General Purpose Registers(GPRs) and the Special Purpose Registers (SPRs).

The OR1200 has 32 GPRs. All of them are 32-bit width. These registers canbe accessed directly by the software with the name r0 to r31 in assemblycodes. For example, the following assembly code adds 128 to the data storedin the register r1, and then use the value as the target address to save the

1This is because we wanted to have an easy start by simplifying the system as muchas possible. Later on when we would like to enable the caches and the MMUs, sadly thetime was not enough anymore.

52 CHAPTER 4

value of the register r9. Because the r1 is usually used as a stack pointer,this line actually pushes r9 into the stack with an offset of 128.

l.sw 128(r1), r91

When writing with higher level languages like C or C++, the compiler willmanage the usages of the GPRs. In this case the GPRs are transparent tothe programmers.

A funny fact is that, although the registers are called “general purpose”they are assigned with special roles by the compiler. It happens not only inOR1200 but many other CPUs as well.

Table 4.1 is partly copied from the OpenRISC 1000 Architectural Manual[9] page 334. It lists the usages of some GPRs. The value of r0 is alwaysfixed to 0. r1 and r2 are the stack pointer (SP) and the frame pointer (FP)which point to the top and bottom of the stack. r3 to r8 are used to passthe parameters during the function calls. If a function has more than 6parameters, the extra parameters have to be stored in the stack. r9 is thereturn address and r11 stores the returned data.

Table 4.1: OR1200 GPRs (part)

All OR1200 SPRs are listed in Section 4.3 of the Architectural Manual [9].All of them use 16-bit addresses in the format showed in Figure 4.3. The bit15–11 are the group index, and the bit 10–0 are the register address.

1OpenRISC 1000 instructions are described in Chapter 5 of the OpenRISC 1000Architectural Manual [9].

CHAPTER 4 53

Figure 4.3: 16-bit SPR address format

However, it is not possible to access the SPRs directly with 16-bit addressesin the OR1200. They must be accessed with 2 instructions “l.mtspr” and“l.mfspr”, and also with the help of the GPRs.

Instruction l.mtspr means to move a value to a SPR. It has the followingformat in assembly code:

l.mtspr rA, rB, K

This instruction uses the value stored in the GRP rA to perform a logicalOR with the constant K. The result is the target address of the SPR. Thenit copies the data stored in the GRP rB to the SPR with the calculatedaddress.

For example, in the instruction l.mtspr r0, r9, 32, firstly 32 is ORedwith the value in r0 which is always 0. The result is 32 and it is the targetaddress of the SPR. Compare 32 with the address format showed in Figure4.3: the group index is 0, and the register address is 32. This is the registerEPCR0 [9]. The instruction actually copies the value stored in r9 to the SPREPCR0.

Similarly instruction l.mfspr rD, rA, K reads a value from a SPR. It cal-culates the logical OR with the data stored in the GPR rA and the constantK. The result defines the target address of the SPR to be read. The valueread from the SPR will be stored in the GPR rD.

4.4 Interrupt Vectors

The OR1200 provides a vector based interrupt system, which reserves arange of specific memory spaces. Once an exception happens, the OR1200CPU stops normal operations, automatically branches to the certain ad-dresses and fetches instructions from there to handle the exception. Theaddress that the CPU jumps to depends on the type of the exception.

54 CHAPTER 4

Table 4.2 is copied for the Architectural Manual [9] Section 6.2. It gives theentry addresses of the supported exceptions.

Table 4.2: Exception types and causal conditions

The address 0x100 is the OR1200 starting address. Every time when thepower is up, or the reset button is pressed, the OR1200 will jump to theaddress 0x100 and executes from there. So the startup program or at leasta proper jump instruction has to be placed at the address 0x100.

If there are errors on the WISHBONE bus, the CPU will jump to the address0x200, for example reading data from a non-existing physical address.

For the errors of the memory alignment, the CPU goes to the address 0x600.In Chapter 6, we will discuss the memory alignment in the OR1200.

CHAPTER 4 55

The addresses 0x500 and 0x800 are for the tick timer interrupt and theexternal interrupts triggered via the PIC.

The 0xC00 is the system call. It is only initiated when the CPU executes theinstruction “l.sys”. This instruction manually throws an exception and forcesthe CPU to branch to the address 0xC00. The system call is especially usefulfor the operating systems when they need to do context switches. We willcome back later in the section where porting the uC/OS-II to the OR1200is discussed.

To serve different types of the exceptions, the programmers need to writethe Interrupt Service Routines (ISRs). More importantly, the ISRs have tobe placed exactly at the correct entry addresses. Locating a piece of programto the specified physical address shall be done by the linker.

Also note that, the addresses from 0x0 to 0x2000 are all reserved for theinterrupt vector table. Although the addresses from 0xF00 to 0x2000 arenot defined yet, for the compatible reason it is suggested to link the userprograms starting from the address 0x2000.

For the people who programming for the OR1200 but without a debugger,it might be helpful to place several simple instructions at each exceptionentries, for example to turn on a LED. In this way it is easier to check if anexception has triggered or not when a program has lost response. We learntthis experience from the thesis project.

4.5 Tick Timer (TT)

The OR1200 has a built-in Tick Timer (TT) unit, which is used to countthe system clock pulses and therefore to have the time information if theclock frequency is given.

The TT is useful for many purposes. It can generate fixed time interrupts,which provides system ticks for the Real-Time Operating Systems (RTOS).For example in the thesis project, we used TT interrupts for the uC/OS-IIRTOS to schedule tasks every 0.01s. The TT can be also used to evaluatethe software execution time. Start the timer before a function call and stopit afterwards gives the running time of the function. My partner Lin tookthis way to compare the performance between the OR1200 based platformand the ALTERA NIOS II based platform. The details are documented inhis thesis Chapter 8 [14].

56 CHAPTER 4

The structure of the tick timer is simple, as presented in Figure 4.4.

Figure 4.4: Tick timer structure and registers

There are 2 SPRs controlling the tick timer, the TTMR and the TTCR.The TTMR sets up the timer operations. And the TTCR holds the countedvalue. The value stored in the TTCR is always added by 1 on every inputclock rising edge, provided the timer is enabled.

Bit [27:0] of the TTMR contains a user defined value, which is continuouslycompared to the bit [27:0] of the TTCR. If a match happens, the TTCRcan restart counting from 0 again, or keep counting regardless the match, orstop. The behavior of the TTCR depends on the timer working mode. Bit31 and 30 of the TTMR selects 3 different timer working modes. If both 2bits are 0, the tick timer is disabled.

Bit 29 of the TTMR enables the timer interrupt. If so, on every match be-tween the TTMR and the TTCR an interrupt is generated, and the interruptpending bit (bit 28) is set to 1. The users need to manually clear this bit inthe timer ISR.

4.6 Programmable Interrupt Controller (PIC)

The OR1200 supports maximum 32 external interrupt inputs. Those in-put signals come from other hardware modules. For example in the thesisproject we have 3 interrupts from the UART16550, GPIO and DM9000A IP

CHAPTER 4 57

cores. It is possible to configure the number of the supported interrupts in“or1200 defines.v”.

The Programmable Interrupt Controller (PIC) block of the OR1200 managesall external interrupts. The PIC structure is showed in Figure 4.5.

Figure 4.5: PIC structure

The OR1200 PIC has 2 registers: a mask register PICMR and a statusregister PICSR.

The PICMR enables/disables the external interrupts. The interrupt inputsignals are firstly logically ANDed with the value stored in the PICMR. Onlyfor the unmasked interrupts, they can set the flags in the PICSR. Note thatthe bit 0 and 1 in the PICMR are fixed to 1, so the INT0 and INT1 arealways enabled.

All 32 flags in the PICSR are logically ORed together. If the result is not0, the interrupt is triggered in the OR1200, which stops the normal CPUoperations and jumps to the interrupt vector 0x800 to execute the ISR pro-gram.

All external interrupts to the PIC are with the same priorities. This impliesthe interrupt nesting cannot happen in the OR1200. If more than 1 interruptis pending at the same time, the programmers must check the flags in thePICSR and decide the sequence to handle the interrupts.

There is no need to clear the flags in the PICSR after the interrupts areserved. However the interrupts must be cleared from the source nodes. Forexample, if the GPIO IP core triggers an interrupt, the ISR must read

58 CHAPTER 4

the GPIO registers to clear the interrupt signal from there. The PIC keepssampling all external inputs and refreshes the PICSR flags automatically.So when the GPIO IP core pulls down the interrupt line, the flag is clearedat the same time in the PICSR.

4.7 Porting uC/OS-II to OR1200

4.7.1 Introduction

In this section we will discuss how to port the uC/OS-II RTOS to theOR1200 processor. “Port” means to modify the uC/OS-II so that it canwork on the OR1200 based hardware platform.

Talking about the uC/OS-II, probably everyone knows the famous bookMicroC/OS-II: The Real-Time Kernel [15]. It is written by Jean J. Labrosse,the creator of the uC/OS. Chapter 13 of the book generally describes portingthe uC/OS-II to different types of processors. It is a very important referenceto us.

The uC/OS-II RTOS has good portability because most of the source codesare written in ANSI C. When porting it to the OR1200, only several files needto be adapted. They are the “OS CPU.h” and the “OS CPU C.c”. In ourthesis project archive [16] these files can be found under /software/uCOS-II/Port.

4.7.2 uC/OS-II Context Switching

The spirit of the porting is to make the OR1200 processor support thecontext switching for the uC/OS-II. The context switching is to switch aCPU from one process to another by storing and restoring the process andthe CPU related states. This is the essential of the multitasking. The uC/OS-II has many features, like semaphore, mutex etc. Most of the features theuC/OS-II can manage itself. So they don’t have to be adapted for differentCPUs. Only for the multitasking feature the uC/OS-II cannot do it alonewithout knowing the hardware details, because how to perform a contextswitch is tightly coupled with the CPU architecture.

In uC/OS-II, there are 2 ways to initiate a context switch, either activelytriggered by a task, or passively managed by the uC/OS-II RTOS in thetimer ISR.

CHAPTER 4 59

When a task has finished its job, it should give the CPU resources away. Sothe other tasks could get the chance to use the CPU. In this case, the taskcan call functions like OSTimeDly() or OSTaskSuspend() to suspend itself.These functions invoke a uC/OS-II internal function OS Sched() to make atask scheduling and find out the next task with the highest priority. Afterthat, the OS Sched() calls OS TASK SW() to perform a context switch. TheOS TASK SW() is a part of the porting and should be defined based on theCPU type. For the OR1200, the OS TASK SW() uses an instruction “l.sys”to make a system call, which manually generates an interrupt. The contextswitching is done in the ISR. For more information about the “l.sys”, seealso Section 4.4.

The uC/OS-II requires a timer of the CPU to provide interrupts (also calledticks) with fixed time interval. When a timer interrupt comes, the uC/OS-IIbreaks the running task and checks whether or not another higher prioritytask becomes ready. If yes, the uC/OS-II performs a context switch in thetimer ISR. So it is always the task which is in the running or ready stateand has the highest priority that gets the CPU after any timer ISR. This iswhy the uC/OS-II is called a preemptive kernel.

4.7.3 Context Switching in OR1200 Timer Interrupt

No matter how a context switch is initiated, it is always done in a similar wayat the ISR level. Here we will only describe the context switching happenedin the OR1200 timer ISR.

Generally, the context switching is done in the following steps:

1. The OR1200 jumps to 0x500 when a timer interrupt comes

2. Backup the context of the current task, including 3 OR1200 SPRs(EPCR, EEAR, ESR) and all GPRs

3. Store the SP of the current task

4. Call function OSTickISR()

5. If a higher priority task is ready, call OSIntCtxSw() to perform acontext switch; Otherwise resume the context of the current task

As mentioned in Section 4.4, when a timer interrupt comes the OR1200CPU jumps to the interrupt vector address 0x500 and executes the interrupthandler program from there. This part of the source codes can be found in/software/Application/Board/reset.S of the thesis project archive.

60 CHAPTER 4

In the timer interrupt, the first thing we do is always to backup the contextof the current task. If another higher priority task gets ready, its context willbe loaded. In this case the context switching is really performed. Otherwise,the context of the current task will be resumed, as if the timer interrupt isnever happened.

The OR1200 context includes 3 SPRs and all GPRs. In a multitasking sys-tem, these registers are needed for a task. The 3 SPRs, EPCR, EEAR andESR, are updated by the OR1200 when an interrupt is triggered, which storethe program counter (PC), the effective address and the CPU status.

The following assembly codes demonstrate how to save a context. All regis-ters are pushed into the stack of the current task, by using the stack pointer(SP) GPR r1 as the base address plus different offsets.

l.addi r1,r1,-140 // get enough space for a contextl.sw 36(r1),r9 // store r9 before using it temporarily

l.mfspr r9,r0,32 // copy EPCR to r9l.sw 128(r1),r9 // save EPCRl.mfspr r9,r0,48 // copy EEAR to r9l.sw 132(r1),r9 // save EEARl.mfspr r9,r0,64 // copy ESR to r9l.sw 136(r1),r9 // save ESR

l.sw 8(r1),r2 // save GPRs to stack except r0, r1, r9l.sw 12(r1),r3 // because r0 is always 0l.sw 16(r1),r4 // r9 has been saved beforel.sw 20(r1),r5 // r1 as the SP will be saved laterl.sw 24(r1),r6l.sw 28(r1),r7l.sw 32(r1),r8l.sw 40(r1),r10...l.sw 120(r1),r30l.sw 124(r1),r31

After pushing a context into the stack, all registers of the context can beaccessed by the SP register r1. So the context is actually linked to theSP. The next step is to store the SP to the uC/OS-II Task Control Block(OS TCB). In the uC/OS-II, each task has its own stack and OS TCB. TheOS TCB is used to maintain all information related to a task, including theSP of the task as well. When the uC/OS-II needs to switch to another task,it firstly finds out the OS TCB and then the stack pointer. With the SP,the uC/OS-II can refer to the correct context for the task to be switched.

CHAPTER 4 61

The following codes store the SP (r1) to the OS TCB. The “OSTCBCur” isa pointer to the currently running OS TCB and the SP is the first memberof the struct OS TCB.

l.movhi r3,hi( OSTCBCur) // move high byte to r3l.ori r3,r3,lo( OSTCBCur) // move low byte to r3l.lwz r4,0(r3) // load the address of where to save the SPl.sw 0(r4),r1 // save the SP (r1) to that address

Afterwards, a function OSTickISR() is called. This is required as writtenin the uC/OS-II book Chapter 13 [15]. The OSTickISR() calls a uC/OS-II internal function OSIntExit() to detect whether or not another higherpriority task becomes ready. If yes, the context of the higher priority taskshould be resumed, which is done by the function OSIntCtxSw(). Otherwise,the OSTickISR() will return and the context of the previously running taskwill be resumed, which is done in the file reset.S.

The resuming of a context is very similar to the storing of a context but inan opposite way. All buffered registers are copied back from the stack to theCPU.

At the end, an OR1200 instruction “l.rfe” is used to return from the in-terrupt. The instruction reloads the EPCR, EEAR and ESR, such that theCPU is having the new program counter (PC) and status register if a contextswitch has happened.

4.7.4 Summary

Now we have introduced how a context switch is performed in the OR1200timer interrupt. The other type of the context switching triggered by “l.sys”is done practically in the same way with another function OSCtxSw().

As a summary, supporting the context switching is the core of porting theuC/OS-II to the OR1200 processor. As soon as this part is carefully takencare of, porting the uC/OS-II won’t be difficult to understand.

References:

[1] Website, OpenCores.org, http://www.opencores.org/,Last visit: 2011.01.31.

[2] Webpage, OpenRISC Project Overview, from OpenCores.org,http://opencores.org/project,or1k, Last visit: 2011.01.31.


http://opencores.org/project,or1k

62 CHAPTER 4

[3] Webpage, Damjan Lampret’s homepage, http://www.lampret.com/,Last visit: 2011.01.31.


[5] Marcus Erlandsson, OpenRISC project update, http://opencores.org/forum,OpenRISC,0,2888, Last visit: 2011.01.31.

[6] Website, Beyond Semiconductor, http://www.beyondsemi.com/,Last visit: 2011.01.31.

[7] Website, EMBECOSM, http://www.embecosm.com/,Last visit: 2011.01.31.

[8] Website, Dynalith Systems, http://www.dynalith.com/,Last visit: 2011.01.31.

[9] User Manual, OpenRISC 1000 Architecture Manual, OpenCores Orga-nization, Revision 1.3, April 5, 2006.

[10] Damjan Lampret, OpenRISC 1200 IP Core Specification, Revision 0.10,November, 2010.

[11] Jeremy Bennett, Julius Baxter, OpenRISC Supplementary Program-mer’s Reference Manual, Revision 0.2.1, November 23, 2010.

[12] Webpage, GNU Toolchain for OpenRISC, http://opencores.org/openrisc,gnu toolchain, Last visit: 2011.01.31.

[13] Jeremy Bennett, The OpenCores OpenRISC 1000 Simulator and ToolChain Installation Guide, Application Note 2, Issue 3, November, 2008.


[15] Jean J. Labrosse, MicroC/OS II: The Real Time Kernel, 2nd Edition,Publisher: Newnes, June 15, 2002, ISBN: 978-1578201037.


http://www.lampret.com/


http://opencores.org/forum,OpenRISC,0,2888

http://opencores.org/forum,OpenRISC,0,2888

http://www.beyondsemi.com/

http://www.embecosm.com/

http://www.dynalith.com/




Chapter 5

WISHBONE Specificationand CONMAX IP Core

In this chapter WISHBONE and CONMAX IP core will be introduced.Because they are very important in the system, a separate chapter is reservedfor them.

WISHBONE is the specification of a System-on-Chip (SoC) interconnectionarchitecture. It is a bus standard like ARM’s AMBA. It defines the interfacesof IP cores, therefore specifies how the cores should communicate with eachother. Further, this influences how the IP core network looks like.

The WISHBONE is adopted to be the interconnection standard in our thesisproject, because it is the official standard suggested by the opencores.org [1]and most open cores support (and only support) the WISHBONE standard.

If saying the WISHBONE is a blueprint, the CONMAX is a building. TheWISHBONE interCONnect MAtriX IP core (CONMAX) is a real IP corethat implements a matrix1 interconnection that complies with the WISH-BONE standard. In the thesis project, what we did was just connecting allother IP cores to the CONMAX. It helped us to control the data traffic andhandle bus transactions in the system.

Actually everything about the WISHBONE and the CONMAX are fullydocumented in their specifications [2, 3]. To avoid just copying texts fromthe specifications, the chapter adds more explanations to help understandthe WISHBONE standard and the CONMAX IP core.

1It is also called crossbar switch structure.

63

64 CHAPTER 5

5.1 Importance of Interconnection Standard

Before everything goes into detail, it is necessary to stress the importanceof the interconnection standards, which is the WISHBONE in our case.

Nowadays, people have gradually realized that the interconnection architec-ture maybe the most significant in an electronic system, even more thanprocessors. There are several reasons listed below:

1. First, the bus or the interconnection is becoming the bottleneck of thesystem performance.

In the past, it was the processor that limited the system performance.One of the methods was to increase the system frequency. For examplethe home PC frequency was improved from 100MHz level to GHz levelduring the last decades.

But it turned out that increasing the CPU frequency was not alwayshelpful. There are at least two serious drawbacks:

• Higher frequency consumes more power.

• Most peripherals cannot catch up such high speed. As a result,processors spend a lot of time in the idle state waiting for theperipherals to complete an operation while doing nothing.

To solve the problem, multiprocessor systems appeared. By using morethan one processor, peripherals can be driven in parallel. And in theorythe frequency can lower down because the work is now shared by moreprocessors. So it is the trend that the multiprocessor structure willbecome popular. However another problem arose, that the traditionalshared bus limited the performance hugely in multiprocessor systems,as Figure 5.1 shows.

In Figure 5.1(a), only one processor is able to access the peripheralsat a time. The other have to wait until the first one releases the bus.This has actually no difference than a single processor system. Figure5.1(b) shows an improved version. The bus is replaced by a matrixinterconnection. Now two processors can access different peripheralsin parallel, but still need to be arbitrated when accessing the sameperipheral.

In fact, some systems have more than two and even dozens of proces-sors. And the number is still increasing. As a result, the traditionalshared bus is evolving more and more like a network. So as we cansee, how to communicate efficiently in the multiprocessor system has

CHAPTER 5 65

(a) Traditional bus limits performance (b) Matrix structure is better

Figure 5.1: Interconnection is important in multiprocessor systems

emerged as a critical problem. Carefully designing interconnection ar-chitecture to get a maximum throughput for the IP core network be-come an attracting issue that engineers care about now.

2. Second, IP cores need standards for their interfaces. This is easy tounderstand. Lots of companies produce IP cores. If there is a standardthat everyone follows, all of the cores will get connected easily. This willdefinitely accelerate a lot on developing the new products. So peopleneed a unified interconnection architecture.

3. Third, a standard for the interconnection has great market potential.Think about it. If a company owns a standard which takes the domi-nant role in the market, all other vendors have to ask for the licenseto make their products having standard interfaces. Furthermore, everytime when the standard is updating, the owner will get chances to leadthe direction of the development in future.

4. Sometimes, the standard could even be the first criterion of the IP coreselection. For instance because we used the WISHBONE in the thesisproject, all the open cores chosen for the system have to be WISH-BONE compliant. Some of the IP cores with different interfaces weregiven up because they cannot adapt into the system easily, althoughthey might have very good quality.

So now we know how important the interconnection architecture could be.Then we must emphasize a special feature of the WISHBONE standard: itis in the public domain.

The WISHBONE is in the public domain means it is not copyrighted, whichis another way of saying anyone could do anything with the WISHBONEwithout any limitation, but of course no warranty at the same time. This is agreat gesture from the developers of the WISHBONE, because they gave up

66 CHAPTER 5

the ownership they could have, so that all other people benefit since they areallowed to develop new products based on the WISHBONE without askingfor a license. They could even turn the new WISHBONE based productsinto their own proprietaries because the WISHBONE is not copyrighted.Besides, the WISHBONE will be always for free. Comparing to the ARM’sAMBA, it is a copyrighted open standard which do not have to pay rightnow, but ARM never promise it will be free forever. So it is possible thatafter next upgrading you may have to pay for the license of each copy ofthe products which apply the AMBA standard. The public WISHBONEstandard also gives a meaningful push to the open core community. Nowdevelopers can happily design new open cores by following the WISHBONEstandard, with no trouble on interface compatibility and with less worry onlegal problems.

Above all, the importance of the WISHBONE has been introduced. But it isa shame that actually all we did in the thesis was just including a CONMAXcore into the system, no further investigations. In the future, it would be areally good starting point to research the interconnection architecture withthe WISHBONE, like how to adapt the standard into a Network-on-Chip(NoC) system meanwhile keeping a certain throughput, or make benchmarksbetween the WISHBONE and other interconnection standards, etc.

5.2 WISHBONE in a Nutshell

In this section the WISHBONE standard will be introduced. It can be seen asan explanation or a supplement to the official specification when the descrip-tions in the standard are not that easy to understand. So it is recommendedto read the WISHBONE specification first. Here’re several suggestions forreading the official specification.

1. Start with the tutorial in the appendix. It is very good to give a quickoverview of the WISHBONE. But don’t have to spend too much timeon the examples of the appendix of how to implement a WISHBONEinterconnection, because we use the CONMAX in the thesis project,which will be introduced in the later section of this chapter.

2. There are lots of Rules, Recommendations, Suggestions, Permissionsand Observations in the specification. It might be helpful to skip themall at the first time reading the document. And only read the Rules atthe 2nd time.

3. All timing diagrams from section 3.2 to 3.5 of the WISHBONE are

CHAPTER 5 67

not good enough to understand and a bit confusing. Those diagramsare re-drawn in this thesis.

5.2.1 Overview

The WISHBONE is not a complicated standard, but it contains almost allmain features that the other bus standards have. And if consider it waspublished at the year 2002, it was really a brilliant work at that time.

One of the major works that the WISHBONE did was to define interfacesignals. If an IP core supports all basic signals with correct functions asthe WISHBONE specified, it will be compatible with other IP cores whichhaving the WISHBONE interfaces.

With the signals, the WISHBONE designed a set of protocols that stan-dardizing how the interfaces communicate, i.e. how the data is packaged andsent/received by the signals. These are called “bus transactions”. There are4 types of transactions described in the specification: single, block, RMW,and burst.

In fact, it is the IP cores’ responsibility to implement such signals and pro-tocols. To be WISHBONE compatible, the cores have to include specializedlogic to provide interface signals, as well as to be able to send and recognizebus transactions correctly. Due to the implementation of the WISHBONEinterface logic is tightly coupled with IP functions that may vary from onecore to another, there isn’t a universal solution of how to design a WISH-BONE interface. So this chapter will not introduce the details of the interfaceimplementation. However, all IP cores used in the thesis project are withWISHBONE interfaces. Their source codes can be taken as examples tostudy the interface logic.

When more than one WISHBONE compliant IP core is used to form a largersystem, a WISHBONE network is constructed. In the Appendix A.2 of thespecification (page 96–99) [2], 4 types of the interconnections are introduced.They are the most common ways to compose a WISHBONE network. Butof course there are more solutions than the four. Users can design inter-connections with new structures, as long as the solutions guarantee all bustransactions are transferred correctly and efficiently in the interconnection.The way to organize a network is still a good topic of the WISHBONE toresearch.

There are many IP cores already built to help constructing the WISHBONEnetworks. In our thesis project, we didn’t spend much time on designing aninterconnection. Instead we used the CONMAX IP core which implements

68 CHAPTER 5

a crossbar switch structure. With the CONMAX, we can easily create anetwork just by connecting all other cores to it. This saved us lots of timeand made the system more reliable.

To sum up, the WISHBONE standard can be divided into 3 aspects: thesignals, the transactions and the interconnections. Both the signals and thetransactions are relatively stricter defined by the standard that all WISH-BONE compliant cores have to follow. While the interconnection is moreflexible to implement that depends on the situations of different projects. Afigure below gives an overview of the WISHBONE.

Figure 5.2: Overview of the WISHBONE

5.2.2 WISHBONE Interface Signals

The WISHBONE defines 2 types of interfaces: master and slave. An interfacehas to be either a master or a slave. Masters always start actions, like requestfor reads or writes. Slaves always respond to the requests. A connection canbe made only between a master and a slave, but not between the same types.

This implies that the signals located at masters and slaves are in pairs, orwould rather say they are complementary. For instance, if a master has asignal named ADR O which outputs an address, there should be an ADR I1

in the slave which receives the output. To simplify only the signals from the

1The “ I” in the CLK I stands for an input port. It is an output if the port is namedas “XXX O”.

CHAPTER 5 69

master side are introduced below.

The WISHBONE standard describes a lot signals, and sorts them in an as-cending order. This is not good to understand for the beginners. In this thesisthe signals are divided into 6 groups based on how frequently they are used.And each group is marked as basic or extended. When designing an IP core,not all signals specified in the standard have to be implemented, but only thesignals in the basic groups. So some WISHBONE complaint IP cores mayonly have minimum basic signals, while some others may have more extendedsignals to support advanced WISHBONE functions, like block read/write,burst etc.

Here are the 6 groups of the WISHBONE signals. Only the group 1 and2 are the basic signals that every WISHBONE compliant IP core has toimplement.

Group 1

CLK I: The system clockRST I: The system reset

Clock and reset are the most basic signals that all WISHBONE interfacesshould have. They are the only two signals that always as input type what-ever at the master side or the slave side.

All of the CLK I and all of the RST I are connected together in a WISH-BONE network. So all IP cores in the network share the same clock source,and reset at the same time.

Sometimes an IP core may have more than one reset input, in such casesnormally all reset inputs should be connected together so that only one resetsignal drives the whole system.

The WISHBONE is a synchronous interconnection standard, which meansall IP cores in the system examine inputs as well as change outputs at eachclock rising edge. This is clearly stated in the specification when describingthe WISHBONE features and objectives. In page 9, one of the features isdescribed as “Synchronous design assures portability, simplicity and ease ofuse.” And in page 12, the last several objectives are about synchronization,like “to create a synchronous protocol to insure ease of use, good reliabilityand easy testing. Furthermore, all transactions can be coordinated by asingle clock.” etc. [2]

70 CHAPTER 5

Group 2

STB O: When the STB O=‘1’, it means either a read or a write operationis ongoing. Meanwhile all data signals like the ADR, DAT andWE etc., are valid.

CYC O: The CYC O keeps high during the period of the whole bus trans-action. More than indicating bus transactions, it is also used torequest grants from bus arbiters when multiple masters are ac-cessing one slave at the same time.

ACK I: Acknowledgements from the slaves. When a read or write oper-ation is finished, the slave informs the master by giving a oneclock cycle’s ACK back. When a master finds the ACK=‘1’ ona clock rising edge, it knows the current read/write operation isdone and it’s OK to start the next one.

ADR O: The address to access.WE O: Indicates either read or write. It is a write operation if WE=‘1’,

else read.DAT O: Data outputs from a master when writing to a slave.DAT I: Data inputs to a master when reading from a slave.SEL O: Indicates which fragments of data are valid. For instance, for a 32-

bit bus, the SEL O is 4-bit width. If SEL O[3:0]=“1000” during aread, it means the master only wants the highest byte of the dataDAT I[31:24]. All other bits are not valid and won’t be processed.

The 8 signals in the group 2 are enough to perform basic WISHBONEfunctions. Plus the CLK I and RST I, all 10 signals are necessary for everyWISHBONE compliant IP core.

Group 3

ERR I: Indicates errorsRTY I: Retry signal, ask for a repeat of the last read/write operation

Some IP cores have these signals in their interfaces. They are similar tothe ACK I but have different meanings. Once a read/write operation issuccessfully finished, a one cycle ACK returns, but if it isn’t, the slave maygive a one cycle ERR to tell the master that an error occurred in the lastread/write operation, or send a RTY to ask for an retry.

To enable this function, both the master and the slave have to support it,i.e. the master should have the ERR I and the RTY I and the slave hasthe ERR O and the RTY O. This implies certain functional logic has to bedesigned in the IP cores. But these features are not compulsory according

CHAPTER 5 71

to the WISHBONE standard.

If a slave doesn’t have an ERR O or a RTY O, the ERR I and the RTY Iof the master can be wired to ground. If a master does not support thesesignals, the ERR O, RTY O and ACK O of the slave may be connectedtogether with an OR gate and then send to the ACK I of the master. But insuch case the ERR and the ACK signals are treated as an ACK and ignored.

Group 4

TGD I: Tag of input dataTGD O: Tag of output dataTGA O: Tag of addressTGC O: Tag of bus cycle

These 4 signals are called tag signals because they are attached with othersignals to provide extra information, just like tags. For example when amaster is sending data in serial to a slave, if some of the data is more specialthan the others, they could be marked by the TGD O which is sending at thesame time, so that the slave can recognize the special data when receivingthe TGD O signal. Or for another example, when the CYC=‘1’, the value ofTGC could be used to determine which kind of transaction is transferring,for instance the 00, 01, 10 and 11 could be used as tags to stand for single,block, RMW and burst transactions respectively.

An interesting thing is that the WISHBONE doesn’t specify the 4 signalsin detail, like the width of the signals, or the meanings of the data patterns.This leaves a great freedom to the users on how to utilize the signals. Inprinciple, it is totally possible to define a custom protocol for a system, aslong as both the master and the slave understand the tags in the same way.

The tag signals need specialized logic design in both the master and theslave IP cores too. Again, these signals are not mandatory. Most existingWISHBONE IP cores do not have them.

Group 5

LOCK O: Indicates the current bus transaction is uninterruptible.

LOCK is another signal almost never in use.

The usage of the signal is not clearly described and only mentioned in thechapter 3.3 of the WISHBONE specification (page 51) [2]. According tothe waveform in the figure, when the block read/write transactions happen,

72 CHAPTER 5

the LOCK could be used together with the CYC to hold a transactionuninterrupted.

The information from the specification is not enough, but in the WISH-BONE forum there was a message [4] of LOCK from Richard Herveille, theauthor of the WISHBONE specification, which explains a little more aboutthe signal. The message is copied below:

What is the purpose of the LOCK O signal? From the description it says thatit indicates that the bus cycle is uninterruptible. However, the statement:

Once the transfer has started, the INTERCON does not grant the bus to anyother MASTER, until the current MASTER negates [LOCK O] or [CYC O].

If deasserting CYC O causes the lock to end, then this signal doesn’t reallydo anything more than asserting CYC O by itself, does it?

[rih] Not really. CYC is a bus-request signal. If it’s asserted it validates allother signals. So if CYC is negated, LOCK is invalid. If a higher prioritybus master asserts CYC then the bus arbiter might grant that master thebus. Asserting LOCK prevents this. So far nobody uses LOCK.

Group 6

CTI O: Cycle type identificationBTE O: Burst type extension

These 2 signals are even less used, but clearly defined in the chapter 4 ofthe specification. They are designed for the WISHBONE registered feedbackbus cycles, i.e. the burst transactions. Basically they are similar to the tagsignals which also provide extra information. By the CTI and the BTE, theslave knows the status of the burst so that can be prepared to handle it.The 2 signals will be discussed again later in the burst transaction section.

5.2.3 WISHBONE Bus Transactions

In the WISHBONE specification, each process of data transferring is calleda bus cycle. There are 4 types of bus cycles defined in the specification1,which are single, block, RMW and registered feedback bus cycles.

In this thesis however, the name of the “bus cycle” is replaced by the “bustransaction”, because when saying “cycles” it might be confusing with clock

1The first 3 types are in Chapter 3, and the last one in Chapter 4 of the WISHBONEspecification.

CHAPTER 5 73

cycles and bus cycles. The 4 bus cycles here are named as single, block,RMW and burst1 bus transaction respectively.

Each action of read or write is called an “operation” in the thesis. A singletransaction contains only one read or one write operation, while the block,RMW and burst transactions can have multiple operations within one trans-action.

According the specification, all 4 types of transactions are optional. But tobe WISHBONE compliant, an IP core has to support at least 1 type oftransactions to communicate with the others. In fact almost all IP coreschoose to implement the single transaction because which is the simplest,whereas the other 3 are merely used. For example all IP cores in the thesisproject only work with single transactions.

5.2.3.1 Single Read/Write Transaction

Single read/write transaction is the most frequently used. Each transactioncontains only 1 read or write operation initiated by the master. The slaveresponds to the request by returning wanted data in case of reading, oraccepting exported data in case of writing.

Figure 5.3 gives an example of the block diagram of the connections of theWISHBONE signals between a master and a slave. The diagram helps todemonstrate how the bus transactions are transmitted from the master tothe slave through the connections. The readers can just imagine all wave-forms below in this Chapter are happening over the wires in the figure.

Figure 5.3: An example of WISHBONE signal connections

1The name of burst is more popular than the registered feedback.

74 CHAPTER 5

Figure 5.4 depicts normally the single read/write transactions could be. Fourtransactions are contained in the figure.

Figure 5.4: An example of WISHBONE single transactions

In the 3rd clock rising edge, the STB became ‘1’ to inform the starting of asignal read transaction (WE=‘0’). On the next cycle, since the slave did notrespond, the master held all signals unchanged. After a while, the slave an-swered by asserting ACK to ‘1’. On the next rising edge, the master detectedthis ACK and latched the returned data. This is the first transaction.

After another 3 cycles, the master was ready again. This time it wrote datato the slave. The slave gave response as soon as possible at the following clockcycle. However, somehow the slave didn’t finish this operation correctly, soit returned a RET to request for another try. The master then repeated theoperation again, and got the result (ACK) successfully this time. These arethe 2nd and 3rd transactions.

After that, the master started another transaction. As limited by the figure,not all of the waveform are drawn. The master has to keep all signals until itgets the response from the slave. It could be even forever if the slave doesn’trespond at all.

So far, we have given some general idea about how the WISHBONE workswith the single read/write transactions. Except for that, there are severalimportant notes listed below:

1. One of the most important things needs to notice is that the slavealways gives exactly 1-clock-cycle ACKs. This is sort of one-way hand-shake protocol, i.e. the master holds the sending signals and observesthe replies from the slave all the time, but the slave only sends backa 1-clock-cycle ACK signal and don’t care if the master receives theACK or not.

Please remember that the master will be confused when they see an

CHAPTER 5 75

ACK signal with multiple cycles. This will be interpreted as severaloperations are done.

Because of the one-way protocol, if a WISHBONE interconnection isso complicated that somehow a master could miss an ACK, the currentbus transaction will be unable to finish and keeps forever. This is quiteexceptional, but if such cases do appear, a watchdog inside the mastermay be considered, which forces to restart or skip the transaction aftertimeout.

2. The CYC signal should not be ignored, although as everyone can seethe CYC and the STB have exactly the same waveforms in singletransactions.

According to the specification, the slaves are only allowed to behavewhen CYC=‘1’. This means the slaves only respond to the transactionswhen the logical AND of the STB and CYC is true.

For example, in complicated WISHBONE networks, the CYC is usu-ally used to request for grants from bus arbiters. There are the situ-ations that the arbiter broadcast bus transactions to all slaves exceptfor delivering only the CYC signal to the right slave. So in such cases,the slaves may respond incorrectly if they don’t check the input valueof the CYC.

3. All 3 signals of the ACK, RET, and ERR can be used to reply tothe master, like the 2nd transaction is finished with a RET. But boththe master and the slave IP cores have to support the signals and thefunction. Usually they have only one ACK signal, because the RETand ERR are not mandatory by the specification.

4. In the first transaction, the master received the response after 4 clockcycles, but the delayed cycles may not necessarily always 4. The slavesmay need several cycles to process the data. No ACK will return untilthey are finished. Therefore, a bus transaction could keep for a longtime because too much time is spent on waiting for the slow slave.

Similarly, when the masters receive ACKs, they may need some timeto process data too. In such cases, there will be breaks between 2transactions, just like the delay between the 1st and 2nd transactionin the figure.

In the best situation without any delay, the ACK will be set to ‘1’ bythe slave on the next rising edge that the STB asserts. And the masterwill start another transaction as soon as it gets the ACK. The 2nd to4th transactions in the figure describe this situation.

76 CHAPTER 5

5.2.3.2 Block Read/Write Transaction

The WISHBONE specification defines block read/write transactions to trans-fer more than one data in a bus transaction. It is almost the same as thesignal transactions, only multiple single read/write operations are now cap-sulated in one transaction. To indicate the current bus transaction is a blocktransaction, the CYC has to keep high during the whole transaction period.

Figure 5.5: An example of a WISHBONE block write transaction

As Figure 5.5 shows, it is a block read transaction which includes 4 readoperations. As we can see, now the CYC is keeping high for the durationof the transaction time, while the 4 read operations have no difference withsingle read/write operations.

According to the specification, block transactions must contain either allread or all write operations, but cannot have both types in one transaction.However, in my perspective it should not be a constraint. In principle bothread and write operations are operations and the block transactions are ac-tually a batch of single operations, so to this extent the slaves can alwaysbehave correctly if they are able to deal with the single read or write op-erations, without thinking about if the current block transaction is a blockread or a block write transaction at all.

By the way, please don’t mix up the “block read/write” in the WISHBONEand the “blocking read/write”. They are quite confusing sometimes. Theblock read/write means to read/write a batch of data in a time, while theblocking read/write means the system will be stuck until the last read/writeoperation has been finished. For a simple example of the blocking read, whenprogramming in C with the function scanf() to read from a keyboard, theprogram won’t continue until some buttons are pressed.

Some people may wonder why we need the block transactions, because theylook pretty much like the single transactions and essentially do not increase

CHAPTER 5 77

the bus throughput. So the following example is designed to give an answer.

In Figure 5.6, there are 2 masters and 1 slave. Both the masters want toaccess the slave but only one of them is allowed to do so at a time. So there isan arbiter who makes the decisions. The arbiter gives grants to the mastersaccording their CYC signals, like Figure 5.7 shows.

Figure 5.6: Block transactions are helpful in multi-master systems

Figure 5.7: Master B is blocked by master A

In Figure 5.7, only the CYC, STB and ACK signals are given, becausethey are enough to describe the WISHBONE protocol. First, the masterA started a block transaction and asserted its CYC at the 3rd clock risingedge. At that time, because no one took the bus, the arbiter gave the grantto the master A and connected it to the slave. In the following 2 clock cycles,one operation was done. However, the other operation from the master Awas somehow delayed, so its CYC had to keep holding as this is a blocktransaction. Because the arbiter gave grants by judging the CYC, the masterA therefore possessed the network all the time during its CYC was ‘1’. As aresult, the master B had to wait until the whole A’s block transaction wasfinished, although it was ready since the 6th clock rising edge.

To summarize the example, the WISHBONE masters use the CYC to requestgrants from the arbiter when accessing slaves in multi-master systems. Ifthe system has no preemption, i.e. once the grant is given to a master itwon’t be withdrawn if another higher priority master becomes ready, theCYC actually can be used to hold the line, until all data from a master is

78 CHAPTER 5

transferred.

Usually the signals of the group 1 and 2 are enough to perform the blocktransactions through the WISHBONE network, but in the specification italso mentions other signals like the tag signal TGC or the LOCK, whichshould be involved in block transactions. The TGC could be used to identifywhich type of the transactions is ongoing. So the slave knows if the currenttransaction is a single or a block transaction. However this is not necessaryin fact. Because if a slave can process read/write operations correctly, it doesnot have to recognize what the current transaction is. And in the systemswithout preemption, the LOCK is useless too.

5.2.3.3 Read-Modify-Write (RMW) Transaction

The WISHBONE specification also defines a kind of transactions namedRead-Modify-Write (RMW). It is said the RMW transactions are used for“indivisible semaphore operations” [2].

Far from the complicated name, the RMW transactions are fairly simple. Infact, a RMW can be seen as a block transaction with 2 different operations.The first one is a read operation, while the other is a write operation. So bythe RMW transactions, we can easily read data, modify it, and then writeback to the same address. The RMW waveform is showed in Figure 5.8.

Figure 5.8: An example of a WISHBONE RMW transaction

In my opinion the RMW and the block transactions can be merged together.The block transactions are essentially a batch of bus operations which haveto be either all reads or all writes. The RMW transactions contain 2 busoperations that a read followed by a write. If we redefine the rules of theWISHBONE specification to allow the block transactions to include anytype and any number of operations, the RMW will become a subcategoryof the block transaction.

CHAPTER 5 79

5.2.3.4 Burst Transaction

Throughput is always an important criterion to evaluate the performanceof an interconnection architecture. Higher throughput can transfer largeramount of data in a certain time period. Or in case the bus width is given,it means to finish as many read/write operations as possible.

The WISHBONE interconnection tries to achieve a good throughput too.This is why it spent the whole chapter 4 to describe registered feedback buscycles, i.e. “burst transactions”.

The burst transactions are one of the four types of the WISHBONE transac-tions, which are different from the block transactions. In principle, the blocktransactions do not increase the throughput of a system. Sometimes theymay even ruin the performance if a master holds a line too long but doesnot transfer data. But the burst transactions do improve the throughput,by a set of carefully defined schemes.

The main idea of the scheme is to inform the slaves in advance that theyare going to be addressed again and again within a bus transaction, sothat they will be prepared to respond continuously. At the same time themasters could initiate operations one after another without waiting for theresponses from the slaves, because the slave is assumed to know the data issent continuously and be able to handle that.

For the single or block transactions, normally after initiating operations themasters have to stay and hold signals until an ACK feeds back. In the bestcase that communicating without any delay, the waveform will look likeFigure 5.9.

Figure 5.9: Maximum throughput with single transactions

In the figure, firstly the STB is asserted to start an operation. Then theslave replies as soon as possible on the next clock rising edge and give avalid ACK back. After the ACK is received, the master sends the nextoperation immediately. As we can see in this scenario, each operation takes2 bus cycles to finish. This means we can get 50% bus utilization in the bestcase.

80 CHAPTER 5

To get a throughput yet higher, the burst transactions are used. A demowaveform is showed in Figure 5.10. In the figure, the master starts a requestby asserting the STB at the 3rd clock rising edge. Meanwhile it somehowtells the slave that this is a burst transaction. At the 4th clock rising edge,the slave receives the message and gets prepared to handle the burst. Bygiving back a one-cycle-ACK the slave indicates that it is ready for onemore read/write operation. The master sees the ACK at the 5th edge andcontinues. Then another 3 operations are done from the 5th and 7th clockcycles.

Figure 5.10: Maximum throughput with burst transactions

As we can see, N operations are now completed in N+1 clock cycles. So thebus utilization now is N/(N+1). If the N is a very large number, the utiliza-tion will theoretically approach 100%. So the burst transactions, if they canperform properly, are the best scheme to achieve the top throughput.

By the way, the burst transaction is also referred as the advanced syn-chronous cycle termination at the beginning of the chapter 4. The WISH-BONE specification compares 3 different ways of terminations, asynchronous,synchronous and advanced synchronous, and points out that the advancedsynchronous is the best. But the conclusion is not well presented, becausethe table in the specification shows that the asynchronous cycle terminationis always the smallest of the three. The writer forgot to stress an impor-tant point again here, that the WISHBONE is a synchronous bus standard.Even though asynchronous circuits are considered faster, we have to pickup a regular and stable way to communicate in the WISHBONE. So fi-nally the advanced synchronous is chosen because it is the better one in thesynchronous schemes.

More WISHBONE signals are involved in case of the burst transactions,because whose protocols are more complicated than the WISHBONE classi-cal single, block and RMW transactions. These signals are the CTI and theBTE, i.e. the signals of the group 6 described in the previous section.

The CTI is used to identify the burst transactions. At every rising edge,the slaves examine the value of the CTI to see if preparations are needed toexecute to handle the burst transactions. If the CTI is “000”, the currenttransaction is a classical transaction. No need for special preparing. If the

CHAPTER 5 81

CTI is a “001”, the current transaction is a constant burst, or if “010”, it isan incrementing burst. When a burst is about to terminate, the master willgive a CTI with “111” to tell the slaves that go back to normal state.

There are 2 kinds of burst. The constant burst always reads or writes thesame addresses. This is useful to access FIFOs or certain I/Os which havevolatile data. While the incrementing burst contains the operations targetedto adjacent addresses. It is particularly designed for reading or writing ablock of data from/to memories.

When the incrementing burst is used, one more BTE is needed to indicatehow the address grows. The definition of the BTE is clearly described in thetable 4-2 and 4-3 of the WISHBONE specification.

Now it is time to go through the details of how the burst transactions work.To avoid describing by just boring texts, 3 examples are designed with wave-forms, which can be seen as supplements to the WISHBONE specification.

The first example is a constant writing burst, which is showed in Figure5.11. The first line of the waveform marks the number of each clock risingedge. Below that, only the related signals are drawn. As we can see now theCTI and the BTE are included for burst transactions. The value “CON” ofthe CTI shows this is a constant burst, and the “EOB” stands for End-Of-Burst. The BTE is not needed for the constant burst transactions, so itsvalue is not cared about (‘X’) during the whole period. Besides, the signalWE is always ‘1’. This indicates the current burst is a burst write.

Figure 5.11: An example of a constant writing burst transaction

Edge 1:

Master: The master is ready to initiate a burst transaction. It sets theSTB to ‘1’ to start the transaction. Meanwhile it gives 1st validaddress, outputs the 1st data to be written, and sets WE to ‘1’.More important, it asserts the CTI as CON to inform the slavethis is a constant burst.

Slave: The slave does nothing at the edge 1 because it cannot see any-thing from its perspective.

82 CHAPTER 5

Edge 2:

Master: The master checks the value of the ACK to see if the slave replies.Since the ACK is still ‘0’, the master holds all signals unchangedfor one more clock cycle.

Slave: The slave now receives the information about the burst from themaster. The slave checks and the CTI, and knows it is a constantburst. Because the slave is idle and can handle the 1st data ofthe burst, it asserts the ACK to inform the master. This ACKmeans: the slave is capable for accepting the 1st data, pleasecontinue. But NOTE that in burst transactions actually the 1stdata is not processed here, but at the next clock rising edge.

Edge 3:

Master: The master checks again the value of the ACK. Now as the ACKis ‘1’, the master knows that the slave can take care of the 1stdata of the burst and wants more. So it puts the 2nd data ontothe bus. Because the constant bursts always access one address,the value of the 1st, 2nd, 3rd and 4th addresses are actually thesame.

Slave: The slave firstly latches the 1st data at the edge 3. The 1st datais accepted now. Meanwhile the slave checks the CTI, and knowsit is still a CON. As the slave is still capable to receive more data,it keeps the ACK as ‘1’.

Edge 4:

Master: The master checks the ACK, and finds which keeps as ‘1’. Themaster sends another data to the slave.

Slave: The slave latches the 2nd data. The slave checks the CTI, andknows it is still a CON. The slave keeps asserting the ACKbecause it is capable to handle more data.

Edge 5:

Master: The master checks the ACK, and finds which keeps as ‘1’. Themaster sends another data (4th) to the slave. Because the masterknows the 4th data is the last one of the burst, it sets the CTIto EOB.

Slave: The slave latches the 3rd data. The slave checks the CTI, andknows it is still a CON. The slave keeps asserting the ACKbecause it is capable to handle more data.

Edge 6:

Master: The master checks the ACK, and finds which keeps as ‘1’. TheACK shows that the last data of the burst will be taken care of,

CHAPTER 5 83

so the master de-asserts all signals and terminates the burst.Slave: The slave latches the 4th data. The slave checks the CTI, and

notices that it is an EOB now. So it knows the 4th data will bethe last one and does not need to assert the ACK anymore.

After the first example we hope the readers have understood more aboutthe burst transactions. The next one is another example of an incrementingread burst transaction, showed in Figure 5.12.

Figure 5.12: An example of an incrementing reading burst transaction

Now the value of the CTI is no longer CON but INC, which shows that thecurrent transaction is an incrementing burst. The BTE is now needed forincrementing bursts to present how the address grows. The LIN means theaddress increases in a linear manner. So the addresses are B+0, B+1, B+2 . . .The “B” represents the “Base” address. The “B+1” does not mean exactlyB plus 1, but rather the address next to the base address. For instance, ifthe data bus is 32-bit width, the value of the B+1 would be the base addressadds 4.

Edge 1:

Master: The master starts the burst transaction. The master sets CTIto INC and the BTE to LIN. Because this is a read transaction,the value of the data in is unknown by the edge 1.

Slave: The slave does nothing because it cannot see the transaction hasinitiated.

Edge 2:

Master: The master checks the ACK and finds which is ‘0’. So the masterholds all signals.

Slave: The slave sees all signals and knows this is an incrementing readburst. The slave feels capable to handle the burst. So the slave(1) returns the 1st data according to the address B+0; (2) setsACK to ‘1’ to inform the master to keep transferring; (3) calcu-lates the next address based on the value of the current addressand the BTE.

84 CHAPTER 5

Edge 3:

Master: The master checks the ACK and finds which is ‘1’, so it knowsthe slave is capable to handle more data. The master puts thenext address B+1 and other signals onto the bus.

Slave: The slave checks the STB, CTI, BTE, and knows the burst isstill happening. The slave feels capable to handle more data. Sothe slave (1) returns the 2nd data according to the address B+1,which is the address calculated by the slave itself at the previousedge; (2) continues setting ACK to ‘1’ to inform the master tokeep transferring; (3) calculates the next address based on thecurrent address (B+1) and the BTE.

Edge 4 is similar to the Edge 3.

Edge 5:

Master: The master checks the ACK and finds which is ‘1’, so it knowsthe slave is capable to handle more data. The master puts thenext address B+3 and other signals onto the bus. The masterknows this will be the last one of the burst, so it changes theCTI from INC to EOB.

Slave: The slave checks the STB, CTI, BTE, and knows the burst isstill happening. The slave feels capable to handle more data. Sothe slave (1) returns the 4th data according to the previouslycalculated address B+3; (2) continues setting ACK to ‘1’ toinform the master to keep transferring; (3) calculates the nextaddress which is the address B+5, although this address will notbe used.

Edge 6:

Master: The master checks the ACK and finds which is ‘1’, so it knowsthe slave is still working and the last data is ready to read. Themaster latches the 4th data. The master de-asserts all signals toterminate the burst.

Slave: The slave checks the STB, CTI, BTE, and knows this is the endof the burst. So there is nothing to do now except for de-assertingthe ACK to ‘0’.

So far we have seen 2 examples. What the master and the slave actuallydo at every clock rising edge were explained. After the specific descriptionsnow it is time to summarize some general rules about the WISHBONE bursttransactions.

1. According to the specification, all burst transactions have to be eitherread or write, i.e. a burst transaction must contain either all read

CHAPTER 5 85

operations or all write operations, but cannot have both within oneburst.

2. The CTI and the BTE signals are used to assist the slaves to identifythe type and the status of the current burst. All burst end up with anEOB in the CTI.

3. The slaves behave different between when it is a read burst and whenit is a write. If it is a write, the slaves don’t do anything special thanjust latch the written data at every rising edge. However if it is read,the slaves need to pre-calculate the next address based on the currentaddress and the value of the BTE. And then use the calculated addressto access the next data for the masters.

4. The WISHBONE specification does not mention that the slaves havethe ability to predict the next address automatically. But this is true.I found a message from the forum of the opencores.org which said so.And in this way all waveforms in the specification are well explained.

5. The masters assert one clock cycle of the STB is saying that there ismore data to read or write. The slaves assert one clock cycle of theACK implies the last operation has been taken care of and they areready to handle the next read or write.

6. Both the masters and the slaves are allowed to break the current burst,i.e. to insert wait states (WSM or WSS) at any time during the burst.This will be described later.

A general algorithm is made for the WISHBONE burst transactions, whichlists everything in detail that the masters and the slaves should do at everyclock rising edge to perform bursts.

For masters:

Firstly at a clock rising edge, initiate a burst by asserting the STB, CTI,BTE and other signals.After that at every clock rising edge, check the value of the ACK.IF: the ACK=‘0’, set the STB to ‘1’ and hold all other signals unchanged

for one more clock cycle.ELSE: the ACK=‘1’,

IF: the current CTI is not an EOB, set the STB to ‘1’, meanwhileIF: it is a write burst, output the next data to the slaves.ELSE: if it is a read burst, latch the current data and output the

next address.ELSE: if the CTI=EOB, set the STB to ‘0’, meanwhile

86 CHAPTER 5

IF: it is a write burst, de-assert all signals to finish the burst.ELSE: if it is a read burst, latch the last data and de-assert all

signals to finish the burst.IF: no wait state is needed, then that’s it. Go to the next clock rising

edge.ELSE: if the masters currently are unable to handle more data, a wait

state is inserted by resetting the STB to ‘0’. All other signals canoutput as ‘X’. However, the masters should still go through theprevious IF-ELSE block, and somehow remember what the outputsignals should be. In case of read bursts and the ACK is ‘1’, themasters should latch the current data before they turn into thewait states. When the masters come back from the wait states,they should resume all signals remembered before they fell into thewait states. Note that, at the edges when masters return from waitstates, the only thing they do is to resume the remembered signals.The first IF-ELSE block is skipped at that clock edge.

For slaves:

At every clock rising edge, check the value of the STB.IF: the STB=‘0’,

IF: no burst is started yet, do nothing.ELSE: if a burst has already started, set the ACK to ‘1’ and hold all

other signal unchanged for one more clock cycle.ELSE: the STB=‘1’, check the CTI, BTE, WE and other signals.

IF: the current CTI is not an EOB, set the ACK to ‘1’, meanwhile,IF: it is a write burst, accept and write the data to the address

currently transferred through the bus. Exception: if this isthe first write operation of the burst, don’t process the data.

ELSE: if it is a read burst, do: (1) return the data to the masterbased on the previously calculated address. Exception: ifthis is the first read operation of the burst, i.e. there’s nopre-calculated address, return the data based on the currentaddress sent by the master. (2) calculate the next addressto read according to the value of the current address, theCTI, and the BTE.

ELSE: if the CTI=EOB, set the ACK to ‘0’, meanwhile,IF: it is a write burst, latch the last data and de-assert all signals.ELSE: if it is a read burst, just de-assert all signals.

IF: no wait state is needed, then that’s it. Go to the next clock risingedge.

CHAPTER 5 87

ELSE: if the slaves currently are unable to handle more data, a wait stateis inserted by resetting the ACK to ‘0’. All other signals can outputas ‘X’. However, the slaves should still go through the previous IF-ELSE block, and somehow remember what the next output signalsshould be. In case of write bursts and the STB is ‘1’, the slavesshould latch the current data before they turn into the wait state.When the slaves come back from the wait states, they should re-sume all signals remembered before they fell into the wait states.Note that, at the edges when slaves return from wait states, theonly thing they do is to resume the remembered signals. The firstIF-ELSE block is skipped at that clock edge.

The last thing about the burst transaction needed to explain is about thewait state. According to the WISHBONE specification, both the mastersand the slaves are allowed to insert wait states at any time during the bursttransactions when they cannot accept more data temporarily. The followingis an example about the wait state of the burst transactions. Since the rulessummarized above also suit for the wait state cases, the readers are suggestedto examine them in the example.

Figure 5.13: An burst transaction with wait states

Figure 5.13 is a constant read burst, which deliberately inserts some waitstates both by the master and the slave. When the master or the slave turnsinto wait states, they output ‘X’, i.e. unknown signal.

Edge 1: The master initiates the burst. The slave does nothing.

Edge 2:

Master: The master wants to insert a wait state. If there is no wait state,the master should hold all signals unchanged because the ACK is‘0’. But now the master must remember the value of all outputsignals, and repeats them when the master resumes from thewait state. By resetting the STB to ’0’, the master turns intothe wait state.

Slave: The slave, however, sees the STB=‘1’ at the edge 2. Since it can

88 CHAPTER 5

handle this reading request, the slave sets the ACK to ‘1’ andreturns the 1st data. Because this is the first read operation inthe burst, the slave outputs the data based on the 1st addresssent from the master. Besides, the slave needs to predict thenext address based on the 1st address and the CTI.

Edge 3:

Master: The master is back from the wait state. Now it should recall allsignals logged at the edge 2. Because at the edge 2 the mastershould keep signals unchanged, the signals at the edge 3 thusare the same as those at the edge 1.

Slave: The slave checks the ACK and finds out which is ‘0’, so it holdsall outputs (the 1st data) unchanged for 1 more cycle.

Edge 4:

Master: The master feels not like working again. If there is no wait state,the master should latch the current data and output the nextaddress because the current ACK is ‘1’. But due to the masterinserts another wait state, it only latches the 1st data, outputs’X’ for other signals. By now the first read operation to the 1staddress is finished.

Slave: The slave checks the STB and which is ‘1’. Since it is capableto keep working, it (1) sets the ACK to ‘1’ for 1 more cycle; (2)output 2nd data based on pre-calculated address at the edge 2;(3) calculate the next address.

Edge 5:

Master: The master comes back from the wait state, and resumes thesignals should have sent at the edge 4.

Slave: The slave wants to insert a wait state. As the STB is ‘0’ at theedge 5, the slave should keep all signals unchanged. But since thisis a wait state, it remembers the data and output ’X’ instead.

Edge 6:

Master: The master plans to insert another wait state. If no wait state,the master should repeat the signals resumed at the edge 5,because right now the ACK=‘0’. So once more it remembersthese signals, which will be resumed later.

Slave: The slave has been in the wait state.

Edge 7: Both the master and the slave resume from the wait state. Themaster recalls the signals remembered at the edge 6. The slaverepeats the signals remembered at the edge 5.

CHAPTER 5 89

Edge 8:

Master: The master wants to work. So it (1) sets the STB to ‘1’; (2)latches the current data (2nd); (3) outputs the next address(3rd) and other signals.

Slave: The slave wants to work too. It (1) asserts the ACK to ‘1’; (2)returns the 3rd data according to the pre-calculated address atthe edge 4; (3) calculate the next address based on the currentaddress, the CTI and the BTE.

Explanations to the Edge 9 and 10 are skipped.

Edge 11:

Master: Since the ACK is ‘1’ and the CTI is EOB, the master knows it istime to finish the burst. Because this is a read burst, the masterhas to latch the last 4th data, and then de-asserts all signals.

Slave: The slave also realizes it is the end of the burst by the STB andthe CTI. Because it is a read burst, the slave has nothing to doexcept for de-asserting signals.

Above all, almost all important things of the WISHBONE specification arecovered, from the interface signals to the bus transactions. We hope thework could help people to understand the specification easier, so that theWISHBONE standard will be even more widely accepted and applied intoreal projects. Best wishes to the WISHBONE in the coming competitionsof the interconnection standards.

5.3 CONMAX IP Core

5.3.1 Introduction

The WISHBONE interCONnect MAtriX IP Core (CONMAX) is an IP coredesigned by Rudolf Usselmann in Verilog HDL. It constructs a WISHBONEinterconnection with a crossbar switch structure, which can be used as the“bus” of a system. The CONMAX core supports up to 8 masters and 16slaves, as well as 4 priority levels. This is already enough to compose a quitecomplicated network. Using the core will save a lot time for the designers tothink about how to organize all modules as a system. Because the CONMAXhelps to handle the traffic within the system, all users need to do is just toconnect all other IP cores to the CONMAX.

If take a look at the website of the opencores.org, there are several other

90 CHAPTER 5

IP cores that also implement similar functions. But we finally chose theCONMAX among the competitors because of the following reasons: (1) itsupports more masters and slaves; (2) it provides more priority levels; and(3) it is designed by Rudolf Usselmann from the asics.ws [5]. According toour experience, the IP cores from that team have better quality and moredetailed documents.

The CONMAX IP core is not complicated. Its source codes are well struc-tured, plus with a clear and concise document [3]. The people who are goodat Verilog HDL can skip the thesis and turn to study the source codes andthe official IP core document instead.

5.3.2 CONMAX Architecture

There is a figure in the CONMAX document well exhibits the structure ofthe core. It is copied here as Figure 5.14. As we can see there are 8 masterinterfaces supporting maximum 8 WISHBONE masters, and also 16 slaveinterfaces for up to 16 slaves. Besides, there is a Register File included inthe slave interface 15 which is used to save the information of the prioritiesof the master interfaces.

Figure 5.14: Core architecture overview

5.3.3 Register File

The Register File is a group of 16 registers. In the specification it is saideach register is 32-bit width, but actually the source codes only implement16-bit width for the registers. So writing to the higher 16 bits does nothing

CHAPTER 5 91

and reading from those bits always returns zero. There is 1 arbiter in eachslave interfaces, which reads the data stored inside the registers to identifythe priorities of each master.

For instance if the register CFG12 contains the value of “0x000080F0” (onlylast 16 bits are valid), it means that for the slave interface 12, the prioritiesfrom the master 7 to the master 0 are 2, 0, 0, 0, 3, 3, 0, 0, i.e. the master2 and 3 both have the highest priority “3” to access the slave 12. The nexthigher master is the master 7, and then all other masters in the third level.Please note that this configuration only applies to the slave interface 12.It is possible to configure the other 15 registers individually with differentpriorities.

5.3.4 Parameters and Address Allocation

There are several parameters used to configure the CONMAX core. Theyare the dw, aw, rf addr, and pri selN. All of them can be changed ONLYin the Verilog HDL source file. This means after the CONMAX is compiledand downloaded into a FPGA, these parameters cannot be further modified.So the designers have to think about the values of the parameters whendesigning the FPGA system.

The dw and the aw stand for the width of the data bus and the address bus.They are allowed to be set to different numbers but usually are both set to32-bit. The rf addr is a 4-bit width argument. It defines the base address ofthe Register File. The pri selN is a group of 16 arguments with 2-bit widthfrom pri sel0 to pri sel15. Each of them corresponds to one slave interface.The pri selN specifies how many priority levels are supported. The 16 slavescan be set to support different priority levels if necessary.

The CONMAX uses the highest 4-bit of the address to decide which one ofthe 16 slaves is accessed. For example if the address width is 32-bit, a bustransaction accessing the addresses from 0xB0000000 to 0xBFFFFFFF willbe sent to the slave 11, regardless which master the transaction is from.

This also implies that once a slave is connected to the CONMAX, its addressrange is determined, e.g. the slave attached on the slave port 8 will have theaddress range from 0x80000000 to 0x8FFFFFFF.

The only exception is for the slave interface 15, because the Register Fileis included and its base address is set by the rf addr. If there is a matchbetween the 2nd highest 4-bit of the address and the rf addr, the RegisterFile is selected.

92 CHAPTER 5

For example if the 4-bit rf addr is configured as “0101”, when writing anumber 0x12345678 to the address 0xF500000C, the value of 0x5678 will bewritten into the register for the slave interface 3 (the last “C” is the addressof the register 3), because the highest 0xF selects the slave interface 15 andthe 2nd highest 4-bit 0x5 matches the rf addr “0101”.

As we can see from the example, because there is some addresses reservedfor the Register File, the address space can be used for the external slaveis reduced. In the last example, the IP core connected to the slave port 15will have 2 valid address ranges from 0xF0000000 to 0xF4FFFFFF and from0xF6000000 to 0xFFFFFFFF. All addresses starting with 0xF5 are reservedfor the Register File.

5.3.5 Functional Notices

To describe some notices of the functions of the CONMAX, an example isdesigned with the waveform showed in Figure 5.15.

Figure 5.15: An example of using CONMAX

The figure displays 2 masters and 2 slaves that working with the CONMAX.The m0 xxx i and m1 xxx i are the signals coming from the 2 masters andare connected to the master interface 0 and 1 of the CONMAX. Similarlythe s0 xxx o and s15 xxx o are the signals that output from the CONMAXto the 2 slaves. All other signals are ignored in this example.

The scenario demonstrated in the example is fairly simple. Firstly master0 accesses slave 0 and gets the grant from the CONMAX. Later master 1wants to access the slave 0 too, but is blocked because the grant is still takenby the master 0. After a while the master 0 turns to slave 15, so the grantof the slave 0 is released to the master 1 at that time.

There are several things needed to notice in the example:

CHAPTER 5 93

1. The CONMAX uses the CYC for its arbiters to determine which mas-ters should be given grants at the moment. But to activate the arbiterat least both the CYC and the STB have to be ‘1’ at the beginning.

In Figure 5.15 the CYC of the master 0 becomes ‘1’ quite early, severalcycles before the STB. But the s0 cyc o is output after a long time.This is because the arbiter is not activated until both the CYC and theSTB are ‘1’. However, between the 2 accesses to the address 0x8 and0xC there is a short gap on the STB. But the master 0 keeps holdingthe grant although the master 1 is ready at that time. This is becausethe arbiter is already activated, which judges only the CYC to makedecisions.

2. The CONMAX uses finite state machine (FSM) to judge the CYCinputs to make complicated arbitration, i.e. round-robin. This resultsin the CYC is usually delayed for several cycles than the other signals.This is showed in the figure. As mentioned before, according to theWISHBONE specification, an interface should respond only when thelogic AND of the CYC and the STB is ‘1’. So the delay of the CYCsacrifices the performance of the whole system to achieve a complicatedscheme of arbitration.

3. The CONMAX broadcasts bus transactions to idle slave ports exceptfor the CYC and the STB signals. As in the example, s15 addr o isthe same as s0 addr o although no one is accessing 0x8 and 0xC in thes15.

The designers have to be very careful to deal with the broadcasting.In some cases, even the STB may be delivered incorrectly for about 1to 2 cycles, because the arbiter is judging the CYC and hasn’t madethe decision yet. The only safe way is always to strictly check the CYCand the STB at every clock rising edge, and don’t respond to the bustransaction unless the logic AND of the 2 signals is ‘1’.

5.3.6 Arbitration

In the CONMAX specification, the only thing not so clear is about thearbitration. For example it says if all priorities are equal, the arbiters workin a round-robin way. But how the round-robin is performed? After studyingthe source codes, this section tries to summarize the rules of the CONMAXarbitration.

1. The CONMAX is reset by the RST signal of the WISHBONE. All RSTsignals in a WISHBONE network are normally connected together. Ifany RST input becomes ‘1’, the CONMAX will reset.

94 CHAPTER 5

2. After resetting, all registers of the Register File of the CONMAX arecleared. In this default case, all priorities of the masters are reset to‘0’, i.e. all masters have equal priorities.

3. The CONMAX works in a round-robin way for the masters with equalpriorities. In the arbiter of every slave interface, there is a FSM with 8states. The states are used to decide which master is allowed to accessthis slave at the moment. Let’s name the 8 states m0, m1, m2 . . . m7.If the current state is m(n), the master N will have the highest priorityto access the slave.

After master N have accessed successfully, the FSM will jump to m(n)state. All other priorities are arranged in a circle. If the current stateis m6, the priorities are m6 > m7 > m0 > m1 > m2 > m3 > m4 >m5.

For example, when power up the FSM resets to state m0. Now thepriorities for the 8 masters are m0 > m1 > m2 > . . .> m7. At thenext moment if 2 masters m0 and m1 struggle for the grant, the m0 will100% win because the current state is m0, even though both mastershave equal priorities in the Register File. Note that the round-robinarbitration here has no randomly selection mechanism.

Besides, please remember when m(n) is accessing, the FSM will turn tom(n) state. Thus the m(n) will have the highest priority. This impliesif the m(n) won’t quit and is always involved into the subsequentcompetitions, no one else at the same priority level can get the grantof the bus any more.

4. The masters can be set to different priorities by writing numbers intothe Register File, which could be from 0 to 3. The priorities are 3 > 2> 1 > 0. The masters with higher priorities always win the arbitration.But the masters with the same priority are still arbitrated in the round-robin way.

5. To support multiple levels of priorities, the value of the pri selN hasto be correctly configured. When the pri selN is set to “00”, it onlysupports 1 level priority. Writing numbers 1, 2, 3 into the registerstakes no effect. All masters are treated with the same level priorities.

6. When the pri selN is set to “01”, the CONMAX supports 2 prioritylevels. Now it is allowed to write “00” or “01” into the registers. Themasters with “01” have higher priorities than those have “00”. If theusers somehow write “11” or “10” into the register, the sequence willbe “11” = “01” > “10” = “00”. Because in case of the pri selN is “01”,the arbiter only judges the last bit in the Register File for the masters.

CHAPTER 5 95

7. When the pri selN is “10”, the CONMAX supports 4 levels. In suchcase all 0 to 3 priorities are valid and “11” > “10” > “01” > “00”.

8. If configure the pri selN to “11”, it is just like to set the pri selN to“01”.

9. When a master gets a grant, there is no way to interrupt it, unless themaster gives it up by de-asserting the CYC. Even if higher prioritymasters become ready during the time, they still have to wait for thenext competition until the current master terminates itself and givesup the grant.

10. The 16 slaves can be configured individually in the Register File. Thismeans one master can have different priorities when accessing differentslaves.

So far, all descriptions about the CONMAX are finished. We hope it is clearenough to understand how the interconnection is organized and operated bythe CONMAX, which is the one of the most important components in thesystem.

References:

[1] Webpage, SoC Interconnection: WISHBONE, OpenCores Organization,http://opencores.org/opencores,wishbone, Last visit: 2011.01.31,This is the official WISHBONE page which recommends the WISH-BONE as the preferred bus standard.

[2] Specification, Specification for the: WISHBONE System-on-Chip (SoC),Interconnection Architecture for Portable IP Cores, OpenCores Organi-zation, Revision: B.3, Released: September 7, 2002.

[3] Rudolf Usselmann, WISHBONE Interconnect Matrix IP Core, Rev. 1.1,October 3, 2002.

[4] Webpage, A message used to post at Opencore’s forum, Wishbonespec clarification, http://osdir.com/ml/hardware.opencores.cores/2007-04/msg00030.html, Last visit: 2011.01.31,This message was posted at the Opencore’s forum but now has removed.A snapshot of the message is found at the link above. It contains usefulinformation to explain the LOCK signal of the WISHBONE standard.

[5] Website, ASICS.ws, http://asics.ws/, Last visit: 2011.01.31,This is the website of a specialized FPGA/ASIC design team. Theyproduced many free IP cores with very good quality and documents.

http://opencores.org/opencores,wishbone

http://osdir.com/ml/hardware.opencores.cores/2007-04/msg00030.html

http://osdir.com/ml/hardware.opencores.cores/2007-04/msg00030.html

http://asics.ws/

96 CHAPTER 5

[6] Rudolf Usselmann, OpenCores SoC Bus Review, Rev. 1.0, January 9,2001,This article compares several bus standard. It is a little out of date butworth to take a look.

Chapter 6

Memory Blocks andPeripherals

In the previous chapters the 2 important modules of the hardware system,the OpenRISC processor and the WISHBONE interconnection IP core, havebeen introduced separately. Now this chapter continues to finish all otherhardware modules of the platform. They are the memory blocks and periph-erals.

First let’s take a review of the hardware system architecture, which has beenshowed before in Chapter 3.

Figure 6.1: Hardware platform architecture

97

98 CHAPTER 6

As we can see, the CONMAX IP core constructs the WISHBONE networkfor the whole system. All other blocks are connected to the WISHBONEnetwork through the CONMAX. The CONMAX has 8 master interfaces and16 slave interfaces. 2 of the master interfaces are taken by the OpenRISCprocessor on the left side. While on the right side are the memory blocks andperipherals acting as the WISHBONE slaves that are going to be discussedin this chapter. The numbers starting with “m” or “s” mark the interfaceIDs of the CONMAX used in the hardware design.

There are 2 colors. The blue blocks are real silicon chips on the DE2-70board produced by different manufactures. They are not the part of theFPGA design. In this chapter we will not focus on the details of those ICchips, which please refer to their specifications and application notes. Wecare more about how to interface them with the FPGA system. The whiteblocks are the IP cores implemented inside the FPGA. We spent quite sometime working on the IP cores during the thesis project, so naturally theyshould be emphasized. These blocks are the actors in the leading roles of thischapter and will get the chance to show up one by one in the subsectionsbelow.

Of all the IP cores, Memory Controller, UART16550, and GPIO come fromopencores.org. They are of very good qualities and helped a lot to accelerateon the system design because we don’t have to implement again those blocksfrom scratch. Except for the 3 IP cores, the on-chip RAM is an IP corefrom ALTERA. The other white blocks, i.e. the on-chip RAM interface, theDM9000A and WM8731 interfaces, are designed by me or my partner LinZuo.

This thesis was done by 2 people. On the FPGA level, my partner Lin wasresponsible for the CONMAX, Memory Controller and DM9000A Interface,while I took charge of the rest blocks as well as the system level integration.Due to the administrative reasons, each of us had to write a separate thesis.So please refer Lin’s thesis [1] as well. There are more information for theIP cores that Lin was working on in his thesis.

This chapter intends not only to introduce those IP cores, but also sharesome design experiences. Because if the thesis is all about “introduction”, itwould probably become another specification of the IP cores, and definitelywill not as good as the ones written by the IP core designers.

CHAPTER 6 99

6.1 On-chip RAM and its Interface

All processors need memories to store instructions and data. This is thesame in the OpenRISC OR1200. When designing the platform, we tried toadd various types of memories to satisfy this requirement. One of them isthe FPGA on-chip RAM.

The ALTERA’s Cyclone II EP2C70 FPGA on the DE2-70 board contains1152000 memory bits, which in turn is about 140KB. Except for the memo-ries used or reserved for the other hardware modules, the rest can be orga-nized as an on-chip RAM block for the OpenRISC processor. In the thesisproject we made an on-chip RAM block with 32KB size.

ALTERA provides the designers a way to organize and utilize the on-chipmemory resources easily and efficiently—via ALTERA’s memory IP cores.Several types of memory IP cores are supported by ALTERA’s Quartus IIsoftware. In this project we chose the most general and simplest one—1-portRAM IP core. In Figure 6.1, the ALTERA’s 1-port RAM core is showed asthe “on-chip RAM”.

Because the 1-port RAM core has interface signals different from the WISH-BONE bus signals, some extra conversion logic is needed to connect it tothe WISHBONE network. The conversion logic is showed as the “on-chipRAM interface” in Figure 6.1. When the OpenRISC CPU is trying to accessthe on-chip memory, the WISHBONE bus transactions will be sent to theon-chip RAM interface through the CONMAX. There the signals will betranslated and forwarded to the 1-port RAM core.

In the thesis archive, 3 design files of the on-chip RAM block are saved underthe “/hardware/components/ram” folder. The file “ram0.vhd” is automat-ically generated by the Quartus. It is the description file of the ALTERA1-port RAM. The file “ram0 top.vhd” designed by us includes the interfaceconversion logic. And the file “ram0.mif” contains the data that to be storedin the 1-port RAM. The RAM will be initialized with the data in ram0.mifevery time when the FPGA is programmed.

In the following sections, firstly we will discuss about the advantages of usingthe on-chip RAM, which gave us reasons to spend time on it. Then theALTERA 1-port RAM core as well as its interface logic will be introduced.After that we will explain how to organize the memory block, because whichhas to follow the WISHBONE bus specification.

100 CHAPTER 6

6.1.1 On-chip RAM Pros and Cons

2 advantages gave us the reasons to make an on-chip RAM block for theproject:

1. Access to the on-chip RAM is much faster comparing to the externalRAMs.

2. The contents of the on-chip RAM can be easily programmed, modifiedand monitored by Quartus.

The first advantage is quite well known and thus no need to explain in detail.For the FPGA on-chip RAMs, because both the processor and the memoryblock reside in the same FPGA chip, it results in simpler interfacing logicand shorter accessing time. For ALTERA 1-port RAM core, it takes only 1clock cycle to read or write. This makes the on-chip RAM especially suit forworking as cache, stack, or storing global variables.

The second advantage provides great convenience for developing software.ALTERA has comprehensive tools to control the on-chip RAM. First, thedesigners can create a MIF file and link it to the 1-port RAM core. Whenprogramming the FPGA, the Quartus will download the data in the MIFfile into the on-chip RAM. Second, there is a tool in the Quartus called“In-System Memory Content Editor”. It can be used to examine or modifythe data stored in the on-chip RAM dynamically. With these features, thedesigners practically get a programmer and a basic debugger. They canalready make some simple software applications with the tools.

The limitation of the on-chip RAM is as obvious as its advantages. It isalways much smaller than the external RAMs because the higher buildingcosts. In the thesis project, we have only 32KB on-chip RAM, but externally2MB SSRAM and 64MB SDRAM. If the program data is over 32KB, theyhave to be stored in the external RAMs. The external memories will bediscussed in the later sections.

6.1.2 ALTERA 1-Port RAM IP Core and its Parameters

For the ALTERA 1-port RAM core itself, there is not too much to talkabout because the core is quite simple and straightforward to use. Like allother ALTERA IP cores, the 1-port RAM core is included in and installedtogether with the Quartus II software package. It can be launched from theQuartus II –> Tools –> MegaWizard Plug-In Manager and then unfoldingthe “Memory Compiler” category. After the wizard is started, the Quartus

CHAPTER 6 101

will ask to fill in some parameters for the RAM to be implemented. Dueto the plenty explanations, it shouldn’t be hard to understand what theparameters are about. There is also a user guide document covering moredetails about this IP core [2].

In the following table, all chosen parameters are listed. The table should behelpful when the readers want to recreating the same RAM block.

Data Width (q) 32-bitHow Many 32-bit 8192 (8192 * 32-bit = 32KB)Memory Type autoClocking Method single clock

Extra Functions and Pins register output port qcreate a byte enablecreate an “aclr”

Memory Initialization ram0.mif

File Generation only the vhd file would be enough

Table 6.1: Parameters of 1-port RAM IP core

Please note that we were using Quartus 8.0 sp1 web edition, which identifiesthe version of the on-chip RAM IP core.

The more interesting part is how to decide the values of the parameters. Inthe later sections we are going to discuss several of them.

6.1.3 Interface Logic to the WISHBONE bus

Due to the signals from the ALTERA 1-port RAM IP core are not exactlythe same as the WISHBONE signals, some interface conversion logics areneeded. The file “ram0 top.vhd” describes the hardware design of this part.

Figure 6.2 gives an overview of the structure. The interface logic is more likea wrapper to the 1-port RAM core. The internal blue block is the 1-portRAM generated by the Quartus. The grey part behaves as a wrapper thatconverting the signals from the 1-port RAM to the standard WISHBONEsignals.

102 CHAPTER 6

Figure 6.2: On-chip RAM module internal structure

One of the WISHBONE signals is the “acknowledgement”, however the 1-port RAM does not have. As you may have noticed, this signal is generatedby a separated logic (the big white block) other than the 1-port RAM core.So when receiving a RAM read/write request we always give a 1-cycle ACKafter a certain time, i.e. the ACK signal does not rely on the 1-port RAM coreoutputs. This seems not reasonable, because we are acknowledging whateverthe outputs of the 1-port RAM core are. But due to the 1-port RAM actuallynever goes wrong and has a fixed delay, we can assume that the outputs willalways be ready at a certain point, and therefore set the ACK signal atthat moment. Also note that, each ACK should be set high for only 1 clockcycle long, which we’ve mentioned in the WISHBONE chapter. Otherwisethe CPU will be confused because it considers multiple ACKs returned fromthe RAM core.

6.1.4 Data Organization and Address Line Connection

If take close look to the source codes in the “ram0 top.vhd”, you may feelcurious about the line:

ram address <= wb addr i (14 downto 2)

It means only the WISHBONE address inputs 14–2 of the 32 WISHBONEaddress lines are connected to the 1-port RAM Core (address lines 12–0).

CHAPTER 6 103

This is also illustrated in Figure 6.2.

The way of the address connections was not arbitrarily decided. On thecontrary it is clearly specified in the WISHBONE bus standard. Table 6.2below is copied from the WISHBONE specification [3] section 3.5 page 66.It tells how to organize data in the systems with 32-bit bus width and 8-bitgranularity. Note that this is a WISHBONE “RULE”, i.e. all WISHBONEimplementations must obey.

Table 6.2: Data organization for 32-bit ports

Table 6.2 shows that the system should use address range 63–2 if there are64 address lines. In our case, only addresses 14–2 are used because we haveonly 8192 address entries (213 = 8192). Except for the valid address range,the table also describes how the SEL signal should select the active portionof the 32-bit data bus.

The rest contents of the section are dedicated to explain Table 6.2. Hope itcan help to understand the configuration easier.

Like most CPUs, the OpenRISC OR1200 is also with the 8-bit (1 byte)granularity, i.e. from the CPU’s perspective there is one byte stored at eachunique address. This sometimes gives people wrong impression that thinkingan N bytes memory block is organized as the width of 8-bit times the lengthof N. However, this is not true. For example in this project we set datato 32-bit width on the 1-port RAM core, as mentioned in Table 6.1. Thememory block is organized as 32-bit * 8192, which gives in total 32KB size.

The data width of the memory has to be 32-bit is because the OpenRISCOR1200 is a 32-bit processor. A “32-bit” processor implies that the width of

104 CHAPTER 6

the data bus of the CPU is 32-bit and all registers inside the CPU are 32-bitas well. This means, to be efficient, the processor should be able to (andin fact often) read and write the data with the same width as its registers.When fetching data to fill in any of the CPU registers, a 32-bit data shouldreturn. If the RAM core is set to 8-bit width, the CPU has to access thememory block 4 times to make a 32-bit data. This certainly takes longertime and therefore should be avoided. This is the reason why we set theparameter of the data width (q) of the 1-port RAM to 32.

The different memory block width (32-bit) and the data granularity (8-bit)cause a problem, that the CPU thinks there is 1 byte mapping to eachaddress but in fact it is 4 bytes storing at each physical address of thememory block. To compromise the width mismatching, the solution is toshift address connections by 2, as well as to introduce the SEL signals.

Shifting address connections by 2, i.e. connecting WISHBONE address lines14–2 to the 1-port RAM address inputs 12–0, discards the last 2 bits of theWISHBONE addresses. As a result, all addresses are implicitly convertedduring the transmission. For example, accessing to the WISHBONE address0x7 (“0111” in binary) becomes accessing to the physical address 0x1 (“01”in binary) because the last 2 bits are ignored and the address is shifted.

The address shifting maps 4 continuous addresses on the CPU side into 1physical address on the memory block side. This coordinates the data widthmismatching. For example accessing to the addresses 0x0, 0x1, 0x2, and 0x3by the CPU are all going to the same physical address 0x0 in the memoryblock.

The side effect of grouping 4 bytes into 1 32-bit memory block entry is that,whichever the byte of the 4 continuous bytes that the CPU is trying toaccess, the memory block always returns the same 32-bit data. To identifywhich byte (or bytes) is wanted among the 4, the SEL signal has to be used.

According to the definition in the WISHBONE specification [3], the Ac-tive Select Line (SEL) signal indicates where valid data is expected on theDATA I signal during the read cycles, and where it is placed on the DATA Osignal during the write cycles. The SEL signal is always transmitting fromthe WISHBONE master to the slave, i.e. from the OpenRISC CPU to thememory block. The width of the SEL signal equals to the width of the databus divided by the data granularity. So it is 32/8 = 4 in our case. As itsname, the SEL signal selects the valid portion of the data signals. Each sin-gle line of the SEL signal matches one byte on the data bus. If 1 of the 4SEL bits is high, the corresponding byte on the data bus is valid and will beaccepted and processed. The other bytes are ignored regardless the values.Table 6.2 also shows that how the bytes on the data bus are matched with

CHAPTER 6 105

the SEL lines, at big and little endian respectively.

Figure 6.3 demonstrates a simple example. The CPU wants to write a byte(0xbb) to the address 0xB. It outputs a 32-bit data onto the data bus whileset the bit 3 of the SEL to high. During the transmission, the address 0xBis converted to 0x2 because of the address shifting. When the memory blocksees the bus transaction, the SEL lines will be checked. The highest bytemarked by the SEL3 will be accepted and put into the bits 31–24 at thephysical address 0x2 of the memory block.

Figure 6.3: Overview of data organization in OpenRISC systems

The “byteena” signal of the ALTERA 1-port RAM core is the perfect optionto handle the SEL input. Although the names are different, Byte Enableand Active Select lines are in fact the same thing. That’s why the “byteena”signal is turned on when parameterized the 1-port RAM core, and fed theSEL inputs directly to the byteena port as showed in Figure 6.2.

As the conclusion to this section, because we need to be able to efficientlyread/write 32-bit data for the 32-bit OpenRISC processor, the data widthof the memory block must be 32-bit. To compromise the mismatching ofthe CPU’s 8-bit granularity and the memory block’s 32-bit data width, theaddress connections are shifted by 2. So 4 continuous addresses at the CPUside are mapping to the same physical location of the memory block. Besides,the SEL signals are introduced to identify the valid portion on the databus. The example showed in Figure 6.3 also gives an overview of the dataorganization scheme.

106 CHAPTER 6

6.1.5 Memory Alignment and Programming Tips

As discussed in the last section, any continuous 4 bytes in the CPU’s per-spective stored at the addresses that starting from 0x0, 0x4, 0x8 or 0xC(e.g. bytes 0x0–0x3, bytes 0x4–0x7 etc.) are in fact stored at the same 32-bit physical location of the memory block. If the CPU needs all 4 bytes,instead of accessing 4 times from the memory block, it can simply get a 32-bit data at a time. This operation can be done by for example the followingC code:

unsigned int i = *(unsigned int *) 0xXXXXXXX0; (legal)

Here the “X” can be any hex numbers from 0 to F.

When executing the line above in hardware, the CPU will send the address0xXXXXXXX0 onto the WISHBONE bus, while turn on all the 4 SELsignals. As soon as the memory block finds out the 4 SEL lines are high, itrealizes the 32-bit data stored at 0x0 are all wanted by the CPU. Then itfeeds back 32-bit data with 4 valid bytes. And the CPU will consider it hasreceived 4 bytes stored at the continuous address 0x0–0x3.

However, it is not possible to access an integer type data from the addressesthat do not end up with 0x0, 0x4, 0x8 or 0xC. For example, the followinglines are illegal:

unsigned int i = *(unsigned int *) 0xXXXXXXX1; (illegal)unsigned int i = *(unsigned int *) 0xXXXXXXX2; (illegal)unsigned int i = *(unsigned int *) 0xXXXXXXX3; (illegal)

Those lines are illegal because the 4 bytes trying to access are not stored inthe same physical location of the memory block. Figure 6.4 below shows thecase. In the figure, each line is a physical address that contains 32-bit data.The memory block can only read or write a whole line with one operation.So for those address mod 4 = 0, they can be accessed at a time. If trying toaccess 32-bit data stored for example at address 0x2, the 4 bytes of the data(i.e. 0x2–0x5) will cross the line. So they cannot be done in 1 operation, but2 instead.

Figure 6.4: Legal and illegal memory accesses

CHAPTER 6 107

To avoid illegal accesses, the OpenRISC CPU makes the rules for the datastorage, which is called the Memory Model or the Memory Alignment. Ac-cording to the OpenRISC Architectural Manual [4] Section 7.1 (also men-tioned in 3.2.2 and 16.1.1):

Memory is byte-address with halfword access aligned on 2-byte boundaries,signleword accesses aligned on 4-byte boundaries, and doubleword accessesaligned on 8-byte boundaries.

In another word, 32-bit data (int type) must always be placed in the ad-dresses starting from 0x0, 0x4, 0x8 and 0xC. Long type or double type thattakes 8 bytes should be placed at the addresses starting from 0x0 or 0x8.Short type (2 bytes) should be placed in the addresses starting with evennumbers. And the 1 byte char type data can be placed anywhere.

After compiling a software project, all variables are converted into memoryobjects. Normally it is the linker who takes care of the memory alignment,and gives the objects correct absolute addresses in the physical memory. Nor-mally, the programmers do not have to think about the memory alignment.But they should be careful when the addresses are explicitly referenced insource codes, like:unsigned int i = *(unsigned int *) 0xXXXXXXXA; (illegal)unsigned short i = *(unsigned short *) 0xXXXXXXX8; (legal)unsigned short i = *(unsigned short *) 0xXXXXXXXF; (illegal)unsigned char i = *(unsigned char *) 0xXXXXXXXF; (legal)

The OpenRISC OR1200 itself has an exception mechanism to handle theillegal accesses. When it detects the next read/write operation is not going toan aligned location, internally the CPU throws out an alignment exception.The next access will be discarded and the CPU program counter (PC) willjump to the address 0x600. Please refer to OpenRISC Architectural Manual[4] Chapter 6 for more information.

For programmers, some tips are good to know regarding to the memoryalignment, which could help to make better software.

One tip is that using 32-bit type variables in OpenRISC OR1200 systemsis the most efficient in performance. Bus transactions that accessing to char(8-bit), short (16-bit) and int (32-bit) type variables take the same time tocomplete. When accessing 8-bit data, 75% bandwidth of the 32-bit bus arenot utilized, which is a big waste. So if several adjacent bytes are calledfrequently in a program, like a part of an array, it is better to define aninteger pointer to access them all together. Besides, accessing to a 64-bitlong type takes twice time comparing to an int type. So if possible, usingint types instead of long helps to shorten the program executing time.

108 CHAPTER 6

Another tip helps to save memory space when defining structures. In casethere is a struct variable in the source code, the linker will allocate memoryfor each member of the structure from top to bottom, but the memoryalignment rule still has to be followed at that time. So when it has to,the linker will skip certain bytes of memories and leave them unused. Thisis called memory padding. Carefully defined orders of the members in thestructures help to eliminate the memory padding spaces and achieves themaximum memory utilization.

Figure 6.5: Example of memory padding

Figure 6.5 gives an example. In the first structure, the system has to domemory padding. Because after allocating the first char, the next integermust be 4-byte aligned. While in the second structure the padding is notneeded. Comparing the 2nd to the 1st, the 2 structures have the same mem-bers, but the first one wastes 4 bytes in total due to the memory padding.Please refer to OpenRISC 1000 Architecture Manual [4] Chapter 16.1.2 formore information of this topic.

6.1.6 Miscellaneous

2 more settings of the ALTERA 1-port RAM core are described in thissection.

We enabled the asynchronous reset (aclr) signal of the 1-port RAM core, forinterfacing the WISHBONE RST signal. Note that when the 1-port RAMcore is reset, only the output register is cleared. The contents of the memoryblock remain as before.

The registered output (q) is also enabled in the 1-port RAM core, becausethis is the prerequisite to add the asynchronous reset signal. The registeredoutput actually delays the result for 1 clock cycle, but makes the system

CHAPTER 6 109

more stable because both input and output signals are synchronized to thesystem clock.

So far, we have introduced the on-chip RAM module of the hardware plat-form, including the ALTERA 1-port RAM core and its parameters, theinterface logic that connecting the 1-port RAM core to the WISHBONEbus, also some specific concerns like address connection shifting, memoryalignment and memory padding etc.

6.2 Memory Controller IP Core

As described in the previous section, the FPGA on-chip RAM gives thebest performance comparing to the other types of memories, but it usuallyhas very limited storage capacity. To satisfy the need of memories, externalRAMs have to be used. There are a 2MB SSRAM chip and a 64MB SDRAMchip on the DE2-70 FPGA board. They are directly wired to the FPGA chipin hardware. With the help of the Memory Controller IP core, we easilymanaged to utilize these external RAMs for the system.

The Memory Controller IP core works like a bridge between the OpenRISCCPU and the external RAM ICs. It has a WISHBONE interface. So it canbe simply connected to the WISHBONE network or directly to the Open-RISC CPU. It also has an external memory interface to drive the memoryICs. When the CPU or other WISHBONE masters are trying to access theexternal memories, the WISHBONE bus transactions will be sent to theMemory Controller, where the read/write requests are translated to the ex-pected logic signals with correct timing according to the type of the externalmemory chips and the Memory Controller configurations.

Figure 6.6: Memory Controller in the OpenRISC system

110 CHAPTER 6

The texts of this section intend not to analyze the internal architectureand the HDL design of the Memory Controller IP core, but rather focus onexplaining how to use the IP core.

In the following sections, first the IP core and its attractive features areintroduced. Then we will continue with the configurations of the IP core bothfrom the hardware side and the software side. At the end, the possibility ofthe performance improvement is discussed.

6.2.1 Introduction and Highlights

The Memory Controller is one of the open source IP cores from the Open-Cores organization. Its source codes are completely open and free to down-load at the link [5] of the opencores.org website. Worth to mention, thesource codes of the core are very well organized1. This benefits the userswho want to study, modify or improve the core.

The IP core is released under a BSD-like license, which is not an exact BSDlicense but even less restricted. The full texts of the license can be found atthe header of every source file. For the convenience to read, the license iscopied here:

This source file may be used and distributed without restriction providedthat this copyright statement is not removed from the file and that anyderivative works contains the original copyright notice and the associateddisclaimer.

After this paragraph there is a long disclaimer, which is the same as inthe BSD license that claiming no warranty and liability of the usage of theMemory Controller IP core.

As we can see from the license, it is allowed for everyone to modify the IPcore, integrate it to another project, and redistribute2 the project, as longas the header information is kept in all source files related to the MemoryController. The restriction of the license is really nothing if comparing towhat we gain from the core. For more information about the BSD license,please refer to Chapter 2.

The Memory Controller is another IP core used in this project that producedby Rudolf Usselmann and the ASICS.ws [7]. Like many other IP cores com-ing from the ASICS.ws, for example the CONMAX IP Core, the MemoryController also gave us very good impression because of its good quality, use-

1See Figure 27 at page 43 of the Memory Controller IP core user manual [6].2or sell, in another word

CHAPTER 6 111

ful document and a lot more. We hereby acknowledge to the author againfor this great contribution.

Below, there are several summarized highlights of the Memory ControllerIP core. They may become the reasons convincing you to use the core in thenext project.

• Improve the productivity

The top reason to use an IP core is always to achieve the design reuse,and therefore accelerate the development of new projects. But some-times making a decision to use an IP core can be tricky. Because ifthere is a lack of documents or the IP core is full of troubles to work,it can cost much more time than expected to debug it. However, thisis not the case for the Memory Controller. The behavior of the IP corealways matches the descriptions in the user manual. So when doing theproject, we had a good trust to the IP core and in fact didn’t make alot simulation at the IP core level to verify the timing etc. before inte-grating it into the system. And the system did work without spendingextra time from us.

The Memory Controller indeed greatly improved the productivity forour project. With the IP core, we managed to make the SDRAM IC towork with the OpenRISC system within a week. But if we had to writea controller for the SDRAM, implementing the read/write timing anddynamic refreshing and so on, there is no way to imagine how muchtime we would spend on it. According to our supervisor Johan, it couldtake half a year even for experienced engineers.

• WISHBONE compatible

The Memory Controller IP core has a WISHBONE interface, whichis fully compatible to the WISHBONE bus specification Rev. B [3].The width of the address and the data bus of the IP core are fixed to32-bit, but it is not a problem because the OpenRISC OR1200 CPU isalso 32-bit width. Different from many other so called “WISHBONEcompliant” IP cores that only support single read/write operations,the Memory Controller also support WISHBONE burst transactions.With the burst transactions, the performance of the communicationsbetween the CPU and the memory will be enhanced a lot. But due tothe limitation of the time, we didn’t try out this feature in the thesisproject. That is a regret.

• Support a wide range of memory devices

The Memory Controller is designed for general purpose but not specificto a certain type of memory IC. It supports a wide range of memory

112 CHAPTER 6

devices, including SSRAM, SDRAM, FLASH, ROM, EEPROM etc.Almost all common used memory chips that configuring for 32-bitcomputing systems can be easily driven by the Memory Controller.On the DE2-70 board, we have a 512K*36 SSRAM chip and 2 16M*16SDRAM chips. The Memory Controller can work with both of typesafter setting the parameters correctly.

6.2.2 Hardware Configurations

Before using the Memory Controller, it has to be correctly configured. Thereare 2 types of configurations. The one has to be done at the hardware level,for example the address allocation. To do the hardware configuration it isneeded to modify the parameters of the HDL codes. Those parameters willtake effect to the FPGA internal logics after the compilation by the Quartus.The other type of configurations can be done at the software level by writingthe Memory Controller registers, for example the timing configurations forthe external memory ICs. The software programs are responsible to do this,usually during the initialization phase.

In this section, we introduce the hardware configurations, including the ad-dress allocation, the power-on configuration (POC), and some other HDLmodifications. The software configurations like the timing parameters forthe SSRAM and the SDRAM will be discussed in the next section.

All the Memory Controller hardware parameters are contained in the file“mc defines.v”, where the users can easily modify them based on the systemrequirements. The file is globally included by all other source files.

6.2.2.1 Address Allocation

The Memory Controller IP core and the external memory devices attachedto it have to be given a proper range of addresses, so the CPU knows whereto access the data in the physical spaces.

The Memory Controller IP core has a 32-bit address bus. It can be dividedinto 4 sections when considering about the address allocation.

First, as talked before, the Memory Controller is directly connected to theWISHBONE network through the CONMAX IP core. The slave port of theCONMAX that the Memory Controller is connected to, decides the highest 4bits (31–28) of the addresses given to the Memory Controller1. For example,

1Please refer to the section 5.3.4.

CHAPTER 6 113

if the Memory Controller is on the port 10 of the CONMAX, the assignedaddresses range from 0xA000 0000 to 0xAFFF FFFF.

Second, in the mc defines.v, 2 definitions MC REG SEL and MC MEM SELdetermine either the internal registers of the Memory Controller or the exter-nal memory spaces are being accessed. If the expression of the MC REG SELis true, the Memory Controller internal registers are selected. If the expres-sion of the MC MEM SEL is true, the external memories are selected.

In our project, MC REG SEL is set towb addr i[27] == 1’b1

and MC MEM SEL is set towb addr i[27] == 1’b0

It means the bit 27 of the 32-bit address is used to select the internal registersor the external memories. For example, in page 31 the Memory Controlleruser manual [6] there is a list of registers. If assuming the CONMAX port10 is in use, the address 0xA800 0010 will be mapped to the register CSC0,because the bit 27 here is “1”.

Third, the Chip Select (CS) configuration is the next step to consider. TheMemory Controller supports maximum 8 CS signals, i.e. it is possible toconnect up to 8 external memory ICs to the same Memory Controller. In themc defines.v, users can define how many CS signals are going to implement.Unused chip selects are better to comment out to save the FPGA resources.CS0 is always enabled by default. There is no way to disable it.

To activate a CS, it requires a combination of the CSCn and BA MASKregisters. For each CS signal there is a Chip Select Configuration Register(CSCn), while the BA MASK is valid for all the CS signals. The values ofthe CSCn and the BA MASK registers are initialized by software. If thefollowing equation is true, the corresponding CS signal is set to low to selectthe external IC:

CSCn[23:16] logicAND BA MASK[7:0] = input address[28:21]

Figure 6.7: Logic to activate a CS

114 CHAPTER 6

3 bits are enough to identify 8 chips. The bits 26–24 of the input addressesmay be reserved for this purpose. For example, if CSC5 21–19 are set to“101” and the BA MASK is set to 0x38, all the input addresses in theformat of 0xX5XX XXXX or 0xXDXX XXXX will activate CS5.

It is usually not the case that 8 memory chips are connected to the sameMemory Controller, so it is possible to reserve fewer bits for the CS. Forexample in our project only 1 bit is configured to enable the CS0 becausewe need just 1 chip select signal1.

Figure 6.8: 32-bit address configuration for Memory Controller

Figure 6.8 above summarizes the allocation of the address. The first 4-bitaddress section is decided by the CONMAX IP core slave port ID. Thefollowing bit selects internal registers or external memory ICs. After that, 3bits can be reserved for 8 chip select signals. And the rest bits are saved forthe external memory address spaces.

The format in Figure 6.8 has 24-bit addresses for the external memory IC.So the memory size can go up to 224 = 16 MBytes per chip. Consider ifonly one CS is required, there is no need to spend 3 bits for identifying theCS signals. In this case the memory size can be 227 = 128 MBytes. Formost embedded systems, 128MB RAM would be enough. If still not, it isan option to use multiple Memory Controllers on different CONMAX slaveports. There is always a way to allocate the addresses.

6.2.2.2 Power-On Configuration (POC)

Sometimes the initialization of the Memory Controller looks like an interest-ing paradox. To make the Memory Controller working properly, its internal

1Actually we don’t even need this bit, because the CS0 can be always selected. The bitwas reserved mainly for the purpose of testing the CS signal.

CHAPTER 6 115

registers have to be correctly configured by the CPU. But the CPU needs toread the software instructions stored in the external memories to know howto configure the Memory Controller. However the external memories cannotbe accessed if the Memory Controller is not working properly.

The Power-On configuration of the Memory Controller tries to solve thisproblem. Every time the Memory Controller resets, it reads the signal levelsfrom the external bus. The value will be stored into the POC register. Thelast 4 bits of the POC will be then used to initialize the CSCn register togive a default basic working state to the Memory Controller, so that theexternal memories become accessible in spite of the timing configuration isprobably not optimized.

To give a definite logic value to Memory Controller POC register, the last 4bits of the external bus must be pulled up or low with resistors in hardware.Unfortunately in the DE2-70 board there is no such resistor. Besides, wedidn’t need the power-on configuration, because the software startup codeswere stored in the FPGA on-chip RAM, where the CPU can find out howto initialize the Memory Controller. As a result, we decided to disable thepower-on configuration with a little modification in the Memory Controllersource codes.

To disable the power-on configuration, firstly we created a new definition:‘define MC POC VAL 32’h00000002

And then in the “mc rf.v” we changed the line of POC to:if(rst r3) poc <= #1 ‘MC POC VAL;

In this way the POC register always get a default value 0x0000 0002 whenthe Memory Controller resets, which sets 32-bit bus width and disablesexternal devices.

In a similar way we also changed the reset value of the BA MASK registerbased on the address configuration.

6.2.2.3 Tri-state Bus

For most memory ICs the data ports are bidirectional. So the outputs ofthe Memory Controller must be tri-stated to high impedance when readingdata from external memories.

The Memory Controller doesn’t have the design for the tri-state outputs. Seepage 45 of the user manual [6]. This is because there is no way to know theFPGA architecture in advance. The ALTERA FPGAs have only tri-stategates at the I/O pins, but for some Xilinx FPGAs there are internal tri-state

116 CHAPTER 6

resources available.

The users have to implement the tri-state buffers for the Memory Controlleron their own. In the thesis project, we made 2 files of tri-state buffers forSSRAM and SDRAM respectively. The files are stored in the folder: /hard-ware/components/memif/.

6.2.2.4 Miscellaneous SDRAM Configurations

In the “mc defines.v”, there are 2 hardware configurations for SDRAM only:the refresh cycles and the power-on operation delay. Those parameters canbe found in the SDRAM datasheet.

6.2.2.5 Use Same Type of Devices on One Memory Controller

One important experience we learnt from the thesis project is to alwaysattach the same type of memory devices to the same Memory Controller.So for example if a Memory Controller has designed to support an externalSSRAM chip, do not put on any SDRAM or FLASH memory device to thiscontroller.

In fact, we tried to put the SSRAM and the SDRAM on the same MemoryController but with different CS signals. In this way we can better utilizethe 8 CS signals, and also save the FPGA resources because only 1 Mem-ory Controller is needed for both SSRAM/SDRAM. But the trial was notsuccessful. The SSRAM became slower together with the SDRAM becausethe Memory Controller is always refreshing the SDRAM. During the timethe accesses to the SSRAM are forbidden. Also sometimes there were wrongdata read back. At the end we decided to separate the SSRAM and theSDRAM with 2 different Memory Controllers.

6.2.3 Configurations for SSRAM and SDRAM

In the last section we discussed the Memory Controller configurations at theFPGA level. After the Memory Controller IP core is connected to the WISH-BONE network and the hardware configurations have been done correctly,the OpenRISC CPU should be able to address its internal registers.

This section talks about how to configure the Memory Controller internalregisters for the SSRAM and SDRAM. The registers have to be properly

CHAPTER 6 117

initialized before the Memory Controller can drive the external memoryICs.

On the DE-70 board, there is 1 chip of SSRAM ISSI IS61LPS51236A and2 chips of SDRAM ISSI IS42S16160B. All the configurations were decidedbased on their datasheets [8, 9].

The Memory Controller user manual Section 4 [6] gives the full list of theinternal registers.

There are 3 global registers in the Memory Controller valid for all CS signals:CSR, POC and BA MASK.

The POC and BA MASK registers have described before. We modified HDLcodes to give the 2 registers default values when power up.

The Control Status Register (CSR) is used only for SDRAM and FLASHmemory. For SSRAM it is not needed to change this register. For the SDRAM,the REF INT field is set to 3 (7.812us) in our case. This value comes fromTable 1 at page 32 of the user manual [6], because there are 2 chips of16M*16 SDRAM on the DE2-70 board. Also the Refresh Prescaler field hasto be configured for the SDRAM. The value of the following expression mustas close as possible to 488.28ns:

(Prescaler + 1) / System Clock => 488.28nsThe system for the thesis project uses 50MHz clock, so the prescaler is set to23, i.e. “10111” in binary. The other fields of the CSR can be left unchangedfor the SDRAM.

For each chip select signal, there is a pair of registers need to be configured:CSC and TMS.

The Chip Select Configuration (CSC) register determines the address rangeand external memory device type etc.

The Timing Select (TMS) register decides the timing parameters for theattached memory devices. For the SSRAM, the TMS is not used. The valuecan leave as default as 0xFFFF FFFF1. For the SDRAM, the TMS value isdefined by the datasheet and according to the Table 2 of the Memory Con-troller user manual [6] page 16. In our case, the value is set to 0x0724 0230.Note that the timing parameters here are very non-aggressive. For examplethe read/write burst between the Memory Controller and the SDRAM chipis disabled. These parameters may be reconsidered to improve the SDRAM

1User manual Section 4.5 [6] says its reset value is 0, but this is a mistake. After areset, the hardware initializes all TMS registers to 0xFFFF FFFF. See the HDL sourcefile mc rf.v.

118 CHAPTER 6

accessing performance.

The following table summarizes the register values for the SSRAM and theSDRAM.

SSRAM SDRAM

CSR 0x00000000 0x17000300POC 0x00000002 0x00000002

BA MASK 0x00000020 0x00000020CSC 0x00000823 0x00000691TMS 0xFFFFFFFF 0x07240230

Table 6.3: Memory Controller register configurations

6.2.4 Performance Improvement by Burst Transactions

In the previous sections we have introduced the Memory Controller IP core.Now let’s talk about the performance issue.

As mentioned at the beginning, the Memory Controller IP core supports theWISHBONE burst transactions. But due to the limited time of the thesisproject, we didn’t manage to investigate this feature. All data accesses inthe project between the CPU and the Memory Controller are single readsor writes.

Compare to the burst accesses, the single accesses consume a lot more timein the following 3 aspects:

1. For each new bus transaction, the CPU has to win the bus arbitrationfrom the CONMAX. This takes at least 1 bus cycle if there is no otherbus transaction currently ongoing, otherwise it takes even longer time.The burst transactions which contain multiple read/write operationsare more efficient.

2. When a bus transaction arrives, the Memory Controller has to takeseveral steps to handle it, like analyzes the target address, converts theWISHBONE signals based on the external memory devices etc. For theburst transactions, the Memory Controller can get the information ofthe next data to access in time, so it can better pipeline the internaloperations to save time.

3. The burst transactions also give the Memory Controller possibilitiesto utilize the burst capabilities of the external memory devices. For

CHAPTER 6 119

example, it is allowed for the SDRAMs to read out data continuouslystored in a bank, after the bank and the row are open. For single trans-actions, the bank has to be re-selected on every read/write operation.Apparently a lot of time is wasted on that.

My partner Lin Zuo in his thesis [1] has given a performance comparisonbetween the OpenRISC system and ALTERA NIOS II system. Table 8–8 ofhis thesis is cited below.

Platform Open Cores NIOS II/e NIOS II/s NIOS II/f

Clock (MHz) 20 20 20 20

Writing Time (s) 3.696 2.045 0.682 0.476

Reading Time (s) 3.880 2.123 0.760 0.527

Table 6.4: System performance test results

The table gives the time that the CPU completely reads/writes a 2MB ex-ternal SSRAM. The “Open Cores” means the OpenRISC CPU accessingthe external SSRAM through the CONMAX IP core and the Memory Con-troller IP core. While the ALTERA systems use the NIOS II processor, theAvalon bus, and the ALTERA’s SSRAM controller.

It is obvious that the ALTERA systems have better performance, but theinteresting part is that if comparing the 3 types of NIOS II processors (econ-omy/standard/fast), the NIOS II/e takes remarkably longer time than theother 2. Lin concludes it is the cache—the NIOS II/e doesn’t have any cachebut the others do—that takes a significant role. This conclusion is not com-pletely right. Indeed the cache is important, but the higher bus throughputshould be the definitive factor for the shorter accessing time. And whetherthe burst transactions are supported or not takes great effects to the busthroughput. It is easy to understand that the cache is not the main rea-son. If the data accessing on the bus is much slower than the CPU doingcalculations, the CPU still has to stop and wait the cache to be fulfilled,regardless the size of the cache is. In this case, it makes no big differencewith or without caches.

The following table is copied from the NIOS II Processor Reference Hand-book Chapter 5 [10]. The table compares the features of the 3 types NIOScores. It clearly shows that the NIOS II/s and /f support the instructioncache and more importantly the pipelined memory access (burst), but theNIOS II/e does not. The pipelined memory access largely increases the bus

120 CHAPTER 6

throughput, and the cache provides spaces for data buffering. This is the rea-son that the NIOS II/s and /f have much better computation performancethan the NIOS II/e.

Table 6.5: NIOS II processor comparison (part)

To conclude this section, based on the tables and the analysis above we canmake a rational assumption: it is likely that our open cores system will havea dramatic performance improvement after enabling the WISHBONE bursttransaction between the OpenRISC CPU and the Memory Controller.

6.3 UART16550 IP Core

The UART16550 IP core is another open source IP core coming from theopencores.org. The source codes can be found at the link [11]. Jacob Gordanis the author of the IP core.

The UART16550 IP core gets its name because it is designed to be maximallycompatible with the industrial standard National Semiconductors’ 16550Adevice. More information about the 16550 can be found at the National’swebsite [12, 13]. Note that at this moment the 16550D is the latest versionbut not “A” any more.

The UART16550 IP core is not fully identical with the National’s 16550.For example its FIFOs cannot be disabled. But in fact, most people like usdo not care about the difference between the UART16550 and National’s16550, because they are more interested in the “UART” part rather thanthe “16550” part.

The Universal Asynchronous Receiver/Transmitter (UART) is a piece of

CHAPTER 6 121

hardware that helping to create a serial connection for data exchangingbetween 2 machines, usually used together with RS-232 standard. For em-bedded systems, UART is the easiest way and the top selected solution tosetup a connection between the PC and the target board. With the serialconnection, it is possible to use terminal software like PuTTY on the PC tocontrol or debug the target systems. For the thesis project, we also neededsuch a connection for the open core platform. That became the reason toinvolve the UART16550 IP core.

Several features of the UART16550 are worth to be stressed:

1. The UART16550 is WISHBONE complaint. It has a WISHBONE in-terface which makes the IP core easily integrated into an OpenRISCbased system. Thanks to this feature, we spent only half a day tointroduce the IP core to the project.

2. The UART16550 has 2 FIFOs always enabled for transmitting andreceiving. The FIFO size is 16 bytes [14], and it is possible to setdifferent interrupt triggering levels. The existence of the FIFOs givesa large improvement on the UART communication performance. It canbuffer more data before the overflow when the CPU has to interruptthe current task to process the UART data.

3. The UART16550 supports 4 interrupts. All of them share the sameoutput signal INT O, which needs to be routed to the OpenRISCprocessor.

4. The UART16550 IP core implements the UART logic only. It stillneeds a RS-232 transceiver like the ADM3202 used on the DE2-70board to reach the RS-232 electrical characteristics.

Regarding to the details of how to use the IP core, please refer to theUART16550 specification [15].

To conclude this section, the UART16550 is an open source IP core de-signed with a WISHBONE interface. This feature makes it suitable for theplatforms internally using the WISHBONE interconnection, for example theOpenRISC based systems. Our experience also shows a serial connection canbe quickly built with the UART16550 IP core, which saves the developingtime and improves the productivity.

122 CHAPTER 6

6.4 GPIO IP Core

The General Purpose Input/Output (GPIO) IP core is used to drive 4 7-segment LEDs and monitor 4 buttons on the DE2-70 board. The IP coreis available at opencores.org [16]. It is designed by Damjan Lampret andGoran Djakovic.

The GPIO IP core might sound a little too “simple” to be called as an “IPcore”, but it is really helpful to have such a ready-to-use IP core in hand.Because if we really had to start working on the I/O design from scratch,most likely it would consume longer time than expect.

The GPIO IP core is tiny yet powerful and multifunctional. Several featuresare especially highlighted below:

1. Support up to 32 pairs of general inputs and outputs;

2. Support external clock input, so the input signals can be sampledbased on the clock rising edge;

3. Support bidirectional port, so the I/Os can be set to tri-state or open-drain for the external buses, provided the FPGA has such gate re-sources;

4. Has a WISHBONE interface, which makes it easy to work with theWISHBONE or OpenRISC based systems;

The GPIO IP core is easy to use. Firstly it has to be correctly configured.This can be done in the setting file “gpio defines.v”. Then the IP core needsto be connected to the WISHBONE network. And if the interrupt is used,the WB INTA O signal has to be wired to the OpenRISC CPU. After that,the GPIO registers will be accessible by the CPU. Mostly the RGPIO INregister is read for the input values, or the RGPIO OUT register is writtento set the output signals.

For more details, please refer to the GPIO IP Core Specification [17]. It hasincluded enough information for using the IP core.

To conclude this section, the GPIO IP core is not complicated in the func-tionality but it is handy to have the IP core prepared. Most projects requiregeneral I/O features in various cases like buttons, switches, LEDs or ex-ternal buses. Using the GPIO IP core can certainly accelerate the projectdevelopment.

CHAPTER 6 123

6.5 WM8731 Interface

6.5.1 Introduction

On the DE2-70 board, there is an audio chip WM8731 connected to theFPGA. The WM8731 is produced by Wolfson Microelectronics. It is a lowpower stereo CODEC (enCOder and DECoder) with an integrated headsetdriver. It supports microphone-in, line-in and line-out. And it is designed foraudio applications, like the portable MP3 player, speech player and recordersetc. For more information about the WM8731, please refer to its datasheet[18].

In the thesis project, we wanted to play music with the open core platform,so it is needed to drive the WM8731 audio CODEC with the OpenRISCprocessor. Therefore, an interface that connecting the external WM8731device to the WISHBONE network is required. Figure 6.9 below gives theoverall connections.

Figure 6.9: WM8731 Interface in the OpenRISC system

The interface was designed as an IP core by us. With the interface, it is pos-sible to configure the WM8731 with the OpenRISC processor and send themusic data. But the data receiving from the WM8731 is not implemented.So the microphone and the line-in are not supported.

6.5.2 Structure of the WM8731 Interface

The WM8731 chip has 28 pins, but most of them have been taken care of bythe designers of the DE2-70 board. All we need is to implement the digitallogics in the FPGA to communicate with the WM8731.

The WM8731 has a control interface and a data interface. The control in-terface is used to configure the WM8731 internal registers, while the data

124 CHAPTER 6

interface is used to transmit the music data from/to the WM8731. The con-trol interface can be selected to work as either a 3-wire SPI or a 2-wire I2Cinterface. Unluckily the DE2-70 board has the mode fixed to the I2C. Inthis way it saves 1 pin from the FPGA, but the I2C bus timing is morecomplicated to implement than the SPI. The WM8731 data interface alsohas multiple modes to choose. We selected the Left Justified mode becauseit is most straightforward.

Regarding to the I2C bus, there is something interesting to mention. Whenwe were working on the thesis project, at that time we didn’t know anythingabout the I2C. However, the WM8731 datasheet [18] doesn’t mention theterm “I2C” at all. It uses “2-wire MPU serial control interface” instead. Ittook several months for us after the thesis project was over to find out whatwe made was exactly an I2C interface. If we could have known it earlier,a ready-to-use IP core from the opencores.org like the I2C Controller [19]should’ve used to save the precious project time. Also we could’ve betterfollowed the I2C bus standard, like using 100k or 400k baudrate. It is curiousif the WM8731 producer had some special considerations about the I2Clicensing [20].

Figure 6.10 below gives the internal structure of the WM8731 interface. Itcontains 3 blocks, one for the WISHBONE bus, and two for the WM8731I2C bus and data bus respectively.

Figure 6.10: WM8731 Interface internal structure

As showed in Figure 6.10, the WISHBONE interface receives WISHBONEtransactions from the CPU. It also decides whether the received WISH-BONE transactions contain control or music data. 1 of the 32 lines of theWISHBONE address signal makes the decision. If the address line is low,the data is considered as the control data and will be written into the targetWM8731 register through the I2C bus. Otherwise the data is the music datato be sent through the data bus.

CHAPTER 6 125

The control interface sends out the control data on the I2C bus. It uses astate machine to transmit bit by bit in serial according to the I2C timing.It also adds an extra I2C control byte.

The digital interface sends out the music data in Left Justified mode. Toimprove the real-time performance, a 32-bit * 8192 stage (32KB) FIFO isincluded for buffering music data.

6.5.3 HDL Source Files and Software Programming

The WM8731 Interface is designed in VHDL. The source files can be foundin the /hardware/components/wm8731/. There are 5 files. The hierarchy ofthe files has showed in Figure 6.10.

In the project, the only address line of the WM8731 Interface is connectedto the 5th of the 32-bit WISHBONE address bus. The address 0xD000 0000is assigned for writing the WM8731 registers, and the address 0xD000 0010for playing the music data1.

We used the following codes to initialize the WM8731:#define WM8731 REG 0xD0000000*(volatile unsigned int *)(WM8731 REG) = 0x00000080;*(volatile unsigned int *)(WM8731 REG) = 0x00000280;*(volatile unsigned int *)(WM8731 REG) = 0x0000047F;*(volatile unsigned int *)(WM8731 REG) = 0x0000067F;*(volatile unsigned int *)(WM8731 REG) = 0x00000812;*(volatile unsigned int *)(WM8731 REG) = 0x00000A00;*(volatile unsigned int *)(WM8731 REG) = 0x00000C00;*(volatile unsigned int *)(WM8731 REG) = 0x00000E41;*(volatile unsigned int *)(WM8731 REG) = 0x00001023;*(volatile unsigned int *)(WM8731 REG) = 0x00001201;

Then the music data can be read from a file and send one by one like:#define WM8731 DAC DATA 0xD0000010*(volatile unsigned int *)(WM8731 DAC DATA) = music data 1;*(volatile unsigned int *)(WM8731 DAC DATA) = music data 2;*(volatile unsigned int *)(WM8731 DAC DATA) = music data 3;

The music data are 32-bit in width and have the following format.

1The addresses start with 0xD because the IP core is connected on the CONMAX slaveport 13.

126 CHAPTER 6

Figure 6.11: 32-bit music data format

6.6 DM9000A Interface

The DM9000A is a fully integrated and cost-effective low pin count singlechip fast Ethernet controller with a general processor interface, a 10M/100MPHY and 4K Dword SRAM [21]. The DE2-70 has a DM9000A IC on theboard as the Ethernet solution. Similarly to the WM8731, a WISHBONEinterface is needed in the FPGA to drive the DM9000A chip.

In the thesis project, my partner Lin Zuo was responsible for the DM9000Ainterface. He has this part comprehensively documented in his thesis [1]Chapter 4.5. Please refer to Lin’s thesis for more information.

6.7 Summary

In Chapter 6, we have introduced the memory blocks or the peripherals.They are important components in the OpenRISC reference platform. Atable is made below to give a review of all IP cores.

Name Section Category License

On-chip RAM Interface 6.1 Memory DBU1

ALTERA 1-Port RAM IP Core 6.1 Memory CommercialMemory Controller IP Core 6.2 Memory BSD-likeUART16550 IP Core 6.3 UART LGPLGPIO IP Core 6.4 IO LGPLWM8731 Interface 6.5 Audio DBUDM9000A Interface 6.6 Ethernet DBU

Table 6.6: List of memory blocks and peripherals

1Designed By Us

CHAPTER 6 127

The Memory Controller, UART16550 and GPIO IP cores are the open coresfrom opencores.org. They are impressive to us because of the high qualitysource codes and well documented user manuals. We proved they can surelywork well in a FPGA system. The open cores are with either the less re-stricted BSD license or the LGPL, which won’t give troubles if the IP coresare used for commercial purposes. We believe the IP cores are good optionsin the future projects.

References:


[2] User manual, RAM Megafunction User Guide, ALTERA Corporation,December 2008.

[3] Specification, Specification for the: WISHBONE System-on-Chip(SoC), Interconnection Architecture for Portable IP Cores, OpenCoresOrganization, Revision: B.3, Released: September 7, 2002.

[4] User Manual, OpenRISC 1000 Architecture Manual, OpenCores Orga-nization, Revision 1.3, April 5, 2006.

[5] Webpage, Memory Controller IP Core, OpenCores Organization,http://www.opencores.org/project,mem ctrl,Last visit: 2011.01.31.

[6] Rudolf Usselmann, Memory Controller IP Core, Rev 1.7, January 21,2002.

[7] Website, ASICS.ws, http://asics.ws/, Last visit: 2011.01.31,This is the website of a specialized FPGA/ASIC design team. Theyproduced many free IP cores with very good quality and documents.

[8] Datasheet, 256K x 72, 512K x 36, 1024K x 18 18Mb SynchronousPipelined, Single Cycle Deselect Static RAM, Rev. I,Integrated Silicon Solution, Inc. (ISSI), October 14, 2008.

[9] Datasheet, 32Meg x 8, 16Meg x16 256-MBit Synchronous DRAM,Rev. D, Integrated Silicon Solution, Inc. (ISSI), July 28, 2008.

[10] User manual, Nios II Processor Reference Handbook,ALTERA Corporation, November 2009, Available at:http://www.altera.com/literature/hb/nios2/n2cpu nii5v1.pdf,Last visit: 2011.01.31.

http://www.opencores.org/project,mem_ctrl

http://asics.ws/

http://www.altera.com/literature/hb/nios2/n2cpu_nii5v1.pdf

128 CHAPTER 6

[11] Webpage, UART 16550 Core, OpenCores Organization,http://www.opencores.org/project,uart16550,Last visit: 2011.01.31.

[12] Webpage, PC16550D Universal Asynchronous Receiver/Transmitterwith FIFO’s, National Semiconductor Corporation,http://www.national.com/mpf/PC/PC16550D.html,Last visit: 2011.01.31.

[13] Datasheet, PC16550D Universal Asynchronous Receiver/Transmitterwith FIFO’s, National Semiconductor Corporation, June 1995,Available at: http://www.national.com/ds/PC/PC16550D.pdf,Last visit: 2011.01.31.

[14] Webpage, 16550 UART, from Wikipedia,http://en.wikipedia.org/wiki/16550 UART, Last visit: 2011.01.31.

[15] Jacob Gorban, UART IP Core Specification, Rev 0.6, August 11, 2002.

[16] Webpage, General-Purpose I/O (GPIO) Core, OpenCores Organiza-tion, http://www.opencores.org/project,gpio,Last visit: 2011.01.31.

[17] Damjan Lampret, Goran Djakovic, GPIO IP Core Specification,Rev 1.1, December 17, 2003.

[18] Datasheet, WM8731/WM8731L Portable Internet Audio CODEC withHeadphone Driver and Programmable Sample Rates,Wolfson Microelectronics, Rev 4.7, August 2008.

[19] Webpage, I2C controller core, OpenCores Organization,http://www.opencores.org/project,i2c, Last visit: 2011.01.31.

[20] Webpage, I2C Licensing Information, NXP Semiconductors,http://www.nxp.com/products/interface control/i2c/licensing/, Last visit: 2011.01.31.

[21] Datasheet, DM9000A Ethernet Controller with General Processor In-terface, Final, DAVICOM Semiconductor, Inc., Version: DM9000A-DS-F01, May 10, 2006.

http://www.opencores.org/project,uart16550

http://www.national.com/mpf/PC/PC16550D.html

http://www.national.com/ds/PC/PC16550D.pdf

http://en.wikipedia.org/wiki/16550_UART

http://www.opencores.org/project,gpio

http://www.opencores.org/project,i2c

http://www.nxp.com/products/interface_control/i2c/licensing/

http://www.nxp.com/products/interface_control/i2c/licensing/

Chapter 7

Conclusion and Future Work

Finally the chapter will conclude the thesis. Also some interesting topics thatwe didn’t have enough time to try out are collected as the future works.

The thesis project was started in January, 2008. Most implementation wasdone in about 6 months. But due to some personal reasons, the writing ofthe thesis wasn’t finished until January, 2011. So an extra section is addedto give the technology updates in the last 2 years.

7.1 Conclusions

The goal of the thesis is to implement an open core based computing plat-form on a DE2-70 FPGA board. First a summary of the tasks that weachieved is given. The readers can compare them with the tasks listed inChapter 1 Section 2.

1. Studied open source licenses including the GPL, the LGPL and theBSD license; analyzed the influences of the licenses for the project,both for academic and commercial usages.

2. Created a digital system on the DE-70 board using ALTERA’s toolsand techniques, like the Quartus II, the on-chip RAM IP core etc.

3. Utilized and integrated 5 open cores in the digital system: OR1200processor, CONMAX, Memory Controller, UART16550, and GPIO.

4. Studied the WISHBONE interconnection protocol and added someexplanations in the thesis for the WISHBONE bus transactions.

129

130 CHAPTER 7

5. Learnt to use the OpenRISC toolchain for the software development,including GCC, GNU Binutils, GDB, Makefile, and OpenRISC simu-lator etc.

6. Designed 2 software tools, ihex2mif and proloader, to help downloadinguser programs to the DE2-70 board. The ihex2mif converts HEX filesto the ALTERA MIF format. The proloader behaves as a bootloaderwhich loading the programs from a PC via a serial connection.

7. Started to create a Hardware Abstraction Layer (HAL) library to col-lect the hardware interface functions for easier software developmentat a higher level.

8. Understood how to support context switches with the OpenRISC CPUand ported the uC/OS-II RTOS to the OR1200.

9. Extended the computing platform with Audio and Ethernet features.

10. Developed a MP3 player application to demonstrate the whole system.

11. My partner Lin Zuo did some more tasks, like porting the uC/TCP-IPstack, and comparing the performance with an equivalent system builtwith the ALTERA IP cores. For these parts please refer to his thesis[1].

Chapter 1 also mentioned 4 initial purposes of the thesis. Here they arerepeated:

1. Evaluate quality, difficulty of use and the feasibility of open source IPs

2. Design the system in a FPGA and also evaluate the system perfor-mance


4. Port embedded Linux to the system

Only for porting the embedded Linux, we couldn’t make it due to the timelimitation. For evaluating the system performance, it is described in mypartner’s thesis [1].

An important motivation to do this thesis was to evaluate thequality, difficulty and feasibility of the open cores. The way of theevaluation was to create a computing platform.

CHAPTER 7 131

5 open cores were utilized in the platform. Once again they are verifiedworkable. So we can conclude that it is feasible to use the open cores.

Regarding to the quality, it can vary from one open core to another becausethey are made by different designers and teams. For the 5 open cores involvedin the thesis project, we think they are with good quality. After the systemwas built, they worked functionally stable. So it is enough to prove the 5open cores are good enough for academic and research purposes. For thecommercial usages, more professional verifications might be still needed.

Comparing to the commercial IP cores, generally speaking the open coresare in short of documents and reliable technical supports. Therefore it relieson the qualifications of the design teams that how difficult an open core canbe studied and utilized into a new system.

Another motivation of the thesis was to investigate the impactsof the open source licenses. In Chapter 2, we introduced 3 widely usedlicenses: the GPL, the LGPL and the BSD license, and discussed the influ-ences of the licenses.

For commercial usages, we get the conclusion that it is not a problem touse the open cores covered by the LGPL and the BSD license. But therequirements of the licenses must be met. For example, for the LGPL a copyof the license text and the source codes of the IP core need to be attachedwith the distributed products. When coming to the GPL, we suggest theusers to think carefully because the GPL will force opening the design detailsof the other parts of the system.

For academic usages, all 3 licenses are possible if it is not an issue to openthe design details in case of the GPL.

One definite conclusion we made for the thesis is that everyoneshould keep an eye on the open cores. The open core community mightgrow slowly because the investments are lower comparing to the commercialworld, but it will never walk backwards. The existing open cores will becomebetter and better and the new open cores will appear sooner or later. If morepeople join in and even just give a small contribution each, the communitywill grow much faster. Meanwhile, the people will be able to find the properIP core at the 1st time when needed.

132 CHAPTER 7

7.2 Future Works

Due to the time limitation, there were some tasks we wanted to do betterbut couldn’t. Those tasks are described as the future works in this section.

7.2.1 Improve and Optimize the Existing System

The open core computing platform we contributed is able to work, but farfrom perfect. Many improvements are possible to increase the system per-formance, make it more stable and easier to use.

On the hardware side, there is still large space to improve the CPU efficiencyby reducing the CPU waiting time. As already analyzed in Section 4.2 andSection 6.2.4, if we can enable the OR1200 cache and MMU, separate theinstruction/data memory, utilize the burst transactions on the WISHBONEbus, and enlarge the throughput between the Memory Controller and exter-nal memory devices, more instructions can be read in a certain time. Thisimproves the MIPS directly because the CPU doesn’t have to stay in idlestate while waiting for new instructions. On the other hand, increasing theCPU frequency also improves the performance. For now the OR1200 hasa system clock of 50MHz. When using a higher number the Quartus gavewarnings about the internal timing. We believe the CPU clock frequencycan still go higher if some optimizations are made at the FPGA level.

On the software side, more energy can be invested to the OpenRISC toolchainto increase the productivity of the software development. For example, up-date to the latest toolchain; combine it with a front end IDE like Eclipse [2];build a JTAG connection and a debugger to download user programs eas-ier and to develop more complicated applications, etc. Also there are someother interesting topics like: port the Linux operating system; support morelibrary functions etc.

Besides, the testing to the system is always welcome, which detects theexisting bugs and makes the platform more stable. Writing user manuals isanother thing worth to do. It attracts people to use and improve the system.

7.2.2 Extension and Research Topics

New features are possible to extend based on the requirements of the appli-cations, like adding new open cores to support the VGA/LCD display, thekeyboard and mouse etc.

CHAPTER 7 133

For academic research, we believe the WISHBONE interconnection is a goodstarting point as mentioned in Section 5.1. Some topics are interesting like:how to improve the bus throughput; how to adapt the WISHBONE to themulti-processor systems; how to bridge the WISHBONE with other busstandards like PCI or ALTERA’s Avalon.

7.3 What’s New Since 2008

Because of some personal reasons, the thesis writing was finally finished inJanuary, 2011. In this section, the latest news of the open core technologiessince 2008 is listed below:

• Since November, 2007, the Swedish company ORSoC [3] took overthe maintenance of the OpenCores Organization and the OpenRISCproject. Thanks to them the open core community grows steadily.Some new features are provided by the opencores.org like monthlynewsletter, online shop, SVN file system, and even the translation ofthe webpages into Chinese language.

• For the OpenRISC OR1200 hardware, there were 2 big updates ac-cording to the project news webpage [4]:

On August 30th, 2010, “Big OR1200 update. Addition of verilog FPU,adapted from fpu100 and fpu projects, data cache now has choice ofwrite-back or write-through modes.”

On January 19th, 2011, “OR1200 update, increasing cache configura-bility, improving Wishbone behavior, adding optional serial integermultiply and divide.”

• The OpenRISC toolchain [5] has greatly improved. The latest toolchainincludes the GCC-4.2.2 with uClibc-0.9.29, GDB-6.8 and or1ksim-0.3.0. A precompiled toolchain package for the Cygwin is also available.

• Some OpenRISC documents have been updated or created [6–8].

• The other 4 open cores, i.e. CONMAX, Memory Controller, UART16550and GPIO, have no change since 2008.

• A new WISHBONE standard, Revision B.4 [9], has released. This newversion supports pipeline traffic mode.

• The Micrium published a new RTOS kernel uC/OS-III, but the sourcecodes are no longer open for academic users.

The uC/OS-II remains the same as before, but the example of portingthe uC/OS-II to the OpenRISC was removed from the website. Luckily,

134 CHAPTER

in the SVN of the OpenRISC project at opencores.org another portingexample is added now [4].

The source codes of the uC/TCP-IP were also removed from the Mi-crium website.

• An enhanced DE2-115 board [10] with a Cyclone IV FPGA and morememories is available from the Terasic [11]. Again a lower price isoffered for academic users.

References:


[2] Website, Eclipse, http://www.eclipse.org/, Last visit: 2011.01.31.


[4] Webpage, OpenRISC News, from OpenCores Organization,http://opencores.org/openrisc,news, Last visit: 2011.01.31.

[5] Webpage, GNU Toolchain for OpenRISC, from OpenCores Organiza-tion, http://opencores.org/openrisc,gnu toolchain,Last visit: 2011.01.31.

[6] Damjan Lampret, OpenRISC 1200 IP Core Specification, Revision 0.10,November, 2010.

[7] Jeremy Bennett, Julius Baxter, OpenRISC Supplementary Program-mer’s Reference Manual, Revision 0.2.1, November 23, 2010.

[8] Jeremy Bennett, Or1ksim User Guide, Issue 1 for Or1ksim 0.4.0, June,2010.

[9] Specification, WISHBONE B4: WISHBONE System-on-Chip (SoC),Interconnection Architecture for Portable IP Cores, OpenCores Orga-nization, Revision: B.4, Pre-Released: June 22, 2010.

[10] Webpage, Altera DE2-115 Development and Education Board, fromTerasic Technologies, http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&No=502, Last visit: 2011.01.31.


http://www.eclipse.org/


http://opencores.org/openrisc,news





Appendix A

Thesis Announcement

This is the thesis announcement written by our industry supervisor JohanJorgensen.

A.1 Building a reconfigurable SoCusing open source IP

The goal of this thesis is to evaluate the quality of open source IP blocksand their suitability for use in commercially available embedded systems.The project aims at building a low-cost SoC in an FPGA through the ex-clusive use of Open source IP. The purpose of the thesis is to investigate thefollowing:

1. Evaluate quality, difficulty of use and the feasibility of open source IP

2. Take the design through synthesis and place & route in order to eval-uate the highest possible clock frequency that can be achieved whenrunning a system in a low cost FPGA. This phase should also includeFPGA utilization (# LUTs required). Apart from these two metrics,the students should identify other metrics that can be used for perfor-mance evaluation and also define and measure a quality-metric.


4. Test and run the system on an evaluation platform using embeddedLinux.

135

136 CHAPTER A

We expect the thesis to be carried out by two students at our office in Malmo.We further expect you to be an SoC major.

As a master thesis student at ENEA you will be offered a great deal offlexibility and freedom. We expect you to be self motivated and able towork independently.

A.2 Further information

For further information regarding this thesis project please contact JohanJorgensen

Appendix B

A Step-by-Step Instructionto Repeat the Thesis Project

This is a step by step instruction shows how to repeat the thesis project.For the people who are interested, it should be easier to reproduce the sameMP3 player as we did by following this instruction.

B.1 Hardware / Software Developing Environment

Below is a list of hardware devices and software tools for the thesis project.If you can manage to get the same developing environment, it would behelpful to repeat the project.

• DE2–70 Board

• A RS232-to-USB Cable

• An Ethernet Cable

• A Speaker or an Earphone

• A PC

• Cygwin

• OpenRISC Toolchain

• Quartus II

• Our Thesis Archive File

137

138 CHAPTER B

B.1.1 DE2-70 Board

The most important hardware needed for the thesis project is a DE2–70FPGA board. The DE2–70 is a Development and Education board basedon ALTERA’s Cyclone II FPGA (EP2C70). It is produced by Terasic. Theinformation of the DE2–70 can be found from their website [1]. The boardcosts 599 USD, or 329 USD for academic users, which is not cheap but luckilymany universities already have lectures or labs with the board. So if you area student, maybe try to borrow one from your professor or laboratory. Justlike we did for the thesis.

Some people might have Terasic’s DE2 board instead. It is possible to portthe thesis project from the DE2–70 to the DE2, because the 2 boards arevery similar. The main difference between the DE2 and the DE2–70 is thatthe DE2 uses Cyclone II EP2C35 as the FPGA, which has less on-chipresources than the EP2C70. Besides, the DE2 has only 512KB SSRAM and8MB SDRAM, while the DE2–70 has 2MB SSRAM and 64MB SDRAM.But the resources on the DE2 are already enough for the thesis project.

Some other people may have different FPGA boards, like Terasic’s DE1 ormaybe even a Xilinx FPGA board. In these cases, I will have to say “I wishI could help, but . . . ” The reason is mainly because those boards probablydo not have the audio CODEC WM8731 to play music or the DM9000A tosupport an Ethernet connection. This makes the project porting becomestoo difficult or even impossible.

B.1.2 A RS232-to-USB Cable

A RS232-to-USB cable sets up a UART connection between a PC and theDE2–70 board. So it is possible to communicate with the OpenRISC pro-cessor with serial terminal software1. Most PCs do not have RS232 portsnowadays. This is why we needs a RS232-to-USB cable. The cable is easyto find. For example just go to www.amazon.com and search “RS232” +“USB”.

Another important reason to have a UART connection is the bootloader.Because we don’t have a programmer or debugger, a bootloader was designedto download the program data to the DE2–70’s SSRAM/SDRAM2.

1We enclosed a terminal software tool in the thesis project zip file under the folder/tools/uart terminal/.

2Terasic’s “control panel” is another option, but it is not as handy as our bootloader.

CHAPTER B 139

B.1.3 An Ethernet Cable

An Ethernet cable [2] connects the DE2–70 to the PC’s network adapter,such that the UDP connections can be created to transfer music data.

B.1.4 A Speaker or an Earphone

To hear the music, an earphone or a speaker is needed.

B.1.5 A PC

Of course, we need a PC. Our PC for the thesis project has 32-bit WindowsXP SP3 installed. I guess Windows Vista should work too.

B.1.6 Cygwin

Cygwin is a Linux-like environment for Windows [3]. It is good to provide aLinux-like environment and it is small if comparing to other virtual machinesoftware (e.g. VMare). The reason we need a Linux environment is becausethe OpenRISC toolchain. The OpenRISC toolchain is derived from the GNUtoolchain, including GCC, GNU Binutils, GDB etc. To compile source codesto binary files for the OpenRISC processor we have to use these tools. Theyare not so friendly to Windows.

Then why not just use a native Linux PC? Hmmm, this is a good question.Actually we tried the CentOS at the beginning, but I gave up soon. Theofficial excuse is that we need ALTERA’s Quartus for the FPGA project,and we had some unenjoyable experience with the Quartus Linux edition.But to be honest, the real reason is that I don’t know Linux well enoughand usually got stuck by some very basic operations. So finally I switchedto Windows / Cygwin, where can be more productive. For Linux pros, theOS should not be a problem. Feel free to try out the project on a Linux PC,but remember to recompile everything again for those we have compiled inCygwin.

Cygwin is good, but the installation of the software is however kind of com-plicated, because it asks to choose the components to be installed from along list. And the user has to remember the components each time whenreinstall the Cygwin or copy an exact Cygwin environment to another PC.

To fix this trouble, I spent some time looking for help and now there is a

140 CHAPTER B

solution. The file “installed.db” under the folder /cygwin/etc/setup/ isactually a list of the names of the packages have been installed. If we canbackup the installed.db and store it in the same path, when running the“setup.exe” the system will display those packages as installed. Then wecan simply choose to reinstall all these packages, which will create a Cygwinenvironment exactly as the installed.db file specified. See this webpage formore information [4]. My installed.db is included in the thesis archive.

B.1.7 OpenRISC Toolchain

The OpenRISC Toolchain converts high level programming language, likeC, into binary instructions for the OpenRISC processor.

When I was a beginner, the most difficult part of the thesis project was toget a working toolchain. Most commercial CPUs, for example ALTERA’sNIOS, have already provided an IDE including everything. Just by severalclicks, all the compiling, downloading and debugging stuff are done. Butin the OpenRISC world without IDEs, we will have to experience all thesedifficulties in person.

The OpenRISC toolchain is modified based on the GNU toolchain, whichis free software under the GPL. The OpenRISC toolchain developers wellperformed their duties as the GPL asked. The source codes on the SVNof the opencores.org can be downloaded freely. There are also instructionswhich guide to set up the tools. For a Linux pro, this shouldn’t be a problem.

In the past, compiling from the source codes used to be the major way to getthe toolchain. Unfortunately I am not a Linux pro and I tried but failed tocompile a working toolchain under the Cygwin. The toolchain made myselfwas always not working properly or efficiently.

Luckily, the OpenRISC teams now provides a pre-compiled toolchain pack-age for the Cygwin. Just unzip the package to the system path, all the toolsfor the OpenRISC software development will be ready to use. This toolchainpackage is available from the opencores.org website [5]. A old version we usedduring the thesis time is also included in the thesis zip file.

B.1.8 Quartus II

The Quartus II is the FPGA development software designed by ALTERA.Because the DE2–70 board uses ALTERA’s FPGA, the Quartus becomesthe one cannot be replaced.

CHAPTER B 141

We used Quartus II 8.0sp1 Web Edition for the thesis, with the web licenseacquired freely from www.altera.com. No need to pay for the license.

B.1.9 The Thesis Archive File

At last, don’t forget to get the project archive file. This file can be down-loaded at my Blog [6], which includes both hardware and software projects,as well as some tools and other stuff. This file will be downloadable untilthe end of year 2011, but no guarantee after that.

B.2 Step-by-Step Instructions

Now let’s start the step by step instructions. Basically there are 3 big steps:

1. Review the Quartus project and program the FPGA on the DE2–70

2. Download the software project to the DE2–70 with the bootloader

3. Run the software to send music data and play on the board

B.2.1 Quartus Project and Program FPGA

1.1 Start Quartus II and open the Quartus project in the /hardwarefolder.

1.2 The top level entity is in /hardware/component/top/orpXL top.vhd.The entity includes all the pins allocated on the EP2C70. Figure 1shows the file.

1.3 As we can see, all modules of the project are saved in different foldersunder /hardware. There is one module needed to be mentioned a littlemore.

The /ram folder contains an on-chip RAM module. It is configured as64KB. The /ram/ram0.mif in the same folder is the data file will bewritten into the RAM when the FPGA is programmed. In our thesiszip file, this ram0.mif contains the data of the bootloader. So if you donot change this file, the bootloader will be downloaded to the boardat the same time when the FPGA is programmed.

It is possible to replace this ram0.mif with something else. For example,you may write your own program, covert it to the MIF format with

142 CHAPTER B

Figure B.1: Top level entity of the project

our ihex2mif tool under the folder /tools/ihex2mif. In this way wecan run any program as long as it is smaller than 64KB. However ifthe program is getting bigger than the limit, the bootloader is neededanyway to load the program into the external 2MB SSRAM, which islarge enough for many embedded programs.

If the default ram0.mif is modified and you want the bootloader back,rename the file:/tools/program loader/server openrisc/proloader server.mifto ram0.mif and copy it back to the /ram folder.

P.S. When only update the MIF file without making other hardwaremodifications, the whole FPGA project recompilation is not neces-sary. Just choose to update the MIF and run the assembler to gener-ate a new SOF file again: short-key in Quartus II is Alt+R+U thenAlt+R+A+A.

1.4 Now it is time to setup the DE2–70 and get all cables connected: thepower cable, the USB cable for FPGA programming, the USB-to-Serialcable and the Ethernet cable.

Don’t connect the speaker to the board for now. Because the DE2–70’sbuilt-in default demonstration program plays a high frequency sinewave when power on, there will be noises if the speaker is connected.The Switch 17 of the board can be used to switch off the sine wave.To switch off we need to turn the switch up. By saying “up” I meanto push the switch closer to the LEDR17.

CHAPTER B 143

1.5 Now the board is ready, please program the FPGA with orpXL top.soffile in Quartus, like Figure 2 showed.

Figure B.2: Program ALTERA FPGA

1.6 By programming the FPGA, the OpenRISC hardware platform will bedownloaded to the board, and the bootloader will be placed into theon-chip RAM. Meanwhile the default DE2–70 demonstration projectwill be overwritten, so the sine wave noise is not there anymore. Fromnow on don’t be afraid to connect the speaker.

1.7 In the project, we used the Switch 17 as the reset key of the hardwaresystem. When the Switch 17 is pulled down, it means the reset is on.And when the Switch 17 is pushed up, the system starts working.

After the FPGA is programmed, please reset the hardware system byputting the Switch 17 down, and then up. After that the bootloaderstarts running. It stays in an endless loop waiting while reading theRS232 port.

B.2.2 Download Software Project by Bootloader

2.1 Start Cygwin, and enter the folder of the software project, i.e. /software.Figure 3 shows the folder structure.

144 CHAPTER B

Figure B.3: Software project

2.2 The /software/build is the folder to compile the software project.

To save time without type all commands every time, a makefile scriptis made for the “Make” tool. Please double check the makefile if theOpenRISC toolchain is placed under the correct path. Otherwise thosecommands will not work. The makefile script is showed in Figure 4.

2.3 Now let’s recompile the software project. This is not necessary butinteresting to try.

In the /build folder, run command “make all clean”, as showed inFigure 5. You will get a myPrj.ihex at the end of compilation. It is anIntel HEX format file which will be downloaded to the board by thebootloader. Also you will get a myPrj.dis. This is a disassembly filewhich shows all instructions of the project. It is very helpful to checkthe disassembly file and understand what your software is actuallydoing.

2.4 Now let’s start the bootloader and download the software to the DE2–70 board.

The bootloader is comprised with 2 parts: a server that is alreadyrunning on the FPGA with OpenRISC processor, and a client will bestarted now to send the HEX data file from the PC.

The executable /build/proloader client.exe is the bootloader clientthat we are talking about. It was compiled by Cygwin-GCC and thuscan only run in the Cygwin. Run the following command under /build

CHAPTER B 145

folder:/proloader client.exe -d /dev/com5 -f myPrj.ihex -p

Figure B.4: Makefile script

Figure B.5: Build software project

146 CHAPTER B

The parameter “-d” specifies the RS232 ports, please check which portis allocated from the Windows Device Manager as showed in Fig-ure 6. In Windows the RS232 port is in the format of “/dev/comX”where X is the port number. But in Linux, the format is usually like“/dev/ttySX”.

Figure B.6: Check USB-to-serial port ID

The parameter “-f” specifies the path of the HEX file.

The “-p” tells the bootloader to display the data being transferredin the Cygwin window. The reason to do this is because the loadingtime is quite long. To finish downloading the HEX file it takes about5 minutes. So the “-p” makes sure the system is still running. But thebad thing of the “-p” is that a lot of data will overwhelm and thescreen will be flushed.

After downloading the HEX file, it will look like Figure 7.

Usually the bootloader works fine without any problem, but I’ve hadbad experience that occasionally the bootloader might get stuck. Andthen the Windows XP gave a blue screen. I am not sure about thereason of the problem. Probably it is because some driver crashes.

CHAPTER B 147

Figure B.7: Downloading and flushing finished

B.2.3 Download Music Data and Play

3.1 Now the software project has downloaded into the SSRAM of theDE2–70. Before starting the program, there are some configurationsto do.

The first thing is to edit the IP address. Please set it to 192.168.0.3,IP mask to 255.255.255.0 and the default gate to the 192.168.0.1, asshowed in Figure 8. By the way, the IP address of the DE2–70 boardis set to 192.168.0.2, where the UDP packets are going to send to. Themusic player program running on the DE2–70 uses these numbers asthe IP address.

3.2 After that please disable all other network adapters on your PC ifthere are more than one. For example, my laptop has 2 network cards.The one is wireless and the other is a normal 100/1000Mbps networkadapter. In this case, please disable the wireless network card.

3.3 Meanwhile please close all other software that may send TCP/IP pack-ets to the Internet, like IE, MSN, and anti-virus software that mightupgrade themselves automatically.

The steps 3.1–3.3 make sure there will be only one program (our musicplayer) sending UDP packets to the only target address (the DE2–70board). The reason is because the thesis application is not so reliableto handle all kinds of packets. If somehow another software broadcasts

148 CHAPTER B

Figure B.8: IP configuration

a UDP packet during the time we are downloading the music file, itwill ruin the data communication.

3.4 Now start the music player that we just downloaded through the boot-loader. It is a similar command but with other parameters:./proloader client.exe -d /dev/com5 -r

The “-d” specifies the RS232 port. The “-r” here tells the bootloaderto jump to the entry point of the external SSRAM. So the programstored there starts working and the job of the bootloader is done.

If the program starts without any problem, you will see the LAN isconnected, showed in Figure 9 and 10.

3.5 Now I want to spend some texts to explain the reset switch and thebootloader.

In the project, the Switch 17 is used to reset the system. When a resetis needed, push the Switch 17 firstly down, and then up.

Because the default starting address is pointed to the on-chip RAM,every time it is the bootloader that starts after a reset. To reset thesoftware application, run the command we did in step 3.4 again to letthe bootloader jump to the external RAM. If the power of the DE2–70

CHAPTER B 149

Figure B.9: Start software project via bootloader

Figure B.10: Ethernet connected

board is not turned off, the data stored in the FPGA and the externalSSRAM will be always valid. So there is no need to reprogram theFPGA and download the software project on every reset.

3.6 We used 4 7-segment LEDs in the project. Each 7-segments plus thedigit can display 8 bits. So they are organized to show the value of 216-bit counters. On the DE2–70 board, the HEX0 and HEX1 is thefirst counter. Its value means how many valid UDP packets have beenreceived. The HEX2 and HEX3 is the other counter which shows thenumber of invalid UDP packets received.

3.7 The /software/build/WINDOWS.mp3 is MP3 file used as the demo inour project. It is a very small MP3 file which only lasts 7 seconds.

First covert the MP3 file to WAV format. This is done by anotherapplication—player.exe, which is a client working on the PC who de-codes MP3 file into WAV format by LibMAD and then sends the musicdata to the DE2–70. For LibMAD, check their website for more infor-mation [7].

The command is: ./player.exe -m WINDOWS.mp3. “-m” parameterspecifies the path of the MP3 file. This is showed in Figure 11.

150 CHAPTER B

Figure B.11: Decode MP3 file

3.8 After that, there will be a “temp.wav” generated in the /build folder.It is 1.28MB while the MP3 is only 123KB. This is the file that goingto be split up into UDP packets and transferred to the board.

Why not sending the MP3 file directly? In that case we need theLibMAD working on the DE2–70. This is not so difficult because theLibMAD is written in ANSI C, but it uses several C standard Libfunctions like malloc(), which we have no time to make it work on ourhardware platform.

3.9 Download the temp.wav into the DE2–70 board. This will be placedin the 64MB SDRAM.

The command is: ./player.exe -w temp.wav -d. “-w” specifies thepath of the WAV file. “-d” tells the system to download. After thiscommand is performed, the counter on the DE2–70, i.e. the 7-segmentHEX1 and HEX0, should start counting.

In theory, we can download any WAV file smaller than 64MB, but youprobably won’t because the downloading speed is too slow. We canachieve only 3KB stable UDP speed with the system. This may beenough to open a webpage but not capable for music files.

The low speed is because of multiple reasons: first the CPU is runningat only 50MHz, and more importantly the platform, both the hard-ware and software, is not working efficient enough. There are lots ofoptimizations we could do but just have no time.

CHAPTER B 151

3.10 When the WAV file has downloaded successfully, it looks like Figure12. But the downloading process could go wrong if it is interfered byother software on the PC. In case the HEX0 and HEX1 stop changingits value but the “Downloading finished” doesn’t show up, we will needto restart the system and try it again. If unfortunately this step alwaysfails, some software tool like Wireshark [8] might be needed to monitorthe TCP/IP packets and check what’s wrong exactly.

3.11 Before playing the music, please lower down the volume with the com-mand: ./player -v. The software will ask to input a value between0–80. A value between 30 to 60 would be fine and here we just fill in 40.The default value is set to 80, which is a mistake. Too loud sound willhurt your ear if wearing an earphone. Also sometimes the WM8731works not properly when it is set to 80.

Figure B.12: Set volume and play the music

3.12 Finally type the “play” command: ./player.exe -p. If everything isfine, you should hear the music. Cheers!

References:




152 CHAPTER B

[2] Webpage, Category 5 cable, from Wikipedia, http://en.wikipedia.org/wiki/Category 5 cable, Last visit: 2011.01.31.

[3] Website, Cygwin, http://www.cygwin.com/, Last visit: 2011.01.31.

[4] Webpage, A discussion on how to duplicate Cygwin environment,http://cygwin.com/ml/cygwin/2008-04/msg00100.html, Last visit:2011.01.31.

[5] Webpage, GNU Toolchain for OpenRISC,http://opencores.org/openrisc,gnu toolchain,Last visit: 2011.01.31.


[7] Website, underbit technologies,http://www.underbit.com/products/mad/, Last visit: 2011.01.31,This is where to find the information of the LibMAD.

[8] Website, WireShark, http://www.wireshark.org/,Last visit: 2011.01.31,Wireshark is the world’s foremost network protocol analyzer.

http://en.wikipedia.org/wiki/Category_5_cable

http://en.wikipedia.org/wiki/Category_5_cable

http://www.cygwin.com/

http://cygwin.com/ml/cygwin/2008-04/msg00100.html



http://www.underbit.com/products/mad/

http://www.wireshark.org/

Documents

Open Core Platform based on OpenRISC Processor …432670/FULLTEXT01.pdfOpenCores organization: OpenRISC OR1200 processor, CONMAX WISH-BONE interconnection IP core, Memory Controller