3
168 Computer R everse engineering—the process of analyzing an existing system to identify its components and interrelationships so that we can create a representation of the system at a higher abstraction level— raises complicated legal, ethical, and technical issues for our profession. Unfor- tunately, many people first heard of reverse engineering in the context of piracy: Programmers would disassemble a third-party software program’s object code, then copy those instructions into their own software, catching a free ride on someone else’s effort. Yet how realistic is this scenario today, when programs sprawl across hundreds of megabytes? If we disassembled Micro- soft Word, for example, we would get thousands or millions of instructions and endless reams of data. Which of those millions of instructions would we want to copy? And how do we guarantee that the code integrates well into our pro- gram? It may often be simpler to study the program’s behavior and write suitable code to implement it. SOFTWARE REVERSE ENGINEEERING Although nearly any manufactured product can be reverse engineered, the practice generates the most controversy when it targets software. Developers often go through a process of under- standing and refining the software’s source base to write a new piece of code and debug a program, whether in their own source base or that of a colleague’s. When making large changes to a sys- tem, managers evaluate existing software to determine the interrelationships between different parts of that system. When purchasing a software company, investors evaluate software to determine its worth and maintainability in the long run. When buying a database for a busi- ness need, a company evaluates the avail- able options to determine their maintain- ability and scalability. Therefore, many aspects of software development and evaluation deal with some form of pro- gram comprehension and thus reverse engineering. Practical reasons for reverse engineer- ing include the following: the original programmers have long since departed, the developers wrote the application in an obsolete language and it must now be migrated to a newer one, the system lacks documentation, the business relies on software that no one understands, the company acquired the program as part of a corporate acquisition and thus lacks access to all the source code, a program requires adaptations or enhancements, or • the software doesn’t function as anticipated. These examples imply both high- and low-level reverse engineering. High-level reverse engineering refers to abstracting design, architecture, or documentation from source code. Low-level reverse engi- neering refers to abstracting source code—whether in assembly or high-level form—from object code or assembly code, and thus the disassembly or decom- pilation of that code. In essence, reverse engineering involves program comprehension, program trans- formation, and information abstraction. Most available tools aid program com- prehension and visualization, or parse legacy languages or dialects of existing languages, for which the language’s spec- ification may not even exist. Researchers in the Americas and Europe have devised several such tools, which the “State of the Art” sidebar describes. REVERSE ENGINEERING AND COPYRIGHT A lawyer will tell you that reverse engi- neering consists of examining or pulling an article or piece of machinery apart to see how it works. Courts usually examine the legality of reverse engineering in the context of patents, asking if the reverse engineering of an article or machine infringes on a patent. Copyright protects literary and artistic works such as books, drawings, and musical compositions. For programs, copyright normally protects software. However, patents have grown in popularity recently—a trend that causes complications of its own, as Neville Holmes explored in “The Evitability of Software Patents” (Computer, Mar. 2000, pp. 30-34). Therefore, for software, the Reverse Engineering and the Computing Profession Cristina Cifuentes, Sun Microsystems Laboratories THE PROFESSION Recent actions against reverse engineering threaten to remove this valuable practice from the computing profession’s toolkit. Continued on page 166

Reverse engineering and the computing profession

  • Upload
    c

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Reverse engineering and the computing profession

168 Computer

R everse engineering—the processof analyzing an existing systemto identify its components andinterrelationships so that wecan create a representation of

the system at a higher abstraction level—raises complicated legal, ethical, andtechnical issues for our profession. Unfor-tunately, many people first heard ofreverse engineering in the context ofpiracy: Programmers would disassemblea third-party software program’s objectcode, then copy those instructions intotheir own software, catching a free rideon someone else’s effort.

Yet how realistic is this scenario today,when programs sprawl across hundredsof megabytes? If we disassembled Micro-soft Word, for example, we would getthousands or millions of instructions andendless reams of data. Which of thosemillions of instructions would we wantto copy? And how do we guarantee thatthe code integrates well into our pro-gram? It may often be simpler to studythe program’s behavior and write suitablecode to implement it.

SOFTWARE REVERSE ENGINEEERINGAlthough nearly any manufactured

product can be reverse engineered, thepractice generates the most controversywhen it targets software. Developersoften go through a process of under-standing and refining the software’ssource base to write a new piece of codeand debug a program, whether in theirown source base or that of a colleague’s.

When making large changes to a sys-tem, managers evaluate existing software

to determine the interrelationshipsbetween different parts of that system.When purchasing a software company,investors evaluate software to determineits worth and maintainability in the longrun. When buying a database for a busi-ness need, a company evaluates the avail-able options to determine their maintain-ability and scalability. Therefore, manyaspects of software development andevaluation deal with some form of pro-gram comprehension and thus reverseengineering.

Practical reasons for reverse engineer-ing include the following:

• the original programmers have longsince departed,

• the developers wrote the applicationin an obsolete language and it mustnow be migrated to a newer one,

• the system lacks documentation,• the business relies on software that

no one understands,• the company acquired the program

as part of a corporate acquisitionand thus lacks access to all thesource code,

• a program requires adaptations orenhancements, or

• the software doesn’t function asanticipated.

These examples imply both high- andlow-level reverse engineering. High-levelreverse engineering refers to abstractingdesign, architecture, or documentationfrom source code. Low-level reverse engi-neering refers to abstracting sourcecode—whether in assembly or high-levelform—from object code or assemblycode, and thus the disassembly or decom-pilation of that code.

In essence, reverse engineering involvesprogram comprehension, program trans-

formation, and information abstraction.Most available tools aid program com-prehension and visualization, or parselegacy languages or dialects of existinglanguages, for which the language’s spec-ification may not even exist. Researchersin the Americas and Europe have devisedseveral such tools, which the “State of theArt” sidebar describes.

REVERSE ENGINEERING AND COPYRIGHT

A lawyer will tell you that reverse engi-neering consists of examining or pullingan article or piece of machinery apart tosee how it works. Courts usually examinethe legality of reverse engineering in thecontext of patents, asking if the reverseengineering of an article or machineinfringes on a patent. Copyright protectsliterary and artistic works such as books,drawings, and musical compositions. Forprograms, copyright normally protectssoftware. However, patents have grownin popularity recently—a trend that causescomplications of its own, as NevilleHolmes explored in “The Evitability ofSoftware Patents” (Computer, Mar. 2000,pp. 30-34). Therefore, for software, the

Reverse Engineeringand the ComputingProfessionCristina Cifuentes, Sun Microsystems Laboratories

T H E P R O F E S S I O N

Recent actions against reverseengineering threaten to remove this valuable practice from the computing profession’s toolkit.

Continued on page 166

Page 2: Reverse engineering and the computing profession

166 Computer

T h e P r o f e s s i o nContinued from page 168

courts have had to examine the legality ofreverse engineering in the context of copy-right.

In the early 1980s, copyright becamethe default form for protecting computerprograms because such programs can beread. Copyright also covers object codeeven though such code cannot be read perse. Many countries have enacted specificlegislation that extends copyright pro-tection to computer programs as literaryworks. This practice has been adopted intwo recent international treaties:

• the 1993 agreement on Trade-Related Intellectual Property Rights,and

• the 1996 World Intellectual Prop-erty Organization Copyright Treaty.

Simply stated, copyright protects a pro-gram’s owners from the unauthorizedreproduction and adaptation of that pro-gram. A reproduction is the creation ofan exact copy, while adaptation creates acopy that does not duplicate the programexactly, such as an altered version of theoriginal or its translation into anotherprogramming language.

Limits and exceptionsBecause the mere act of running a pro-

gram in a computer creates a copy of itin memory, and this act technicallyinfringes on the owner’s copyright, excep-tions to copyright law have been estab-

lished so that such acts are not consideredviolations. Likewise, users usually havepermission to make backup copies.

The intermediate copying of a programwhile performing its disassembly hasraised debate, however. In the seminal UScase from the past decade—1992’s Sega v

Accolade—the Ninth Circuit Court ruledthat the fair use doctrine permitted thereverse engineering of Sega’s code to writecompatible games. Specifically, the courtruled that fair use allowed the intermedi-ate copying of the program into memory,and onto printouts and disks, while thereverse engineer strove to access themachine’s interface and write programsfor the Sega machine. Since then, it hasbeen widely held that the US courts per-mit the reverse engineering of softwarefor interoperability purposes.

In 1991, the European Union carefullydrafted software copyright legislation foradoption by member countries. Specifi-cally, Article 6 of the 1991 EU Directive onthe legal protection of computer programsprovides two exceptions that permit re-verse engineering of computer programs:

• to let a product interoperate withanother application or platform, or

• to determine the source of a bug insome third-party code.

These exceptions apply, however, only ifthe copyright owner has not made avail-able in a reasonable amount of time theinformation that those doing the reverseengineering seek.

In 2000, Australian copyright legisla-tion incorporated these exceptions toreverse engineering, then added one forcomputer security testing—along the linesof that in the 1998 US’s Digital Millen-nium Copyright Act. The US Congressdrafted the DMCA to deal with copyrightin the digital world, enacting legislationto implement Article 11 of the WIPOCopyright Treaty. This article requiresparticipating parties to effectively preventcircumvention of the technological mea-sures authors use to preserve copyright.The parties must also outlaw circumven-tion of the technological protection mea-sures that copyright owners use to preventunauthorized copying.

When it became apparent that thislaw would prevent encryption researchand the security testing of computer pro-grams and networks, the US added thesecurity testing exemption to its legisla-tion. Unfortunately, the wording of thisexception has caused problems for bonafide researchers in the computer securityarea.

Sticking pointsMany in the computer security com-

munity believe that the text used to framethe security-testing exception makes itvery difficult to use, given that any find-ings uncovered through reverse engineer-ing cannot be made available to others.For example, if a developer analyzes viruscode and determines how to circumventits techniques, that information cannot beshared.

Even academic researchers, such asEdward Felten’s team—which sought topublish experimental results derived fromreverse engineering digital watermarks on music—have received legal warningsfrom the SDMI not to publish thoseresults. Some believe that by acting in thisway, the SDMI prohibits free speech.

Many countries haveenacted specific legislation

that extends copyright protection to computer

programs as literary works.

Circulation: Computer (ISSN 0018-9162) is published monthly by the IEEE ComputerSociety. IEEE Headquarters, Three Park Avenue, 17th Floor, New York, NY 10016-5997; IEEE Computer Society Publications Office, 10662 Los Vaqueros Circle, PO Box 3014, Los Alamitos, CA 90720-1314; voice +1 714 821 8380; fax +1 714 821 4010;IEEE Computer Society Headquarters,1730 Massachusetts Ave. NW, Washington, DC20036-1903. IEEE Computer Society membership includes $14 for subscription ofComputer magazine ($14 for students). Nonmember subscription rate available uponrequest. Single-copy prices: members $17.00; nonmembers $20.00. This magazine isalso available in microfiche form.Postmaster: Send undelivered copies and address changes to Computer, IEEE ServiceCenter, 445 Hoes Lane, Piscataway, NJ 08855. Periodicals Postage Paid at New York,New York, and at additional mailing offices. Canadian GST #125634188. Canada PostPublications Mail (Canadian Distribution) Agreement Number 0487910. Printed inUSA.Editorial: Unless otherwise stated, bylined articles, as well as product and service descriptions, reflect the author’s or firm’s opinion. Inclusion in Computer does not necessarily constitute endorsement by the IEEE or the Computer Society. Allsubmissions are subject to editing for style, clarity, and space.

Innovative technology for computer professionals

Page 3: Reverse engineering and the computing profession

More aggressively still, Sony sued Con-nectix last year for reverse engineeringand trademark dilution of Sony’s gameswhen Connectix made a PlayStation emu-lator for the Mac environment. Emu-lators, popular and widely used since the1960s, would seem safe from lawsuits 40years later—and indeed this proved to bethe case.

Citing fair use, the Ninth Circuit Courtheld in favor of reverse engineering soft-ware to allow the running of differentsoftware products on different hardwareplatforms. The court explained that Con-nectix’s intermediate copying of Sony’sBIOS was a fair use for the purpose ofgaining access to the unprotected elementsof Sony’s software. If that software hadbeen protected by a patent, however, theresult might have been different.

SOCIAL AND PROFESSIONAL ISSUESThe reverse engineering of computer

software gives rise to many social and pro-fessional issues, most of which stem fromthe intellectual property system used toprotect software. This system has gener-ated considerable controversy, such as thatwhich arose at a recent Santa Clara Uni-versity symposium. During the one-daysymposium, Donald Chisum wonderedwhy—if we place such heavy restrictionson reverse engineering software—it’s okayto buy a TV, pull it apart, and reverse engi-neer it. On the other hand, Michael Leh-mann questioned why we should limit thereverse engineering of software when thereverse engineering of any other literaturehas always been permitted.

In reality, software is not a literarywork per se, and therefore has becomethe first technology protected by bothcopyright and patents. This dual IP pro-tection scheme has created some jarringinconsistencies that have led to confusionand sometimes paralyzing litigation. Theemerging legalities of reverse engineeringaffect the profession by threatening orappearing to threaten customers andsome researchers, causing an uncertaintythat can lead to less innovation in thereverse engineering community.

Computing professionals must educatethe legal community about the technol-ogy itself, and it must be proactive inmaking comments about proposed legis-

lation that affects software in general. Asthe DMCA shows, much of the thinkingbehind this act derived from the enter-tainment industry’s influence, not thecomputing community’s.

The legal community must educate thecomputing community in the meaning ofthe different intellectual property sys-tems. It can clarify, for example, howcopyright does not apply to each singlebyte of an object file but only protectsparts of the program and certainly doesnot protect its ideas.

U nless we develop better communi-cation between the legal and com-puting professions, we will continue

to suffer the consequences of legislationsuch as the DMCA. By making the dis-tribution of research results more diffi-cult, such legislation stifles the sharing ofideas, causing an uncertainty that canlead to less innovation and fewer benefitsto society. ✸

Cristina Cifuentes is a senior staff engi-neer at Sun Microsystems Laboratories.Contact her at [email protected].

December 2001 167

State of the ArtAmerican and European researchers have developed several good high-level

tools for reverse engineering. Tools such as Rigi (http://www.rigi.csc.uvic.ca),PBS (http://swag.uwaterloo.ca/pbs/), and GUPRO (http://www.gupro.de/) aid inprogram understanding and software architecture recovery. Other tools, such asSHriMP (http://www.csr.uvic.ca/shrimpviews), significantly contribute to under-standing a large piece of software through different visualization techniques.

Most tools focus on one aspect of reverse engineering. They may specialize inparsing code well, producing different types of graph views, or producing archi-tecture diagrams in UML. Unlike program transformation tools such as compilers,which build an intermediate representation in memory and apply transformationsto that representation internally, reverse engineering tools tend to cooperate witheach other to support different parts of the reverse engineering process.

The Graph eXchange Language (http://www.gupro.de/GXL/) has simplifiedreverse-engineering tool interoperability. GXL, based on XML, resulted from aninternational collaboration between researchers in academia and industry. Anextensible language, it supports any graph-based data format. For example, youcan describe a control flow graph or an abstract syntax tree in GXL.

A key difficulty in using reverse engineering tools arises from their need tosupport a variety of languages or be capable of extension to support anotherlanguage. Although building them has often proven to be a daunting task, toolshave been successfully designed to convert Cobol code to C, and to translate Ccode to C++ or Java code.

Some low-level reverse engineering tools have also been successful, includinginteractive commercial disassemblers such as IDA Pro (http://www.datarescue.com/idabase/ida.htm) and Sourcer (http://www.v-com.com/product/devsou1.html), which provide good quality assembly code for a variety of machines. Dur-ing the Y2K crisis, a few companies provided decompilation services for Cobolbinaries because many large organizations have vast legacy applications writtenin that language. Java decompilers have also been written—more easily than someother decompilers because the Java program’s binary format is not machine codebut rather an intermediate representation called Java bytecodes. Otherwise, mostdecompilation techniques are supported manually: The engineer decompiles assem-bly code mentally and annotates the representation with its high-level equivalent.

Editor: Neville Holmes, School of Computing,University of Tasmania, Locked Bag 1-359,Launceston 7250; [email protected]