16

Click here to load reader

Graduate Project Summary

Embed Size (px)

Citation preview

Page 1: Graduate Project Summary

MySQL/JVMA Framework for Enabling Java Language Stored Procedures in MySQL

Kevin TankersleySacred Heart University

5151 Park AvenueFairfield, CT 06825

[email protected]

AbstractDatabase procedural languages tend to be special-purpose lan-guages, with constructs and libraries designed to support commondata access methods and flow control. To provide support for taskswhich cannot be solved with such basic data access functional-ity, many database vendors embed the runtime environment of amore general-purpose language in the database server, allowingstored programs to be written in this external language. Such anapproach leverages the work that has already gone into design-ing, implementing, and testing the runtime library of the externallanguage, while maintaining a low learning curve for advancedfunctionality since many developers will already be fluent in thisexternal language. This paper presents the design, implementation,and use of the MySQL/JVM system, a framework for embeddingthe Java Virtual Machine runtime environment into the MySQLdatabase server to allow stored procedures and stored functionsin the MySQL database to be written in the Java programminglanguage.

Categories and Subject Descriptors H.2.3 [Database Manage-ment]: Languages—Database (persistent) programming languages;D.3.4 [Programming Languages]: Processors—Run-time environ-ments; D.3.3 [Programming Languages]: Language Constructsand Features—Data types and structures

General Terms Design, Languages, Security

Keywords MySQL, Java Native Interface, JNI, Stored Proce-dures, SQL/JRT, ISO/IEC 9075-13

1. IntroductionMost relational database systems provide a procedural language,which allows stored procedures to be hosted in the database to en-capsulate common business logic, and allows user defined storedfunctions to be created to calculate common metrics. The natureof these languages and the robustness of the library of functionsavailable to them can vary widely from one vendor to another. Forexample, MySQL includes a procedural language which offers ba-sic control flow and has a fairly small library, Microsoft SQL Server

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. To copy otherwise, to republish, to post on servers or to redistributeto lists, requires prior specific permission and/or a fee.

offers control flow and basic exception handling with a larger li-brary, and Oracle includes an object-oriented language with a fairlyrobust library.

Given that the majority of the procedures hosted inside adatabase system will be primarily intended to execute basic al-gorithms over data sets and cursors, most of the functionality thatdevelopers will need is present even in the least feature-rich storedprocedure languages. There are tasks, however, for which featurestypically found in the libraries of more general purpose languagesmay be needed. For example, processing and transmitting XMLdocuments has become a more common task in many databasesas XML standards have become widely accepted for data transfer.Security policies for sensitive data may require custom encryptionroutines and processes. Access to the file system, or to networksockets, may be needed to acquire or export data. The degree ofsupport for these tasks is generally low in most database procedu-ral languages.

To solve such problems, several database vendors allow storedprocedures to be written in a general purpose programming lan-guage (in addition to the database native procedural language) inorder to expose the libraries provided by that language to devel-opers. For example, the Oracle database allows stored proceduresto be written in the Java language, and Microsoft’s SQL Serverallows stored procedures to be written in any of the .NET lan-guages. Further, all versions of the standard definition of the SQLlanguage since SQL:2003 [1] have consisted of 14 parts, one ofwhich (SQL/JRT [2], [5]) is dedicated entirely to defining the be-havior of Java language stored procedures within a database server.Currently, however, the MySQL database does not provide supportfor Java language stored procedures.

This paper will present the MySQL/JVM system, a projectwhich integrates the Java Virtual Machine runtime environmentinto the MySQL database server process to allow stored proceduresto be written in the Java language. The balance of this section willpresent the features and characteristics of stored procedures in theMySQL database. Section 2 will present the scope of the project,and Section 3 will discuss the high level design. In section 4, lowerlevel design issues and noteworthy highlights of the implementa-tion will be presented.

1.1 Stored Procedures, Functions, and Triggers

Relational databases are ubiquitous in application architecture.Most of the major information systems used by a typical orga-nization rely on a relational database server for their data storageand retrieval needs. The role of the database as the originator of

Page 2: Graduate Project Summary

data and the final destination of data makes it a good candidate toassume data access control functionality. The centralization of thedatabase and the use of network protocols for data transfer alsomakes it a potential performance bottleneck. The result has beena migration of some program logic out of the applications makinguse of the database and into the database itself, in the form of storedprocedures.

The term stored procedures will be used here to mean subrou-tines which are stored in a location accessible to a database serverprocess and which the process may execute in response to an eventor on behalf of a client. Some distinction is typically made betweenstored procedures and stored functions (or user-defined functions),namely that stored functions can return a value to the caller. Furtherdistinction can be made between stored procedures and triggers onthe basis that triggers are not called explicity by a client but are in-stead executed on the occurrence of some predefined event. Whensuch distinctions are important in the following sections, they willbe mentioned explicity; Otherwise the use of the term stored pro-cedures throughout the rest of this paper will broadly refer to all ofthese classes of stored code.

Migrating common business logic out of applications and intostored procedures can bring several benefits. Stored proceduresmay be able to implement logic containing multiple decision pointsmore efficiently than a client application, since a stored proceduredoes not need to make each data request over the network. On manysystems, the statements in the procedure are precompiled, so thatexecuting a stored procedure will be faster than executing a blockof the same statements. Implementing stored procedures makes thebusiness logic they encapsulate reusable across applications. Storedprocedures can also be used to create fine-grained access controlpolicies.

1.2 Stored Procedures in the MySQL Database

MySQL is a relational database management system. Developmenton the MySQL project began as early as 1994, and the features ofthe server have grown steadily since. MySQL now supports mostof the SQL:1999 standard, and has become extremely popular, withmore than 100 million distributions to date. The MySQL server iswidely used as the underlying data store backing many web appli-cations. The source code for the MySQL server is freely availableunder the terms of the GNU General Public License.

Stored procedures were added to MySQL in its fifth versionin 2005. The syntax for creating and executing stored proceduresloosely adheres to the SQL:2003 Persistent Stored Module stan-dard [3] (see [11] for full details concerning stored procedure syn-tax and features). The stored procedure language provides flowcontrol via such statements as IF, LOOP, and WHILE; a BEGIN ...END syntax for blocks; a DECLARE statement for variable declara-tion and a SET statement for variable assignment; OPEN, CLOSE,and FETCH statements for cursors; a RETURN statement for func-tions; and a DECLARE ... HANDLER for exception handling. Userdefined types, packages, and objects are not supported. The lan-guage provides about 250 functions and operators for control flow,string manipulation, mathematics, date and time manipulation, typecasting, XML processing, aggregation, spatial data manipulation,binary data operations, encryption and compression.

1.3 Limitations of Stored Procedures in MySQL

The procedural statements and function libraries discussed in sec-tion 1.2 are certainly sufficient for a large number of tasks relatedto data processing, but they do not provide much support for moreadvanced functionality. Below are several use cases that cannot be

easily achieved by using the existing stored procedure language ofMySQL. Each case could be implemented within individual ap-plications instead of in the database, of course, but such a solu-tion would lose all of the benefits discussed in section 1.1. Whendatabases do provide robust, general purpose libraries, the choiceof whether to implement common business logic in the databaseor in each application that uses the data is an important design de-cision. The following cases would be good candidates for storedprocedures, if MySQL had sufficient support to develop solutionsfor them:

1. The database regularly receives and stores XML documentswhich are supposed to adhere to a particular XML Schema.The documents are generated independently by several sourcesystems, each implemented in different languages and usingdifferent XML platforms. To detect and control errors, it wouldbe desirable for the database to ensure the validity of eachdocument and to verify that it does conform to the expectedschema.

2. The database needs to store sensitive information in an en-crypted form. Symmetric encryption is deemed unsuitable dueto problems in properly protecting the shared encryption key.A public key encryption protocol is desired to protect the mostsensitive data; Preferably one which does not have to be re-implemented in each client application.

3. An organization is employing a service-oriented application ar-chitecture, and valuable data services are available over the net-work. It would be both costly and undesirable for the function-ality made available by these services to be re-implemented inthe database. It would be ideal if a procedure could be writtento access such services whenever the database needs them.

2. ScopeBringing Java language stored procedures to MySQL is a veryhigh-level goal. Both the Java runtime environment and the MySQLdatabase are complex systems, which can in fact already interactindependently over network protocols. Further, Java technologyis highly standardized, by way of the Java Community Process.Expert groups consisting of representatives from multiple productvendors draft technology specifications in the form of Java Speci-fication Requests, which in turn become the standards to adhere towhen working with a Java technology area.

The SQL language is also governed by a defining standard (themost recent version of which is defined by [4]). The standard con-sists of the nine interrelated parts in Table 1, each of which is iden-tified by a standard ID (e.g. ISO/IEC 9075-1:2008), a full name(e.g. Information Technology–Database Language–SQL– Part 1:Framework), and a short mnemonic identifier (e.g. SQL/Frame-work).

Official claims of conformance to one of the nine parts of thisstandard are verified by a conformance audit. No vendor currentlyclaims full official conformance to all nine parts of the standard,and some vendors do not pursue official conformance at all, choos-ing instead simply to design their products to comply with the stan-dards as much as possible but to make exceptions or extensions asneeded. The features and behavior of the MySQL database servercomply closely with several of the nine parts of the SQL standard.In particular, the stored procedure language used by MySQL is oneof the few vendor languages that closely conforms to the languagespecified in the SQL/PSM substandard for defining stored routines.It is noteworthy that part 13 (SQL/JRT) defines a standard for Javastored procedures that builds on the syntax and standards defined in

Page 3: Graduate Project Summary

ISO/IEC ID Name Mnemonic9075-1:2008 Framework SQL/Framework9075-2:2008 Foundation SQL/Foundation9075-3:2008 Call-Level Interface SQL/CLI9075-4:2008 Persistent Stored Modules SQL/PSM9075-9:2008 Management of External Data SQL/MED9075-10:2008 Object Language Bindings SQL/OLB9075-11:2008 Information and Definition

SchemasSQL/Schemata

9075-13:2008 SQL Routines and Types Us-ing the Java TM ProgrammingLanguage

SQL/JRT

9075-14:2008 XML-Related Specifications SQL/XML

Table 1. ISO/IEC 9075:2008 Substandards

Feature Feature Name Compliance1 J511 Commands In Scope2 J521 JDBC data types Out of Scope3 J531 Deployment No Compliance4 J541 SERIALIZABLE Out of Scope5 J551 SQLDATA Out of Scope6 J561 JAR privileges Out of Scope7 J571 NEW operator Out of Scope8 J581 Output parameters In Scope9 J591 Overloading Out of Scope10 J601 SQL-Java paths No Compliance11 J611 References Out of Scope12 J621 External Java routines In Scope13 J622 External Java types Out of Scope14 J631 Java signatures In Scope15 J641 Static fields Out of Scope16 J651 Information Schema Out of Scope17 J652 Usage tables Out of Scope

Table 2. SQL/JRT Feature Sets

SQL/PSM. Since the stored routine language in MySQL is alreadyin close compliance with SQL/PSM, defining what levels of con-formance this project will have with the elements of the SQL/JRTstandard are the primary scope decisions to be made.

2.1 ISO Standard Compliance

The SQL/JRT ISO standard [4] is a large standard. In fact, it is largeenough that it groups the feature requirements it defines into sev-enteen feature sets. Table 2 defines whether each of the seventeenfeature sets are in scope, out of scope, or will not be a conformancetarget for this project. Features which are in scope for this projectwill be implemented in close compliance to the SQL/JRT specifi-cation. Features which are out of scope will not be implemented,but the implementation will be structured such that they can beadded in the future. Features which are not compliance targets willnot be implemented, and it is unlikely that they could be added tothe system without a substantial redesign. The presence of suchfeatures does not necessarily preclude a claim of conformance tothe specification, however. An official claim of conformance tothe specification requires, at a minimum, one of the features J621,J541, or J551 together with one of the features J511 or J531.

The features can be even more broadly classified as those whichsupport the definition and execution of Java stored routines, thosewhich support the definition and execution of user defined Javatypes, those which define the interaction between the database andthe Java runtime environment, and those which define the tablesand views which should be exposed as database metadata. The pri-

mary scope of this project is to integrate the Java Virtual Machineinto the database engine and to provide an API through which callscan be made from the Java runtime to the database or vice-versa.The project will comply closely with the feature sets in Table 2which fall within that scope.

The SQL/JRT specification devotes roughly half of the featuresit defines to defining and invoking Java routines, and devotes theother half to defining and using Java language user-defined types.Any feature relating to the creation of user-defined types with theJava language is out of scope, and left for future development.Since MySQL does not currently have any support for user-definedtypes, even in the host language, such a change would be too largeof a task to complete within the timeframe of the project. Such atype system could be added later, though, and could easily leveragethe framework which will be built to support routine calls.

For reasons discussed in Section 3, the subsystem for locatingJava classfiles will differ significantly from the recommendations inSQL/JRT. As a result, the system will not comply with the featuresin Table 2 relating to the deployment of Java classfiles and theresolution of Java paths. Further, it would not be reasonable to bringthe system into compliance with these features without a major re-write (possibly a total re-write). As mentioned above, this does notmean that an official claim of compliance could not be made, sincea minimal claim of compliance can be made without either of thefeatures J531 or J601.

2.2 Other Scope Considerations

Within the features defined in Section 2.1, there are still a num-ber of scope decisions to be made. The SQL/JRT standard definesthe features that a compliant database server must provide from afairly high level, but it does not provide many mandates concerningthe design details related to implementing those features. In partic-ular, there are several subsystems which the Java runtime and theMySQL database server have in common. Ideally, a seamless inte-gration would fully integrate each such subsystem. Since there willnot be sufficient time to provide a full integration of each subsys-tem, the remainder of this section will discuss the scoping decisionsfor each major touch point between the database and the Java run-time.

2.2.1 Access Control

Security in MySQL is managed in a fairly standard way througha remote login process and access control lists. The access controllists control access to resources such as tables, views, and storedprocedures. The access control allows actions such as CREATE,DROP, SELECT, and EXECUTE against these resources, and these ac-tions can either be explicitly allowed (GRANT) or denied (REVOKE).(See [11] for a more complete listing of MySQL access controlcommands).

Security in the Java runtime, however, is managed rather differ-ently. The default security model for the Java runtime assigns per-missions based on the notion of a CODESOURCE, which is primarilya combination of a URL identifying where an archive originatedand possibly a cryptographic signature of the code. This policyessentially allows local Java code to execute with access to the en-tire runtime, but restricts the access of remotely downloaded codesuch as Java Applets. This default policy is difficult to integratein a meaningful way with the user-driven access control policy ofMySQL. It should be noted that a custom security policy could bewritten by system administrators, and there are Java Specificationsand APIs which allow user-driven access control to be enforced -

Page 4: Graduate Project Summary

see [6] for more details on access control options in Java.

The Java runtime provides access to some very powerful re-sources (e.g. network sockets and file operations), which is exactlywhy it is useful as a language for stored routines. Some of thesemight use significant memory or processor resources, however,which can be a big problem in a database server which is typicallymulti-user and performance-sensitive. Ultimately, the database ad-ministrator is the individual responsible for ensuring that accesscontrol is setup optimally. The database administrator should havea simple way to control access to the various sensitive resourcesin the Java runtime. Ideally this would come in the form of ex-tending the GRANT and REVOKE actions to include Java resources(e.g. ‘GRANT OPEN SOCKET TO USER1’ or ‘GRANT WRITE FILETO USER2’). The implementation differences between the Java se-curity model and the MySQL security model currently make thisan unreasonable goal, although it is an interesting area for futuredevelopment.

2.2.2 Output

Several database vendors provide a channel across which basicmessages can be sent from a stored procedure. Microsoft SQLServer, for example, provides a print statement, and the Oracledatabase provides the dbms output.put line procedure. In somecases, the client may even choose whether or not information re-ceived through this channel will be processed, making it a usefultool for diagnostics information or debugging information.

The MySQL database, however, does not provide such a chan-nel. This is more than just a missing feature in the language - theTCP protocol which the client and server use to communicate doesnot even define any structure which could be used to pass such data(see [10] for a description of the MySQL network protocol).

The scope of this project is certainly limited to the server pro-cess itself. Even a small change to the communication protocolwould render all existing clients unable to connect to the server. Assuch, no diagnostic channel will be created or assumed. The Javaruntime, however, frequently sends output to the user through theSystem.out and System.err streams. With no convenient way toredirect these to the user, they will end up in the MySQL server logfiles. This is almost certainly not the ideal place for them, especiallysince the MySQL log file conventionally follows a specific formatfor its diagnostic messages. A future enhancement could disablethese streams in the most harmless way possible, or might redirectthem to a special Java log file.

2.2.3 Data Type Translation

At the moment when a Java routine is called, the parameters mustbe translated from their MySQL data type to the equivalent Javadata type. For stored functions, the same holds for the return valueat the time the Java method completes. Only data type mappingswhich can map to and from Java primitive types will be consid-ered in scope for this project, with the exception of mappings toand from java.lang.String and mappings to and from one-dimensional arrays of char and byte. Mappings to and from anyother Java reference type are not in scope. This is an issue of time,not feasibility, so the design of the parameter translation shouldbe easily extensible to accomodate future mappings to and frommore complex MySQL types which call for a Java reference typeto properly represent them. Since no straightforward mapping ex-ists for result sets, there will not be any way in this version of thesystem for a Java routine to return a result set. Adding parametersupport for result sets and cursors would be another interesting area

Charset Description Default collationbig5 Big5 Traditional Chinese big5 chinese cidec8 DEC West European dec8 swedish cicp850 DOS West European cp850 general cihp8 HP West European hp8 english cikoi8r KOI8-R Relcom Russian koi8r general cilatin1 cp1252 West European latin1 swedish cilatin2 ISO 8859-2 Central European latin2 general ciswe7 7bit Swedish swe7 swedish ciascii US ASCII ascii general ciujis EUC-JP Japanese ujis japanese cisjis Shift-JIS Japanese sjis japanese cihebrew ISO 8859-8 Hebrew hebrew general citis620 TIS620 Thai tis620 thai cieuckr EUC-KR Korean euckr korean cikoi8u KOI8-U Ukrainian koi8u general cigb2312 GB2312 Simplified Chinese gb2312 chinese cigreek ISO 8859-7 Greek greek general cicp1250 Windows Central European cp1250 general cigbk GBK Simplified Chinese gbk chinese cilatin5 ISO 8859-9 Turkish latin5 turkish ciarmscii8 ARMSCII-8 Armenian armscii8 general ciutf8 UTF-8 Unicode utf8 general ciucs2 UCS-2 Unicode ucs2 general cicp866 DOS Russian cp866 general cikeybcs2 DOS Kamenicky Czech-Slovak keybcs2 general cimacce Mac Central European macce general cimacroman Mac West European macroman general cicp852 DOS Central European cp852 general cilatin7 ISO 8859-13 Baltic latin7 general cicp1251 Windows Cyrillic cp1251 general cicp1256 Windows Arabic cp1256 general cicp1257 Windows Baltic cp1257 general cibinary Binary pseudo charset binarygeostd8 GEOSTD8 Georgian geostd8 general cicp932 SJIS for Windows Japanese cp932 japanese cieucjpms UJIS for Windows Japanese eucjpms japanese ci

Table 3. Supported Character Sets

for future development.

With respect to java.lang.String parameters and char[]parameters, some consideration needs to be given to the characterset encodings that can be used in MySQL and in the Java runtime.Table 3 lists the character set encodings supported in MySQL 5.1(see [11] for details). The two-byte UCS2 Unicode character setwill be used as the common encoding to translate all other charac-ter sets into before being passed into the Java runtime. This meansthat any character not in the Unicode Basic Multilingual Plane can-not be represented, although MySQL provides no support for suchcharacters at the moment anyway.

2.2.4 Other Server/Runtime Communication

The MySQL database server has an extensible exception handlingmechanism, which includes the DECLARE ... HANDLER storedprocedure instruction for exception catching. The Java languageincludes a very powerful exception handling mechanism, althoughthe behavior of uncaught exceptions which propogate all the wayout of the entry method is necessarily defined by the runtime. In-tegration of these two exception handling mechanisms will be inscope, so uncaught Java exceptions should continue to propogateoutward from the Java routine as MySQL exceptions. Further, newexceptions will be created for errors resulting from incorrect Java

Page 5: Graduate Project Summary

routine definitions or errors in parameter translation.

The Java runtime makes calls into a database using the JavaDatabase Connectivity API (JDBC, the msot recent version ofwhich is defined in the Java community process specification JSR-54). The JDBC API provides a fixed interface for all database ven-dors, and it is up to each vendor to provide an implementationof that interface (called a JDBC driver) for their product. Thesedrivers communicate with the database server by opening a TCPconnection to the server, providing login credentials, sending thedesired command, and receiving the appropriate result. For Javacode which is not running on the same machine as the MySQLserver, this is an effective communication mechanism. Java storedprocedures, however, will be executing not only on the same ma-chine as the database server, but in the same process. It could po-tentially be much faster for JDBC calls to make a direct call tothe appropriate function in the MySQL server, rather than send-ing commands over TCP sockets that require authorization, state-ment parsing, and result interpretation. Unfortunately, the JDBCAPI is prohibitively large, so a general-purpose native driver isout of scope. However, as a special exception, the custom class-loader class edu.sacredheart.cs.myjvm.MyClassLoader (seeSection 3.2.3) does make direct calls to native MySQL functionswithout routing anything over a TCP connection.

3. DesignThe features scoped in Section 2 could be added to the MySQLserver in a number of ways, and the design of additions and modifi-cations to the server could affect issues like the platform avaiabilityof the server, the performance of the Java routines, and the memoryconsumption of client threads.

The most pressing design issue is the choice of how to in-voke the Java Virtual Machine and call class methods within it.The available design choices differ primarily in how tightly inte-grated the MySQL server and the JVM become. At one extreme,the MySQL server could simply make a system() call or similar,invoking the java binary executable and passing the class name,path, and arguments as strings. At the other extreme, the sourcecode for the JVM could be included with that of the MySQL server,and MySQL could make direct calls into the internal processinglogic of the JVM. Section 3.2 presents the major design decisionsmade in this project, and Section 3.2.2 presents the design deci-sions made specifically to enable the MySQL server to make callsinto the JVM.

Before presenting these design decisions, it will be useful tosummarize the current MySQL server design. The server is actuallyquite complex, offering platform-independent support for featureslike threads, transactions, locking, logging, and replication. A fullpresentation of the server design is beyond the scope of this paper,but a summary of the design elements which support the use ofstored routines will be presented in Section 3.1.

3.1 MySQL Design

The MySQL server is implemented as a fairly standard client-serverapplication. When the server is first started, it goes through an ini-tialization procedure, setting up the structures and parameters thatit will need to properly serve requests (see [10] for a much morecomplete description of the server initialization process and manyother details of the server implementation). After initialization, theserver begins listening for network connections (the default portthat it listens on is port 3306, although administrators can changethis). From this point on, the main server thread does very littleother than listen for incoming connections and spawn new threads

:Client :Server :ClientThread :Parser :ParseTree

t

request()

create(thd)

handle one connection(thd)

do command(thd)

dispatch command(thd,packet)

mysql parse(thd,command)

create()

return()

mysql execute command(thd)

Figure 1. MySQL New Thread Prolog

to handle them.

Once a client is authenticated and a thread has been createdfor it, the typical flow of events proceeds as in Figure 1. Af-ter the client makes a request, the server creates a new thread tohandle the request. The thread begins execution by calling thehandle one connection server function. This calls the do commandserver function, which calls the dispatch command server func-tion, which invokes the parser via the mysql parse function. Theparser then parses the input, creating a parse tree class with objectsand structures representing the client request. The parser then callsthe execute server function, after which processing will differ ac-cording to the type of command which the client requested.

This thread initialization prolog demonstrates a few features ofthe design of the server. Firstly, note that the server is not sub-divided into loosely coupled subsystems or classes. Most of thecore server features are implemented as globally accessible func-tions. Features added to the server more recently, however, aremore likely to be encapsulated in classes. Secondly, this prolog in-troduces a few of the elements which will be most important in thedesign of Java routines.

After the server creates a new thread in Figure 1, most of theremaining function calls pass a variable named thd. This variableis a MySQL thread descriptor, and it is passed as the first argu-ment to almost every function in the core server library. The threaddescriptor contains basically all data structures that are relevant toa specific client request. This includes the objects which actuallyrepresent the operating system thread, but also much more, such asthe parse tree, flags and states, references to the protocol handlersand the table handlers, object caches, and status variables.

One element of the thread initialization prolog which is a sepa-rate module is the parser. MySQL uses the GNU Bison parser gen-erator to create a parser for the language understood by MySQL

Page 6: Graduate Project Summary

1 CREATE DEFINER = ’root’@’localhost’ PROCEDURE ‘hello‘(2 INOUT str VARCHAR(100)3 )4 LANGUAGE SQL5 DETERMINISTIC6 CONTAINS SQL7 SQL SECURITY DEFINER8 COMMENT ’Outputs a greeting.’9 BEGIN

10 SET str = ’Hello, World!’;11 END;

Listing 1. A Basic MySQL Stored Routine

from a specification grammar. Bison would normally use the GNUFlex utility to generate a lexical analyzer to support the parser, butfor performance reasons MySQL uses a custom-built lexical ana-lyzer. The job of the parser is to create the parse tree, a data struc-ture which holds the class instances, structures, and flags whichrepresent the command requested by the client.

3.1.1 Creating Stored Routines

Suppose that the client sends the request in Listing 1. The threadprolog defined in Section 3.1 will execute, and the parser will pro-cess the routine definition. The most important object created bythe parser for stored routines is the sp sphead object summarizedin Listing 2.

The sp head object stores all of the information that applies tothe stored procedure as a whole. There are several fields of typeLEX STRING which the parser uses to store the parts of the origi-nal client request string. Not presented in Listing 2 are many classfunctions and fields related to the processing of individual instruc-tions within the procedure, which the parser is also responsiblefor creating from the definition. Note that the same sp head classis used for functions, procedures, and triggers, and that all threestored routine types have different execution functions in Listing2.These three different entry points differ only in their context andusage, however, and all three defer to the private execute func-tions for the actual execution of instructions.

For performance reasons, after the parser creates a new sp headobject, it is placed in the stored procedure cache. Procedures in thiscache are available across client threads, so unless the cache isflushed this newly defined stored procedure will be immediatelyaccessible to any client who has privileges to execute it. The rel-evant parts of the stored procedure definition are then stored in asystem table proc in the mysql schema (See Figure 4).

When a stored routine is called, the parser first processes theparameters passed, and for each one it creates an instance of theItem class and adds it to the value list in the parse tree structure.The sp head object representing this procedure is then retrieved.The stored procedure cache is checked first, and if no copy isfound there then the definition statement is retrieved from themysql.proc table and passed to the parser, which will create thesp head object. The appropriate function from Listing 2 is theninvoked (for example, execute procedure if the routine is astored procedure), passing in the value list created by the parser.

3.2 Design Changes

As mentioned in Section 3, the most important design choices to bemade are those decisions regarding how the database should link tothe Java runtime. In effect, since the MySQL database and the Java

1 class sp head :private Query arena2 {3 MEM ROOT main mem root;4 public:5 int m type;6 Create field m return field def;7 const char ∗m tmp query;8 st sp chistics ∗m chistics;9 ulong m sql mode;

10 LEX STRING m qname;11 bool m explicit name;12 LEX STRING m sroutines key;13 LEX STRING m db;14 LEX STRING m name;15 LEX STRING m params;16 LEX STRING m body;17 static void ∗ operator new(size t size) throw ();18 static void operator delete(void ∗ptr, size t size) throw ();19 sp head();20 void init(LEX ∗lex);21 void init sp name(THD ∗thd, sp name ∗spname);22 int create(THD ∗thd);23 virtual ˜sp head();24 bool execute trigger(THD ∗thd, const LEX STRING ∗

db name, const LEX STRING ∗table name,GRANT INFO ∗grant info);

25 bool execute function(THD ∗thd, Item ∗∗args, uint argcount,Field ∗return fld);

26 bool execute procedure(THD ∗thd, List<Item> ∗args);2728 private:29 sp pcontext ∗m pcont;30 DYNAMIC ARRAY m instr;31 bool execute(THD ∗thd);32 };

Listing 2. The sp head Class

Column Data Typedb char(64)name char(64)type enum(’FUNCTION’,’PROCEDURE’)specific name char(64)language enum(’SQL’)sql data access enum(...)is deterministic enum(’YES’,’NO’)security type enum(’INVOKER’,’DEFINER’)param list blobreturns longblobbody longblobdefiner char(77)created timestampmodified timestampsql mode set(...)comment char(64)character set client char(32)collation connection char(32)db collation char(32)body utf8 longblob

Table 4. Table mysql.proc

Page 7: Graduate Project Summary

runtime are already functional systems separately, this amountsto saying that the most crucial element of their integration is theboundary between the two systems.

The primary vehicle for that integration will be the Java NativeInterface (JNI). Section 3.2.1 discusses the JNI in general, andSection 3.2.2 discusses the design of a subsystem which managesthe JVM linkage using JNI. Section 3.2.3 discusses the choice ofwhere and how to store compiled Java code so that the databasecan find and execute it at runtime, and Sections 3.2.4 and 3.2.5discuss changes to the objects introduced in Section 3.1.1 to addJava routine functionality.

3.2.1 The Java Native Interface

The Java Native Interface is an API which provides a powerful bi-directional communication channel between native code and coderunning within the Java Virtual Machine. The JNI can be an idealframework with which to integrate C or C++ applications with Javaapplications. A brief introduction to the JNI will be presented here,but the interested reader can find much more detail in [8].

Since the Java Virtual Machine is not a specific software pack-age, but rather a standard which many vendors have provided im-plementations for, the features exposed through the JNI treat theinternal structure of the JVM as a black box. This is accomplishedthrough the use of the JNI environment pointer, defined in theheader file jni.h as type JNIEnv *. The environment pointer pro-vides an interface through which requests for services can be madefrom the JVM without revealing the internal structure of the virtualmachine.

The JNI basically allows running Java methods to call C or C++(“native”) functions, and it allows running C or C++ code to callmethods of Java classes. Calls from Java to native code are facili-tated by the native keyword in Java, which informs the compilerthat the definition of a method will be provided by a C or C++ func-tion from a library which will be linked at runtime. The appropriatefunction to call is determined either by following specific namingconventions and exporting the function from a shared library, orby explicitly registering the appropriate native function with theJVM at runtime. Making calls from native code to Java methodsis achived through the JNI invocation interface. The invocation in-terface allows native code to create an instance of the JVM, thencreate class instances within the created JVM and call methods onthose classes.

Since the JVM is multithreaded, the JNI provides a mechanismfor native code to interact with the JVM in a multithreaded way. Arequest can be made to attach the current native thread to the JVM,which creates a new instance of java.lang.Thread to representthe native thread in the JVM and provides an environment pointerto the native thread through which it can request JVM services.

Since the Java language allows method overloading, it is neces-sary to identify methods with both their name and their signature.The signature of a method is formatted using the internal signa-ture format defined in the JVM specification (see [9]). In this for-mat, primitive types are represented with a single character, andreference types have a form similar to Ljava/lang/String; inwhich the type name begins with L and ends with ; and consistsof the fully-qualified name in between, with packages separatedby slashes. Arrays of any type are represented by prepending anumber of [ characters equal to the depth of the array to the typename, so that a three-dimensional array of strings would be iden-tified as [[[Ljava/lang/String;. This type format is important

1 #include ”jni.h”23 class MyJVM {4 // Using latest version of JNI, version 1.45 static const jint vm version = JNI VERSION 1 4;6 // Singleton instance of this class7 static class MyJVM ∗myjvm;8 // The pointer to the JNI jvm descriptor9 JavaVM ∗jvm;

10 // Environment descriptor for main thread11 JNIEnv ∗env;12 public:13 static MyJVM ∗getMyJVM();14 int startMyJVM();15 int restartMyJVM();16 int shutdownMyJVM();17 ˜MyJVM();18 JNIEnv ∗attachThread();19 int detachThread();20 static const unsigned char sigmap[NUM STATES] [

NUM CHARS];21 static const unsigned char chmap[NUM ASCII CHARS];22 private:23 MyJVM();24 };

Listing 3. The MyJVM Class

to understand when working with the JVM, as many calls need tospecify either a variable type or a method signature in this way.

3.2.2 Linking to the Java Virtual Machine

Linking to the JVM will be acomplished by the class MyJVM, pre-sented in Listing 3. The class will encapsulate all of the JNI-relatedprocessing that needs to be done to create and attach to the virtualmachine, so that other parts of the server do not have to make JNIcalls or even include JNI headers.

The MyJVM class is implemented as a singleton. During theserver intialization process described in Section 3.1, the getMyJVM()function will be called for the first time, which will in turn call theprivate constructor to create the static instance myjvm. Subsequentcalls to the getMyJVM() function by native client threads will re-turn this static instance. This design guarantees that there will neverbe more than one JVM defined in a single database instance. Nativeclient threads can also call the attachThread() function to attachthe current native thread to this JVM.

The arrays sigmap and chmap implement the finite state ma-chine in Figure 2 which parses the language of method signaturesmentioned in Section 3.2.1. They are defined at the JVM level inpart because the internal method signature format is defined bythe JVM and in part because this ensures that the arrays will notbe defined more than once in the application. In Figure 2, transi-tions labelled with α represent the character set [a-zA-Z0-9$ ](which are defined in [9] to be legal to use as part of a Java classname), and the transitions labelled β represent the character class[ZBCSIJFD] (the JNI single character representations of the Javaprimitive types).

3.2.3 Locating Java Class Files

After linking to the JVM, the next most pressing decision to makeis where and how to store the Java code. The simplest solutionwould be to store the Java classes on the file system of the same

Page 8: Graduate Project Summary

0 1

2

3 4 5

6 7

8

9 10 11

12

13

(

β[

β

[

L

α

/

α

;

)

V

β

[

L

β

[

L α

α ;

/

α

Figure 2. Method Signature Parser

physical or virtual machine that the database instance is runningon. This is, in fact, the solution which the ISO specifications [2, 5]assume systems will use. There are a number of potential problemswith this solution, however. The most pressing concern is that thissolution requires that all developers who will be allowed to writeJava routines be given access to the file system that the databaseresides on. The database is a very sensitive resource, and accessto the file system of a server is a very powerful privilege to granton such a sensitive resource. Further, this increases the surfacearea which must be reliably secured. Security concerns aside, stor-ing Java code locally also makes administration more difficult, asdatabase administrators would then have to work partly with thefile system and partly with the database to properly manage privi-leges and resolve issues.

Given these drawbacks, this project will not assume that Javacode is stored in individual class files on the database server.Rather, the Java code will be stored in a table in the mysql schema.Of course, this means that ultimately the Java code is stored on thefile system used by the database, but this code will be stored in fileswhich are already secured and managed by the database itself, andadministrative tasks related to this code can be carried out usingonly the features of the database. Table 5 describes the jclasstable which Java code will be stored in.

The biggest drawback to storing java code in the database itselfis that the runtime environment will not know how to find it. Javacode is located by the runtime with the use of Classloaders, and thedefault Classloader searches the file system directories specified bya classpath variable to find class bytecode representations whenresolving new class references. However, for flexibility, customClassloaders can be created which locate Java bytecode by othermeans, and in fact these Classloaders can be arranged hierarchi-cally such that a Classloader delegates the task of locating a classfirst to its parent, and then employs its own techniques if the parentClassloader is unable to find the requested class (see [7] for a morethorough treatment of the subject).

To locate the Java class files stored in the jproc table, the cus-tom Classloader MyClassLoader (see Listing 4) will be added tothe Classloader chain of the first class defined as part of executinga Java routine. For performance reasons, the work of actually re-trieving the class definition from the database is done by the nativemethod findClass0, which makes a direct call into the MySQL

Field Type Descriptionclass name varchar(200) The fully-qualified name

of the classpackage name varchar(100) The package which the

class resides ininternal name varchar(200) The fully qualified name

of the class, in JVM inter-nal format

library name char(50) The name of the JARarchive which this classwas loaded from

short name varchar(100) The unqualified name ofthe class

major version tinyint(3) The major class versionnumber

minor version tinyint(3) The minor class versionnumber

platform version enum(...) The java platform versionwhich this class was com-piled under

is interface enum(...) Indicates whether or notthis class is an interface

modifiers set(...) Indicates what modifierswere listed for this class

size int(10) The size of the bytecodefor this class, in bytes

created timestamp The date this class wasloaded into the database

bytecode longblob The binary definition ofthis class

Table 5. The mysql.jclass Table

1 package edu.sacredheart.cs.myjvm.launcher;23 public final class MyClassLoader extends ClassLoader {4 @Override5 protected Class<?> findClass(String name) throws

ClassNotFoundException { ... };6 private native byte[] findClass0(String className);7 }

Listing 4. The MyClassLoader Class

table handler for the jclass table and retrieves the bytecode. Asimilar set of native calls for more general features such as exe-cuting queries, opening and iterating over cursors, and managingtransaction could be the basis for a fully native JDBC Driver.

In addition to locating class files, the decision to store Javacode in the database also raises the question of how to catalogmethods and resources. As for methods, to simplify the design,only static class methods will be permissible as Java routines. Anyattempt to allow instance methods to be used as Java routineswould necessarily imply that the database has to have a means ofcreating class instances. Further, restricting routines defintions tostatically defined methods imposes no loss of generality, since astatic wrapper method could be written to perform any instantiationwhich the database itself could be expected to perform. To keepa catalog of which methods are available in which classes, thetool which loads classes into the mysql.jclass table should alsopopulate the mysql.jmethod table described in Table 6. This tabletracks which static methods are available in which classes, andprovides method level details for summary and analysis.

Page 9: Graduate Project Summary

Field Type Descriptionsignature varchar(1000) The fully-qualified class

name and parameter listfor the method

class name varchar(200) The fully-qualified classname for the method

method name varchar(100) The name of the methodmethod descriptor varchar(500) The JNI method descrip-

tor for the methodnum args int(11) The number of parame-

ters the method acceptshas return enum(...) Indicates whether or not

this method has a returnvalue

return type varchar(100) If the method has a returnvalue, this is the fully-qualified type which is re-turned

modifiers set(...) The list of modifierswhich the method wasdefined with

throws exceptions enum(...) Indicates whether or notthis method throws anychecked exceptions

exceptions varchar(300) A list of the exceptionsthrown by this method, ifany

Table 6. The mysql.jmethod Table

Field Type Descriptionresource name varchar(200) The file name (minus the

path)file name varchar(300) The file name, with the

patch includedpackage name varchar(100) The name of the java

pacakge which this re-source is contained in

library name char(50) The name of the JAR filewhich this resource wasloaded from

size int(10) unsigned The size of this resource,in bytes

contents longblob The resource, representedin raw binary form

Table 7. The mysql.jresource Table

It is also necessary to track class resources in the database.Resources are file system objects which would be stored with theJava class file definitions and accessible at runtime. Frequently,this includes objects like property configuration files, XML-basedconfiguration files, or documents like XSD Schemas. As withclass files, resource files will be stored in the database, in themysql.jresource table defined in Table 7.

3.2.4 Creating Java Stored Routines

A primary goal of this project is that calling Java routines shouldbe as similar as possible to calling native routines. From a designperspective, that means that the classes and tables presented in Sec-tion 3.1.1 should also be used to represent Java routines. Making allchanges internally within the functions which are already definedin these classes will ensure that calling and executing Java routines

1 CREATE DEFINER = ’root’@’localhost’ PROCEDURE ‘hello‘(2 IN str VARCHAR(100)3 )4 LANGUAGE JAVA PARAMETER STYLE JAVA5 EXTERNAL NAME ’edu.sacredheart.cs.myjvm.hello.Hello(

java.lang.String)’;6 DETERMINISTIC7 CONTAINS SQL8 SQL SECURITY DEFINER9 COMMENT ’Outputs a greeting.’;

Listing 5. A Basic Java Stored Routine

Column Data Type Changeexternal name varchar(1000) Column addedlanguage enum(’SQL’,’JAVA’) Column can now store ei-

ther SQL or JAVAis external enum(’YES’,’NO’) Column addedbody longblob Can now be null, since

the ‘body’ of externalroutines is stored else-where

body utf8 longblob Can now be null, sincethe ‘body’ of externalroutines is stored else-where

Table 8. Changes to the mysql.proc table

is as seamless as possible.

Changes will obviously have to be made to the grammar itself,to accomodate the slightly different syntax required for definingJava routines. Note that there are directives in Listing 1 betweenthe end of the parameter list and the beginning of the body ofthe routine. These directives are referred to as the characteristicsof the routine. The ISO standard [5] distinguishes Java routinesfrom native ones using a new set of options for these character-istics, as in Listing 5. Specifically, the LANGUAGE characteristicmay now specify JAVA, and an optional PARAMETER STYLE JAVAcharacteristic may now appear. Routines defined in languages otherthan the database native language are referred to as external lan-guages in the specifications, so the characteristic EXTERNAL NAMEis followed by a string which tells the database where to find thecode for the routine. For example, in Listing 5, the bytecode forthe class edu.sacredheart.cs.myjvm.hello should be in themysql.jclass table, and this class should have a static methodnamed Hello described in the mysql.jmethod table which takesa single String argument.

As mentioned in Section 3.1.1, the most important data struc-tures in the creation of a stored procedure are the table mysql.procand the class sp head. The mysql.proc table will be modifiedas summarized in Table 8. The design of the sp head class willnot change very much (of course the implementation of some ofthe functions in it will need modification), but a single new pri-vate class variable of type MyJThread will be added. See Listing 7in Section 3.2.5 for a description of the MyJThreadClass class.Note that the sp head class in Listing 2 has a member variable ofpointer type st sp chistics. This structure defines the charac-teristics of the routine, and the definition of this structure with theneeded changes for Java routines is presented in Listing 6.

Page 10: Graduate Project Summary

1 struct st sp chistics2 {3 LEX STRING comment;4 enum enum sp suid behaviour suid;5 bool detistic;6 enum enum sp data access daccess;7 enum enum sp lang splang;8 bool external;9 LEX STRING extname;

10 };

Listing 6. The modified st sp chistics structure

1 #include ”myjvm.h”23 class MyJThread4 {5 MyJVM ∗myjvm;6 JNIEnv ∗env;7 THD ∗thd;8 jobject jclassLoader;9 MyJThread(const MyJThread &);

10 void operator=(MyJThread &);1112 public:13 static void ∗operator new(size t size, THD ∗mythd) throw ();14 static void operator delete(void ∗ptr) throw ();15 MyJThread();16 ˜MyJThread();17 inline JNIEnv ∗get env() { return env; };18 inline THD ∗get thd() { return thd; };19 int run jmethod(sp head∗ const sph, int nargs, Item field ∗

retval);20 private:21 int parseSignature(st invocation ∗invk);22 };

Listing 7. Class MyJThread

3.2.5 Calling Java Stored Routines

Once Java routines are created, the syntax for calling them willbe exactly the same as for native routines. After the parsing of acall statement for a Java routine or a select statement includ-ing a Java function, either the execute procedure function orthe execute function function of the sp head class is called.These functions will check the splang member of the characteris-tics structure, and if it indicates the procedure is a Java routine, thena new instance of the MyJThread class summarized in Listing 7 iscreated.

The MyJThread class is intended to encapsulate all of the JNIwhich is needed to invoke Java classes, so that the rest of the coreserver library can simply make use of the MyJThread API insteadof using JNI directly. When a new MyJThread is created, the con-structor attaches the current native thread to the JVM and createsan instance of the edu.sacredheart.cs.myjvm.launcher.-MyClassLoader class to use for loading Java class files from themysql.jclass table.

After creating a new MyJThread, the sp head instance can callthe run jmethod function. This function will create an instance ofthe class for which the desired Java routine is a member by usingthe MyClassLoader instance created in the constructor. The func-tion will then translate each of the parameters of the routine (whichare currently of type Item field) into data types that the JVM canuse. This translation of data types from MySQL formats to JVM

1 #include ”jni.h”2 #include ”myjthread.h”34 class JParam {5 MyJThread ∗jthd;6 jparam type type;7 String ∗base type name;8 bool primitive;9 int arrdepth;

10 jvalue jval;11 JParam(const JParam &);12 void operator=(JParam &);1314 public:15 static void ∗operator new(size t size, MyJThread ∗jthread)

throw ();16 static void operator delete(void ∗ptr) throw ();17 JParam(const char ∗type name, bool is primitive, int

array depth);18 ˜JParam();19 int set value(Item field ∗ifld);20 int get retval(Item field ∗item, jvalue jni ret, String ∗∗result);21 jparam type get type();22 jvalue get jvalue();23 bool is string(jobject obj);2425 private:26 jparam type get primitive type(char ptype);27 int get byteorder();28 inline void endian swap(unsigned short& x) { x = (x>>8) | (

x<<8); };29 int get ucs2 str(String ∗paramstr, String ∗ucs2str);30 };

Listing 8. Class JParam

formats is complicated enough to deserve a class dedicated to it,which is the purpose of the JParam class summarized in Listing 8.

After converting all of the parameters to JVM types, theMyJThread instance will call the target method with JNI, passingin the converted parameter types, and will store the return value. Ifan uncaught Java exception occurrs while processing the method,then an error message is sent back to the client. Otherwise, if thereturn type was not void and the routine is a stored function, thenthe return value is set in the stored procedure runtime context andprocessing continues as normal. The invocation process is illus-trated in Figure 3.

4. ImplementationThe design elements in Section 3 were implemented in a build ofMySQL version 5.1.39. A number of changes were necessary tointroduce the new design elements or modify the existing ones,but the most architecturally important ones involved linking tothe virtual machine (Section 4.1), making the necessary changesto the lexical analyzer and the grammar (Section 4.2), creating aframework for native classloading (Section 4.3), and implementingthe invocation of routines (Section 4.4).

4.1 Linking to the JVM

Since the Java runtime is not part of the standard MySQL build atall, the first major implementation issue to complete is a modifica-tion of the build system to link the code to the JVM. Linking thecode to the JVM requires that the static or shared libraries whichexport the functions that are needed by the MySQL code be avail-able to the compiler and linker, and that the header files declaring

Page 11: Graduate Project Summary

:Server :Parser :SpHead :MyJThread :Loader :JParam :JVM

t

parse()

create()

call()

create()

attach()

create()

run jmethod(sp head *sph, Item field *params)

loadClass()

jparams = create(params)

invoke(jparams)

return()

return()

return()

Figure 3. Java Routine Invocation

any needed prototypes are available to the compiler. To meet theserequirements, the shared library jvm.dll and the import libraryjvm.lib (for Windows platforms) were copied to the sql/libsource code directory, and the header file jni.h was copied to thesql/include source code directory. These files are available fromany standard Java Development Kit.

MySQL uses a cross-platform build system named CMake1

to manage the build process. CMake allows the developer to de-fine abstract libraries, which be sets of code files from the currentproject, code files from other project, or shared native libraries.The CMake buils system is rather interesting in that it does not ac-tually build the project. Rather, it generates a configuration file forthe development file or build system of your choice. For instance,on Windows platforms, CMake can generate Visual Studio solutionfiles, and on Linux platforms it can generate makefiles. The CMakesystem maintains a set of properties for each library that the userdefines, and allows these libraries to be linked, and when it is runthe appropriate commands or syntax will be generated in the targetbuild system to effectively carry out the declared directive. Themajor changes made to the CMake configuration file are presentedin Listing 9.

Although making the shared JVM library, the imported JVMlibrary (on Windows), and the jni.h header file available to thebuild system is enough to compile and link the application, the fullJava Runtime Environment is required when executing the applica-tion in order for the application to operate successfully. Further, theJRE must be compatible with the shared JVM library linked by thebuild system. Compiling the application under a Java 6 JVM andthen running the application under a Java 5 JRE will likely lead tocrashes.

1 http://www.cmake.org

1 SET (JVM HOME ${PROJECT SOURCE DIR}/sql/lib )2 SET (JNI HOME ${PROJECT SOURCE DIR}/sql/include )34 INCLUDE DIRECTORIES( ${JNI HOME}/include )56 ADD LIBRARY(jvm SHARED IMPORTED)78 SET TARGET PROPERTIES(jvm PROPERTIES9 IMPORTED IMPLIB ${JVM HOME}/lib/jvm.lib

10 IMPORTED LOCATION ${JVM HOME}/lib/jvm.dll11 IMPORT PREFIX ””12 IMPORT SUFFIX .dll13 )1415 SET (MYSQLD CORE LIBS mysys zlib dbug strings yassl

taocrypt vio regex sql jvm)16 TARGET LINK LIBRARIES(mysqld ${

MYSQLD CORE LIBS} ${MYSQLD STATIC ENGINE LIBS})

Listing 9. CMakeList.txt

4.2 Modifying the Grammar

Fortunately, since the MySQL language for routines is alreadystrongly compliant with the ISO standard [1], only a fairly smallset of changes had to be made to the language processing subsys-tem. Since new keywords need to be added to the grammar, thefirst changes to make are in the lexical analyzer. MySQL uses acustom lexical analyzer which relies on constructing a perfect hashof symbols at compile time. The symbols are defined in lex.h,and the keywords JAVA, PARAMETER, STYLE, EXTERNAL, and NAMEwere added to the definition of syntactic symbols.

For parsing, MySQL uses the GNU Bison parser generator.Bison creates the parser from a language specification grammar,which for MySQL is defined in the file sql yacc.yy. For eachsymbol added to lex.h, a corresponding %token was added tothe header of the grammar. The production rule for stored routinecharacteristics (See Section 3.1.1 for a discussion and examples ofroutine characteristics) was then modified as in Listing 10. Notethe presence of the LANGUAGE SYM JAVA SYM rule, which allowsa routine to be declared as a Java routine, and the EXTERNAL SYMNAME SYM TEXT STRING sys rule which sets the external prop-erty of the sp chistics object in the parse tree and stores the fullyqualified name of the Java method in the extname field.

Beyond this, only two other changes to the grammar are nec-essary. A stored routine is normally ended with an sp proc stmtproduction rule, which can be a single statment or a BEGIN...ENDblock. External routines will not have such a statement, however,as the “body” of external routines is defined in a separate code file.Listing 11 relaxes the condition that a stored procedure statementcannot be empty for external routines. Additionally, stored func-tions must include a RETURN statement as one of the statements inthe sp proc stmt body. However, external functions will not havesuch a RETURN statement, as the return value will be managed sep-arately by the language runtime. Listing 12 shows modifications tothe sf tail production rule which relax this constraint for exter-nal functions.

4.3 Classloading

Implementing native classloading was one of the most interestingchallenges of this project. The design ideas were discussed in Sec-tion 3.2.3, but a number of choices remain for implementation.

Page 12: Graduate Project Summary

1 /∗ Characteristics for both create and alter ∗/2 sp chistic:3 COMMENT SYM TEXT STRING sys4 { Lex−>sp chistics.comment= $2; }5 | LANGUAGE SYM SQL SYM6 { Lex−>sp chistics.splang= SP LANG SQL; }7 | LANGUAGE SYM JAVA SYM8 { Lex−>sp chistics.splang= SP LANG JAVA; }9 | PARAMETER SYM STYLE SYM JAVA SYM

10 { /∗Parse, but take no other action at this time∗/ }11 | EXTERNAL SYM NAME SYM TEXT STRING sys12 { Lex−>sp chistics.external= TRUE; Lex−>

sp chistics.extname= $3; }13 | NO SYM SQL SYM14 { Lex−>sp chistics.daccess= SP NO SQL; }15 | CONTAINS SYM SQL SYM16 { Lex−>sp chistics.daccess= SP CONTAINS SQL;

}17 | READS SYM SQL SYM DATA SYM18 { Lex−>sp chistics.daccess=

SP READS SQL DATA; }19 | MODIFIES SYM SQL SYM DATA SYM20 { Lex−>sp chistics.daccess=

SP MODIFIES SQL DATA; }21 | sp suid22 {}23 ;

Listing 10. Bison Production Rule for Characteristics

1 sp proc stmt:2 /∗ Empty ∗/3 {4 // Have to allow potentially empty routine body

statements now for5 // external Java routines, but it should still be an

error for native routines.6 if(!Lex−>sp chistics.external)7 {8 my error(ER SP NOBODY, MYF(0));9 MYSQL YYABORT;

10 }11 }12 | sp proc stmt statement13 | sp proc stmt return14 | sp proc stmt if15 | case stmt specification16 | sp labeled block17 | sp unlabeled block18 | sp labeled control19 | sp proc stmt unlabeled20 | sp proc stmt leave21 | sp proc stmt iterate22 | sp proc stmt open23 | sp proc stmt fetch24 | sp proc stmt close25 ;

Listing 11. Bison Production Rule for Routine Bodies

1 sp proc stmt /∗ $15 ∗/2 {3 THD ∗thd= YYTHD;4 LEX ∗lex= thd−>lex;5 sp head ∗sp= lex−>sphead;67 if (sp−>is not allowed in function(”function”))8 MYSQL YYABORT;9

10 lex−>sql command=SQLCOM CREATE SPFUNCTION;

11 sp−>set stmt end(thd);12 if ( !( (sp−>m flags & sp head::HAS RETURN) ||

sp−>m chistics−>external ) )13 {14 /∗Error if a native function has no return value (

Not a problem for for external proceduresthough)∗/

15 my error(ER SP NORETURN, MYF(0), sp−>m qname.str);

16 MYSQL YYABORT;17 }

Listing 12. Bison Production Rule for Function Returns

1 struct st bytecode2 {3 const char ∗name;4 size t len;5 void ∗data;6 };78 struct st bytecode ∗ sp find jclass(THD ∗thd, const char ∗name

);

Listing 13. The st bytecode Structure

The structure st bytecode (See Listing 13) was defined in themyjvm.h file to contain the fields necessary to represent a class filein memory, and the function sp find jclass was defined in sp.hwhich will return the appropriate instance of this structure for theclass file name given. The sp find jclass function acutally callsthe private db find jclass function (presented in Listing 14) in-ternally. This function opens the mysql.jclass table by placingthe appropriate locks on it and retrieving a table handler (an in-stance of TABLE*) to operate on the table. An index scan is usedto locate the desired row, and the values in the row are populatedinto the fields of the st bytecode structre, which is returned tothe caller.

When an instance of the MyJThread class is first created, itwill call the sp find jclass function, passing the class nameedu.sacredheart.cs.myjvm.launcher.MyClassLoader, whichwas presented in Listing 4. The raw bytecode returned in thest bytecode structure will be passed to the JNI function DefineClass,which will parse the bytecode and create a CLASS object inmemory. Recall that the defineClass0 method of this class-loader was defined as native (See Listing 4). At runtime, na-tive methods have to be appropriately linked to a C++ func-tion or a LinkageError will be thrown. Typically, this link-age is accomplished by writing the desired native function in ashared library according to strict naming conventions, and thendynamically loading the library at runtime with a call in thestatic initializer of the Java class containing the native method.In this instance, however, it is more convenient to simply linkthe native method to its implementation with a direct JNI call

Page 13: Graduate Project Summary

1 static int2 db find jclass(THD ∗thd, const char ∗name, st bytecode ∗∗clazz

)3 {4 TABLE ∗table;5 int ret;6 char ∗ptr;78 ∗clazz= 0; // In case of errors9 if (!(table= open jclass table for read(thd, &

open tables state backup)))10 DBUG RETURN(SP OPEN TABLE FAILED);1112 st bytecode ∗tmp= (st bytecode ∗) alloc root( thd−>mem root

, sizeof(st bytecode) );13 if ((ptr= get field(thd−>mem root,14 table−>field[15 MYSQL JCLASS FIELD INTERNAL NAME16 ])) == NULL)17 {18 ret= SP GET FIELD FAILED;19 goto done;20 }21 tmp−>name= ptr;22 tmp−>len= table−>field[MYSQL JCLASS FIELD SIZE

]−>val int();2324 if ((ptr= get field(thd−>mem root,25 table−>field[26 MYSQL JCLASS FIELD BYTECODE27 ])) == NULL)28 {29 ret= SP GET FIELD FAILED;30 goto done;31 }3233 tmp−>data= ptr;34 (∗clazz)= tmp;3536 close system tables(thd, &open tables state backup);37 table= 0;3839 ret= SP OK;40 ...41 DBUG RETURN(ret);42 }

Listing 14. The db find jclass Function

named RegisterNatives. The MyJThread constructor will reg-ister the findClass0 method of the MyClassLoader class to afunction named get jclass bytes, which is presented in List-ing 16. The MyJThread constructor then creates an instance of theMyClassLoader class, which will be used as the defining class-loader for the class which is called when the MyJThread object isexecuted. See Listing 15 for the full MyJThread constructor (withsome exception handling elided).

At this point, native classloading is now setup. If the Java classinvoked when the MyJThread instance runs encounters a class def-inition which is not yet defined in the runtime, the defining class-loader for that class (namely, the MyClassLoader instance whichwas created in the MyJThread constructor) will first delegate toits parent classloader (which would be the bootstrap classloader,in this instance). The bootstrap classloader will be unable to findthe class definitions, since they exist in database tables and noton the file system, so it will indicate failure. The MyClassLoaderinstance will then call the native method defineClass0, which

1 MyJThread::MyJThread() {2 myjvm= MyJVM::getMyJVM();3 env= myjvm−>attachThread();4 ...5 // Thread Bootstrapping: Natively define the

MyClassLoader loader6 st bytecode ∗my cls loader cd= sp find jclass(thd, ”edu

.sacredheart.cs.myjvm.launcher.MyClassLoader”);

78 jclass myClassLoaderClass= env−>DefineClass(

my cls loader cd−>name, NULL, (jbyte ∗)my cls loader cd−>data, my cls loader cd−>len);

9 ...10 jclass launchClassLoader = env−>FindClass(

MY JAVA ENTRY CLASSLOADER);11 // Linkage: Native method registration12 JNINativeMethod nm;13 nm.name= ”findClass0”;14 nm.signature= ”(Ljava/lang/String;)[B”;15 nm.fnPtr= &get jclass bytes;16 env−>RegisterNatives(launchClassLoader, &nm, 1);17 ...18 // Create classloader instance19 jmethodID myClassLoaderCtor= env−>GetMethodID(

myClassLoaderClass, ”<init>”, ”()V”);20 ...21 jclassLoader= env−>NewObject(myClassLoaderClass,

myClassLoaderCtor);22 ...23 }

Listing 15. MyJThread Constructor

1 #include ”myjthread.h”23 jbyteArray JNICALL get jclass bytes(JNIEnv ∗env, jobject obj,

jstring lkp class nm)4 {5 // Get the THD∗ for this pthread6 THD ∗thd= my pthread getspecific ptr(THD∗, THR THD);7 const char ∗class name = env−>GetStringUTFChars(

lkp class nm, false);8 st bytecode ∗clazz= sp find jclass(thd, class name);9 env−>ReleaseStringUTFChars(lkp class nm, class name);

10 jbyteArray data= env−>NewByteArray(clazz−>len);11 env−>SetByteArrayRegion(data, 0, clazz−>len, (jbyte ∗)

clazz−>data );12 return data;13 }

Listing 16. MyClassLoader Callback

has been linked to the function get jclass bytes. Note thatcode running inside the JVM is making a direct call to theget jclass bytes function, which is running in the same processspace, rather than opening a new Socket and making a database re-quest over JDBC. This is a strong advantage to the tight couplingthat JNI provides for an integration project like this, and it is easyto see how a similar set of linked functions could be created toconstruct a fully native JDBC driver.

4.4 Invocation

With classloading setup and working, all that remains is to createthe functions which invoke the target methods of Java routines.

Page 14: Graduate Project Summary

This responsibility belongs to the run jmethod function, whichis presented in Listings 17 and 18. The first half of the functiondeals with finding and loading the correct Java class file, findingthe method details from the mysql.jmethod table, parsing themethod signature, and using the JParam class to translate the rou-tine parameters from MySQL data types to Java data types. Thesecond half of the function makes JNI calls to execute the method,passing in the correctly translated parameters, and saves the returnvalue (if the return type is not void), which is then set as the returnvalue of the routine.

The run jmethod function starts by calling the sp find jmethodfunction. This function similar to the sp find jclass functionin that it accesses the mysql.jmethod table with native tablehandlers and stores the information for the desired method in astructure in memory. The parser in Figure 2 is then called on thisstructure, and it will parse the method signature and create an ar-ray of parameter and return types appropriate for this method. Thesp find jclass method is then called to get the bytecode forthe class which defines the target method. This class is then de-fined using a JNI DefineClass call. Note that the instance of theMyClassLoader class created in the constructor is passed in thiscall to DefineClass. This makes the jClassLoader instance ofthe MyClassLoader class the defining classloader for this class,which means that the jClassLoader instance will be called uponwhen any other unknown class is encountered in this method call(as described in Section 4.3). The JNI ID of the target method isthen retrieved with a JNI call to GetStaticMethodID. The reasonfor requiring methods which implement Java routines to be staticis evident here - if instance methods were allowed, what instancewould be used to get the JNI Method ID? No instance of the targetclass is readily available, although the class itself is, which makesretrieving static methods straightforward.

The final loop in Listing 17 translates the routine parametersfrom their MySQL data type (an instance of the Item field class)to their associated Java data type. This is done primarily with theget jvalue function in the JParam class, which is dedicatedsolely to translating data between MySQL types and Java types.All MySQL integer types are allowed to translate to some Java nu-meric type, and the MySQL float and double types will translateas well. The exact precision numeric type cannot be translated toany Java primitive, although in the future it could be translated toa Java object type. It is worth noting that MySQL allows integerdata types to be either signed or unsigned, whereas Java allowsonly signed types. This means that an incompatibility may ariseat runtime if the value passed in an unsigned mysql type is toolarge to fit into the corresponding Java signed type. For example,the TINYINT type in MySQL is a one-byte value, so its unsignedvariant can store numbers from 0 to 255. The Java byte primitiveis also a one-byte value, but it is always signed so it can only acceptvalues from −128 to 127. If an unsigned TINYINT with a value of250 is passed to a method where a byte is accepted, an exceptionwill be raised and the caller will be notified.

Some special consideration is given to character types duringdata type translation. The MySQL CHAR and VARCHAR data typeswill map to the Java types byte[], char[], or java.lang.String.The implementation of this mapping needs careful treatment, how-ever, since MySQL and Java use different character set encodingsfor strings. MySQL allows character data to be stored in manyencodings, as listed in Table 3. Java, on the other hand, stores allstring and character data internally using the UTF-16 encoding.MySQL does not currently have support for UTF-16, although itdoes support the older UCS2 encoding, and UTF-16 is backwards-

1 int MyJThread::run jmethod(sp head∗ const sph, int nargs,Item field ∗retval)

2 {3 st invocation ∗invk= sp find jmethod(thd, sph−>

m chistics−>extname.str);45 int sig parse ret= this−>parseSignature(invk);6 ...7 st bytecode ∗target def= sp find jclass(thd, invk−>

className);8 ...9 jclass target class= env−>DefineClass(target def−>

name, jclassLoader, (jbyte ∗) target def−>data,target def−>len);

10 ...11 jmethodID target method= env−>GetStaticMethodID(

target class, invk−>methodName, invk−>internalSignature);

12 ...13 sp rcontext ∗rctx= thd−>spcont;14 jvalue ∗target method params= (jvalue ∗) alloc root(thd−>

mem root, nargs ∗ sizeof(jvalue) );15 if(nargs)16 {17 for(int k= 0; k < nargs; k++)18 {19 Item field ∗nxt arg= (Item field ∗) rctx−>get item(k);20 int cast failed= invk−>jparams[k]−>set value(nxt arg);21 if(cast failed)22 {23 my error(ER JPARAM CAST, MYF(0), k+1, invk−>

fullSignature);24 return ER JPARAM CAST;25 }26 target method params[k]= invk−>jparams[k]−>

get jvalue();27 }28 }

Listing 17. Invoking Java Routines (Part 1)

compatible with UCS2. The general procedure, then, will be toconvert MySQL strings from whatever encoding they are currentlyin to UCS2, and the create Java strings or characters using this UCS2data. There is one more implementation issue here, though, andthat is the fact that UCS2 is a multi-byte data format (each characteris represented by two bytes). This means that big-endian and little-endian systems may have different expectation as to the layout ofthis data in memory. The JParam class therefore has functions todetect the endian-ness of the platform and swap the bytes in eachUCS2 character if the translation is not in the correct endian modefor the platform. The allowed data type translations are listed inTable 9, in which square brackets indicate the MySQL type canmap to a one-dimensional array of the Java type, ‘Y’ indicates thatthe two types can map, and ‘S’ indicates that the two types can mapbut that unsigned values could potentially overflow.

The rest of the run jmethod function is presented in List-ing 18. After translating the routine parameters to Java types, thetarget method is invoked. If the return type is void, the functionreturns, otherwise the return value is stored in the jni ret vari-able. The JParam class is then used to translate this value backinto a MySQL type, which is stored in the return value field. At thispoint the Java routine has been called successfully, and the MySQLserver completes via its usual path and returns results to the callerover the network connection.

Page 15: Graduate Project Summary

1 jvalue jni ret;23 switch(invk−>jreturn−>get type())4 {5 case JPARAM TYPE VOID :6 env−>CallStaticVoidMethodA(target class, target method

, target method params);7 ...8 break;9 case JPARAM TYPE BOOLEAN :

10 jni ret.z= env−>CallStaticBooleanMethodA(target class,target method, target method params);

11 break;12 case JPARAM TYPE BYTE :13 jni ret.b= env−>CallStaticByteMethodA(target class,

target method, target method params);14 break;15 case JPARAM TYPE CHAR :16 jni ret.c= env−>CallStaticCharMethodA(target class,

target method, target method params);17 break;18 case JPARAM TYPE SHORT :19 jni ret.s= env−>CallStaticShortMethodA(target class,

target method, target method params);20 break;21 case JPARAM TYPE INT :22 jni ret.i= env−>CallStaticIntMethodA(target class,

target method, target method params);23 break;24 case JPARAM TYPE LONG :25 jni ret.j= env−>CallStaticLongMethodA(target class,

target method, target method params);26 break;27 case JPARAM TYPE FLOAT :28 jni ret.f= env−>CallStaticFloatMethodA(target class,

target method, target method params);29 break;30 case JPARAM TYPE DOUBLE :31 jni ret.d= env−>CallStaticDoubleMethodA(target class,

target method, target method params);32 break;33 case JPARAM TYPE OBJECT :34 jni ret.l= env−>CallStaticObjectMethodA(target class,

target method, target method params);35 break;36 case JPARAM TYPE UNKNOWN :37 // Fall−through38 default :39 return 1;40 }41 ...42 String ∗ret bytes;43 if(invk−>jreturn−>get retval(retval, jni ret, &ret bytes))44 {45 return 1;46 }47 retval−>save str value in field(retval−>field, ret bytes);48 return 0;49 }

Listing 18. Invoking Java Routines (Part 2)

Java TypesB S I J F D Z C Str

M CHAR [] [] Yy BINARY []S TEXT [] [] YQ BLOBL ENUM

SETT BIT S Yy TINYINT S Y Y Y Yp BOOLEAN S Y Y Y Ye SMALLINT S Y Ys MEDIUMINT S Y

INT S YBIGINT SFLOAT YDOUBLE YDECIMALDATEDATETIMETIMESTAMPTIMEYEAR S Y Y

Table 9. Allowed data type translations

5. Summary and Future WorkThe design of the project’s major architectural elements leavesroom for the addition of several new features in the future. Severaloptimization features could improve the performance or memoryprofile of Java routines, and a number of additional features couldbe added which would considerably extend the flexibility or usabil-ity of Java routines in the server.

For performance optimization and administration purposes, itwould be interesting to create several new server variables relatedto Java routines. MySQL server variables control many aspects ofthe system, such as the size of certain object caches and memorypools. Modifying these configuration variables is an important partof performance optimization, and it could be important to manageJVM variables in the same way. For instance, such variables couldcontrol the amount of memory allocated to the JVM, or the size ofthe stack allocated for each thread.

To further optimize performance, a caching structure could beimplemented for the Java bytecode lookups and the Java routinedefinitions. MySQL makes use of caches for many other databaseobjects, including stored procedure definitions, statements, andsome result sets, all to good effect.

It would be interesting to integrate the Java Authentication andAuthorization Services into the system, so that user-based accesscontrol could be seamlessly integrated. Such a security solutionwould ideally also involve extending the set of available GRANTand REVOKE targets, so that database administrators could manageaccess to sensitive Java resources the same way they manage accessto sensitive database resources.

The most interesting additional feature that could be added tothis framework would be support for a fully native JDBC driver. Anative driver would give much better performance for JDBC callsthan routing requests and responses over TCP, even if the packetsare travelling over the loopback interface on the server. A nativeJDBC driver would take full advantage of the fact that the JVMand the database are running in the same process space, and thenative classloader described in Section 4.3 demonstrates that the

Page 16: Graduate Project Summary

framework would support such a driver.

An even more ambitious goal would be to implement theother half of the ISO specification and bring user defined typesto MySQL using the Java language. This would require a muchmore extensive change to the grammar than that which was imple-mented here, but the payoff could be worth the effort, as MySQLdoes not support any form of user-defined types at the present time.

Finally, the data type translation layer could be extended. Thislayer currently support translations for basic interger and floatingpoint types, as well as character and string types. Support could beadded for date and time types, exact numeric types, and even moreexotic types like ENUM, SET, or GEOMETRY types.

The primary goal of this project, however, which was to build arobust and extensible framework for linking the MySQL databaseserver to the Java runtime environment, has been very successful.The MySQL/JVM framework provides a fully functional environ-ment for loading, creating, and calling Java routines; a manage-able framework for storing and locating class files; and a well-encapsulated API for invoking Java methods and translating datatypes. Further optimizations could be applied, and more featurescould be added, but the system as it stands even now can bring thepower of the Java language and its class library to MySQL storedroutines.

References[1] I. O. for Standardization (ISO). Information technology–database

language–sql, standard no. iso/iec 9075:2003, 2003.

[2] I. O. for Standardization (ISO). Information technology–databaselanguage–sql– part 13: Sql routines and types using the java program-ming language (sql/jrt), standard no. iso/iec 9075-13:2003, 2003.

[3] I. O. for Standardization (ISO). Information technology–databaselanguage–sql– part 4: Persistent stored modules (sql/psm), standardno. iso/iec 9075-4:2003, 2003.

[4] I. O. for Standardization (ISO). Information technology–databaselanguage–sql, standard no. iso/iec 9075:2008, 2008.

[5] I. O. for Standardization (ISO). Information technology–databaselanguage–sql– part 13: Sql routines and types using the java program-ming language (sql/jrt), standard no. iso/iec 9075-13:2008, 2008.

[6] L. Gong and G. Ellison. Inside Java(TM) 2 Platform Security: Archi-tecture, API Design, and Implementation. Pearson Education, 2003.ISBN 0201787911.

[7] J. Gosling, B. Joy, G. Steele, and G. Bracha. Java Language Specifi-cation, Second Edition: The Java Series. Addison-Wesley LongmanPublishing Co., Inc., Boston, MA, USA, 2000. ISBN 0201310082.

[8] S. Liang. Java Native Interface: Programmer’s Guide and Reference.Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA,1999. ISBN 0201325772.

[9] T. Lindholm and F. Yellin. Java Virtual Machine Specification.Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA,1999. ISBN 0201432943.

[10] S. Pachev. Understanding MySQL Internals. O’Reilly Media, Inc.,2007. ISBN 0596009577.

[11] M. Widenius and D. Axmark. Mysql Reference Manual. O’Reilly &Associates, Inc., Sebastopol, CA, USA, 2002. ISBN 0596002653.

A. Tables, Figures, and Listings

List of Tables1 ISO/IEC 9075:2008 Substandards . . . . . . . . . 32 SQL/JRT Feature Sets . . . . . . . . . . . . . . . 33 Supported Character Sets . . . . . . . . . . . . . 4

4 Table mysql.proc . . . . . . . . . . . . . . . . . 65 The mysql.jclass Table . . . . . . . . . . . . . . . 86 The mysql.jmethod Table . . . . . . . . . . . . . 97 The mysql.jresource Table . . . . . . . . . . . . . 98 Changes to the mysql.proc table . . . . . . . . . . 99 Allowed data type translations . . . . . . . . . . . 15

List of Figures1 MySQL New Thread Prolog . . . . . . . . . . . . 52 Method Signature Parser . . . . . . . . . . . . . . 83 Java Routine Invocation . . . . . . . . . . . . . . 11

List of Listings1 A Basic MySQL Stored Routine . . . . . . . . . . 62 The sp head Class . . . . . . . . . . . . . . . . . 63 The MyJVM Class . . . . . . . . . . . . . . . . . 74 The MyClassLoader Class . . . . . . . . . . . . . 85 A Basic Java Stored Routine . . . . . . . . . . . . 96 The modified st sp chistics structure . . . . . . . 107 Class MyJThread . . . . . . . . . . . . . . . . . 108 Class JParam . . . . . . . . . . . . . . . . . . . . 109 CMakeList.txt . . . . . . . . . . . . . . . . . . . 1110 Bison Production Rule for Characteristics . . . . . 1211 Bison Production Rule for Routine Bodies . . . . 1212 Bison Production Rule for Function Returns . . . 1213 The st bytecode Structure . . . . . . . . . . . . . 1214 The db find jclass Function . . . . . . . . . . . . 1315 MyJThread Constructor . . . . . . . . . . . . . . 1316 MyClassLoader Callback . . . . . . . . . . . . . 1317 Invoking Java Routines (Part 1) . . . . . . . . . . 1418 Invoking Java Routines (Part 2) . . . . . . . . . . 15