FA_RPE_UDF2_0405EN_01

Embed Size (px)

Citation preview

  • 8/8/2019 FA_RPE_UDF2_0405EN_01

    1/8

    User Defined Functions in DB2by Rosmarie Peter, Trivadis AG

    User Defined Functions (UDFs) enable users to write their own functions which can be used in SQLstatements. This article describes how this option can be used for DB2 UDB for z/OS and OS/390.The difference between UDFs for DB2 mainframe installations and DB2 in Linux, Unix and theWindows environment is small and is mostly restricted to differences in the operating systems.Current practice is to develop UDFs and stored procedures for both environments. This article focuseson those issues where the external UDFs differ from the stored procedures (SFs).

    1 Introduction1.1 What are functions?Functions vastly enhance the power of SQL. They are invoked with SQL language elements, in otherwords, from within the SELECT clause, the FROM clause, or the WHERE clause, depending on thetype of function.

    Functions can be classified differently: Built-in functions: These are called built-in since they are incorporated in the supplied DB2

    code. Examples of built-in functions are MAX and SUBSTR. User Defined Functions (UDFs): These are functions written by the user which can be used in

    SQL statements. They are written by customers, or by IBM itself. Examples of UDFs suppliedby IBM are the MQ functions, or the functions included in extenders.

    Functions can also be classified in another way: The input for Column functions is a collection of column values; they return a single value.

    Examples are SUM, MIN, MAX. They are used in the SELECT clause. In the WHERE clause,these functions must be embedded in subselects. Users cannot write their own new columnfunctions. They can only be made available as sourced UDFs, typically for User DefinedTypes.

    Scalar functions have one or more input values as function parameters. The function returns ascalar value. SUBSTR is an example of this type of function. Scalar functions can be used inthe SELECT or WHERE clause wherever a single value is permitted. Most built-in functionssupplied with DB2 are scalar. But it is also possible for users to write their own scalarfunctions and to make them available as UDFs. Scalar UDFs can be generated in the SQLProcedure Language or in other programming languages. These are then called externalfunctions.

    Table functions have one or more input values as function parameters. The table functionreturns a table. It is used in the FROM clause. There are no built-in table functions. All tablefunctions are external functions

  • 8/8/2019 FA_RPE_UDF2_0405EN_01

    2/8

    Functions are part of the SQL standard. But the DBMSs differ considerably with regard to built-infunctions. With User Defined Functions, the degree of portability can be significantly improved.

    1.2 User Defined FunctionsUser Defined Functions (UDFs) enable users to write their own functions, which can be used in SQL,DDL or DML statements. Three types of UDF exist:

    Sourced UDFs are based on an existing function. The base function can either be a built-infunction or another UDF.

    Scalar functions can be written in a higher programming language or in SQL ProcedureLanguage.

    Table functions must be written in a higher programming language.Scalar Column Table

    Built-in Function yes yes -UDF SQL yes - -

    External yes - yesSourced yes yes -

    Other UDF features:

    All UDFs are entered in the DB2 catalog. This is done using a CREATE FUNCTION statement. The name of the UDF consists of a schema name and the name of the function. Relational and other data, such as IMS or flat files, can be read within UDFs. UDFs can be

    altered to a certain extent. UDFs can be nested by up to 16 layers. The call hierarchy can contain both UDFs and stored

    procedures. UDFs can be invoked from triggers. UDFs always run under WLM control. UDFs offer the option of function overload. Several functions can be defined with the same

    name, only differing from each other in their parameters. The number of parameters can differ,as well as their data types. When the function is invoked, the DBMS will be able to select andexecute the correct UDF.

    1.3 Why UDFs?The functional scope of SQL can be considerably enhanced with UDFs. Built-in functions are veryuseful, however, they may not always cover all requirements. Reasons for using UDFs:

    Special transformations, for example, converting the account number from an internal to anexternal format.

    Simple calculations, for example, company-specific calculation of years of service. Option of standardization. Built-in functions for User Defined Types by means of sourced functions. Migration from other DBMSs: The different DBMSs differ considerably in the scope and

    specification of their supplied functions. Many functions used frequently in other DBMSs havedifferent names, parameters, or they do not exist in DB2 at all.

    Complex SQL logic can be embedded in UDFs. This enables users to write simpler SQLstatements.2 Sourced UDFsSourced UDFs are based on existing functions. They are absolutely necessary if User Defined Typesare being used. The built-in functions cannot simply be applied to User Defined Types. If they arerequired, then UDFs -based on the desired built-in functions- must be generated, as indicated in theexample below.

  • 8/8/2019 FA_RPE_UDF2_0405EN_01

    3/8

    CREATE DISTINCT TYPE KM

    AS INTEGER

    WITH COMPARISONS;

    CREATE FUNCTION KM_MAX(KM)

    RETURNS(KM)

    SOURCE SYSIBM.MAX(INTEGER);

    3 Generating external UDFsThe following are the steps involved in creating UDFs:

    CREATE FUNCTION: Introduces a UDF to DB2. CREATE FUNCTION needs to be used once.The function can be created after CREATE FUNCTION. If the definitions need to be changedlater, this can be done using ALTER.

    Writing the program. This can be done in one of the following languages: C, C++, COBOL,PL/I, Java. Naturally, the particular features of the individual languages have to be taken intoaccount for UDFs, too.

    The programs must then be converted like other programs. A package has to be created fromthe DBMS.

    The procedure for creating the UDFs is very similar to the stored procedures. Some particular featuresof functions, which will be discussed further on, do have to be taken into account:

    DETERMINISTIC option Linkage convention SCRATCHPAD option FINAL CALL option Program logic for scalar functions Program logic for table functions

    3.1 DETERMINISTIC optionThis option is one of many that must be defined for CREATE FUNCTION. If a function is defined asDETERMINISTIC, it means that it always returns the same result for the same input. SUBSTR is anexample of this type of function. RAND, on the other hand, is not deterministic, because RAND willreturn a different result every time it is invoked. The default is NOT DETERMINISTIC. Since thedescription is far from spectacular, it would be all too easy to dispense with this option. This can beharmful, as the following example illustrates:

  • 8/8/2019 FA_RPE_UDF2_0405EN_01

    4/8

    The UDF in this exampledoes nothing more thanconvert a number froman internal to an external

    format. Several minutes ofprocessing time for a simplestatement are unacceptable.The reason for the processing time was obvious from the EXPLAIN result: A tablespace scan wasexecuted for the SELECT statement in the FROM clause bracket section. The result was thenmaterialized and read with tablespace scan.A look at the SQL Reference Manual gave us our explanation.

    Readers should take a close look at this short text. NOT DETERMINISTIC means worse access path unexpected results.

    In the example above, the definition of the UDF was changed to DETERMINISTIC. The result wasdelivered a split second later, since the existing index was used for the SELECT in the bracket section.

    3.2 Linkage for UDFsThe linkage convention defines how the UDFcommunicates externally. The structurecorresponds mostly to the structure of storedprocedures that have been defined withPARAMETER STYLE DB2SQL and DBINFO.

  • 8/8/2019 FA_RPE_UDF2_0405EN_01

    5/8

    There are two additional fields: The scratchpad: It is possible to define an

    area where information can be passed onfrom one command to the next.

    The CALL type: is used for programcontrol

    As is the case for stored procedures, SQLSTATEscan be defined with UDFs. What is to inserted inthe SQLSTATE field is visible to the invokingelement in the SQLCA, together with the textfrom diagnostic data.

    3.3 SCRATCHPAD optionA scratchpad is a certain memory space supplied by DB2 for passing information from one call of theUDF to the next. CREATE FUNCTION defines whether a scratchpad will be created for a UDF andwhat size it should be.DB2 provides one scratchpad per

    SQL statement occurrence within the SQL statement parallel task

    We will use an example to show what that means. Lets take a look at the following statement:

    SELECT MYUDF(C1,1), MYUDF(C2,1)

    FROM TABA;

    The optimizer decided to execute this statement in three parallel tasks. This means that DB2 makes 6scratchpads available!The scratchpad is initialized by DB2 to X00. The programmers themselves are responsible for

    complying with the maximum length. Initializing scratchpads istime-consuming, so if scratchpads have to be initialized for singletonSELECTs, users will immediately notice longer response times.

    3.4 FINAL CALL optionDB2 can use FINAL CALL for the UDF to request one special callfor initializing tasks and an extra call for finalizing tasks. Theseinitialization and termination calls are invoked per

    SQL statement occurrence within the SQL statement parallel task

    FINAL CALL must be specified if special resources have to beallocated for the UDF. These must then be explicitly shared in thefinal call.

    3.5 Program logic for scalar functionsThe program control of UDFs is decided by CALL TYPE, ascan be seen from the following pseudo code:

    If FINAL CALL is specified, then CALL TYPE is setby DB2

  • 8/8/2019 FA_RPE_UDF2_0405EN_01

    6/8

    > -1 first call> 0 normal call> +1 final call> 255 final call, if the calling application

    terminates the unit of work without FINAL CALL, the CALL_TYPE for scalar

    UDFs is irrelevant Error messages:

    > UDF_SQLSTATE> UDF_DIAG_MSG

    Scalar functions are used in the SELECT or WHERE clause and can thus vastly extend the functionalscope of SQL. They may be invoked several times per SQL statement. For this reason, performanceaspects must be given due consideration when writing thefunction.

    3.6 Program logic for table functionsA table UDF returns a table. It can thus be used as analternative to stored procedures with result sets. TABLE UDFs are

    used in FROM clauses:

    SELECT COUNT(*)FROM TABLE(TEST.TABUDF(1,2)) AS A;

    For DB2, table UDFs take on a type of cross form: DB2 sends the UDF program a message to open the cursor. In the subsequent calls, the UDF program receives the command to execute a FETCH and to

    return a result row. If no more rows are available, the program sets STLSTATE 02000. At the end, DB2 requests the UDF program to execute CLOSE for the cursor.

    The requested program logic is again controlled by CALLTYPE.

    Call types without FINAL CALL> -1 open call> 0 fetch call> +1 close call

    Call types with FINAL CALL> -2 first call> -1 open call> 0 fetch call> +1 close call> 2 final call> 255 final call, if UOR is terminated by the

  • 8/8/2019 FA_RPE_UDF2_0405EN_01

    7/8

    calling UOR. Error messages:

    > UDF_SQLSTATE> UDF_DIAG_MSG

    Table UDFs are invoked in the FROM clause. They can also be used as part of join operations.Depending on the access path selected by the optimizer, table functions can also become innertables. To give the optimizer the option of selecting an operative access path, the CARDINALITYparameter should be specified for CREATE FUNCTION. This allows to specify the expected number ofresult rows.

    4 Design Considerations4.1 What is feasible and what isn't?The DB2 manual has lots of information on UDFs. We recommend taking all of it into account, eventhose subjects which seem to require little attention. The most important things to consider are:

    SQLSTATE is a CHAR(5) field requiring a valid value at each function end.

    Modifying SQL statements are only possible in scalar UDFs if invoked by an UPDATE orINSERT statement. In the UDF programs, the cursors must all be closed explicitly, otherwise SQL will return a

    negative code. There is no implicit CLOSE. If the UDF is defined as NOT DETERMINISTIC, the following restrictions apply:

    > The UDF cannot be used in CASE expressions.> It cannot be used in the ORDER BY clause.> It should not be used in WHERE clause predicates as the results would be unexpected.

    DISALLOW PARALLEL should be specified in the following cases:> If the UDF is NOT DETERMINISTIC> If a scratchpad is used> If FINAL CALL is specified> If MODIFIES SQL DATA is specified for scalar UDFs> If EXTERNAL ACTION is specified> If a table UDF is being used.

    4.2 UDF EfficiencyOne of the most important differences between built-in functions and external UDFs is the fact that allUDFs are FENCEd. This protects DB2 against errors in the application code. UDFs therefore do notrun in the DB2 address space, but under the control of the language environment in a WLM addressspace. Built-in functions, on the other hand, are a component of the DB2 code and run in the DB2address space. Nonetheless, the developers themselves can contribute to the efficiency of the UDFs:

    The number of input parameters should be kept as low as possible since each input parameterincreases the overhead.

    UDF code should be re-entrant so that the STAY RESIDENT YES option can be defined. This isespecially important if the same UDF is invoked several times in a single SQL statement. STAYRESIDENT YES has the following impact:> Once the load module is active, it remains in the memory> This single copy can then be used across several UDF calls.

    When using UDFs, the access path should always be checked using EXPLAIN, as UDFs canchange the access path.

  • 8/8/2019 FA_RPE_UDF2_0405EN_01

    8/8

    5 SummaryUDFs extend the functional scope of SQL considerably. They can embed application code to make itavailable to SQL. Writing UDFs is not difficult, however, use of UDFs should be well planned inorder to avoid unpleasant surprises.