TechJournal - NYOUG · Tanel Poder DEV 2 Oracle APEX API Primer Josh Millinger Niantic Systems DBA 2-2 ... Advanced SQL Performance Tuning Dean Richards Confio Software DEV 4 Tips

In This Issue – Presentation Papers from the December 2008, and March, June, and September 2009 General Meetings How Innovations in Storage Change Your Oracle Playing Field, by Ari Kaplan ADF On-Ramp: What You Need to Know to Use the ADF Fusion Technology Stack, by Peter Koletzke Tuning the Oracle Grid, by Rich Niemiec www.nyoug.org 212.978.8890

TechJournalNew York Oracle Users Group

Fourth Quarter 2009

25th Anniversary/NYC Metro Area Meeting

Tuesday, December 8, 2009 The New Yorker Hotel

481 Eighth Ave. (at 34th Street)

Sponsored by: Oracle Corporation, Sun Microsystems, Confio

Software, Corporate Technologies, Fadel Partners, Oracle GoldenGate, IBM Systems, Texas Memory

Systems, Rolta TUSC, and XDuce/LearnDBA

Free for Paid 2009 Members Don’t Miss It!

Usually, it’s harder to pinpoint. Amazing what you can accomplish once you have the information you need.When the source of a database-driven application slowdown isn’t immediately obvious, try a tool that can get you up to speed. One that pinpoints database bottlenecks and calculates application wait time at each step. Confio lets you unravel slowdowns at the database level with no installed agents. And solving problems where they exist costs a tenth of working around it by adding new server CPU’s. Now that’s a vision that can take you places.

A smarter solution makes everyone look brilliant.

Sometimes the problem is obvious.

Download our FREE whitepaper by visiting www.oraclewhitepapers.com/listc/confioDownload your FREE trial of Confio Ignite™ at www.confio.com/obvious

www.nyoug.org 212.978.8890 3

NYOUG Officers / Chairpersons ELECTED OFFICERS - 2009 President Michael Olin [email protected] Vice President Mike La Magna [email protected] Executive Director Caryl Lee Fisher [email protected] Treasurer Robert Edwards [email protected] Secretary Thomas Petite [email protected] CHAIRPERSONS Chairperson / WebMaster Thomas Petite [email protected] Chairperson / Technical Journal Editor Melanie Caffrey [email protected] Chairperson / Member Services Robert Edwards [email protected] Chairperson / Speaker Coordinator Caryl Lee Fisher [email protected] Co-Chairpersons / Vendor Relations Sean Hull Irina Cotler [email protected] Chairperson / DBA SIG Simay Alpoge [email protected] Chairperson / Data Warehousing SIG Vikas Sawhney [email protected]

Chairperson / Web SIG Coleman Leviter [email protected] Chairperson / Long Island SIG Simay Alpoge [email protected] Director / Strategic Planning Carl Esposito [email protected] CHAIRPERSON / VENUE COORDINATOR Michael Medved [email protected] EDITORS – TECH JOURNAL Associate Editor Jonathan F. Miller [email protected] Contributing Editor Arup Nanda - DBA Corner Contributing Editor Jeff Bernknopf - Developers Corner ORACLE LIAISON Kim Marie Mancusi [email protected] PRESIDENTS EMERITUS OF NYOUG Founder / President Emeritus Moshe Tamir President Emeritus Tony Ziemba Chairman / President Emeritus Carl Esposito [email protected] President Emeritus Dr. Paul Dorsey

www.nyoug.org 212.978.8890 4

Table of Contents Message from the President’s Desk...................................................................................................................... 12 The New York Oracle Users Group (NYOUG) Celebrates 25 Years of Serving the Greater NYC Area Oracle User Community................................................................................................................................................... 14 Upgrading to 11g – Best Practices........................................................................................................................ 17 ADF On-Ramp: What You Need to Know to Use the ADF Fusion Technology Stack....................................... 30 How Innovations in Storage Change Your Oracle Playing Field ......................................................................... 49 How Long is Long Enough? Using Statistics to Determine Optimum Field Length ........................................... 62 Migrating Database Character Sets to Unicode .................................................................................................... 69 Tuna Helper – A Proven Process for Tuning SQL ............................................................................................... 86 Get More for Less: Enhance Data Security and Cut Costs................................................................................... 98 Advanced Report Printing with Oracle APEX ................................................................................................... 113 Tuning the Oracle Grid ....................................................................................................................................... 121 Practical Data Masking: How to Address Development and QA Teams' Seven Most Common Data Masking-related Reactions and Concerns .......................................................................................................................... 133 Legal Notice Copyright© 2009 New York Oracle Users Group, Inc. unless otherwise indicated. All rights reserved. No part of this publication may be reprinted or reproduced without permission. The information is provided on an “as is” basis. The authors, contributors, editors, publishers, NYOUG, Oracle Corporation shall have neither the liability nor responsibility to any person or entity with respect to any loss or damages arising from information contained in this publication or from use of programs or program segments that are included. This magazine is not a publication of Oracle Corporation nor was it produced in conjunction with Oracle Corporation. New York Oracle Users Group, Inc. #0208 67 Wall Street, 22nd floor New York, NY 10005-3198 (212) 978-8890

www.nyoug.org 5 212.978.8890

NYOUG 25th Anniversary/NYC Metro Area Meeting - presented by NYOUG & Oracle Corporation Tuesday December 8, 2009 at the New Yorker Hotel – 481 Eighth Ave. (at 34th Street) in Manhattan

8:00-9:00 REGISTRATION AND BREAKFAST 9:00-9:30 Introduction and Welcome: Michael Olin – NYOUG President 9:30-10:15 KEYNOTE: Jeff Henley – Chairman, Oracle Corporation

“Oracle’s Business Transformation: The Next Phase” DBA

TRACK 1 DEVELOPER

TRACK DBA

TRACK 2 DEVELOPER/ BI/DW Track

LOCATION Crystal Ballroom Herald Square Gramercy Park Sutton Place 10:30 - 11:30 Session 1

DBA 1-1 Case Study: 11g Upgrade with RAT, SPM and Snapshot Standby Arup Nanda Starwood Hotels

DEV 1 Oracle Forms To Oracle APEX: A Migration Roadmap Daniel McGhan SkillBuilders

DBA 2-1 Oracle on SSD Technology for Performance Steve Fluge Bank of America

DEV/BI/DW 1 De-Mystifying OBIEE/Oracle BI Applications Shyam Varan Nath IBM/Oracle BIWA SIG

11:45-12:45 Session 2

DBA 1-2 0 Slides: Scripts and Tools to Make Your Life Easier Tanel Poder

DEV 2 Oracle APEX API Primer Josh Millinger Niantic Systems

DBA 2-2 Using Preferred Read Groups in Oracle ASM Michael Ault Texas Memory Systems

DEV/BI/DW 2 Fraud and Anomaly Detection Using Oracle Data Mining Charles Berger Oracle Corporation

12:45-2:00 LUNCH sponsored by Sun Microsystems – “Maximizing Oracle Performance with Sun Technologies” 2:00-3:00 Session 3

DBA 1-3 Oracle 11g Cache Features Put to the Test Dave Anderson SkillBuilders

DEV 3 Effective Utilization of the Database in Web Development Dr. Paul Dorsey Dulcian, Inc.

DBA 2-3 Custom Monitoring Your Database with PL/SQL Bill Schott DTE Energy

DEV/BI/DW 3 Benefits of Agile SOA Methodologies Jordan Braunstein Rolta TUSC

3:15-4:15 Session 4

DBA 1-4 Advanced SQL Performance Tuning Dean Richards Confio Software

DEV 4 Tips & Techniques Integrating Oracle XML DB Coleman Leviter Arrow Electronics

DBA 2-4 Managing Risk: Understanding the New Options in Data Protection Ulf Mattsson Protegrity

DEV/BI/DW 4 What is ITIL and Why Should You Care? Leslie Tierstein newScale, Inc.

4:15-5:00 VENDOR RAFFLES

Sponsored by Oracle Corporation, Sun Microsystems, Confio Software, Corporate Technologies, Fadel Partners, Oracle GoldenGate, IBM Systems, Texas Memory Systems, Rolta TUSC, XDuce/LearnDBA

www.nyoug.org 212.978.8890 6

New York Metro Area Oracle Users Group Day December 8, 2009

ABSTRACTS

KEYNOTE: “Oracle’s Business Transformation: The Next Phase” Oracle’s business transformation strategy has enabled the company to grow revenues and maintain strong operating margins during the downturn, while freeing up funds to expand Oracle’s investment in innovative new products and services. Oracle Chairman Jeff Henley will share how Oracle is preparing internally for the recovery, from integrating the SUN acquisition, to deploying new Oracle applications to help support the company’s ongoing transformation into an integrated technology solutions provider. Jeffrey O. Henley is Chairman of Oracle Corporation. He has held this position since January 2004.Mr. Henley was Oracle’s Chief Financial Officer and an Executive Vice President from March 1991 to July 2004, and he has been a member of Oracle’s Board of Directors since June 1995. He also serves on Oracle’s Executive Management Committee. Prior to joining Oracle in 1991, Mr. Henley served as Executive Vice President and Chief Financial Officer at Pacific Holding Company, a privately held company with diversified interests in manufacturing and real estate, and as Executive Vice President and Chief Financial Officer at Saga Corporation, a multibillion-dollar food service company. He also served as Director of Finance at Memorex Corporation in its large storage division, and as Controller of International Operations at Fairchild Camera and Instrument Corporation. Mr. Henley has a BA in economics from the University of California at Santa Barbara and an MBA in finance from UCLA. In 2004, he received the UCLA Anderson School’s Outstanding Alumnus award.

10:30-11:30 AM - Session 1 Presentations

DBA 1-1 10:30-11:30 Crystal Ballroom "Case Study-11g Upgrade with RAT, SPM & SnapshotStandby" This session describes a real life case study with detailed steps, commands and screenshots of our 11g upgrade experience from 10g. Attendees will learn how to use Database Replay and SQL Performance Analyzer to gauge the impact of the upgrade in advance, fine tune parameters by replaying the captured workload from production. Arup will explain the differences between SQL Plan Management baselines and SQL Profiles, show how they were used them to stabilize the plans, and how SQL Tuning Advisor was used to tune the regressed queries. Snapshot Standby database was used to further tune the production database without jeopardizing operations. Attendees will learn from our success and failures to better plan your own upgrade process. Arup Nanda has been an Oracle DBA for more than 16 years spanning every aspect of a DBA's work - from modeling to performance tuning. He is the Lead Database Architect at Starwood Hotels. DEV 1 10:30-11:30 Herald Square "Oracle Forms to Oracle APEX – A Migration Roadmap" Many organizations are seeking an HTML-based migration path for their Oracle Forms applications. The main platforms available, including JSP/JSF, .NET, and PHP, are object oriented and may require extensive time and training for those new to them. Another solution is Oracle Application Express, which can provide an HTML-based solution while leveraging existing SQL and PL/SQL knowledge. Attendees will learn about the main differences between the two platforms, the pros and cons of the alternatives (JSP/JSF, .NET, PHP), the general migration process and how to avoid some pitfalls along the way.

www.nyoug.org 212.978.8890 7

Dan McGhan is a senior APEX developer and instructor at SkillBuilders.com. He has been developing Apex applications since HTMLDB and is the designer and coder of the famous Brown University Energy Efficiency application (BEE). Dan can be reached at [email protected]. DBA 2-1 10:30-11:30 Gramercy Park "Oracle SSD Technology for Performance" Results of performance tests conducted with SAN storage, SAN SSD and FusionIO SSD technologies will be reviewed, with comparisons of each for TPC-C and Oracle disk I/O performance. A brief mention of the methods used to test including tools (Benchmark Factory and Orion) will also be included. Steve Fluge has 15 years of Oracle experience, as a DBA and Consultant. He has presented at Oracle OpenWorld on the topics of migration to Oracle RAC with OCFS, and 10g performance on Itanium. Steve has also created a demo database for Oracle Spatial on Itanium for an Intel keynote presentation. DEV/BI/DW 1 10:30-11:30 Sutton Place "De-Mystifying OBIEE/Oracle BI Applications" Attendees will learn how to meet their BI and Analytical Reporting needs using OBIEE and/or Oracle BI Applications. They will learn how to simplify the implementation cycles, deal with real-world challenges and learn strategies for overcoming them. A product demo will show the step-by-step installation and configuration processes. Pointers to vital resources using real world customer implementation experience of Oracle BI Apps 7.9.6 and higher versions will also be discussed. Shyam Nath is an OBIEE Architect at IBM. He has implemented several OBIEE and OBI Applications projects. He is a Certified DBA since 1998 and had worked on several Oracle Data Warehousing and BI projects including at Citigroup in NY. He is the founder and President of Oracle BIWA SIG (http://OracleBIWA.org) and a regular speaker at NYOUG, IOUG-Collaborate and Oracle OpenWorld. Shyam is also a regular blogger/micro-blogger on Oracle BI/DW topics. Currently (Aug '09), Shyam is the top expert in Oracle BI Apps Forum, helping several users to troubleshoot and plan their BI projects.

11:45 AM-12:45 PM - Session 2 Presentations

DBA 1-2 11:45-12:45 Crystal Ballroom “Zero Slides: The Scripts and Tools to Make Your Life Easier" As the title already says, this presentation has no slides at all! Instead Tanel will demonstrate some of the most useful tools he uses for accurate Oracle performance troubleshooting. The toolset ranges from plain SQL statements to UNIX and DTrace scripts and some custom-built GUI tools and of course a new version of Tanel's Oracle Session Snapper will be covered. During the session you will be taken through few troubleshooting case studies following a systematic troubleshooting methodology. For anyone interested in Oracle internals and performance tuning, this session should be a fun learning exercise! Tanel Poder is an experienced consultant with deep expertise in Oracle database internals, advanced performance tuning and end to end troubleshooting. He is one of the first Oracle Certified Masters in the world, passing the OCM DBA exam in 2002; an Oracle ACE director and also a proud member of the professional OakTable Network. In addition to teaching his seminars and speaking at major Oracle conferences worldwide, he publishes his work at his Oracle performance tuning blog http://blog.tanelpoder.com .

www.nyoug.org 212.978.8890 8

DEV 2 11:45-12:45 Herald Square "Oracle Application Express API Primer” Oracle Application Express comes with a series of provided API's such as APEX_UTIL,MAIL,ITEM, APPLICATION, CUSTOM_AUTH, etc. This presentation will go through the ones that we have found to be the most valuable and describe how and where to use them in your applications. Why build something when Oracle has already done it for you?! This presentation will walk through examples of using the provided Apex API's. We demonstrate how to build a manual tabular form using the APEX_ITEM API, how to reference arrays and debug using the APEX_APPLICATION API, in addition to a host of others. Josh Millinger is the Founder of Niantic Systems, LLC, a Princeton, NJ-based consulting/development firm. He worked at Oracle for 11 years serving the partner community as their technical liaison. He currently focuses on Application Express development and related technologies. DBA 2-2 11:45-12:45 Gramercy Park "Using Preferred Read Groups in Oracle ASM with Data Masking" Oracle11gR2 ASM provides the capability to use a preferred read mirror to allow geo-mirroring and use of high and low speed assets in the same ASM disk group. This presentation shows how to utilize the preferred read mirror concept to optimize your performance and reliability. Attendees will see by example how to setup and utilize the new preferred read mirror in Oracle ASM. Examples form actual installations will be used as well as performance comparisons for when high and low speed disk groups are used with and without preferred read. Mike Ault has worked with Oracle since 1990 and computers since 1980. Mike has written over 2 dozen books about Oracle and Oracle related technologies. Mike is a frequent presenter at international, national and regional user groups. Mike has also written numerous articles for various Oracle related publications. Mike works for Texas memory Systems as their Oracle Guru. DEV/BI/DW 2 11:45-12:45 Sutton Place "Fraud and Anomaly Detection Using Oracle Data Mining" How do you find needles in haystacks? How can you detect rare events, fraud, anomalies, and network intrusions in your corporate databases and operations? Oracle Data Mining’s (ODM) Anomaly Detection algorithm trains on what is considered “normal” and then flags any record(s) that, on a multi-dimensional basis, appear to not fit in. Find suspicious expense report submissions, find non-compliant tax submissions, combat fraud in healthcare claims and save huge amounts of money in fraud and abuse. This presentation and demo will show several use cases of ODM’s anomaly detection. Charlie Berger is the Senior Director of Product Management, Data Mining Technologies at Oracle Corporation and has been with Oracle for ten years. Previously, he was the VP of Marketing at Thinking Machines prior to its acquisition by Oracle in 1999. He holds a Master of Science in Engineering and a Master of Business Administration from Boston Univerity as well as a Bachelor of Sciences in Industrial Engineering/Operations Research from the University of Massachusetts at Amherst.

Lunch Presentation

1:25-1:55 Grand Ballroom - Maximizing Oracle Performance with Sun Technologies For over 25 years, Sun and Oracle have collaborated to build highly available solutions with massive scalability, unmatched performance and savings to meet the needs of growing enterprises. This presentation examines the latest

www.nyoug.org 212.978.8890 9

intelligent optimizations with the Oracle Exadata V2 Database machine, as well as innovation around key Oracle solutions such as Data Warehousing, Oracle 11g, Siebel CRM, and Oracle Business Intelligence Enterprise Edition. Mike Seto is Sun's Global Partner Executive for Oracle. The Sun/Oracle global alliance is a multi-billion dollar relationship dedicated to providing our joint customers with secure, reliable, and scalable enterprise-class solutions. Mr. Seto leads a global team focused on developing deeper and broader relationships between Oracle and Sun executives, sales and partners to deliver the best IT platforms available to address the business challenges of today and tomorrow. Mr. Seto has over eighteen years of sales and sales management experience. He joined Sun ten years ago as an enterprise account manager for iPlanet. In 2004, Mr. Seto began managing the BEA alliance and transitioned into the leadership role for global Oracle team in 2006.

2:00-3:00 PM - Session 3 Presentations

DBA 1-3 2:00-3:00 Crystal Ballroom "Oracle 11g Cache Features Put to the Test” Dave follows up his sought-after Oracle 10g New Features presentation (http://www.nyoug.org/Presentations/2004/10ginfo.pdf) with a look at the SQL and PL/SQL caching features found in 11g. Learn how to implement caching and see Dave's benchmarks. This will be a technical and practical presentation with working examples presented where possible. Dave Anderson founded SkillBuilders.com in 1994. He has worked in the IT field since March 1980 and has 25 years of hands-on relational database experience. He currently makes Oracle databases work well and fast for large banks, insurance companies and small companies like SkillBuilders.com. DEV 3 2:00-3:00 Sutton Place "Effective Utilization of the Database in Web Development" With the ever changing web application development environment, the benefits of placing more information in the database are becoming increasingly evident. This presentation (targeted at experienced PL/SQL developers) will discuss how to implement this approach in real world projects with a brief overview of the advantages of this approach along with specific details about how to implement the approach in SQL and PL/SQL Emphasis will be placed on using collections, dynamic SQL/PL/SQL and server-side support for stateless programming. Dr. Paul Dorsey is the founder and president of Dulcian, Inc. an Oracle consulting firm specializing in business rules and web based application development. He is the chief architect of Dulcian's Business Rules Information Manager (BRIM®) tool. Paul is the co-author of seven Oracle Press books on Designer, Database Design, Developer, and JDeveloper, which have been translated into nine languages as well as the Wiley Press book PL/SQL for Dummies. Paul is an Oracle ACE Director. He is President Emeritus of NYOUG and the Associate Editor of the International Oracle User Group’s SELECT Journal. In 2003, Dr. Dorsey was honored by ODTUG as volunteer of the year, in 2001 by IOUG as volunteer of the year and by Oracle as one of the six initial honorary Oracle 9i Certified Masters. Paul is also the founder and Chairperson of the ODTUG Symposium, currently in its tenth year. Dr. Dorsey's submission of a Survey Generator built to collect data for The Preeclampsia Foundation was the winner of the 2007 Oracle Fusion Middleware Developer Challenge and Oracle selected him as the 2007 PL/SQL Developer of the Year. [email protected] . DBA 2-3 2:00-3:00 Gramercy Park "Custom Monitoring Your Database with PL/SQL" Want to know when a particular table is locked and who is locking it? Are current archive logs being shipped to that standby database? You can automate monitoring for these and a whole lot more conditions by using PL/SQL packages and dbms_ jobs to customize the monitoring - and alerting - of an Oracle Database. This presentation describes the

www.nyoug.org 212.978.8890 10

fundamental concept of using Jobs to periodically call a PL/SQL procedure to watch for specific conditions, and then to take action. Bill Schott has been a DBA at DTE Energy for 14 years, with experience using Oracle 7.3 thru 10.2. He is a past presenter at ECO, MOUG, IOUG-A and other conferences and he is passionate about sharing knowledge, tips and techniques across the Oracle DBA community. DEV/DW/BI 3 2:00-3:00 Sutton Place "Benefits of Agile SOA Methodologies" Service Oriented Architecture (SOA) is complex, lengthy, and riddled with challenges. Is SOA dead as claimed earlier this year? Many organizations have had frustrating experiences in the path they chose and some have even given up on their SOA initiatives and struggling how to show immediate results. This session will present a refreshing and effective approach to achieving rapid and incremental results in SOA, all the while educating participants on how to keep their projects aligned with the “big-picture” SOA Roadmap. Approaches include how to leverage an Agile methodology with SOA, when to use top-down vs. bottom-up vs. middle-out, and how to ensure business success through the use of SOA pilots. The session will also cover fundamental principles of Service Oriented Architecture by explaining definitions, concepts, and practices. This will help participants gain a common understanding of what SOA is, what constitutes SOA, and why it is so important to businesses. Jordan Braunstein serves as the Business Integration & Architecture Partner at Rolta TUSC and is responsible for managing the TUSC SOA Center of Excellence. This includes delivering tangible results and support to customers through consulting, education, and overall partnership support, to ensure our clients achieve positive returns throughout their SOA Journey. Prior to joining, Braunstein started and Managed BearingPoint’s Public Services SOA practice, focusing on deriving value and hard-line results for defense and civilian government agencies, state and local governments, and higher education institutions.

3:15-4:15 PM - Session 4 Presentations

DBA 1-4 3:15-4:15 Crystal Ballroom "Advanced Oracle SQL Performance Tuning" Many DBAs and Developers are faced with tuning poorly performing SQL statements. However, many tuning projects fail because the process being used is inefficient. This presentation will walk through a process Confio Software uses with great success and will include topics such as: indexing strategies, use of histograms, SQL wait event data, column selectivity and several more that will help the you succeed on future tuning projects. Dean Richards has over 20 years of Oracle and SQL Server project development, implementation and strategic database architecting experience. Before coming to Confio Software, Dean held engineering positions at McDonnell Douglas and Daugherty Systems and was also a technical director for Oracle Corporation managing all aspects of key accounts including short and long-term technical planning and strategic alliances. DEV 4 3:15-4:15 Herald Square "Tips & Techniques: Integrating Oracle XML DB" Many times during a project life cycle, new technology is introduced that presents first time challenges. The author will present a project using Oracle XML DB. The presentation will cover the project, why XML DB was chosen, how XML DB was used and technical issues encountered. This presentation describes several examples using XMLTYPE, CLOBs, XML DB methods, XMLAGG, XMLELEMENT and XMLFOREST. Additionally, namespace examples and an introduction to XML Schema Definition (XSD) will be presented.

www.nyoug.org 212.978.8890 11

Coleman Leviter is employed as an IT Software Systems Engineer at Arrow Electronics. He has presented at IOUG's Collaborate 07, 08 and 09. He is the WEB SIG chair and sits on the steering committee at the NY Oracle Users' Group (www.nyoug.org). He has worked in the financial services industry and the aerospace industry where he developed Navigation, Flight Control and Reconnaissance software for the F-14D Tomcat at Grumman Aerospace. Coleman has a BSEE from Rochester Institute of Technology, an MBA from C.W. Post and an MSCS from New York Institute of Technology. Coleman recently completed Oracle’s Certified Professional Exam (OCP). He may be contacted at [email protected] . DBA 2-4 3:15-4:15 Gramercy Park "Managing Risk: Understanding the New Options in Data Protection" Sometimes data security and business processes do not play well with each other. The situation becomes even uglier when regulatory compliance is added into the mix. Too often, enterprises feel that they have to choose between data security, compliance and business needs. This session will detail the latest methodologies and technologies such as Type Preserving Encryption, Data Masking, Tokenization and Database Activity Monitoring to protect data in an Oracle environment with a focus on solutions designed specifically to support critical business processes. Attendees will also learn how to conduct a risk-based analysis to determine the scenarios where these new technologies are best suited in their environments. We’ll also explore new ways to measure and manage risk and compliance. This presentation also includes anonymous case studies that detail risk management security plans in an Oracle environment. Ulf Mattsson created the initial architecture of Protegrity's database security technology, working closely with Oracle R&D, creating several key patents in the area of database security. His extensive IT and security industry experience includes 20 years with IBM as a manager of software development, and a consulting resource to IBM's Research and Development organization in the areas of IT Architecture and IT Security. Ulf holds a degree in electrical engineering from Polhem University, a degree in Finance from University of Stockholm and a master's degree in physics from Chalmers University of Technology. DEV/BI/DW 4 3:15-4:15 Sutton Place "What is ITIL and Why Should You Care?" ITIL stands for "IT Infrastructure Library"; it consists of a set of practices and guidelines for standardizing IT practices. This presentation will start with an overview of some of those practices and discuss how they can affect your daily life as an database developer, DBA, server administrator or practitioner of another IT discipline. It will also include a discussion of commercial software products available for ITIL implementation and a demonstration of one such product (but without marketing hype). Leslie Tierstein is Manager of Knowledge Services for newScale, Inc, a provider of Catalog and Portfolio Management software located in San Mateo California. Before joining newScale she accrued many years of experience as an Oracle developer and technical project manager. She is an experienced presenter, having presented at conferences such as NYOUG, ODTUG, IOUG, and RMOUG, and the technical editor of several Oracle Press books, including Oracle JDeveloper 10g Handbook and Oracle Designer Handbook.

www.nyoug.org 212.978.8890 12

Message from the President’s Desk Michael Olin

Winter, 2009 NYOUG’s 25th Anniversary year draws to a close and preparations are being finalized for our final event of the year, I have decided, after much serious contemplation, to take the easy way out on my final column of the year. At the start of the year, I reviewed my personal archive of NYOUG publications. I have a bookshelf filled with newsletters, conference proceedings and even the three volumes of the “International Oracle User’s Journal,” a perfect-bound paperback conceived and edited by former NYOUG leader Tony Ziemba and published by the IOUG in the early 1990s. Reviewing this treasure trove of Oracle User Group history, I documented NYOUG’s story, our story. I had high hopes for this “official history” of NYOUG. After all, our user group has really been around since the beginning. I wanted to get our story published in Oracle Magazine and in the IOUG’s Select Journal. This was a great public relations story for both Oracle Corporation and the IOUG. NYOUG has been at the forefront of the local Oracle user community for twenty five years and was still going strong. It turns out that not everyone thinks that we’re the center of the Oracle universe. Select published a greatly abridged version of our history. Oracle Magazine highlighted our upcoming Metro Day event and included a bit of our story as part of an interview with NYOUG President Emeritus Paul Dorsey. The full history, however, has just been bouncing back and forth between my hard drive and that of our Executive Director. Since I can commandeer as much space as I need in the NYOUG Technical Journal, we’ll just have to publish it ourselves. I do want to take this opportunity to thank all of the people who contribute to making NYOUG such a valuable resource for the Oracle user community. Thank you to our officers and the members of our Steering Committee: Mike LaMagna, Robert Edwards, Thomas Petite, Simay Alpoge, Melanie Caffrey, Irina Cotler, Dr. Paul Dorsey, Carl Esposito, Coleman Leviter, Michael Medved and Vikas Sawhney. Thanks are also due to all of the Oracle professionals who volunteer to speak at our General and SIG meetings. I also appreciate the efforts of the experts who make it possible to have so many successful Training Day events, providing their expertise to our membership (at a cost that makes it an exceptional value their managers would never find elsewhere). Finally, I extend my heartfelt gratitude and appreciation to two people who really make it all possible. Our Executive Director, Caryl Lee Fisher, is really the heart and soul of NYOUG. She’s never brought up (or crashed) an Oracle database, hasn’t ever written a single SQL query or PL/SQL procedure, yet nothing that NYOUG does would be possible without her. If Paul Dorsey had not delegated the administration of NYOUG to Caryl Lee (and NYOUG had not brought her on as Executive Director when Dr. Dorsey stepped down), I doubt there would have been a 20th anniversary to celebrate. Thank you Caryl Lee for all you do (especially the stuff that I never find out about) for NYOUG. Our liaison with Oracle Corporation, Kim Marie Mancusi regularly goes “above and beyond” for NYOUG. Kim Marie hosts our SIG meetings and arranges for speakers, especially the keynotes at our Metro Day meetings. She helps with our meeting logistics, makes Oracle’s partners aware of NYOUG (and sponsorship opportunities) and of course, provides a bit of financial support for our activities. Most importantly, Kim Marie is a valuable sounding board. Her advice has been crucial in helping to market and grow the NYOUG. We have had many Oracle liaisons over the past 25 years. Somehow, the only ones who stick around are our fellow New Yorkers (we don’t seem to make them cry very often). Thank you Kim Marie for your steadfast support of NYOUG and for allowing us to be ourselves.

Conference Schedule – 2009 Be sure to participate in the following Oracle events and other events of interest in 2009:

RMOUG Training Days – Denver, CO February 16-18 IOUG – OAUG – QUEST – Collaborate 2010 – Las Vegas, Nevada April 18-22 ODTUG Kaleidoscope 2010 – Washington, DC June 27 – July 1 NYOUG Spring General Meeting – New York, NY March 9

www.nyoug.org 212.978.8890 14

The New York Oracle Users Group (NYOUG) Celebrates 25 Years of Serving the Greater

NYC Area Oracle User Community 2009 marks the 25th anniversary of the founding of the New York Oracle Users Group. It was formed in 1984 with the goal of exchanging ideas, providing assistance and support to users of Oracle software products. The organization consists of volunteer users, consultants, and vendors of Oracle-related products and services. NYOUG is one of the oldest and largest Oracle user groups in North America. It all started simply enough. In 1983, Moshe Tamir read about this new “Oracle database product”, a commercial implementation of E.F. Codd’s relational database using the SQL language described in IBM’s research papers. Mr. Tamir, realizing that he had just glimpsed the future of databases, quit his job writing microcode for telephone switches and searched for a job where he could work with this amazing new software product. He landed a position as both DBA and developer at Hebrew National Kosher Foods, one of the first Oracle customers in the New York City area. At that time, Hebrew National was running Oracle Version 3 on Data General computers. Looking to share his experiences with Oracle and to learn from others who were using the software, Tamir asked Ken Jacobs (almost two decades before he earned the moniker “Dr. DBA”), who was the Oracle employee responsible for sales on the entire East coast, if he could provide contact information for others who were using Oracle on Data General hardware. That contact information became the mailing list for the first Oracle Users’ Newsletter, and the stage was set for the founding of NYOUG. In 1984, Tamir began working at the New York Blood Center. Along with IT manager Ken Brown, they organized the first official meeting of NYOUG, which was held in a conference room at the United Nations and attracted 10 attendees. The group set up its initial steering committee, adding Richard Drechsler, the director of the computing center at the City University of New York’s (CUNY) graduate center, Carol Ann Greff, an engineer with Allied Bendix Aerospace, William Kinahan from Merck, and Martin Rosman from Ebasco Services. This committee organized meetings that featured many speakers from “Belmont,” the location of Oracle’s headquarters at the time. These speakers included Oracle’s top technical staff among them, the first manager of Oracle’s centralized Support Center, Mary Winslow, who told the group about the new 800 number they could use to reach her staff of 30 support technicians. NYOUG has been in the forefront of the Oracle user community ever since. NYOUG meetings have been held at venues throughout New York City. Columbia University, Baruch College, the CUNY Graduate Center and St. John’s University have all hosted NYOUG general meetings. NYOUG has met in Merrill Lynch’s offices in the World Financial Center, New York Life’s headquarters, and in the auditorium of the Swiss Bank Corporation’s conference center (which was used as the set for Gordon Gecko’s office in the movie Wall Street). The keynote speaker at an NYOUG meeting at the Fashion Institute of Technology was Oracle’s president, Ray Lane. Mr. Lane and his walkie-talkie carrying entourage arrived several hours late and tried to cut his appearance short so that he could get to the meetings he had scheduled for later in the day. The membership of NYOUG was anything but cooperative, asking question after question of Oracle’s second in command until all hope of making those afternoon meetings was lost. NYOUG also met at the midtown offices of Morgan Stanley, hosted by one of the top research analysts following the software industry, Charles E. Phillips, Jr. Long before he became one of Oracle’s presidents, Mr. Phillips (or, as his toll free number and website referred to him - “Mr. Chuck”) would regularly host the NYOUG. He routinely polled the membership after returning from meeting with Oracle’s executives on the West coast. “This is what they told me in California,” he would say, “now what is the truth?” In the late 1980’s, some new faces appeared on the NYOUG Steering Committee. The NYOUG Newsletter, which disappeared after 1987, was revived in 1989, with Carmine Tedesco as editor and Anthony Ziemba as publisher. At the end of 1989, Michael LaMagna joined the steering committee as treasurer. After the first election of officers at the end of 1989, Tedesco and LaMagna remained in their positions. Tony Ziemba was elected chairman, and Carol Ann Greff, a founding member of the group became the first representative to the fledgling IOUG. While no president was elected, Meir Feig was appointed to serve in that role. The early 90’s saw huge growth in user group activity on the East coast. Ziemba, along with Mike Corey, the newly elected chairman of the Boston-based Northeast Oracle Users Group and Dale Lowery, chairman of the Mid-Atlantic

www.nyoug.org 212.978.8890 15

Oracle Users Group formed a joint publishing venture to create and distribute a combined “super” newsletter. The three also began discussions regarding a regional users’ conference. These talks led to the creation of the Oracle User Resource, Inc. OUR added the Southeast (SEOUG) and Eastern Canada (ORA*GEC) to the consortium and the first East Coast Oracle conference, ECO ’91 was held in Washington D.C. in March of that year. ECO thrived as a smaller, less expensive alternative to what had become Oracle’s premier conference, “International Oracle Users Week”. For a decade, ECO consistently drew 500 attendees to what many felt was the best technical conference the Oracle world had ever seen. At the end of 1991, two new officers joined the NYOUG Steering Committee - Carl Esposito from Merrill Lynch as Vice President and Michael Olin an independent consultant and Tamir protégé as Membership Coordinator. Moshe Tamir was once again given a title, this time as Vendor Policy Chair. The following year, Ziemba stepped down as chairman but remained in charge of communications. Feig left the Steering Committee and Esposito replaced him as president. The group went back to the New York Blood Center and recruited Jonathan Intner to serve as VP. While Ziemba continued to publish the OUR newsletter, NYOUG restarted its own journal in 1993, with Tamir serving as editor. Dr. Paul Dorsey, a professor at Rider College joined the Steering Committee that year, as secretary. At the end of that year, Guy Yasika took over as editor and Dorsey became VP. There was only one NYOUG SIG at the time, the DBA SIG, chaired by Shaul Ganel. The SIG added a programming coordinator, Steven Zanone, towards the end of 1993. Carl Esposito was a man with a mission. At every opportunity he extended an invitation to Oracle CEO Larry Ellison to come and speak to what he (Esposito) described as the largest and most dynamic of all Oracle Users Groups. Offers were made to schedule an NYOUG meeting around Ellison’s travel schedule, so that he could meet with the group at his convenience, when he was already in New York. These invitations were regularly and routinely declined. When two speakers from Oracle simply didn’t show up at the NYOUG general meeting on June 29, 1994, the fireworks began. Esposito fired off an indignant letter to Ellison the next day, with copies to Oracle’s New York office. A reporter for Computerworld, the leading trade journal of the time, picked up on the story and ran a brief item about Oracle’s snub of NYOUG. This led to Charles Wang, the CEO of Computer Associates (the Long Island, NY based firm that had just purchased Ask/Ingres an Oracle competitor), to track down Esposito at work. Wang called Esposito directly and offered to come and speak at an NYOUG meeting on any number of generic IT topics. Of course, the offer was preceded by some discussion between Wang and Esposito regarding the proper pronunciation of the CEO’s name. Wang: This is Charles Wong from Computer Associates Esposito: Who? Wang: Charles Wong. I’m the CEO of Computer Associates Esposito: You must mean Charles Wang… No matter how the name was pronounced, the turmoil generated by Esposito’s letter, Wang’s call, and the reporting of the entire incident in the trade press caused a rift between Oracle and NYOUG that required the intervention of at least three senior Oracle executives to repair. When Ellison appeared at an Oracle sales event in Lower Manhattan in December 1994, he referred to the NYOUG members in the audience as “spies.” NYOUG expanded its SIG offerings in 1994, adding a CASE SIG, chaired by Tony Ziemba. That year also saw Rachel Carmichael take over as chair of the DBA SIG. The following year, a Web SIG was added, chaired by Diane Hickry from Northrop-Grumman. She would be replaced by Jeff Bernknopf in 1996. Peter Koletzke joined the steering committee as Marketing Director and managed the release of the first NYOUG DBA Utilities Disk. The next major organizational change for NYOUG occurred in 1999. Dr. Paul Dorsey took over as President, Zanone moved to Vice President, Tamir became Treasurer and Michael Medved joined the officer ranks as Secretary. Completing the shuffling of the ranks, Mike LaMagna became the group’s first Webmaster. A Data Warehousing SIG was also added, with Rob Edwards as chair. As NYOUG moved into the new millennium, Jeff Bernknopf added the Vice Presidency to his responsibilities and Edwards took over the newsletter from Guy Yasika. NYOUG had been holding most of its general meetings in Lower Manhattan throughout the 1990s. Meeting locations had varied, but most meetings were either held at Merrill Lynch’s offices at the World Financial Center or at the Swiss Bank Corporation’s conference center at the foot of City Hall Park. By the end of the decade, the College of Insurance, two blocks north of the World Trade Center had become the regular meeting location. NYOUG had scheduled its fall general meeting at that location for September 12, 2001. When the first plane hit the World Trade Center, Steering Committee members started calling each other to discuss the impact that the apparent crash would have on our meeting the next day.

www.nyoug.org 212.978.8890 16

By the time everyone was contacted, the magnitude of the attacks had become clear. Discussion of meeting logistics gave way to concerns about friends, family and NYOUG members who we knew worked in the Twin Towers. The group later heard from Guy Yasika who had gone down from his office on the upper floors to the cafeteria to get coffee with some colleagues shortly before the building was struck. Luckily, they emerged unharmed. There have been discussions since then about redesigning the NYOUG logo to remove the image of the World Trade Center towers. The Steering Committee decided that the towers should stay, as a reminder of who we are, and what we all lost on that day. The fall 2001 meeting was rescheduled for March 2002 and moved uptown to Columbia University’s Faculty house. NYOUG continued to meet either uptown or in midtown throughout the year. A Long Island SIG, with Jason Cohen as chair, was added to provide an opportunity for the growing membership in the suburbs to attend local events. A short-lived attempt was also made to create a local SIG for the northern suburbs, meeting in Westchester County and chaired by Arup Nanda. At the start of 2003, NYOUG returned downtown to the former College of Insurance site, which had become the Manhattan campus of St. John’s University. While the location is certainly convenient for members, it has bedeviled speakers from out of town. All too frequently, speakers who get into a cab at the airport and ask to be taken to St. John’s find themselves frantically calling for directions when their cab driver takes them to the University’s main campus in Queens. The officer shuffle continued over the next few years. Rachel Carmichael left IT completely when she began working towards becoming a veterinary technician. Simay Alpoge, a longtime NYOUG member, took over the DBA SIG and later added the Long Island SIG to her portfolio. Rob Edwards was elected Treasurer, and Melanie Caffrey took over responsibility for the growing newsletter, now the “NYOUG Technical Journal.” John McKenna took over the Warehousing SIG for a year or so before it landed back with Edwards. At the end of 2005, the shuffling seemed complete. Tom Petite joined the board as Secretary, Michael Medved became Venue Coordinator and Sean Hull took over Bernknopf’s responsibilities as Vendor Coordinator. Finally, Coleman Leviter became chair of the Web SIG. Paul Dorsey finally succumbed to the demands of fatherhood. At the end of 2006 he declined nomination for another term as President and was replaced by Michael Olin. This presented the group with a major crisis. While Dorsey had done a spectacular job in growing both the group’s membership and prestige, none of that would have been possible without the extraordinary efforts of his right (and possibly left) hand, Caryl Lee Fisher. She was the logistical genius that had enabled the growth of NYOUG. The crisis was resolved when NYOUG hired Fisher as Executive Director. Now she had a title that accurately described what she did for the group (which is pretty much everything). A few minor changes to the leadership of the group have occurred during Olin’s (relatively) short tenure. Vikas Sawhney is now chair of the Warehousing SIG and Irina Cotler has recently joined the steering committee as Vendor Relations Chair. NYOUG’s members and officers have authored numerous books on Oracle related topics, presented at Oracle conferences throughout the world, and even taught database systems and Oracle to university students. Among the membership are Oracle ACEs, members of the Oak Table Network and even an Oracle “DBA of the Year” (2003) and PL/SQL Developer of the Year (2007).” NYOUG’s general and SIG meetings attract some of the top speakers and leaders in the industry. Past meeting keynote speakers have included Oracle’s Ray Lane, Charles Philips, Ken Jacobs, Tom Kyte, Thomas Kurian, Dai Clegg, Roel Stalman, Vijay Tella, and Wim Coekaerts, as well as other leading Oracle gurus like Steven Feuerstein, Rich Niemiec, Michael Abbey, and Arup Nanda. While the regular general meetings routinely attract over 100 attendees, NYOUG’s “Metro Area” events draw over 500. These events have been held on university campuses, in corporate auditoriums, on a boat cruising the Hudson River and New York Harbor and in Midtown Manhattan hotels. In addition to its meetings, NYOUG also offers its members exclusive training sessions and a Technical Journal full of useful white papers, tips and techniques from the meeting speakers and members. The entire archive of presentations from NYOUG meetings is available on the web (http://www.nyoug.org) and the group is exploring the possibility of podcasting meeting presentations. Over the past 25 years, NYOUG has seen its paid membership grow to over 700 and its weekly email blasts reach over 4,000 IT professionals. Although part of the IOUG Regional User Group structure, NYOUG is a completely independent and self-funded group, supported by revenue from its membership dues and meeting vendor sponsors. NYOUG has been a resource for the Oracle user community for almost as long as there have been Oracle users. The group has accomplished quite a bit in that time and is well positioned for the next 25 years.

www.nyoug.org 212.978.8890 17

Upgrading to 11g – Best Practices Ashish Agrawal, Oracle Corporation

INTRODUCTION This white paper describes the best practices that can be used while doing a Database upgrade to 11g. The Best Practices are derived by Oracle technical staff and offer an accumulation of real-world knowledge and experience obtained while working with our customers. Following these best practices will help to avoid the most common challenges faced while upgrading a database to 11g. The white paper discusses the Oracle’s Upgrade Companion, testing methodology, SQL Plan Management and Real Application Testing. Before pursuing an upgrade, it is imperative that one fully understands the upgrade process and planning, potential upgrade paths, the steps to upgrade, and the testing involved. The following sample workflow illustrates this approach:

1. Upgrade Planning - Evaluate and document the plan for configuring and testing the upgrade procedure in your test environment. o The documented plan resulting from this step will be relevant for Test, Stage, and Production environments.

2. Prepare and Preserve - Evaluate, document, and perform the steps to prepare your test environment. o Decisions and steps outlined here will be relevant for both Test and Production environments.

3. Upgrade - Upgrade your test environment o Document any lessons learned from this step to ensure smooth execution when upgrading your production

database. 4. Post-upgrade - Use the tips and techniques documented here to ensure your test environment is performing up to

a standard required for production. 5. At this point, you have upgraded the test environment. Consider the following:

o Have you adjusted your plan to include everything you learned from the test upgrade? During your production upgrade, an accurate plan is important to avoid problems that were

encountered during the test upgrade o Are you comfortable that you have a repeatable plan to upgrade production?

If not, test the upgrade procedure again o Are you comfortable that the system was tested adequately for functionality and stability and will adhere to all

of your performance and availability requirements? o Have you tested your fallback plans and procedures?

6. Once you are comfortable that you can move on to upgrade the Stage or Production environment, execute steps 2 through 4 on that environment.

UPGRADE COMPANION The Oracle 11g Upgrade Companion (Metalink Note 601807.1) helps with upgrading an Oracle database from Oracle9i Release 2 or Database 10g to Oracle Database 11g. The guide is not an automated tool but provides guidance for pre-upgrade, upgrade, and post-upgrade steps. It is constantly being updated and makes it easier to find upgrade information without sifting through multiple pieces of documentation, Metalink notes, and white papers. Reference Metalink Note: 601807.1 Upgrade Companion 11g

Challenge On of the most highly visible problem attributed to an upgrade do not occur while upgrading but appear as an unanticipated performance degradations after the upgrade operation is completed. Fixing this performance problem becomes one of the important challenge. One of the typical cause is execution plan changes due to different underlying

www.nyoug.org 212.978.8890 18

reasons. Other reasons for the performance problems are incorrect setup or issues with operating system,storage or network.

Best Practices

1. Upgrade a copy of production database in a test system and do a functional testing, stress testing, integration testing, and performance testing using real production loads.

2. Apply the latest patchset and Recommended patches. This is applicable to server side as well as client side software.

3. Preserve as much information as possible BEFORE upgrading the Production environment to the new release , this will help before and after comparison. This includes

1. AWR OR Statspack reports and repositry. 2. Operating System level performance statistics like cpu usage,memory usage etc using os utilities like vmstat

etc. 3. You may use oswatcher from Oracle. 4. Save old configuration information. You may use Oracle’s software configuration manager and RDA. 5. Save the current optimal tuned execution plans. You may use tkprof or Sql tuning sets. 6. Backup the current optimizer statistics.

4. Minimize the number of system and application component changes when doing the upgrade. 5. Validate the statistics gathering strategy for the database. In 11g users are encouraged to use

AUTO_SAMPLE_SIZE for ESTIMATE_PERCENT. In 11g AUTO_SAMPLE_SIZE is much faster compared to earlier versions and gives accuracy of close to that of 100 % sample size. Create system statistics during a regular workload period. Create fixed table statistics. Create dictionary statistics prior to the upgrade - otherwise it may take significantly longer to upgrade.

6. Create a fallback strategy, test and verify if this strategy works. Rehearse both upgrade and back out procedures. 7. Document all changes detailed and clearly into a change log. Follow a checklist that includes database and

application, operating system setup, storage and network.

The Following are some of the few features that can help in a successful upgrade. 1. SQL PLAN MANAGEMENT (SPM) 2. REAL APPLICATION TESTING

a. SQL PERFORMANCE ANALYZER ( SPA) b. DATABASE REPLAY

SQL PLAN MANAGEMENT (SPM) SQL plan management is a preventative mechanism that records and evaluates the execution plans of SQL statements over time, and builds SQL plan baselines composed of a set of existing plans known to be efficient. The SQL plan baselines are then used to preserve performance of corresponding SQL statements, regardless of changes occurring in the system. With SPM the optimizer uses only known or verified and accepted plans from the plan baselines and if new plan are found it is recorded in the plan history and will be used only after performance verification. Common usage scenarios where SQL plan management can improve or preserve SQL performance is database upgrades. A database upgrade that installs a new optimizer version usually results in plan changes for a small percentage of SQL statements, with most of the plan changes resulting in either no performance change or improvement. However, certain plan changes may cause performance regressions. The use of SQL plan baselines significantly minimizes potential performance regressions resulting from a database upgrade.

www.nyoug.org 212.978.8890 19

SPM has 3 phases Phase 1 – Capture There are two ways to capture execution plans in the SPM Management Base.

A. Automatic capture of execution plans by setting OPTIMIZER_CAPTURE_SQL_PLAN. B. Manual Plan Loading or Bulk load of execution plans using DBMS_SPM.LOAD_PLANS_FROM_SQLSET

or DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE. Phase 2 – Selection Only accepted plans will be used. New execution plans will be recorded in the plan history. Phase 3 – Evolution Evaluate all unverified plans for a given statement in the plan history to become either accepted or rejected. DBMS_SPM.EVOLVE_SQL_PLAN_BASELINE is used for Evolve. The following are the steps that can be used for SPM while upgrading from 10g R2 Database to 11g Database.

1. Create a SQL TUNING SET (STS) of desired sqls on 10g Release R2 from AWR or cursor cache. 2. Transport the STS from 10g Release R2 to 11g Database and unpack the STS on 11g. 3. On 11g use DBMS_SPM.LOAD_PLANS_FROM_SQLSET to load the execution plans into the SPM

baseline.

Create a SQL TUNING SET (STS) of Desired SQLs on 10g Release R2 Examples to create a STS Example 1: --From SNAP ID 1,2 Load sql id dmqch2g6rtvzf with plan_hash_value 1421641795 --and its sql plan and all execution statistics --into STS small_sh_sts_4 . --This will create an empty SQL Tuning Set small_sh_sts_4 with OWNER SYS. --Please use correct snap id's sqlplus / as sysdba EXEC DBMS_SQLTUNE.CREATE_SQLSET('SMALL_SH_STS_4','SYS'); DECLARE BASELINE_REF_CURSOR DBMS_SQLTUNE.SQLSET_CURSOR; BEGIN OPEN BASELINE_REF_CURSOR FOR SELECT VALUE(P) FROM TABLE(DBMS_SQLTUNE.SELECT_WORKLOAD_REPOSITORY(1, 2, 'SQL_ID='||CHR(39)||'DMQCH2G6RTVZF'||CHR(39)||' AND PLAN_HASH_VALUE=1421641795',NULL,NULL,NULL,NULL,NULL,NULL,'ALL')) P; DBMS_SQLTUNE.LOAD_SQLSET('SMALL_SH_STS_4', BASELINE_REF_CURSOR); END; Example 2: sqlplus scott/tiger -- This will create a sts with name test_sts and owner as scott. EXEC SYS.DBMS_SQLTUNE.CREATE_SQLSET(SQLSET_NAME => 'TEST_STS', - SQLSET_OWNER => 'SCOTT'); -- This will load sql starting with 'select /*MY_CRITICAL_SQL*/%' from cursor ----- cache into sts test_sts. DECLARE STSCUR DBMS_SQLTUNE.SQLSET_CURSOR; BEGIN

www.nyoug.org 212.978.8890 20

OPEN STSCUR FOR SELECT VALUE(P) FROM TABLE(DBMS_SQLTUNE.SELECT_CURSOR_CACHE( 'SQL_TEXT LIKE ''SELECT /*MY_CRITICAL_SQL*/%''', NULL, NULL, NULL, NULL, NULL, NULL, 'ALL')) P; DBMS_SQLTUNE.LOAD_SQLSET(SQLSET_NAME => 'TEST_STS', POPULATE_CURSOR => STSCUR, SQLSET_OWNER => 'SCOTT'); END; / Example 3: sqlplus scott/tiger --This will create a sts with name my_sql_tuning_set and owner as scott. EXEC SYS.DBMS_SQLTUNE.CREATE_SQLSET(SQLSET_NAME =>'MY_SQL_TUNING_SET',- SQLSET_OWNER => 'SCOTT'); -- The filter used here is 'executions > 1 and disk_reads > 10000' DECLARE BASELINE_REF_CURSOR DBMS_SQLTUNE.SQLSET_CURSOR; BEGIN OPEN BASELINE_REF_CURSOR FOR SELECT VALUE(P) FROM TABLE(DBMS_SQLTUNE.SELECT_WORKLOAD_REPOSITORY(22853, 23034, 'EXECUTIONS > 1 AND DISK_READS > 10000',NULL,NULL,NULL,NULL,NULL,NULL,'ALL')) P; DBMS_SQLTUNE.LOAD_SQLSET('MY_SQL_TUNING_SET', BASELINE_REF_CURSOR); END; / Example 4: EXEC SYS.DBMS_SQLTUNE.CREATE_SQLSET(SQLSET_NAME =>'MY_SQL_TUNING_SET_EXAMPLE', - SQLSET_OWNER => 'SCOTT'); DECLARE BASELINE_REF_CURSOR DBMS_SQLTUNE.SQLSET_CURSOR; BEGIN OPEN BASELINE_REF_CURSOR FOR SELECT VALUE(P) FROM TABLE(DBMS_SQLTUNE.SELECT_WORKLOAD_REPOSITORY(22853, 23034, 'BUFFER_GETS>10000 AND PARSING_SCHEMA_NAME <> ''SYS''',NULL,NULL,NULL,NULL,NULL,30,'ALL')) P; DBMS_SQLTUNE.LOAD_SQLSET('MY_SQL_TUNING_SET_EXAMPLE', BASELINE_REF_CURSOR); END; / -- Verify SQL statements in STS. SELECT SQL_ID, SUBSTR(SQL_TEXT,1, 15) TEXT FROM DBA_SQLSET_STATEMENTS WHERE SQLSET_NAME = 'NAME OF YOUR STS' ORDER BY SQL_ID; --Verify the execution Plan of a sql in the STS for an user sql. SELECT * FROM TABLE ( DBMS_XPLAN.DISPLAY_SQLSET( 'NAME OF YOUR STS','DMQCH2G6RTVZF')); Transport the STS from 10.2.0.X TO 11.1.0.X Database and Unpack the STS on 11.1.0.X. Example 1. Create a staging table on the source system 10g R2 system.

www.nyoug.org 212.978.8890 21

EXECUTE DBMS_SQLTUNE.CREATE_STGTAB_SQLSET(TABLE_NAME =>'TEST'); 2. Populate the table TEST using DBMS_SQLTUNE.PACK_STGTAB_SQLSET on the source system 10g R2 system.

EXECUTE DBMS_SQLTUNE.PACK_STGTAB_SQLSET(SQLSET_NAME => 'SMALL_SH_STS_4',STAGING_TABLE_NAME => 'TEST');

3. Export the table test on the source system 10g release R2. to the destination server 11g and Import it . The staging table TEST can also be moved using the mechanism of choice such as datapump or database link.

4. Unpack the table using DBMS_SQLTUNE.UNPACK_STGTAB_SQLSET on the Destination server 11g.

EXECUTE DBMS_SQLTUNE.UNPACK_STGTAB_SQLSET (SQLSET_NAME => '%',- REPLACE => TRUE,- STAGING_TABLE_NAME => 'TEST'); On 11g Database Load Plans from STS into the SQL PLAN BASELINE SET SERVEROUTPUT ON DECLARE MY_INTEGER PLS_INTEGER; BEGIN MY_INTEGER := DBMS_SPM.LOAD_PLANS_FROM_SQLSET ( SQLSET_NAME => 'SMALL_SH_STS_4', SQLSET_OWNER => 'SYS', FIXED => 'YES', ENABLED => 'YES'); DBMS_OUTPUT.PUT_LINE(MY_INTEGER); END; / --Verify that Plan baselines were populated. SQL> SELECT COUNT(*) FROM DBA_SQL_PLAN_BASELINES; SELECT SQL_HANDLE,SQL_TEXT, PLAN_NAME, ORIGIN, ENABLED, ACCEPTED, FIXED FROM DBA_SQL_PLAN_BASELINES WHERE SQL_TEXT LIKE 'SELECT%SH.SALES%';

To enable the use of SQL plan baselines, set the OPTIMIZER_USE_SQL_PLAN_BASELINES initialization parameter to TRUE (default). This is a session level as well as system level parameter and can be modified dynamically. Each time a SQL statement is compiled, the optimizer first uses a cost-based search method to build a best-cost plan, then tries to find a matching plan in the SQL plan baseline. If a match is found, the optimizer will proceed using this plan. Otherwise, it evaluates the cost of each accepted plan in the SQL plan baseline and selects the plan with the lowest cost. The best-cost plan found by the optimizer that does not match any plans in the plan history for the SQL statement represents a new plan, and is added as a non-accepted plan to the plan history. The new plan is not used until it is verified to not cause a performance regression. Use the DBMS_SPM.EVOLVE_SQL_PLAN_BASELINE function to verify this new plans. The Note section of an explain plan can verify the usage of execution plan from SPM. For example:

www.nyoug.org 212.978.8890 22

select * from table(dbms_xplan.display(null, null, 'basic +note')); Note ----- - SQL plan baseline "SYS_SQL_PLAN_0f3e54d211df68d0" used for this statement To view the plans stored in the SQL plan baseline for a given statement, use the DISPLAY_SQL_PLAN_BASELINE function of the DBMS_XPLAN package: select * from table( dbms_xplan.display_sql_plan_baseline( sql_handle=>'SYS_SQL_209d10fabbedc741', format=>'basic')); If you are using stored outlines currently, it is recommended to migrate the outlines to SPM in 11g. Real Application Testing The Real application Tetsing can be used for testing the upgrade on an test instance.The Real application testing has two options SQL PERFORMANCE ANALYZER and Database Replay. The Database Replay feature can capture starting 9.2.0.8.0 version of database. SQL PERFORMANCE ANALYZER can help in upgrade since 9.X version of database. For more details reference Metalink Note 560977.1 Real Application Testing Now Available for Earlier Releases Metalink Note 562899.1 For SQL PERFORMANCE ANALYZER SQL PERFORMANCE ANALYZER ( SPA) SQL Performance Analyzer provides a granular view of the impact of changes on SQL execution plans and execution statistics by running the SQL statements in isolation before and after a change. SQL Performance Analyzer compares the SQL execution result, before and after the change, and generates a report outlining the net benefit on the workload due to the changes as well as the set of regressed SQL statements. For regressed SQL statements, appropriate executions plan details along with recommendations to remedy them are provided. The workflow for SPA is:

1. Capture the SQL workload in SQL Tuning Set (STS) on 10g R2. 2. Measure the performance of the workload before the change. 3. Make a Change. 4. Measure the performance of the workload after the change. 5. Compare performance and review the compare report and tune any regression.

The following is one of the examples how SPA can be used for 11g upgrade testing.

1. On 10g R2 Create the Sql Tuning set ( STS ).

-- create_sts.sql -- create my sql tuning set and populate it from the cursor cache var sts_name varchar2(30); exec :sts_name := 'small_sh_sts_4';

www.nyoug.org 212.978.8890 23

exec dbms_sqltune.drop_sqlset(:sts_name); exec dbms_sqltune.create_sqlset(:sts_name, 'small demo workload to test SQLPA'); DECLARE stscur dbms_sqltune.sqlset_cursor; BEGIN OPEN stscur FOR SELECT VALUE(P) FROM TABLE(dbms_sqltune.select_cursor_cache( 'sql_text like ''SELECT /*+ my_query%''', null, null, null, null, null, null, 'ALL')) P; -- populate the sqlset dbms_sqltune.load_sqlset(:sts_name, stscur); end; / Upgrade to a test database to 11g and transport the STS to this 11g database. On 11g database Create a Task to run Sql Performance analyzer. --create_sqlpa_task.sql -- 1. create a task with a purpose of change impact analysis ---create sql task -- declare vars var tname varchar2(30); var sname varchar2(30); -- init vars exec :sname := 'small_sh_sts_4'; exec :tname := 'my_sqlpa_demo_task'; exec :tname := dbms_sqlpa.create_analysis_task(sqlset_name => :sname, - task_name => :tname); -- 2. check task status --------------------------- SELECT task_name, status FROM user_advisor_tasks WHERE task_name = :tname; set optimizer_features_enable=10.2.0.X ( To the version from which the database is upgrade from) for example alter system set optimizer_features_enable=’10.2.0.4’; and make sure your optimizer statistics are exactly the same as 10.2 database.

2. Before Change Test Execute

beforechange.sql --Now I am ready to run the BEFORE CHANGE EXECUTE begin DBMS_SQLPA.EXECUTE_ANALYSIS_TASK( task_name => 'my_sqlpa_demo_task', execution_type => 'TEST EXECUTE', execution_name => 'BEFORECHANGE'); end; /

www.nyoug.org 212.978.8890 24

3. Now Make a Change alter system set optimizer_features_enable='11.1.0.7';

--flush the shared pool and buffer cache for proper comparison.

4. After Change Test Execute

-- afterchange.sql --Now I am ready to run the AFTER CHANGE EXECUTE begin DBMS_SQLPA.EXECUTE_ANALYSIS_TASK( task_name => 'my_sqlpa_demo_task', execution_type => 'TEST EXECUTE', execution_name => 'AFTERCHANGE'); end; /

5. Generate Report and Compare the Performance -Now we need to compare The 2 executions BEFORECHANGE and AFTERCHANGE . --We are selecting the comparison matrix as BUFFER_GETS. --compare_runs.sql begin DBMS_SQLPA.EXECUTE_ANALYSIS_TASK( task_name => 'my_sqlpa_demo_task', execution_type => 'COMPARE PERFORMANCE', execution_name => 'DEMOTASK', execution_params => dbms_advisor.arglist( 'comparison_metric', 'buffer_gets')); end; / --Now we will generate a Report --The report format can be TEXT,HTML,OR XML --report.sql set long 100000 longchunksize 100000 linesize 200 head off feedback off echo off spool report.html SELECT dbms_sqlpa.report_analysis_task('my_sqlpa_demo_task', 'HTML', 'ALL','ALL') FROM dual; spool off Review this report and tune any regression. SQL Performance Analyzer is well integrated with existing SQL Tuning Set (STS), SQL Tuning Advisor and SQL Plan Management functionalities. SQL Performance Analyzer completely automates and simplifies the manual and time-consuming process of assessing the impact of upgrade on extremely large SQL workloads (thousands of SQL statements). DBAs can use SQL plan baselines and SQL Tuning Advisor to remediate the regressed SQL statements in test environments and generate optimally performing execution plans. These plans are then exported back into production and used for future executions of SQL statements. Thus, using SQL Performance Analyzer, DBAs can validate with a high degree of confidence that an upgrade to a production environment in fact results in net positive improvement.

www.nyoug.org 212.978.8890 25

Database Replay Database Replay can be used for 11g upgrade testing. One can capture the real live workload on an production system starting 9.2.0.8.0 version of Database and above and replay it on an 11g test system while following the exact timing, concurrency and transactional properties of the original workload. The end result of Database Replay testing is the following reports. Errors Data divergence Performance divergence If the above 3 points are taken care of before the actual production upgrade, the upgrade will be successful. Following is the Workflow for Database Replay:

1. Workload CAPTURE 2. Workload PREPROCESSING 3. Workload REPLAY

Workload Capture The first step in using Database Replay is to capture the production workload. Capturing a workload involves recording all requests made by external clients to Oracle Database. When workload capture is enabled, all external client requests directed to Oracle Database are tracked and stored in binary files—called capture files—on the file system. You can specify the location where the capture files will be stored. Once

www.nyoug.org 212.978.8890 26

workload capture begins, all external database calls are written to the capture files. The capture files contain all relevant information about the client request, such as SQL text, bind values, and transaction information. Background activities and database scheduler jobs are not captured. These capture files are platform independent and can be transported to another system Workload Preprocessing Once the workload has been captured, the information in the capture files need to be preprocessed. Preprocessing transforms the captured data into replay files and creates all necessary metadata needed for replaying the workload. This must be done once for every captured workload before they can be replayed. After the captured workload is preprocessed, it can be replayed repeatedly on a replay system running the same version of Oracle Database. Typically, the capture files should be copied to another system for preprocessing. As workload preprocessing can be time consuming and resource intensive, it is recommended that this step be performed on the test system where the workload will be replayed. Workload Replay After a captured workload has been preprocessed, it can be replayed on a test system. During the workload replay phase, Oracle Database performs the actions recorded during the workload capture phase on the test system by re-creating all captured external client requests with the same timing, concurrency, and transaction dependencies of the production system. Database Replay uses a client program called the replay client to re-create all external client requests recorded during workload capture. Depending on the captured workload, you may need one or more replay clients to properly replay the workload. A calibration tool is provided to help determine the number of replay clients needed for a particular workload. Because the entire workload is replayed—including DML and SQL queries—the data in the replay system should be as logically similar to the data in the capture system as possible. This will minimize data divergence and enable a more reliable analysis of the replay. Analysis and Reporting Once the workload is replayed, in-depth reporting is provided for you to perform detailed analysis of both workload capture and replay. The report summary provides basic information about the workload capture and replay, such as errors encountered during replay and data divergence in rows returned by DML or SQL queries. A comparison of several statistics—such as database time, average active sessions, and user calls—between the workload capture and the workload replay is also provided. For advanced analysis, Automatic Workload Repository (AWR) reports are available to enable detailed comparison of performance statistics between the workload capture and the workload replay. The information available in these reports is very detailed, and some differences between the workload capture and replay can be expected. For application-level validation, you should consider developing a script to assess the overall success of the replay. For example, if 10,000 orders are processed during workload capture, you should validate that a similar number of orders are also processed during replay. After the replay analysis is complete, you can restore the database to its original state at the time of workload capture and repeat workload replay to test other changes to the system once the workload directory object is backed up to another physical location. The following are the steps for Database Replay. On the Production System (9.2.0.8.0 to 10.2.X)

1. Create a directory where the capture files will be recorded. sqlplus / as sysdba create directory “TEST” as ‘/home/prod/capture’;

2. Start the capture.

www.nyoug.org 212.978.8890 27

sqlplus / as sysdba execute dbms_workload_capture.start_capture(name => ‘prod-capture’, dir =>‘TEST’, duration => NULL);

3. Stop the capture after the desired time.

execute dbms_workload_capture.finish_capture;

Restore the production system on a test system until the SCN number when the capture began. The capture report shows this SCN number. Move the capture files to this test system in a directory. Create the directory “TEST” as ‘/home/test/replay’; On the Test 11g System

1. Preprocess the workload. sqlplus / as sysdba execute dbms_workload_replay.process_capture(‘TEST’);

2. Initialize replay data.

execute dbms_workload_replay.initialize_replay(replay_name => ‘test-replay’, replay_dir => ‘TEST’); 3. Remap connections using DBMS_WORKLOAD_REPLAY.REMAP_CONNECTION.

execute DBMS_WORKLOAD_REPLAY.REMAP_CONNECTION (connection_id => 101,replay_connection => 'dlsun244:3434/bjava21');

In this example, the connection that corresponds to the connection ID 101 will use the new connection string defined by the replay_connection parameter.

4. To prepare workload replay on the replay system, use the PREPARE_REPLAY procedure.

execute dbms_workload_replay.prepare_replay(synchronization => TRUE, connect_time_scale => 100, think_time_scale => 100, think_time_auto_correct => TRUE);

5. Before starting a workload replay we must start the replay client.

a. To estimate the number of replay clients and hosts that is required to replay a particular workload, run the

wrc executable in calibrate mode. wrc mode=calibrate replaydir=/home/test/replay

b. Start the required number of replay clients as follows. wrc username/passwd@inst replaydir=/home/test/replay

6. Start the replay.

sqlplus / as sysdba execute dbms_workload_replay.start_replay;

www.nyoug.org 212.978.8890 28

Once the replay is finished you can the replay report. To generate a workload replay report, use the REPORT function: DECLARE cap_id NUMBER; rep_id NUMBER; rep_rpt CLOB; BEGIN cap_id := DBMS_WORKLOAD_REPLAY.GET_REPLAY_INFO(dir => 'TEST'); /* Get the latest replay for that capture */ SELECT max(id) INTO rep_id FROM dba_workload_replays WHERE capture_id = cap_id; rep_rpt := DBMS_WORKLOAD_REPLAY.REPORT(replay_id => rep_id, format => DBMS_WORKLOAD_REPLAY.TYPE_TEXT); END; / Please check Oracle documentation for detailed description on the steps and restrictions.

Conclusion By following Best Practices, 11g upgrade will be much easier and highly successful.

Acknowledgement This White paper is based on Metalink Note: 601807.1 Upgrade Companion 11g. I would like to thank my colleagues from Oracle Centre of excellence, and Upgrade Companion Team and acknowledge there help. NOTE: This document is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. This document is provided for information purposes only and the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document and no contractual obligations are formed either directly or indirectly by this document.

MAKE EVERY PERFORMANCEA HIT WITH COMPUWARE

Get clear insight into end-user experience and make IT a strategic revenue driver with Compuware Business Service Delivery (BSD) solutions.

compuware.com

www.nyoug.org 212.978.8890 30

ADF On-Ramp: What You Need to Know to Use the ADF Fusion Technology Stack

Peter Koletzke, Quovera

Si nous ne trouvons pas des choses agréables, nous trouverons du moins des choses nouvelles.

(If we do not find anything pleasant, at least we shall find something new.)

—Voltaire (1694-1778), Candide (Ch. xvii)

Developing a Java-oriented web application these days is an experience that many Oracle technologists find to be new but not necessarily very pleasant. Architecting such an application requires selecting a set of technologies from a dauntingly-large and ever-growing list. Up to now, the responsibility for combining these technologies and ensuring that they communicate and work together in an orderly way has been left up to each organization. The path of selecting and working with different Java-oriented frameworks can inevitably lead to wrong turns, especially for those who are new to the Java world. Depending upon when those wrong turns occur, the effect on the project can range from mild to devastating and will likely require rewriting some or most of the application. Fortunately, Oracle has now provided guidance in the form of the set of technologies they have selected to build Fusion Applications—the next version of E-Business Suite (Oracle Applications). Oracle has chosen open-standards technologies in the Java realm so parts of the application can be easily extended with little reliance on a specific vendor’s product line, hardware set, or operating system. (Many other reasons for selecting open standard technologies—such as customer preferences—exist, but are a larger discussion that is not critical to the focus of this article.)

Fusion Technology Stack Fusion developers within Oracle have been creating Fusion Applications using Application Development Framework (ADF) in JDeveloper with the following core technologies: ADF Business Components (ADF BC) ADF Faces Rich Client (ADF Faces RC) ADF Bindings and ADF Data Controls ADF Controller In addition to those core technologies, Oracle uses high-level technologies or strategies such as the following to coordinate Fusion Applications’ components and to fulfill additional architectural requirements: Service Oriented Architecture (SOA) with Business Process Execution Language (BPEL) Enterprise Service Bus (ESB) Oracle Business Rules Oracle WebCenter Since Oracle is using Application Development Framework (ADF)—a facility in JDeveloper for working with code in a common way—to create Fusion Applications, you can use the term “ADF Fusion Technology Stack” to refer to all technologies in the core and high-level lists. Packaged application software is a large part of Oracle’s business, and Oracle has a very compelling business reason to ensure that the technologies used in Fusion Applications will integrate properly and work successfully. Therefore, you can be relatively assured that you, too, can be successful in creating applications with the same technologies.

www.nyoug.org 212.978.8890 31

Retooling for Fusion Technology Work Determining the list of technologies that an application will use is not enough. Planning for any application development effort must also include tasks and strategies for bringing current development staff up to speed on the techniques required for the new environment. If you determine that your current development staff cannot reach acceptable skill levels in the available time, you may need to employ additional resources. You will need to understand what tools, development techniques, and languages a developer needs to learn (for current staff) or to know (for additional resources) to be productive in the ADF Fusion Technology environment. The main objective of this article is to explain just that—what developers need to know to be productive writing applications using the ADF Fusion Technology Stack. If you think of Oracle internal developers as drivers already speeding along on the Fusion Development Highway, this article is the on-ramp for others who are not yet on that road but who need to be there. To extend that analogy a bit, while the exit for Oracle developers is “Oracle Fusion Applications Production” and yours will be different, all have the same vehicle—ADF in JDeveloper—and type of fuel—the Fusion Technology Stack. This article starts by explaining some preliminary concepts; then it explains and shows the kinds of code and techniques needed for productive work in ADF with the core technologies in the Fusion Technology Stack. The goal is to explain the main development techniques for only the core technology set. The high-level technologies are more strategic systems that an enterprise architect will select for a particular application. While heads-down developers may need to know about techniques for the high-level technologies, that type of work will vary depending upon architectural decisions and on the enterprise’s environment. In fact, the core technology stack may suffice for some applications so no high-level technologies would be needed at all. The article closes by discussing the languages developers use for this type of work.

Note: Although this article does not specifically address techniques required to extend Fusion Applications, if you currently develop or maintain custom extensions to Oracle E-Business Suite (Oracle Applications) or think you will find yourself doing so in the future, you will be using the same techniques and technologies discussed in this article for that work in Fusion Applications.

What is Fusion? The word “Fusion” is used these days to refer to almost everything from cars to food to razors to drinks. Therefore, the first concept to understand is what Oracle means when they use that word. The word “Fusion” is used in various ways within the Oracle product line, but it generally refers to a strategic reorganization of Oracle products. Oracle invented Oracle Fusion after acquiring various companies who offered their own application products. Oracle’s objective with Fusion is to merge the best of all those products into a single (fused) applications suite. This effort will take many years, but Oracle has started this work and we expect to see the premier version of these application products in the near future. You can summarize Oracle Fusion with the uses in the following terms: Oracle Fusion Applications, mentioned before, is the next version of Oracle E-Business Suite. Oracle Fusion Middleware is the toolset that Oracle is using to develop and deploy Fusion Applications. This

toolkset consists of virtually all Oracle development and runtime products (except for the Oracle database and Oracle packaged applications) such as JDeveloper and Oracle WebLogic Server (but not legacy tools such as Oracle Forms and Reports).

Oracle Fusion Architecture outlines the way various technologies are used to build the applications. This Fusion usage is not as frequently used or seen as Oracle Fusion Middleware and Oracle Fusion Applications.

JDeveloper (now at its first fully-featured 11g production release) is the Fusion Middleware development tool. It is the common tool used for developing all types of code, regardless of the technology. Moreover, as mentioned, JDeveloper is the container for ADF. Therefore, Oracle is very focused on enabling JDeveloper 11g to support all requirements of the new Fusion Applications.

www.nyoug.org 212.978.8890 32

What is ADF? To answer that question, you need to know that the word “framework” in the Java world refers to an application development technology. A framework is like an Application Programming Interface (API) or a code library in other disciplines: all offer generically built code that you can use in your application. The code that implements the framework supplies an entire service that you can access using a certain development method and calling interface. Although APIs and code libraries may have these characteristics, frameworks are built around the idea of a service. For example, instead of building from scratch some key facility such as a connection layer to the database, you use an existing framework such as ADF BC to supply that service. One reason to use a framework is to tap into a standard way of supplying the functionality of the service to your application. You do not need to invent a service that you need for a piece of your application. Another related reason is that you do not need to redevelop code that many applications share. When using a framework, you leverage solid and (hopefully) well-debugged code in all your applications. In addition, the most popular frameworks offer solid support at least from the user community, if not from a vendor. The sidebar “Working with Java Frameworks” describes how you use frameworks in your application code.

Working with Java Frameworks

Framework code in the Java world usually consists of prebuilt Java classes. Those classes offer complete functionality for a service (like database access). They are set up to read configuration or application-specific definitions coded inside an Extensible Markup Language (XML) file. Therefore, the primary code you are responsible for when using a framework is XML-based. A good framework offers enough flexibility to handle most applications with this type of work. Moreover, developers using frameworks are most effective when they understand what the framework can accomplish so they can design their application code to fully leverage the framework.

Occasionally (and if you are using frameworks properly, it should only be occasionally), a developer will need to replace or add to a part of the service that cannot fulfill an application’s requirement. In this case, the developer subclasses one or more framework classes and adds some code to customize the framework’s behavior. This type of work requires intermediate-level knowledge of Java as well as a deep knowledge of the framework. Therefore, it is a technique to be used sparingly.

So ADF is… Application Development Framework (ADF) is an architectural strategy within JDeveloper that allows you to build applications using common declarative and visual methods. For example, you can build database access code into your application using Enterprise JavaBeans (EJBs), ADF Business Components (ADF BC), or web services (among others). The code details and libraries that support these frameworks are different, but the actions you use in JDeveloper to create user interfaces based on these frameworks are the same. ADF, therefore, is really a meta-framework that integrates and offers common development methods to many other frameworks.

ADF Architecture The ADF architecture model, depicted in Figure 1, divides the frameworks it supports into various code layers that loosely follow the Java EE design pattern Model-View-Controller (MVC). MVC defines three main layers of application code: Model—to manage the data portion of the application, View—to handle drawing the user interface screen, and Controller—to process user interface events (such as button clicks) and to control page flow (how one page is called from another page). The ADF architecture layers follow the definition of MVC for the most part, but ADF adds another layer, ADF Business Services, a spin off from the Model layer. ADF Business Services provides code for accessing data sources such as a database. Business services are responsible for persistence—the physical storage of data for future retrieval—and object-relational (OR) mapping—translating storage units such as rows and columns in relational database tables to object-oriented structures such as arrays of objects with property values. ADF Business Components is a core Fusion technology in this layer.

www.nyoug.org 212.978.8890 33

The ADF View layer corresponds directly to the MVC View layer. It includes technologies that you use to draw the user interface. In the case of web client code—application code that is run in a Java runtime on an application server rather than locally on the desktop (as is application client code)—ADF View supports JavaServer Faces (JSF) and ADF Faces RC, core Fusion technologies. The ADF Controller layer, which defines separate frameworks only for web client code, supports popular JSF and Struts controller frameworks. In addition, it adds an ADF-specific framework—ADF Controller (“ADF Task Flow Controller”) —that allows you to create and control parts of a page. ADF Controller is a core Fusion technology in this layer. The ADF Model layer corresponds to part of the MVC Model layer but specifically represents the connection mechanism from the Business Services layer to the View layer (through the Controller layer). The ADF Model layer is composed of the following two aspects: ADF Bindings This framework (really just an aspect of ADF Model) provides a standard way to access data values

in the ADF Business Services layer from an ADF View user interface component such as a pulldown item. For example, if you defined a business service item to query the DEPARTMENTS table, you could add an expression to the Value attribute of a text input item referring to the DEPARTMENT_ID column of the query. When the screen is drawn, the data would automatically flow from the ADF Business Services object to the text item in the View layer by using ADF Bindings.

ADF Data Controls This aspect of ADF Model supplies a list of prebound components based on the data model (data sources) defined in the ADF Business Services layer. For example, in JDeveloper, you could drag and drop a node from the Data Controls panel that represents the DEPARTMENTS query onto a JSF page. The IDE will determine the type of business service (in this case a collection—multiple rows and multiple columns) and will present a selection menu of different styles of display components (for example, forms, tables, trees, or navigation buttons). Selecting one of those options causes JDeveloper to lay out the appropriate display on the screen and bind the items on the screen to the business service.

Using both of those aspects, you do not need to write code to present data (for query and also for insert, update, and delete operations) in the user interface. Although no Java EE standard exists yet for bindings and data controls, Oracle and other parties are working on a Java Specification Request (JSR, the process by which a new feature or revision is made to the Java platform) to include this mechanism in the Java standards just as JavaServer Faces is supported by standards. (You can read more about this JSR by searching at jcp.org for JSR-227.)

Figure 1. ADF Architecture Model

www.nyoug.org 212.978.8890 34

Dans ce meilleur des mondes possibles ... tout est au mieux.

(In this best of all possible worlds ... everything is for the best.)

—Voltaire (1694-1778), Candide (Ch. i)

Core ADF Fusion Technologies The easiest way to describe the core ADF Fusion technologies is in the context of a working application. Although the ADF frameworks have many advanced features, the purpose of this article (to understand what you need to know) will be best served by looking at a simple application (shown in Figure 2) that provides the following basic data handling functions:

1. Querying the DEPARTMENTS table in read-only mode when the page opens. 2. Querying EMPLOYEES table records that are related to the displayed DEPARTMENTS record. 3. Navigating between DEPARTMENTS table records using First, Previous, Next, and Last buttons. 4. Editing the displayed DEPARTMENTS table using a separate page accessed with the Edit Department button. 5. Creating a DEPARTMENTS record using the edit page in Create mode accessed with the New Department

button.

Figure 2. Sample Application Containing Basic Data Handling Functions This application uses basic examples of these core Fusion technologies:

www.nyoug.org 212.978.8890 35

ADF Business Components for ADF Business Services layer functions that access the database. ADF Faces Rich Client for ADF View layer functions that render the user interface in the web browser ADF Bindings and ADF Data Controls for ADF Model layer functions that connect database data to components on

the web page ADF Controller for Controller layer functions that manage page flow and handle user event interactions

Let’s see where those technologies are used in this sample application.

ADF Business Components This application queries and updates data in an Oracle database. ADF Business Components (ADF BC) is the framework from the ADF Business Services layer used to perform the database-specific operations. For example, a representation of the DEPARTMENTS table is defined in an ADF BC entity object. You work with the entity object code in a declarative way. When you create an entity object, you follow a set of wizard pages. To change the entity object you would interact with a property editor such as the following for the Departments entity object:

Entity objects contain attributes that represent columns in the database table or view. The Attributes tab in the Entity Object Editor just shown allows you to modify the details about a specific entity attribute. Figure 3 shows an example of that screen. Each attribute defines a Java field (for example, DepartmentId with a Java type of Number and a SQL type of NUMBER(4,0)) that ADF BC will use to prepare INSERT, UPDATE, and DELETE statements based on instructions issued through the user interface. These SQL statements are then passed to the database through Java Database Connectivity (JDBC) communication paths. All of the code that handles the JDBC calls as well as the code to create the SQL statements are provided by ADF BC. All you need do is declare at which table and columns the ADF BC framework should target. The entity object wizard pages and property editor screens create XML code that is read by the framework files. The following code listing is a snippet from Departments.xml, the entity object definition file for the DEPARTMENTS table: <Entity

www.nyoug.org 212.978.8890 36

xmlns="http://xmlns.oracle.com/bc4j" Name="Departments" Version="11.1.1.53.41" DBObjectType="table" DBObjectName="DEPARTMENTS" AliasName="Departments" BindingStyle="OracleName" UseGlueCode="false"> <DesignTime> <Attr Name="_codeGenFlag2" Value="Access"/> <AttrArray Name="_publishEvents"/> </DesignTime> <Attribute Name="DepartmentId" IsNotNull="true" Precision="4" Scale="0" ColumnName="DEPARTMENT_ID" SQLType="NUMERIC" Type="oracle.jbo.domain.Number" ColumnType="NUMBER" TableName="DEPARTMENTS" PrimaryKey="true"> <DesignTime> <Attr Name="_DisplaySize" Value="22"/> </DesignTime> <Properties> <SchemaBasedProperties> <LABEL ResId="hr.model.Departments.DepartmentId_LABEL"/> </SchemaBasedProperties> </Properties> </Attribute> This snippet shows how the entity object is declared and associated with the DEPARTMENTS table; it also sets up the DepartmentId attribute based on the DEPARTMENT_ID column. Similar definitions appear for other attributes in the entity object. When you change the entity object properties, the XML code is modified appropriately. Therefore, you do not need to modify (or even look at) entity object XML code.

Note: This declarative style of programming is found throughout work in JDeveloper and is a core strength of ADF.

www.nyoug.org 212.978.8890 37

Figure 3. Attributes Page of the Entity Object Editor Just as entity objects supply INSERT, UPDATE, and DELETE operations, view objects represent SELECT statements. View objects can be based on one or more entity objects, which then supply details about the table and columns, or on SELECT statements. You create and edit view objects in the same declarative way as entity objects. An XML code snippet for a view object follows:

<ViewObject xmlns="http://xmlns.oracle.com/bc4j" Name="AllEmployees" Version="11.1.1.53.41" SelectList="Employees.EMPLOYEE_ID, Employees.FIRST_NAME, Employees.LAST_NAME, Employees.JOB_ID, Employees.EMAIL, Employees.HIRE_DATE, Departments.DEPARTMENT_NAME, Departments.DEPARTMENT_ID, Departments.LOCATION_ID" FromList="DEPARTMENTS Departments, EMPLOYEES Employees" Where="Departments.MANAGER_ID = Employees.EMPLOYEE_ID" BindingStyle="OracleName" CustomQuery="false" PageIterMode="Full" UseGlueCode="false"> ... <ViewAttribute Name="EmployeeId" IsUpdateable="false" IsNotNull="true" PrecisionRule="true" EntityAttrName="EmployeeId" EntityUsage="Employees" AliasName="EMPLOYEE_ID"/>

www.nyoug.org 212.978.8890 38

This view object is based on two entity objects, Employees and Departments; in the view object’s XML you will find clauses used to construct a SELECT statement from those two tables. You can also read this query more directly in the view object editor as shown here:

The Bind Variables section of the editor just shown allows you to create variables that you work into the query so you can filter rows by values supplied by the application or by the user. You can also create view links that represent foreign key constraints, master-detail relationships, or other logical attribute pairs that relate one view object to another. In the sample application, a view link is defined between the DepartmentsView and EmployeesView view objects so when a department record is displayed, the employees for that department will be displayed. ADF BC automatically handles the master-detail synchronization if you define a view link.

ADF Controller The JavaServer Faces standard of the Java Enterprise Edition platform specifications defines Controller functionality, which manages page flow (which page is loaded) as well handling user events (for example, by passing data from the Model layer to the View layer). ADF supplements the standard JSF Controller with the ADF Controller framework (also called “ADF Task Flow Controller”), which adds the ability to handle page fragments (parts of pages). This ability has the following advantages over the standard JSF Controller: Page fragment processing can be faster (because fewer components are rerendered) Fragments can be reused more easily than full pages Additional functions or logic can be added into the flow between pages Flows between pages can be reused in different parts of the application.

The sample application does not specifically demonstrate page fragments; instead, as a simpler example, it shows a more standard set of two full pages: browse and edit. Navigating from one to the other is handled by the Controller as is the activity triggered by button clicks—for example, the Next and Previous buttons. Defining page flow is easiest using the diagrammer shown in Figure 4. You first create an ADF Controller file, and then drop View (page) and Control Flow Case (flow) components onto it. You then name all objects so you can refer to them in code later on. As with ADF BC, when you interact with the diagram editor, JDeveloper creates XML code such as the following:

www.nyoug.org 212.978.8890 39

<task-flow-definition id="dept-flow"> <default-activity>deptBrowse</default-activity> <view id="deptBrowse"> <page>/deptBrowse.jspx</page> </view> <view id="deptEdit"> <page>/deptEdit.jspx</page> </view> <control-flow-rule> <from-activity-id>deptBrowse</from-activity-id> <control-flow-case> <from-outcome>toEdit</from-outcome> <to-activity-id>deptEdit</to-activity-id> </control-flow-case> </control-flow-rule> <control-flow-rule> <from-activity-id>deptEdit</from-activity-id> <control-flow-case> <from-outcome>toBrowse</from-outcome> <to-activity-id>deptBrowse</to-activity-id> </control-flow-case> </control-flow-rule> </task-flow-definition> After you set up a JSF page file, you can drop components such as buttons into the page. The button component’s Action property can refer directly to the name of the control flow case. For example, the sample application’s Edit Department button is defined in the JSF page using the following code: <af:commandButton text="Edit Department" id="cb2" action="toEdit"/> When the user clicks this button, the Controller finds the definition of the toEdit action in the task flow file. This code (listed earlier) declares that the flow toEdit defined in the from-outcome tag will load the deptEdit activity (in this case, a JSF page). The Browse Departments button on the edit page reverses this navigation using the toBrowse flow.

Note: With ADF Controller, as well as with ADF BC, you can always write Java code to supplement or replace functionality. However, the more functionality you can define declaratively, the more you will be using the power of these frameworks.

www.nyoug.org 212.978.8890 40

Figure 4. Task Flow Diagram

ADF Faces Rich Client The ADF View layer constructs the user interface. In the case of a web application, the user interface is rendered in a Hypertext Markup Language (HTML) browser. Native HTML items such as text input items, buttons, selection lists, and radio buttons are limited in functionality. JSF defines higher-level items (called “components”) that add functionality to HTML. ADF Faces Rich Client (available in JDeveloper 11g and abbreviated hereafter as “ADF Faces”) is a set of JSF components with “rich” functionality. For example, ADF Faces offers a component called af:table (ADF Faces components are prefixed with “af” denoting the tag library in which they are found) that represents an HTML table in a web browser. Combining af:table with one or more af:column components allows you to define an entire HTML table without writing HTML. Here is a snippet of code for the Employees read-only table in the sample application: <af:table value="#{bindings.EmployeesView3.collectionModel}" var="row" rows="#{bindings.EmployeesView3.rangeSize}" emptyText="#{bindings.EmployeesView3.viewable ? 'No data to display.' : 'Access Denied.'}" fetchSize="#{bindings.EmployeesView3.rangeSize}" rowBandingInterval="0" selectedRowKeys="#{bindings.EmployeesView3.collectionModel.selectedRow}" selectionListener="#{bindings.EmployeesView3.collectionModel.makeCurrent}" rowSelection="single" id="t1" inlineStyle="width:100.0%;"> <af:column sortProperty="EmployeeId" sortable="true" headerText="#{bindings.EmployeesView3.hints.EmployeeId.label}" id="c2"> <af:outputText value="#{row.EmployeeId}" id="ot6"> </af:column> <af:column sortProperty="FirstName" sortable="true" headerText="#{bindings.EmployeesView3.hints.FirstName.label}" id="c5"> <af:outputText value="#{row.FirstName}" id="ot7"/> </af:column> ... </af:table> Notice that, like the Business Services and Controller layer code, ADF Faces is also XML code consisting of elements (“components” in ADF Faces) and attributes (“properties” in ADF Faces). The power of ADF Faces is in the flexibility of

www.nyoug.org 212.978.8890 41

the component properties. In this sample code listing, the value property of af:table connects the table component to a data source (EmployeesView3 in this case—an instance of the EmployeesView view object) and assigns a variable name (called “row”) to each record in the result set of that view object. Nested within the af:table component are two af:column components—representing the EmployeeId and FirstName attributes. Within each column component is an af:outputText (read-only text) component whose value property identifies the table data element within a single record (using the variable “row”) that will be displayed in the HTML table cell. The af:table component is responsible for iterating rows appropriately for the data set.

Note: As discussed more in the next section of this article, the “bindings” reference in the af:table component’s value property points to the page binding, which connects the ADF BC objects to the components on the page.

Although the af:table code in the preceding snippet is functional code (many properties are defaulted and properties with default values are not represented in code) many more properties are available. Figure 5 shows JDeveloper’s Property Inspector (the default property editor for most XML files) displaying the complete set of properties for af:table. (This display spreads across three columns although JDeveloper shows all properties in a single column.) You can zoom in for a closer look at individual property names within this article, but the main point is that this component offers a lot of options for modifying its behavior or appearance. Some properties are data-oriented as just explained but some supply user-friendly features such as the following: rowSelection Setting this property to “single” will allow the user to select a row at runtime (by clicking it). The

selected row can then be processed in a way you define (for example, to display a popup showing more detail). You can also define the ability to select multiple rows.

rowBandingInterval Setting this property to “1” will shade every other row in the table to make rows visually easier to follow across a wide display

filterVisible If you set this property to “true,” the table component will display input fields above each column heading. The user can type a value into one or more of these fields and the displayed rows will be filtered by the entered values.

www.nyoug.org 212.978.8890 42

Figure 5. Property Inspector View of the af:table Properties Declarative AJAX The recent movement to make web applications more interactive has led to acceptance and wide use of Asynchronous JavaScript and XML (AJAX). AJAX (sometimes spelled as “Ajax”) consists of a number of technologies that have existed for some time (such as JavaScript and XML); it allows you to write code that refreshes only part of the page instead of the entire page. This enhances the user experience because the user does not need to wait for the entire page to redraw after clicking a button or link, changing a data value, or interacting with the page in some other way. ADF Faces components are written with embedded AJAX features. For example, in the preceding code listing, the sortable property of the af:column components are set to “true.” This sets up functionality that if the user clicks a column heading the rows displayed will be sorted based on the values in that column. Clicking the same column heading again reverses the sort. As the table is redrawn to display the rows in a different order, the rest of the page stays in place. That is, only the table contents are redrawn. This partial page drawing uses AJAX technology. AJAX is built into the ADF Faces components so you do not need to write any JavaScript or XML code to cause the partial page redraw. However, with a small handful of properties, you can write your own partial page events, again without writing AJAX code. For example, by declaring property values for Price, Quantity, and Line Total fields, you can cause a refresh of the Line Total field when the user changes either Price or Quantity fields. The rest of the page would remain static. Only the value in the Line Total field would change when Prince or Quantity changes.

Note: AJAX within ADF Faces is more properly called “Partial Page Rendering” (PPR), which specifically refers to the capability to define AJAX functionality by just declaring property values.

Visual Editor In addition to the Property Inspector and source code view of the ADF Faces components in JSF file, you can view the components in a visual editor that emulates the component runtime. Figure 6 shows the Departments browse page as it appears in the visual editor.

www.nyoug.org 212.978.8890 43

This tool supports drag-and-drop actions for repositioning components. Changes you make in the visual editor are reflected in the source code just as changes you make in Task Flow Diagram are reflected in the controller source code. As an ADF application developer, you create code in any way that is most efficient and intuitive. For example, it is probably easier to reposition buttons by dragging and dropping them in the visual editor rather than reordering lines of code in the source code editor.

Figure 6. JDeveloper Visual Editor Display of the Departments Browse Page

In addition to the visual editor, you can interact with ADF Faces source code (as well as most other types of code) using the Structure window, shown here:

www.nyoug.org 212.978.8890 44

This window displays the hierarchy of ADF Faces and JSF component tags and allows repositioning them using drag-and-drop operations. In addition, you can select, delete, and copy nodes in this window to change the source code. The right-click menu on any node allows you to add components above, below, or inside that component. Errors and warnings are summarized at the top of this view and double clicking an error will open the source code editor to the problem line of code. Although the sample application displays relatively standard interface components, ADF Faces offers nearly 150 components that you can use to create virtually any user interface you can envision. In addition to simple user input items—for example, text items and pulldowns—ADF Faces also supplies more complex input items such as a date input item with calendar popup, a shuttle control that serves as a multiple selection list, and a full-featured calendar widget. It also provides layout components that allow you to manage the relative positioning of components. In addition, a separate set of ADF Faces components called Data Visualization Tools (DVT) provides highly-interactive, Web 2.0, Flash-aware components such as graph, chart, gauge, hierarchy viewer, Gantt chart, map, and pivot table.

Le superflu, chose très nécessaire.

(The superfluous, a very necessary thing.)

—Voltaire (1694-1778), Le Mondian

ADF Bindings and ADF Data Controls The Model layer in ADF is composed of two aspects—ADF Data Controls and ADF Bindings. These frameworks link the database components written in ADF BC to user interface components (through the management of pages in the Controller layer). Wiring user interface components to database objects is relatively easy with these two technologies. The story of how these ADF Model layer technologies work starts back in the ADF Business Services layer. An ADF BC component, the application module, manages database transactions (COMMIT and ROLLBACK) and defines the data model, a list of view objects and view links that the application uses. The data model is depicted within the Application Module Editor as a hierarchy as shown here:

The Data Model area in this example defines view objects for DepartmentsView with a detail of EmployeesView (the suffix numbers indicate distinct usages of the view objects in the data model) linked through a view link. A master-level

www.nyoug.org 212.978.8890 45

instance of JobsView and LocationsView (used to supply unfiltered data for pulldowns or LOVs) is also part of this data model. This data model is defined completely within the ADF Business Components application module in the Business Services layer. Returning to the Model layer, whenever you create a JSF page or page fragment, the Data Controls panel in the JDeveloper navigator will display the ADF BC application module’s data model nodes as shown in the adjacent screenshot. Additional nodes appear under each view object for attributes (for example, DepartmentId under DepartmentsView1), Operations (that provide actions you can take on the data collection, such as navigating the current record to the Next, Previous, First, or Last record in the set), and Named Criteria (which define which fields will be available for queries using search forms). An almost magical thing occurs when you drag one of these nodes onto a JSF page or page fragment. For example, to build the sample application, the DepartmentsView1 node was dragged from the Data Controls panel and dropped onto the JSF page. The ADF Data Controls framework determines that the node is a collection-level (table-level) item and

displays a menu of applicable components or component combinations as shown here with the Forms menu expanded. In the sample application, selecting ADF Read-only Form caused JDeveloper to create a display containing labeled fields with navigation buttons at the top of the Departments browse page. This drag-and-drop-and-select action interacts with the ADF Data Controls list. If an individual attribute node (such as DepartmentId) is dragged instead, a list of data controls appropriate to a single data value (for example, input text items, output items, and pulldowns) will display instead. In addition to drawing user interface components on the screen, the drag-and-drop operation also creates bindings for those components. Bindings are code or definitions that declare which data from a business service will be connected to a user interface control or structure. Bindings appear in the ADF Faces’ property values. The following example is an ADF

Faces input text component from the Edit Department page: <af:inputText value="#{bindings.DepartmentId.inputValue}" label="#{bindings.DepartmentId.hints.label}" required="#{bindings.DepartmentId.hints.mandatory}" columns="#{bindings.DepartmentId.hints.displayWidth}" maximumLength="#{bindings.DepartmentId.hints.precision}" shortDesc="#{bindings.DepartmentId.hints.tooltip}" id="it1"> </af:inputText> All of this code was created by the Data Controls panel drag-and-drop operation. This is one of the main advantages of the Data Controls panel: it builds all the property values for you and automatically binds the components to data. The property values defined using the “#{ }” delimiters are Expression Language expressions. Expression Language (EL) is a high-level, non-procedural language specified in the JavaServer Pages standards. It is used within JSF pages to refer to potentially dynamic sources of data that will supply property values at runtime. In this case, all EL expressions begin with “bindings,” which is the context for the values. This context refers to a PageDef (Page Definition bindings) file that JDeveloper creates for each JSF page. You can view the bindings in this file using the Bindings viewer for the page as shown here:

www.nyoug.org 212.978.8890 46

If you need to look at or manipulate the bindings code, you click the link next to the Page Definition File label to open the PageDef file—the container for the bindings definitions. The Structure window view of this page is shown next.

You will see an executables section for the queries (iterators) that occur when the page opens. You will also see a bindings section for the objects that refer to view object attributes. By now, you will not be surprised that JDeveloper creates XML code to define bindings; you will rarely need to touch this code. Here is a code snippet from the deptEditPageDef.xml file: <bindings> <attributeValues IterBinding="DepartmentsView1Iterator" id="DepartmentId"> <AttrNames> <Item Value="DepartmentId"/> </AttrNames> </attributeValues> This file is processed by the ADF Bindings framework code and links the attribute, DepartmentId, to the iterator, DepartmentsView1Iterator. That iterator is defined for the DepartmentsView1 view object instance in the data model, and therefore represents a query of data. The EL bindings expressions in the ADF Faces component code point to this communication path and therefore to data. The EL expressions also further drill into a specific property of the ADF BC view attribute; for example, the label property of the example component is defined as "#{bindings.DepartmentId.hints.label},” which refers to the label property of the view attribute (in the hints property category). If no label property is defined, the default label is the attribute name.

www.nyoug.org 212.978.8890 47

Which Languages Are Important? Now that you have sampled some ADF techniques for working with each of the core technologies, you know that JDeveloper creates a lot of application code automatically when you interact with its visual and declarative tools. However, you may still be wondering about which languages you will use when you need to supplement this code. First, remember that ADF was created as a visual and declarative environment to interact with many frameworks. Therefore, a key skill is knowing how to squeeze the most functionality out of the technologies by just defining property values and laying out components visually. The less code you need to write, the less code you need to debug. With the goal of “declarative if at all possible” in mind, you can be quite productive without writing much code. However, you will come to a point where writing code is necessary and you will be using a combination of languages. The following list summarizes the main languages you need to know and how you will use them: XML As you have seen, work with frameworks makes heavy use of XML code. However, you work with most

XML code in JDeveloper using declarative and visual tools. You will rarely need to type XML elements and attributes in these files, but the level of skill you will need at that time is very basic. You mainly need to know three things about XML: elements need ending tags; elements have attributes that refine the element’s use; and elements can be nested within elements to create an element hierarchy.

Java You will write snippets of Java inside ADF BC classes and View layer code to perform customized tasks that the frameworks cannot provide. You can be quite productive in the ADF Fusion Technology Stack with a novice level knowledge of Java if you have someone on your team who understands Java at an expert level. This person can step in to assist if you run into a requirement that cannot be handled with a basic knowledge of Java.

HTML For best use of JSF and ADF Faces concepts, you will avoid writing HTML code. Instead, you use high-level components that generate HTML for you.

Cascading Style Sheets (CSS) ADF Faces components use CSS styles defined in a skin, a set of style selectors that provide a common look-and-feel to all your pages. You will use CSS to define the skin at the start of the first ADF application project, but will not need it much after that because you will apply the same skin to all applications in your organization.

JavaScript ADF Faces components use JavaScript internally to provide user-friendly features such as refreshing part of a page when scrolling to the next set of records. You will usually not need to write JavaScript or AJAX code because the components provide many of the features you would normally need other languages to supply and allow you to declare AJAX functionality using only property values.

Expression Language EL is used to supply dynamic values to JSF components’ properties. The main learning curve for EL is in knowing how to start to build the correct expression. Fortunately, JDeveloper can assist. In the pulldown for most properties is an item for “Expression Builder.” This selection displays a navigator that helps you create properly formatted EL expressions. It is a good learning tool as well as a way to enter proper EL.

Groovy ADF Business Components allow you to write validation and message code using this language. As with EL, Groovy is used at a very basic level and understanding a few fundamentals as explained in the JDeveloper online help system will suffice.

Additional Resources The intention of this article is to get you started thinking about ADF, Fusion, and techniques you will be performing in JDeveloper to build web applications. The main source of all things ADF is the JDeveloper home page on Oracle Technology Network (www.oracle.com/technology/products/jdev/). Follow the links on that page to access tutorials and articles about specific techniques. In addition, the “Learn More About” section at the bottom of that page currently displays a link to information about ADF. Keep your eyes open for the “ADF Fusion Developer’s Guide” (although the title may change a bit over time). It contains a wealth of information and techniques. This guide is available within the JDeveloper help system as well. Speaking of the help system (technically called the “Help Center”), don’t forget to use that resource; especially helpful when learning ADF are the Help Center’s cue cards, which step you through creating a specific type of code. Another extremely useful Oracle resource can help in your ADF learning process: the Rich Enterprise Applications website (rea.oracle.com). This website allows you to preview and learn about Fusion technologies. One more highly

www.nyoug.org 212.978.8890 48

recommended website is the ADF Faces Rich Client Components Hosted Demo (linked on the ADF Faces RC home page). This demo shows all ADF Faces components and allows you to change properties to see how they work.

Le secret d'ennuyer est celui de tout dire.

(The secret of being a bore is to tell everything.)

—Voltaire (1694-1778), Sept Discours en Vers sur l’Homme

Conclusion Admittedly, this is a lot of information but, hopefully, you will have a better idea now about Oracle Fusion and ADF as well as about the basics about each of the core technologies in the ADF Fusion Technology Stack: ADF BC, ADF Controller, ADF Faces RC, and ADF Bindings and ADF Data Controls. This article has shown the type of code you will be creating and the style of development work you will be performing to create that code in each of these technologies. This overview information should help in your understanding of what you need to know to be productive with ADF and Fusion technologies and to start up the on-ramp to the Fusion Development Highway. May that road rise up to meet you!

Il faut cultiver notre jardin.

(Let us cultivate our garden.)

—Voltaire (1694-1778), Candide (Ch. xx)

About the Author Peter Koletzke is a technical director and principal instructor for the Enterprise e-Commerce Solutions practice at Quovera, in Mountain View, California, and has 25 years of industry experience. Peter has presented at various Oracle users group conferences over 250 times and has won awards such as Pinnacle Publishing's Technical Achievement, Oracle Development Tools Users Group (ODTUG) Editor's Choice (twice), ODTUG Best Speaker, ECO/SEOUC Oracle Designer Award, ODTUG Volunteer of the Year, and NYOUG Editor’s Choice (twice). He is an Oracle Certified Master, Oracle ACE Director, and coauthor of the Oracle Press Books: Oracle JDeveloper 11g Handbook (which Peter coauthored with Duncan Mills and Avrom Roy-Faderman), Oracle JDeveloper 10g for Forms & PL/SQL Developers (with Duncan Mills); Oracle JDeveloper 10g Handbook and Oracle9i JDeveloper Handbook (with Dr. Paul Dorsey and Avrom Roy-Faderman); Oracle JDeveloper 3 Handbook, Oracle Developer Advanced Forms and Reports, Oracle Designer Handbook, 2nd Edition, and Oracle Designer/2000 Handbook (all with Dr. Paul Dorsey).

www.nyoug.org 212.978.8890 49

How Innovations in Storage Change Your Oracle Playing Field

Ari Kaplan, Datalink

Abstract Storage is no longer a “black box” to store data. New advances fundamentally change Oracle-based environments. When Oracle databases get large it becomes impossible to backup, recover, provision, and replicate according to business needs without the right architecture and solutions in place. This presentation discusses the trends and new technologies in storage related to Oracle and answers the questions: How can you backup and recover multi-terabyte databases in minutes? How can you provision a new copy of production in minutes while saving 90% storage? How do the features of disk-to-disk backup, virtual tape libraries (VTLs), storage-based replication, RAID-DP, aggregates, de-duplication, and tiered storage architectures change the playing field? What are the different types of snapshots available? Where do RMAN and hot backups work; where do they fall short; how can they be integrated with storage solutions? Organizations house some of their most business critical information in Oracle® databases, making it imperative to have sound backup and recovery processes in place to protect this data. Backing up and recovering Oracle environments can present unique challenges. A variety of technologies are available that—if used correctly—can assist companies in overcoming these challenges. This paper provides an introductory overview of some of the enterprise backup options available for Oracle environments. It is aimed at IT professionals—including database administrators, storage administrators and managers—and anyone who plays a role in architecting and implementing the backup and restore processes of Oracle systems.

Introduction

Database Data is Extremely Valuable Oracle databases generally host mission critical data for business operations. This content is often revenue generating, driven by business applications that span the enterprise and support a wide variety of activities performed by an organization’s customers, employees and other stakeholders. These activities can range from online e-commerce purchases to deposits in a savings account or the tracking of global inventory and shipping. While it is important for companies to have a solid enterprise backup plan in place for all corporate information assets, the requirements are especially stringent for database applications and data. Business requirements demand that organizations be able to recover database environments in both a timely manner and without corrupting or losing more data than the company can withstand to lose. Data loss or the inability to quickly recover data could cause severe financial repercussions for a company and its customers and partners. According to an IOUG survey in 2007: 57% reported database growth has impinged upon available storage resources 60% report that the lack of available storage has impacted the performance of their databases 43% delayed the rollout of an application within the past two years because of lack of storage resources 31% are managing 1TB+ database, up from 13% the previous year And according to a Storage IO / Datalink survey in August 2009: What are customer challenges in their virtual server environment? Backup and recovery – 31.2% indicated this was a challenge Performance issues – 25% indicated this was a challenge

www.nyoug.org 212.978.8890 50

Contention among networks, storage & servers – 18.8% indicated this was a challenge Difficult to manage-12.5% indicated this was a challenge None—31.2% had no challenges All of the above—12.5% The takeaway here is that ~70% have identified that they have issues in their virtual server environment.

DBA Business Challenges These are main challenges with Oracle environments that storage solutions (from Oracle Corporation or third-party) can help resolve:

Backup Issues Scalability: physically copying 2 TB of data to tape or disk is time consuming Cost: expensive to purchase 200 TB of storage to perform physical image backups of 200 TB databases; costly to even purchase hardware to test backups Performance: keeping large databases in hot backup mode negatively affects the performance of high-transaction systems (inserts, updates, deletes) Complexity of systems: multiple databases, interlinked systems, different database versions, RMAN/non-RMAN, RAC, ASM, etc. Manageability: setting up, managing, and testing backups is often difficult

Recovery Issues Manageability: human errors, lost data, inconsistent data, physical failures, and corruption can require restores; recovering to a consistent point-in-time is a manual and daunting process Performance: how do you recover a 2 TB database in 15 minutes?

Disaster Recovery / Replication How do you architect your database and surrounding environment for DR? With no data loss? With a 15-minute failover timeframe?

Data Growth Cost: DBAs tend to put storage on a single class of storage without archiving or tiering considerations Performance: System response time is 5 seconds now—what happens when data triples in size? Manageability: Getting additional storage from non-DBA groups is often a political process

Development and Testing Cost: Purchasing 20 TB of storage to get several image copies of production to test and development is costly Scalability: Providing 5, 10, 15 or more copies to development and testing teams is unrealistic Manageability: Managing the cloning process can take 25% or more of a DBA’s time

Backup Options for Oracle Various Alternatives are Available There are a myriad of enterprise backup and recovery solutions on the market for Oracle-based environments. Oracle Corporation provides several native options such as hot backups, Recovery Manager (RMAN), Data Guard, Export/Import, Data Pump, and Oracle Flashback.

www.nyoug.org 212.978.8890 51

Storage vendors, such as Network Appliance, Hitachi Data Systems and EMC, offer a variety of solutions that can further enhance the Oracle backup and recovery process and in some cases address limitations of Oracle-only solutions . These include triple-mirroring, array-based replication, and snapshots. Beyond this, several hardware and software manufacturers provide technologies that can be integrated within backup solutions to help monitor and manage the backup environment. Also, other products can enhance backup performance, security, and compliance of Oracle data through features such as deduplication and encryption.

Deploying the Right Solution Need to Weigh Several Factors The type of backup and recovery solution that an organization deploys for its Oracle environment should be driven by recovery point objectives (RPOs) and recovery time objectives (RTOs). The RPO describes how much data a company can lose and how frequently the data must be captured in the backup system. For example, some organizations may decide that they can withstand to lose an hour’s worth of transactions or even a day of transactions. Meanwhile, other organizations, such as financial institutions, might determine that they cannot afford to lose any data. For them it is well worth the extra cost and effort required to deploy a solution that captures every transaction and ensures no records are ever lost. RTO describes how long applications or parts of applications can be down while the IT area is recovering or failing over data. In determining RTO, organizations should ask questions such as, “How long could our business wait to recover a specific table?” or “How long could our business wait to recover the entire data center?” Organizations should define RPO and RTO on an application by application basis. For example, when defining RTO, recovering the database for the payroll application may not be as time sensitive as recovering the database that services an e-commerce application. Or, as it relates to RPO, companies likely will not be able to afford losing any information from applications that generate credit card transactions. Meanwhile, they may be able to lose some data from an application where the data is refreshed every evening or where the data could be completely re-created from another source. As an organization defines its RPOs and RTOs, there must be buy-in from upper management so that company-wide standards can be established. Ideally, management sets the business objectives and the IT department can then scope out the cost and effort involved to meet these objectives. There is a spectrum of solutions that meet varying business requirements and differing costs associated with each. It is important to note that the stringency of the RPO and RTO requirements and the cost of the backup and recovery solution to meet these requirements are, for the most part, directly proportional. The good news is that there are now hardware and software technologies available that make these solutions much more cost effective.

Backup in an Oracle Environment Systems Are Complex When backing up an Oracle environment, there are some unique challenges that present themselves. In the “non-Oracle” world, flat files such as Microsoft Word® documents or executables can generally be backed up and recovered on an individual basis. Oracle databases, on the other hand, comprise a complex system of interrelated files. They cannot be backed up in isolation, but rather must be backed up in sets and in the correct order. The challenges in doing this span several fronts.

Unique Challenges Ensuring Data Consistency First, it can be difficult to know which sets of files to back up and recover in consistent groups. For instance, multiple databases and applications often talk to one another. It is essential to know which files are interrelated so that the proper sets of files can be grouped together in a backup. Otherwise, it may not be possible to restore the system to a consistent

www.nyoug.org 212.978.8890 52

state. A simple illustration of this would be a bank customer transferring funds from Account “A” to Account “B.” The sets of files that contain information about the deposit transaction and withdrawal transaction may be located in different areas, but they are interrelated. The bank would need to ensure that both of these files sets were contained in the same backup. Otherwise, if the system crashed and was subsequently restored, it may only reflect the withdrawal to one account and not the deposit to the other (or vice versa). Ensuring data consistency across multiple databases and applications can be challenging, but it’s extremely important. These technologies are common in business applications such as Oracle e-Business Suite, SAP, Siebel, Peoplesoft, and home-grown applications. Another example of this is a customer placing an order for a product. Depending on how the system is architected, the order information could potentially be housed in two different databases (one where the order is placed and one where the order is fulfilled). If there is an outage and the databases are not restored to consistent states, then it may appear in one database as if the order was placed and the other database would not show the order. Data consistency is extremely relevant if an application relies on multiple databases or if data is distributed through Oracle snapshots, Oracle Streams, Oracle multi-master replication, or application-level replication.

The Landscape of RPO and RTO Below outlines the relative RPO and RTO benefits of solutions discussed further in this whitepaper:

Oracle Backup Methods Oracle offers several options for backup and recovery that are native to the software. Depending on which version of Oracle an organization has, these options may vary. Some benefits and drawbacks of the various Oracle backup methods are listed below.

Oracle Hot Backup Mode Oracle’s hot backup mode enables physically copying files to another disk or tape while the database is active. PROS: This method provides recovery without any data loss and can be used with other backup methods as well. For example, performing nightly hot backups enables a company to recover a database to the previous night, then applying all redo log changes (provided these logs were online or mirrored or copied to another system) brings the database back to a consistent point in time, with a strong RPO. Most database administrators are likely familiar with this method. CONS: The process of backing up and recovering entire file systems can be prohibitively lengthy and a challenge for RTO. Recovering a 500G database alone can take several hours, and many times this length for a multi-terabyte environment. System performance degrades during the backup process (10-15 percent on standard systems). Additionally, this method only backs up the database itself and not customized code or non-Oracle systems such as Exchange or SQL. And, if online and archived redo logs are not mirrored or copied to another system, data may be lost in the event of a recovery.

www.nyoug.org 212.978.8890 53

Oracle RMAN Oracle’s RMAN software has an incremental option which copies only database blocks that have changed since the previous backup. This significantly improves backup and recovery times over copying full files with hot or cold backup methods. In fact, many companies see a 10x improvement (or more) for both backup and recovery, depending on the amount of updates that were made between backups. PROS: RMAN enables a company to significantly reduce its RTO. For example, a 30-hour backup of two terabytes might only take three hours using RMAN. In addition, because only changed data is backed up, the backups will require less storage space. This method also backs up in parallel streams/channels. RMAN works well with other Oracle and third-party solutions to provide additional capabilities, such as deduplication and encryption. CONS: RMAN only backs up the database, not non-Oracle systems such as custom applications, executables or scripts. As with the RPO of hot backups, if redo logs are not mirrored or copied to another system, the result may be lost data in the event of a recovery.

Oracle Data Guard Oracle’s Data Guard replicates Oracle databases from one data center to another. It also enables companies to perform backups from the Data Guard’s standby database instead of the production database, improving production performance during the backup window. Data Guard can be used in either a standard physical or logical version. The physical version applies the changes from the Oracle redo logs to the destination target. The logical version meanwhile sends one SQL statement at a time to change the records at the destination target. PROS: Depending on the application, the logical version of Data Guard can provide huge improvements in RTO and replication performance. With the logical standby architecture, one SQL command can instruct the system to change millions of records (versus sending a million record changes in a database block). This can also significantly reduce bandwidth. Additionally, with the logical version it’s not necessary to have identical structures at the primary and secondary sites (as is with the physical version). This makes it possible to use less expensive disk as the replication target. The logical version is most effective in environments where SQL statements are used to update large amounts of data at once. Using Data Guard’s synchronous mode (in the physical version) enables quick RTO with an RPO of no data loss. Because the alternate database is already up and running and in sync with the primary database, data recovery is just a matter of switching over to the alternate site while running in synchronous mode. Data Guard works well in conjunction with array-based replication, but there are bandwidth and physical site distance latency issues with replication if the volume of transactions is significant. CONS: Data Guard’s most robust features are only supported in the Oracle 10g and 11g releases, so it may not provide the required functionality in Oracle 9i or earlier environments. Data Guard only supports Oracle databases (and even then not application and database source code). If running in asynchronous mode, there could be data loss during a failure, hurting the RPO. And data will not be replicated if a table or loading process is in NOLOGGING mode. Additionally, organizations may need to purchase a license for the standby database. The standby database must also be running in order for changes to be applied, which impacts the performance of any other applications running there.

Oracle Export Import Utilities Oracle’s Export/Import utilities generate logical backups of tables. That is, rather than copying the physical blocks of data, the utilities copy the series of commands used to recreate the tables. PROS: Export/Import offers the ability to recover on a table-by-table basis. So if only one table is needed, the backup administrator can selectively import that table versus recovering, for example, a 2 TB database. CONS: The backup and recovery process with this method is fairly lengthy. A recovery of just a few hundred gigabyte table can take several hours to perform. RPO is also an issue as this solution does not provide the ability to recover past the point at which the export started. Consequently, this method is not very feasible for tables that change.

Oracle Data Pump Oracle’s Data Pump provides much of the same functionality as Export/Import has and adds on several new features.

www.nyoug.org 212.978.8890 54

PROS: Like the Export/Import utilities, this method enables recovery on a table-by-table basis. It also uses multiple parallel streams for faster performance. In addition, it provides the ability to suspend and restart data transfer. That way, organizations can pause backups and restart them at their convenience (without having to start over). This might be necessary if an organization needs to add additional storage. Or, perhaps the backup is impacting performance and the organization wants to resume it during off-peak hours. CONS: Like the Export/Import utilities, RPO is an issue as it is not possible to recover past the point in time that the Pump process started. As for RTO, even though Data Pump is typically 15-45 times faster than Export/Import, it still can take a long time to back up and recover compared to snapshot methods (i.e., hours versus seconds).

Oracle Flashback Oracle Flashback Area allows for recovering a table (or database) to any point in time in the past by storing images of data online. PROS: This method provides online backup and recovery, eliminating the need to recover from tape and saving valuable recovery time and management effort. It provides a great improvement in RTO through extremely fast recovery of tables with simple SQL commands. CONS: Flashback requires a significant amount of Flashback Area online, taking up lots of storage. Also the database must be up and active to connect to the Flashback Recovery Area. Although some functionality was introduced in Oracle 9i, the more robust features only work with Oracle 10g and 11g. While Oracle’s Flashback capabilities improve RTO, backups of the database must still occur for meeting RPO and avoiding lost data.

Storage-based Backup Methods for Oracle Environments

‘Undiscovered’ Technologies Many third party hardware and software-based backup and recovery technologies can be applied to Oracle data to improve backup and recovery performance. These technologies are independent of Oracle and typically are also integrated in the backup of non-Oracle applications and files. These solutions are sometimes “undiscovered” in the world of database administrators. They may already be deployed within the backup architecture for non-Oracle applications and files; however, the database staff may be unaware of how the technology is being used or its benefits to the database environment. When architected properly, these solutions can prove extremely valuable and meet some needs that simply cannot be met with Oracle-only technology.

Triple Mirroring Triple mirroring is a backup strategy where the organization copies the data in real time to three sets of redundant disks. One of those three sets is broken off when the database is in hot backup mode and the data is then backed up at a more leisurely pace from the mirror slices. The other two redundant data sets remain in use for production. Once the backup is complete, the third mirror is synced back up with the primary copies. PROS: It’s possible to split the mirror almost instantaneously and back up from the slices, thereby eliminating performance hits of being in hot backup mode for extended periods of time. CONS: It can be expensive to keep a set of production-sized disks. Companies still may not be able to meet backup windows, especially if backing up from the mirror takes more than 24 hours. Because of the amount of disk involved, it is not feasible to maintain multiple recovery points for the data.

Array-based Replication Like its name implies, array-based replication offers replication between two storage arrays—in most cases the devices must be from the same manufacturer. This method sends storage layer blocks to a standby site whenever there is a storage change at the primary site. PROS: Array-based replication solutions fill in the gaps of Data Guard by replicating non-Oracle systems and source code as well as tables in NOLOGGING mode. This works best in conjunction with Data Guard by jointly reducing replication

www.nyoug.org 212.978.8890 55

traffic and reducing or eliminating single points of failure. Regardless of whether or not it’s used with Data Guard, array-based replication will significantly improve RTO and RPO. CONS: There is a cost associated with purchasing and maintaining a third-party array-based replication solution. An ROI analysis may demonstrate that this cost can be justified. The point at which it is recouped will depend on the value of the data.

Snapshots Snapshots, often referred to as point-in-time copies, allow near instant backup and recovery of large data sets using a sophisticated, scalable, and failsafe pointer system of storage blocks. Snapshots represent a read-only view of data taken at a specific point in time. Data and entire environments can be restored to a known stable point prior to the event that caused the disruption or corruption. PROS: Online backup and recovery is fast and seamless and eliminates the need to recover from tape. It is possible to recover a 4 TB database in a matter of minutes and keep hundreds of snapshots (and therefore hundreds of recovery points) online. RTO improves with the more number of snapshots a vendor can manage. The reason is that database recovery is two phases: image recovery of the database files and then apply redo log changes. If a storage vendor can only provide a handful of snapshots, then typically 12-24 hours of redo logs must be applied before starting the database. If a storage vendor can offer dozens or hundreds of snapshots, then snapshots can be taken every 15 minutes, requiring no more than 15 minutes of redo log applies. This greatly reduces the RTO. Some third-party vendors also have management software for scheduling and integrating with Oracle RMAN. CONS: There is a cost associated with purchasing third-party snapshot-based storage products. As with the other storage-based methods, an ROI analysis will demonstrate that this cost can be justified. The point at which it is recouped will depend on the value of the data and the value of the speed of backup or recovery. Also, some vendors use copy-on-write technology for snapshots which may degrade performance of the primary disk system during snapshot activity. This can be addressed with alternate vendors providing “zero performance impact” snapshots. Taking snapshots every hour greatly shortens the recovery time. Not only do you need to recover the image of the database from a snapshot, you also need to replay all redo logs to bring the database to a point-in-time. Taking hourly snapshots – or even every 15 minutes – means you play through only an hour of redo logs versus 24 hours or more with traditional recovery methods.

Below is a diagram of how snapshots work under the covers:

www.nyoug.org 212.978.8890 56

Other Enhancements

Using Cloning to Accelerate Application Development Typically to make five test environments of a 1 TB environment would take days, and require 5x1=5 TB of storage. Now, leveraging “writeable snapshots”, you can clone a database, even a multi-TB database, in minutes. And making it even more enticing – it often times takes up 90% LESS storage. For example, instead of requiring 5 TB of storage for 5 copies, writeable snapshots might only require 500G of storage. It all depends on how much data you update during the test. Stress testing, regression testing take up little space. On the other hand updating all data in a large table can consume almost as much space as the original database. The main benefit is you can quickly reconfigure multiple test, development, QA, DW, auditing, staging environments. This fundamentally accelerates test cycles and helps deliver new applications quickly.

www.nyoug.org 212.978.8890 57

Enterprise Backup Scheduling Software Additional solutions that monitor and manage backup and recovery processes include enterprise scheduling software such as IBM Tivoli,® HP OpenView, CA Unicenter,® and BMC® Patrol .® These solutions help schedule and monitor the end-to-end process of backing up and recovering an enterprise—from the applications to the databases and the rest of the enterprise storage data. By using these solutions, companies can reduce backup and recovery risks and simplify the management of complex and heterogeneous environments that often have dozens or even hundreds of applications. Also of note are GUI solutions from leading storage manufacturers that integrate and schedule snapshot technology with backup environments, integrating with Oracle’s RMAN, ASM, RAC, and more.

Deduplication Deduplication through third-party solutions can dramatically reduce the backup stream size—typically 10-20 times—and bandwidth required for backup. Deduplication works by detecting redundant data patterns during the backup process and saving references to that data (versus actual streams of blocks of data) when duplicate streams are detected. When set up properly, deduplication can work well with multiple Oracle RMAN channels. Deduplication can work for data at rest at the storage layer, within the database (Oracle 11g compression), through virtual tape libraries (VTLs), or across a network. In practice, deduplication can reduce the amount of physical disk storage required for backup and recovery by factors ranging from 10:1 to 30:1, or even greater for sparse or slow changing environments.

www.nyoug.org 212.978.8890 58

Encryption Companies often store unencrypted yet sensitive data on tape. This opens up the company to vulnerabilities—whether it’s somebody stealing the tape or the tape being lost in the warehouse or while being transported off-site. Encrypting sensitive data or the entire backup can quickly and easily solve this issue. Beyond this, encryption of backups may even be a corporate or government mandate for an organization to be in compliancy. Options for encrypting data exist within Oracle and with third party providers as well. It is worthy to note that with any type of encryption strategy, encryption key management is an important consideration. Methods for encryption include:

1. Oracle secure backup–Using Oracle’s Secure Backup method, organizations can encrypt Oracle’s RMAN backups before they are written to tape. Oracle also has capabilities for storing data encrypted within the database and, thus, also encrypted on tape. By encrypting at the database level, an organization reduces much of its exposure. This method is also free (for one direct-attached storage device). A downside to this type of encryption is that there are no auto-destruct or multimaster key management features. Typically, the DBA is solely responsible for retaining the key and there are inherent weaknesses with this strategy.

2. Third-party encryption–Third-party vendors provide encryption at the storage level. Many of these solutions provide multi-master key management that prevents single points-of-failure for key loss. By using a hardware and software appliance, backup streams can be encrypted with negligible impact on performance.

Summary

Backup for Oracle Environment Requires Careful Analysis Many companies use combinations of Oracle-based and third party solutions for better protection and improved performance of their mission critical Oracle data. Because criticality and the interrelated nature of the data will vary, there is no “one size fits all” solution. Organizations should consider several factors—including RPO and RTO of the individual applications, total cost of ownership (TCO), and management complexity—before implementing a solution. Because of the complexities involved in developing a well-designed backup and recovery architecture for an Oracle environment, it is often helpful to work with a company that is knowledgeable about the alternatives available and that has experience in weighing the various benefits and drawbacks of each.

www.nyoug.org 212.978.8890 59

Datalink As a leading information storage architect, Datalink analyzes, designs, implements, and supports information storage infrastructures in a variety of environments. Our capabilities and solutions span storage area networks, networked-attached storage, direct-attached storage, and IP-based storage, using industry-leading hardware, software, and technical services. For several years, Datalink has worked with organizations to help define and implement strategic storage solutions for their information assets, which include a combination of Oracle and non-Oracle data. Our independence allows us to recommend hardware and software technologies that provide the most optimal fit for an organization’s environment and enable them to meet their business initiatives. Datalink has extensive field experience with a wide range of technologies. This, combined with the knowledge we glean from in-depth testing conducted in our interoperability labs, provides us with invaluable insight that we can pass on to our clients as we design and implement storage solutions. For more information, contact Datalink at (800) 448-6314 or visit www.datalink.com .

SIGS, SIGS and more SIGS!

The following Special Interest Groups (SIG) hold meetings throughout the year for the benefit of NYOUG members:

DBA SIG – Database Administration

Data Warehouse SIG – Business Intelligence Web SIG – Web / XML / Java / 9iAS

Long Island SIG – Nassau/Suffolk area - All topics (Sponsored by Quest Software)

704Number of Oracle

Business IntelligencePartners

1Number with Oracle BIPillar and Sun Partner

of the Year status

CORPORATETECHNOLOGIES

Oracle BI Pillar Partner, NortheastSun 2008 Partner of the Year

Find out about our free on-site OBIEE workshop!

http://66.92.85.11/obiee_workshop

www.nyoug.org 212.978.8890 62

How Long is Long Enough? Using Statistics to Determine Optimum Field Length

Suzanne Michelle, NYC Transit

Introduction The database my team and I designed and continue to support and evolve is called “UGOS”, the Unified General Order System. New York City Transit uses it to coordinate all subway work that requires track access. If you’ve ever ridden a n NYC subway and found yourself reading subway service diversion signs, the generation process to create those signs often starts in this database. If you look in a sign’s lower right-hand corner, the identifying numbers there come from UGOS. But don’t think my team and I know anything about how to plan service diversions – it is more like we know how to build the car, but have no idea how to drive it, beyond the fact that a car needs steering, brakes, gears, seats, etc.. The database is an amalgam of five data modules, each its own star schema, four major supporting sub-modules, and various other categorizing code tables (the car drivers don’t really care about the wires and car components until an error occurs). UGOS is built loosely in “3rd Normal Form” rarely in “5th” and has useful hooks / interfaces to various Transit subsystems. It has worked reasonably well for the last 12-odd years, with few serious design changes, except when we found a need to change a supporting module, and this change needed to ripple up. We’re using Forms 6i against a 10g database, delivered to our users via Terminal Services and Citrix platforms, neither of which is “really” supported by Oracle. Yes, UGOS has its design and reporting flaws, but, well, it works. It has too much code in the major data-entry screens (one per major schema), and various view and dependence trace-ability issues, but … we’re working on that: moving code from the forms to the database, improving error trapping and documentation, and experimenting with web-enabled reporting views (via APEX, at this writing), to allow users to extract data however they like. We’ve also been rewriting our current reports to move code to the database. As part of this process, we decided we needed an ‘aggregate view’ of the data, per star, such that we could standardize reporting formats. At the outset, some aspect of this was done, but it was too easy to copy a report, and tweak it for new circumstances (which of course we did) – so we’re trying to move that code back to the database, in as flexible a manner as we can. It is fair to say we understood the data and our reporting needs differently a decade ago. And so this topic emerged: given the data available to the team, how do we determine optimal field length? Can we analyze our users’ data-entry predilections? It need not be a “perfect” analysis, but what will give me reasonable results when I consider various ways to aggregate those star schemas?

Summary What Was Done: Created “Count Views” that summarized relevant parts of star schema data, including lengths of certain field data,

as well as numbers of related records. Analyzed those numbers using SQL to determine best aggregate lengths. Applied the results to the particular problems needing resolution. What Was Learned: Where primary keys are composed of several columns, and any of those columns are themselves foreign keys,

indexing those foreign key parts is crucial for view performance and accurate counts. SQL “with” statements are very useful for simple aggregation. “Count Views” with sufficient columns for selection, are useful in a variety of ways for studying data and creating test

cases for reviewing and testing code, and can be built in simple or complex ways. The details follow. Some of “what was learned” may seem obvious in hindsight, but we all have our “Aha!” moments when the light bulb goes on and the evidence is clear.

www.nyoug.org 212.978.8890 63

GTMerge

SPDATA

DRDATA

GODATA

FormBData

HelpModule

PeopleCodes

TrackCodes

SIGNDATA

WTDATA

OtherSupporting

CodeInformation

BoilerplateCodes

RCC / ATS

DATA

AccountingCodes

General Universe of UGOS DataDR - Diversion Requests

SP - Service PlansGO - General Orders

GT - GO Textual InformationWT - Work Trains

Track, People, and Accounting Codes and the Help Module are supporting sub-modules."Other Supporting Codes" includes things like

Status, Type, Time and Limit Codes. Represents POSSIBLE Path Represents DEFINITE Path

(double-ended arrow, data flows both ways)

Figure 1. General Universe of Data in the Unified General Order System

CPMand otherForce Account Users

MTACapitalConstructionCompany

Sub 'C'GO Writers

OMB(closesthe dataloop withridershipuse vs.

GO effectanalysis

DiversionRequestor

Pool

PSR Data Operations Planning ...

DiversionRequests

ServicePlans

MOW

Work Trains(Linden Shop / CPM)

RTO

ATS

RCC

UGOS Data Flow and Ownership...

Public

...Accounting Data Availability ...

Schedule Supplements

Service Diversion Notices,

MTA Website,Other Service

Announcements

Figure 2. Flow of Data within the UGOS Universe

www.nyoug.org 212.978.8890 64

Details Without going into a long and involved description of our database, Figure 1 contains a bird’s eye view of our database, with its various star-schema modules, and Figure 2 shows how this data moves through the system. Figure 3 is a list of selected tables within one star (Service Plans or “SPs”), all related by the “master” table’s primary key, its “SPID” or “Service Plan ID”. Of interest are the numbers of records in the related child tables, and the lengths of data in some of the child table’s columns. How do we summarize these child table records? While self-writing, convoluted dynamic SQL methods exist to build a “Count View,” for these star schemas, we began with simple.

Selected “SP” Star Tables Table Purpose Summary Needs / Comments SP0SERVICEPLANS Generates Primary Key, certain

identifiers Count, key values (like year, week, status, last modified, in case needed elsewhere)

SP0SIGNATURES Contains / does not contain record to lock a record set against any further change

Count (always 1 or 0)

SP1STATIONAREAS SP2WORKAREAS SPBALTSERVICE

Stations, Work Areas for any diversion; track put in service instead

Count (at least 1, across all record types, usually 2-10 across all 3)

SP3TIMESASSIGNED When diversion runs 2 kinds of count (regular and exception, e.g., service runs except on date X)

SP4DESCRIPTIVETEXT Text describing who does what where, to “run” a diversion

Count, max length per primary key, total length of all text per primary key

SP5ADJUSTMENTS Specially formatted text describing exactly how the service will run, if not running normally

Count, max length of select columns per primary key

SP6WORKSWITHSPS “Sibling” records, SPs that work with other SPs

Count of all related records, Count of all records that start after starting week of PK

SPX_SP_XREF SP_DR_XREFERENCE SP_GO_XREFERENCE

Cross-reference tables for Signs, Original Diversion Requests, General Orders for any SP

Counts of each type

SP_ACONTRACTNOS SP_AFUNCTIONNOS SP_AJOBNOS SP_ARCNOS

Cross-reference tables for Accounting records related to any SP

Counts of each type

Figure 3. Sample of Service Plan-related Tables within the UGOS Universe

To summarize the master and child tables, to one record per primary key (per star), we created views, so that results from all the child tables would be visible at once. While it was initially useful to have these “ALL COUNT” views to be able to analyze for field length, it became apparent that we could mine these counts to find records that met specific cases. This now allows easy testing of various functions and procedures for nulls, for multiple rows, and for how data-specific errors are handled. With the thousands of data records available to us across many years of data, until we created these views and learned how to use them, it was “hit or miss” to find specific cases to test normal data vs. anomalies vs. errors. This may seem obvious in hindsight, especially to folks who work with large data models, but it was not obvious to us when the users began creating and working with the data in the mid-90’s. Over time, after seeing various Oracle

www.nyoug.org 212.978.8890 65

presentations about data warehousing and business intelligence, purpose and aggregation uses became clearer. We began to see what we could do with the various data stars, in a very simple fashion (and we have a very limited budget for buying Oracle or 3rd party tools). Applying Steven Feuerstein’s testing logic, and Paul Dorsey’s “thin client” logic, the results from these views have greatly improved our development team’s ability to accurately test our data. We no longer need to rely solely on user memory to find “good records” or “bad records” to use for testing. But first we needed to create the views. Figure 4 is a snippet of the code needed, with various kinds of fields referenced or created, showing only the tables mentioned in Figure 3 (there are more). Depending on your own table groupings, your list would of course vary. The view fieldnames are in bold. In the UGOS database, naming standards include beginning a field with “c” if it’s any sort of LOV / code table identifier of fixed length or “d” if it’s a date / time field. Create View VW_SPALLCNTS (CSPID, cPlanStatus, SP_TYPE, cYear, cSPWk, dGOStart, dGOStop, CNTSIGS, CNTSTATS, CNTAREAS, CNTTIMES, CNTEXCPS, CNTTEXT, MAXTEXT, CNTCOMS, CNTADJ, CNTWW, CNTWWF, CNTALT, CNTCONT, CNTFUNC, CNTJOBS, CNTRCNS, CNTSIGNS, CNTDRs, CNTGOMS CNTPKG, CNTCANC, CNTWTS) AS select CSPID, cPlanStatus, substr(SP_TYPE(CSPID),1,4), cYear, nvl(cSPWk, (CASE WHEN dGOStart is null THEN '54' ELSE GET_WORK_WEEK(dGOStart) END)), dGOStart, dGOStop, (select count(*) from SP0Signatures Zero where Zero.CSPID = Base.CSPID), (select count(*) from SP1StationAreas One where One.CSPID = Base.CSPID), (select count(*) from SP2WorkAreas Two where Two.CSPID = Base.CSPID), (select count(*) from SP3TimesAssigned Three where Three.CSPID = Base.CSPID and Three.CDATETYPE <> 'AE'), (select count(*) from SP3TimesAssigned Threea where Threea.CSPID = Base.CSPID and Threea.CDATETYPE = 'AE'), (select count(*) from SP4DescriptiveText Four where Four.CSPID = Base.CSPID and Four.CTEXTTYPE <> 'CN'), nvl((select max(length(VNOTETEXT)) from SP4DescriptiveText Foura where Foura.CSPID = Base.CSPID and Foura.CTEXTTYPE <> 'CN'),0), (select count(*) from SP4DescriptiveText Fourb where Fourb.CSPID = Base.CSPID and Fourb.CTEXTTYPE = 'CN'), (select count(*) from SP5Adjustments Five where Five.CSPID = Base.CSPID), (select count(*) from SP6WorksWithSPs Six where Six.CSPID = Base.CSPID), (select count(*) from SP6WorksWithSPs Sixb where Sixb.CSPID = Base.CSPID and (select Reld.CSPWK from SP0ServicePlans Reld where Reld.cSPID = Sixb.cSPIDRelated) > Base.CSPWK), (select count(*) from SPBAltService Sixa where Sixa.CSPID = Base.CSPID), (select count(*) from SP_AContractNos Eight where Eight.CSPID = Base.CSPID), (select count(*) from SP_AFunctionNos Nine where Nine.CSPID = Base.CSPID), (select count(*) from SP_AJobNos Ten where Ten.CSPID = Base.CSPID), (select count(*) from SP_ARCNos Eleven where Eleven.CSPID = Base.CSPID), (select count(*) from SPX_SP_XREF Thirteen where Thirteen.CSPID = Base.CSPID), (select count(*) from SP_DR_XREFERENCE Sixteen where Sixteen.CSPID = Base.CSPID), (select count(*) from SP_GO_XREFERENCE Seventeen where Seventeen.CSPID = Base.CSPID), (select count(*) from UTGOMTGDETAILS Nineteen where Nineteen.CSPID = Base.CSPID), (select count(*) from UTGOCANCELLATIONS Twenty where Twenty.CSPID = Base.CSPID), (select count(*) from WT0WORKTRAINS TwentyOne where TwentyOne.cSPID = BASE.CSPID) from SP0SERVICEPLANS Base;

Figure 4. Snippet of Summary View Code For every table set we summarized in this fashion, in all the sub-selects to any desired child tables, where the child’s PK included the master parent’s PK as a foreign key, we had to be sure all the foreign keys were indexed. We had been able to get away without these FK child indexes for so long, because the total number of records we are dealing with is in the tens of thousands – this is a planning and decision support database after all. Had it been an OLTP system, for example,

www.nyoug.org 212.978.8890 66

we would have needed these indexes from day one. In most cases these indexes were already there, but in some cases not. Next, we used “WITH” statements (and Oracle analytic functions) to aggregate and summarize data, as needed for whatever the task at hand is. Figure 5 shows one code analysis sample and three results (the hows / whys of aggregation are not discussed herein). We will look more closely at the reasoning behind the query and what the results reveal.

with data as (select cyear, cspid, cntadj, (case when cntadj >50 then '10 = 51+' when cntadj >45 then '09 = 46-50' when cntadj >40 then '08 = 41-45' when cntadj >35 then '07 = 36-40' when cntadj >30 then '06 = 31-35' when cntadj >25 then '05 = 26-30' when cntadj >20 then '04 = 21-35' when cntadj >15 then '03 = 16-20' when cntadj >10 then '02 = 11-15' when cntadj >5 then '01 = 6-10' when cntadj >0 then '00 = 1- 5' else '00 = 0' end) as Catg from vw_SPALLCNTS) select catg as grp, count(cspid) as cnt_sp, sum(cntadj) as sum_adj

from data group by catg order by catg;

GRP CNT_SP SUM_ADJ ---------- --------- --------- 00 = 0 6213 0 00 = 1- 5 34534 77248 01 = 6-10 3143 22769 02 = 11-15 500 6218 03 = 16-20 270 4685 04 = 21-35 14 323 05 = 26-30 14 395 06 = 31-35 5 167 07 = 36-40 1 39 9 rows selected.

GRP CNT_SP SUM_TEXT ---------- --------- --------- 00 = 0 43 0 00 = 1- 5 39006 103382 01 = 6-10 4507 32247 02 = 11-15 670 8300 03 = 16-20 235 4167 04 = 21-35 149 3576 05 = 26-30 4 117 06 = 31-35 5 158 07 = 36-40 9 341 08 = 41-45 65 2789 09 = 46-50 1 46 11 rows selected.

GRP CNT_SP SUM_WW ---------- --------- --------- 00 = 0 15393 0 00 = 1- 5 12935 38206 01 = 6-10 8914 68775 02 = 11-15 3783 47925 03 = 16-20 1728 30712 04 = 21-35 737 16757 05 = 26-30 334 9284 06 = 31-35 182 5938 07 = 36-40 115 4361 08 = 41-45 83 3554 09 = 46-50 55 2634 10 = 51+ 435 62367 12 rows selected.

(A) Issued using “CNTADJ” (T) Issued using “CNTTEXT” (W) Issued using “CNTWW”

Figure 5. Selection from the SP Summary View, Using “WITH” Statement, Including Data Results

The data results in Figure 5 show very different patterns of usage. For the (A) Adjustments-to-Normal-Service group, the bulk of the data has between 1 and 5 adjustment records, and

nearly all have less than 10 child records per primary key. For the (T) Text records group, the bulk of records also use between 1 and 5 text records to describe day-of-service

operation duties, and again, nearly all have less than 10 such records. There are, however, a few very large records (66) that have more than 40 descriptive text records. It turns out, the more text needed to describe service operation duties, the more significant the diversion, and the more complicated the actual service operation. Remember when the 2/3 and 4/5 trains had unusual weekend routes, to accommodate building / testing the new South Ferry Station?

For the (W) “Works With” records, the summary presents other interesting observations: most plans work with 20 or fewer other operations. But SOME plans (435) coordinate service with more than 50 other operations over the course of its life (these tend to be year-long “master” plans upon which all others are based – it takes a while to “work with” that many plans). Data inspection shows that a few of those 435 SPs coordinate with hundreds of other operations.

www.nyoug.org 212.978.8890 67

And now, you may ask, why do we care? Suppose you want a reporting view of your data, with one record per primary key that represents as MUCH of all those child tables as can reasonably fit in a given column? Remember, that to the user entering this information, it is one record. In the user’s mind, the times a diversion is to operate, and the area in which a diversion is to operate, and the work required to perform that operation (setup / run / return to normal) are all part-and-parcel the same, indivisible operation. The location means nothing without the out-of-service times. The time means relatively little if the operation doesn’t affect a specific track area. And if there are no adjustments or operation descriptions, it can’t be that important. So we designed these kinds of views (one per star) and experimented with population methods, and are currently working to web-enable and auto-generate the data. While the design and workings of these ‘aggregate’ views (AV) are another story, how we determined the length of the AV’s underlying column has relevance. After examining all the types of count data available to us, we settled on 1000 characters. It is overkill for things like accounting data (very few SPs have more than 4 accounting codes of any type associated with them), just right for most work area and station listing data, and not so good for all that text (only four 250-character fields will concatenate into a single 1000-char column). For the Works With records, we faced a special challenge: we could only concatenate “N” number of records without reaching column-length (1000) / test variable length limits (4000). To resolve this, we settled on “the first 35”, which is fine for 98% of the records, as is evident by the chart. For the remaining 2%, we included a message “N records exist, most recent 35 shown” – so that a user who somehow selects these monsters, for whatever reason, would have at least some useful data. For the statistics buffs, you’re looking at the data mean plus / minus more than 2 standard deviations (95% of data accounted for) – where ±3 standard deviations accounts for 99% of data being examined. Initially, to determine the best length for our aggregating column, we analyzed each kind of data being summarized per star group – we wanted the best fit across all the data types, because we wanted to use the same table, with one “flat” record per type of child data per primary key. As we refined the programmatic code that would do the aggregating, we paid attention to the individual differences, such as the 2% of “works with” records that “won’t fit” in the 1000-char field. In the code cases displayed in Figure 5, we simply said, count everything in clumps of 5 records (the greater than / less than parts of the case statement in the WITH statement). “How you clump” your data together will determine the usefulness of your results. The “WITH” statement CASE clause is crucial to how the data is broken up and counted. The highest category has to come first, and then, each next lower category. The sum of the related records per primary key will sort itself to be counted in the proper bin. PK1 with 30 text records will fall into group 5, while PK2 with 10 text records will drop into bin 1. By carefully arranging the order of the categories, the data sift themselves, largest bin first. Before our team member, Irina Yevtukhova, discovered the WITH statement, this analysis had been done in Excel. A properly constructed WITH command is a lot faster and easier. Because the statement allows an inner and outer group / sort, this can be put to work – the inner aggregation is large-bin first, but the outer statement displays small bin first. This is very useful as we create aggregating functions per field type across all the stars – we can find the records that will cause errors before the users stumble on them. The Count Views turn out to have additional benefits besides having helped to determine how much room was needed to aggregate child records into one column per record type, for a reporting view. We can use them to precisely select test data to cover every case that needs testing (in general, “no records”, an “average” amount or a “break the field” amount of records). A sample test script can be found in Figure 6. PROMPT Get_Acctg Data Sample ... Get_Acctg Data Sample ... column Useful Format a100 word_wrapped with data as (select cSPID from vw_SPALLCNTS where (cntCont+CntFunc+CntJobs+CntRCNs) >3 and cYear >'2006') select cspid||': '||substr(get_Accounting(cSPID,'A'),1,100) as useful from data where rownum <4; USEFUL -------------------------------------------------------------------------------------------------- 2007IRT8077: CNs:C-33293 JNs:15705 RNs:2832 FNs:500 2007IRT9437: CNs:S-32728 JNs:15693 RNs:2822 FNs:510 2007IND1000: CNs:P-36279 JNs:15694 RNs:2837 FNs:770

www.nyoug.org 212.978.8890 68

with data as (select cSPID from vw_SPALLCNTS where (cntCont+CntFunc+CntJobs+CntRCNs) =0 and cYear >'2006') select cspid||': '||substr(get_Accounting(cSPID,'A'),1,100) as useful from data where rownum <4; USEFUL -------------------------------------------------------------------------------------------------- 2007IND7996: No Acctg Data 2007BMT7998: No Acctg Data 2007IND8136: No Acctg Data with data as (select cSPID from vw_SPALLCNTS where (cntCont+CntFunc+CntJobs+CntRCNs) >3 and cYear >'2006') select cspid||': C/'||substr(get_Accounting(cSPID,'C'),1,100)|| ' J/'||substr(get_Accounting(cSPID,'J'),1,100) as useful from data where rownum <4; USEFUL -------------------------------------------------------------------------------------------------- 2007IRT8077: C/CNs:C-33293 J/JNs:15705 2007IRT9437: C/CNs:S-32728 J/JNs:15693 2007IND1000: C/CNs:P-36279 J/JNs:15694 with data as (select cSPID from vw_SPALLCNTS where (cntCont+CntJobs) =0 and cYear >'2006') select cspid||': C/'||substr(get_Accounting(cSPID,'C'),1,100)|| ' J/'||substr(get_Accounting(cSPID,'J'),1,100) as useful from data where rownum <4; USEFUL -------------------------------------------------------------------------------------------------- 2007IND7996: C/No Contract Nos J/No Job Nos 2007BMT7998: C/No Contract Nos J/No Job Nos 2007IND8136: C/No Contract Nos J/No Job Nos Set heading off PROMPT Show error messages via bad selects ... Show error messages via bad selects ... select GET_ACCOUNTING('2007IRT2524','P') as useful from dual; Bad Type Entered select GET_ACCOUNTING('2007I24','A') as useful from dual; Bad Key Length

Figure 6. Sample Test Scripts, Using “WITH” Statement, Including Data Results As you can see, this testing is not elaborate or refined, and there are certainly other ways to do it. But by including these simple selects wherever possible, we can quickly and easily see which code does not succeed. When we finally run our scripts in production, they are error free, and the installation log reflects this.

Conclusion Creating simple data views that count the related data elements in a star schema can be a useful tool for analyzing data in that schema. The count information (or length information) can be easily summarized using the SQL WITH statement, to get a quick picture of how data is being created within the schema or database as a whole. Additionally, these “Count Views” can be used as source material for writing easy-to-implement error-checking cases. If you choose to implement view creation in this manner, take time to review your indexing strategies for optimal performance, but at the very least, all foreign keys should be indexed.

www.nyoug.org 212.978.8890 69

Migrating Database Character Sets to Unicode

Yan Li

INTRODUCTION As companies internationalize their operations and expand services to customers all over the world, they often need to support more languages than what is available within their existing database character set. Many companies today are finding Unicode to be essential to support their global businesses. What is Unicode? How do you choose the right one for the corresponding database character set? How do you install and use the Character Set Scanner? What are the migration methods? How do you resolve data dictionary and application issues? What is the meaning of “changeless”, “lossy” data, “truncation” and “convertible”? How should they be resolved? This paper uses WE8ISO8859P1 to AL32UTF8 migration as an example to answer the above questions and to provide step by step implementation approaches. ASSUMPTIONS Database Version: 10g and up Source NLS character set: WE8ISO8859P1 Target NLS Character Set: AL32UTF8 Character Set Scope: NLS_CHARACTERSET (database level) Note: The national character set NLS_NCHAR_CHARACTERSET is always in Unicode and can be used for NCHAR, NVARCHAR2, and NCLOB columns, which is out of the scope of this paper.

STANDARD UNICODE AND ORACLE CHARACTER SETS Unicode is a universal encoded character set that allows you to store information in ANY language. Unicode provides a unique code point for each character, regardless of the platform, program, or language. Unicode version 5.1.0 contains over 100,000 characters. Major Unicode Encoding Formats UTF-8 8-bit variable-width, 1-4 bytes per character UCS-2 16-bit fixed-width, 2 bytes per character

Supports standard Unicode to version 3.0 UTF-16 16-bit fixed –width, 2 or 4 bytes per character

An extension of UCS-2; supports new standard Unicode versions There are some less popular formats such as UTF-4, UTF-7, and UTF-32 Among the above Unicode formats, Oracle databases mainly use two formats: UTF-8 and UTF-16. The corresponding Unicode character sets for Oracle are: UNICODE ORACLE Character Set UTF-8 UTF8 and AL32UTF8 (mainly used for NLS_CHARACTERSET) UTF-16 AL16UTF16 (mainly used for NLS_NCHAR_CHARACTERSET)

www.nyoug.org 212.978.8890 70

Note: Standard Unicode has a dash (e.g. UTF-8) while Oracle character sets do not. Oracle’s Unicode Character Sets UTF8 is an 8-bit variable-width encoding, with each character having 1 to 3 bytes, supporting standard Unicode up to version 3.0 only. AL32UTF8 is an 8-bit variable-width encoding, with each character having 1 to 4 bytes, and will continue supporting future Unicode standards. AL32UTF8 is appropriate for XMLType data. It seems AL32UTF8 is the best Unicode choice. However, AL32UTF8 is not recognized by pre-9i clients and server systems. If you still have 8i servers and clients connecting to the 9i/10g system, then you must use UTF8 until you can upgrade the older versions to 9i or higher. AL16UTF16 is a 16-bit fixed-width encoding, with each character having 2 or 4 bytes. It is NOT used as a normal database character set, but rather for a national character set. AL16UTF16 will also continue supporting future Unicode standards. Most Western European and North American languages use US7ASCII or WE8 series character sets in Oracle such as WE8ISO8859P1, WE8ISO8859P15, WE8MSWIN1252, etc. The migration steps are similar, but you should refer to the source character set related documentations to resolve specific problems. This paper uses WE8ISO8859P1 as an example to discuss the migration to Unicode AL32UTF8. Note: In 11g, if you don't specify the character set for the “CREATE DATABASE” command, the default will be AL32UTF8, not US7ASCII.

Oracle Database Character Sets and Corresponding Unicode Versions: ORACLE Character Set

RDBMS Version

UNICODE format

UNICODE version Bit Byte / Note

AL24UTFFSS 7.2 - 8.1 UTF-8 1.1 8 (obsolete)

UTFE 8.0 - 8.16 UTF-8 2.1 8 UTF8 for EBCDIC platform

8.1.7 - 11g 3.0 UTF8 8.0 - 8.1.6 UTF-8 2.1 8 variable-width 1-3 bytes 8.1.7 - 11g 3.0 AL32UTF8 9.0 UTF-8 3.0 8 variable-width 1-4 bytes 9.2 3.1 10.1 3.2 10.2 4.0 11.1 5.0 AL16UTF16 9.0 UTF-16 3.0 16 fixed-width, 2 or 4 bytes 9.2 3.1 10.1 3.2 10.2 4.0 11.1 5.0

MIGRATION STEPS OUTLINE Based on a given character set and data, migration to Unicode can be simple or complicated. It requires the proper strategy and tools. Not fully understanding the issues and the suitable converting steps can lead to problematic outcomes. The complexity of migration approaches depends on how “clean” the source data is. If the data is relatively clean, some steps are not necessary. The major steps can be outlined as the follows:

1. Pre-check the current database environment, including the current character set, database size, init parameters, etc.

www.nyoug.org 212.978.8890 71

2. Install character set scanner (CSSCAN) and run a sample scan from source database to target Unicode database, and analyze the output.

3. If CSSCAN results in all “changeless” in data dictionary and application, then run CSALTER script to convert the database to Unicode. No more steps.

4. If CSSCAN output has “convertible” data but no other exceptions, then running export/import will resolve it. There are two methods to do so: full export/import to pre-created Unicode database or partial export/import with CSALTER.

5. If partial export/import with CSALTER method will be used and CSSCAN results in data dictionary exceptions, then data dictionary issues must be handled.

6. If “lossy” data resulted from CSSCAN, they must be handled before export/import. “Lossy” data may be resolved by first convert to its “strict” superset using CSALTER.

7. If “truncation” resulted from CSSCAN, they also must be handled before export/import. “Truncation” can be resolved by changing nls_length_semantics from BYTE to CHAR on either database level or column level.

CSALTER is not required to convert a non-Unicode database to Unicode. If the database is small, you may create a new Unicode database and do a full export/import from the source to the target Unicode database. However, if the database is large, a full export/import may be very time consuming, and you may consider using CSALTER with a partial export/import. If a large portion of data needs to be converted anyway, you may still use a full export/import because in that case the partial method may not reduce much of the downtime. Normally, convertible data can be handled as the last step. You must address lossy and truncation issues earlier regardless of using the full or partial method. If the full export/import method is chosen, data dictionary issues from the source database can be ignored. This paper will discuss the migration steps based on the implementation order best used for our example.

www.nyoug.org 212.978.8890 72

All Changeless?

Has convertible but no truncation/lossy?

has lossy data?

has truncation?

Has convertible?

Run CSALTER to Unicode

Has dictionary issue data?

Fix data dictionary

Partial exp/imp with CSALTER

CSALTER to “strict” superset

Full exp/imp to Pre-installed Unicode

db

Install and run CSSCAN

END

Change BYTE to CHAR

Use full export/import?

Y

NY

N

Y

Y

N

Y

N

N

Y

N

N

Unicode Migration Flowchart

(Summary)

www.nyoug.org 212.978.8890 73

All Changeless?

Has convertible but no truncation/lossy?

Has lossy data?

Has a strict superset?

Run CSSCAN to the strict superset

Run CSALTER to the strict superset

Has truncation?

Has byte-based product?

Data exceed maximum bytes?

Change columns byte to char

Has PL/SQL dependency?

Shorten data or change to CLOB

Has convertible?


Has data dictionary issue?

Fix data dictionary

Export convertible tables

Truncate convertible tables

CSSCAN to Unicode


Import convertible tables

Fix lossy manually

Pre-install Unicode DB

Export full from source DB

Import full to Unicode DB

END

Install and run CSSCAN

Validate data

Change to char length semantics at db level

Change code explicitly or implicitly

Use full export/import?

YGo to A

Go To B

A

B

N Y

NY

YN

YN

NY

Y

N

N

NY

Y

Y

YN

N

Unicode Migration Flowchart (Detail)

www.nyoug.org 212.978.8890 74

PRE-MIGRATION PREPARATIONS

1.1 Gather Information about the Current Database Environment: Before starting migration, current database information should be gathered and analyzed, which including current character set, database size, installed Oracle products, init parameters such as nls_length_semantics, job_queue_processes, aq_tm_processes, etc.

1.2 Check If There Are Any Non-ASCII Object Names Next, check if any existing object, username or password has used non-ASCII characters. The following query produces results for non-ASCII object names. You must rename them if any exist. SELECT object_name FROM dba_objects WHERE object_name <> convert(object_name,'US7ASCII');

Similar queries can be used to check usernames and passwords stored in the database.

1.3 Clean Up Unused or Unwanted Objects The migration tasks and duration depend on the amount of data that needs to be converted. It is better to cleanup the schemas, users, tables, columns, and any other objects which are no longer needed.

2. CHARACTER SET SCANNER - CSSCAN UTILITY No matter which migration method is used, it is essential to pre-scan the source data in order to assure successful migration. The CSSCAN utility, also known as the Character Set Scanner, is a very useful tool for character set migration. Before changing the NLS_CHARACTERSET, it's mandatory to use CSSCAN. During the migrating process, CSSCAN will be run quite a few times at different stages.

2.1 Install CSSCAN To install CSSCAN, run the script in $ORACLE_HOME/rdbms/admin/csminst.sql as sysdba.

Note: You should modify the script to change tablespace name, otherwise, SYSTEM tablespace will be used.

This script will create a schema owner CSMIG and its objects such as tables (CSM$...) and views (CSMV$...). Each time CSSCAN is run, the objects will be replaced. Therefore, only one set of scan result is kept in the database. 2.2 Running CSSCAN Running CSSCAN is similar to running other Oracle utilities. Use the following to find command line options:

CSSCAN HELP=Y You can scan at the full database level, owner level or table level. You can also put your options into a parfile, just like using other Oracle utilities. However, before running the CSALTER script (to be discussed later) you must run a FULL scan. Always run CSSCAN using “sys as sysdba”, not “system” or “csmig”, etc. After installing CSSCAN, you may want to run a sample scan from your current character set to your target Unicode to get a taste of the utility and your existing data. Consider the following as an example to explain CSSCAN output; do not use this scan result for CSALTER at this stage.

csscan FULL=Y FROMCHAR=WE8ISO8859P1 TOCHAR=AL32UTF8

www.nyoug.org 212.978.8890 75

Note: Be cautious when using option “CAPTURE=Y”, especially when “FULL=Y”. It will put a large amount of data into database and onto OS (the output files). This option captures every “convertible” row which is usually not necessary. (The default is “CAPTURE=N”) 2.3 Understanding the CSSCAN Output Running CSSCAN produces three output files with the extensions txt, err and out. If no file name is given, the default name is “scan”.

scan.txt file - summary of exceptions scan.err file - detailed exceptions with ROWIDs scan.out file - log of the scan process (e.g., ORA- error occurred during CSSCAN) The following is a portion of the scan.txt file: [Database Size] Tablespace Used Free Total Expansion ---------- ------------ --------------- --------------- --------------- SYSTEM 12,586.25M 5,413.75M 18,000.00M 343.00K TSABCD 9,807.38M 2,192.63M 12,000.00M 13.38M TSXXX 267.44M 1,732.56M 2,000.00M .00K TSYYY 64.00K 1,999.94M 2,000.00M .00K TSZZZ 7,285.88M 714.13M 8,000.00M .00K TSINDEX 640.06M 1,359.94M 2,000.00M .00K TSUNDO1 2,141.13M 93,858.88M 96,000.00M .00K TSUNDO2 24,519.00M 35,481.00M 60,000.00M .00K … … ---------- ------------ --------------- --------------- --------------- Total 3,581,509.00M 586,521.98M 4,168,030.98M 586.55M [Data Dictionary Conversion Summary] Datatype Changeless Convertible Truncation Lossy ----------------------------------------------------------------------- VARCHAR2 136,913,559 10 0 154 CHAR 7 0 0 0 LONG 6,614,799 0 0 0 CLOB 58 71 0 0 ----------------------------------------------------------------------- Total 143,528,423 81 0 154 Total in percentage 100% 0% 0% 0% [Application Data Conversion Summary] Datatype Changeless Convertible Truncation Lossy ----------------------------------------------------------------------- VARCHAR2 189,317,244,480 113,000 14,326 7,455,330 CHAR 977,410,022 0 0 0 LONG 36,450,616 0 0 0 CLOB 585,086,356 100,702 0 523 ----------------------------------------------------------------------- Total 190,916,191,474 213,702 14,326 7,455,853 Total in percentage 100% 0% 0% 0% The database size section helps to estimate the expansion in size after migration. The conversion summary section gives the total number and exception types of the data dictionary and the application data. The summary also tells you how

www.nyoug.org 212.978.8890 76

many tables must be converted in your database. This information can help you to choose a migration method that minimizes the downtime. The scan output can also be queried from CSMIG owned tables and views. 2.3.1 CSSCAN Results in Four Types of Data 1. Changeless - data that does not change binary representation when converted from the source character set to the target. 2. Lossy Conversion (Data loss) – data for which there does not exist a conversion path to the target character set or the round-trip conversion gives data different to the original. 3. Truncation – data resulting from conversion that does not fit within the column’s maximum length. Truncations are also convertibles. 4. Convertible - data for which there exists a conversion path from the source character set to the target, but the binary representation will be different.

Note: Before using CSALTER, the CSSCAN output must be: Changeless for all CHAR, VARCHAR2, and LONG data (Data Dictionary and user) Changeless for all USER CLOB Convertible and changeless for all Data Dictionary CLOB

3. HANDLING LOSSY DATA

3.1 What Is “Lossy Conversion” or “Data Loss”? Lossy data is not a valid code point for the source (most likely) character set or the target character set. For instance, the Euro symbol “€” does not have a binary code in the WE8ISO8859P1 character set. If no action is taken, the “lossy” data will be “lost” in the conversion. Every "lossy" character will be converted to the same "default replacement character", usually as a question mark "?" or an inverted question mark "¿". Note: If you have lossy data, never export/import to a database with a different character set. Once you have performed a conversion, there is no way to "recover" the original data. This can only be done with CSALTER.

A common cause of the “lossy” data is the incorrect NLS_LANG client setting. In an environment using English or Western European languages, Windows clients’ NLS_LANG might have been set to WE8ISO8859P1. This is NOT correct. As a result, the actual WE8MSWIN1252 codes are stored in the WE8ISO8859P1 database. Up to Oracle 8i, the default NLS_LANG of a Windows client installation was WE8ISO8895P1, which is in fact incorrect, as the correct value is WE8MSWIN1252. This should be corrected as soon as possible and must be addressed before converting to Unicode. It is important to set NLS_LANG to its actual client’s operating system environment, not necessarily the database character set. For example, the following registry value can be checked to determine Windows ACP setting:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePages\ACP If the value is 1252, NLS_LANG should be set to WE8MSWIN1252.

3.2 Use CSSCAN to Find “Lossy” Data in Current Character Set If there is data that is NOT defined in the source character set, running CSSCAN with both the parameter FROMCHAR and TOCHAR defined to the current character set will result in “lossy” data. csscan FULL=Y FROMCHAR=WE8ISO8859P1 TOCHAR=WE8ISO8859P1

www.nyoug.org 212.978.8890 77

The following is a sample of “lossy” data in the scan.err output:

ROWID Exception Type Size Cell Data(first 30 bytes) ------------------ ------------------ ----- ------------------------------ AACQtnAC3AAAp2XAAF lossy conversion XT¬Q QVRLTQ AACQtnAC3AAAqHsAAG lossy conversion VO¬C KTLUOC 3.3 Can “Lossy” Data Be Fixed? How? Under What Conditions? Most lossy data can be resolved by converting the character set to its “strict” superset. The target character set is a “strict” superset if and only if each and every character in the source character set is available in the target character set, with the same corresponding character value. For instance, all characters included in WE8ISO8859P1 are defined in WE8MSWIN1252 with the same code points (same binary values), therefore, WE8MSWIN1252 is a "strict" superset of WE8ISO8859P1 (as well as US7ASCII).

Note: You can create a database on UNIX with a "Windows" character set like WE8MSWIN1252. Oracle does not depend on the OS character set for the database (or national) character set. Character set Strict Superset US7ASCII: WE8ISO8859P1, WE8ISO8859P15, WE8MSWIN1252, etc. WE8ISO8859P1: WE8MSWIN1252 UTF8: AL32UTF8

To determine whether or not converting to the “strict” superset WE8MSWIN1252 can resolve the “lossy” data of the current WE8ISO8859P1 database, run the following CSSCAN: csscan FULL=Y FROMCHAR=WE8MSWIN1252 TOCHAR=WE8MSWIN1252

If the following 3 lines are generated in the summary output file (scan.txt), then running CSALTER can resolve the “lossy” issue:

All character type data in the data dictionary remain the same in the new character set. All character type application data remain the same in the new character set. The data dictionary can be safely migrated using the CSALTER script. 3.4 Converting WE8ISO8859P1 to WE8MSWIN1252 Using CSALTER Now it is time to get rid of the “lossy” data. Make sure your last CSSCAN’s FROMCHAR and TOCHAR are both WE8MSWIN1252. CSALTER is based on your last CSSCAN’s results.

Note: Although we are converting from WE8ISO8859P1 to WE8MSWIN1252, the CSSCAN’s FROMCHAR and TOCHAR are both WE8MSWIN1252. Otherwise, the 3 clean lines won’t be generated in order to use CSALTER. CSALTER checks the TOCHAR of the last CSSCAN; does not consider the “FROMCHAR’.

WE8MSWIN1252

WE8ISO8859P1

US7ASCII

www.nyoug.org 212.978.8890 78

CSALTER script (csalter.plb) is a PL/SQL block located in $ORACLE_HOME/rdbms/admin. This script comes with 10g. Before 10g, to change the database character set, you must use “ALTER DATABASE CHARACTER SET”. Before running CSALTER, be sure to backup the database, shutdown listener and any applications connected to the database, purge dba_recyclebin, and then run CSALTER as sysdba:

SELECT * from nls_database_parameters WHERE parameter like '%CHARACTERSET%'; shutdown immediate startup restrict SPOOL To_WE8MSWIN1252.log @$ORACLE_HOME/rdbms/admin/csalter.plb SPOOL OFF -- change back job_queue_processes, etc. to original values shutdown startup SELECT * from nls_database_parameters WHERE parameter like '%CHARACTERSET%';

Your database character set is now WE8MSWIN1252 and the lossy data should be gone.

Note: The inverse operation WE8MSWIN1252 to WE8ISO8859P1 (or WE8MSWIN1252 to US7ASCII) is normally NOT possible without losing data. 4. HANDLING TRUNCATION DATA Once we’ve eliminated “lossy” data, we can continue to the next step. Do a CSSCAN from the current character set (for our example, WE8MSWIN1252) to the chosen Unicode (e.g. AL32UTF8): csscan FULL=Y FROMCHAR=WE8MSWIN1252 TOCHAR=AL32UTF8

Most likely you will get “truncation” and “convertible” data now.

4.1 What Is Truncation? Typically, a column’s maximum length is expressed in bytes. A column defined as VARCHAR(1) will hold one byte of binary code. This is sufficient for storing one character in any single byte encoding form but it may be insufficient for storing a multi-byte character. In Unicode, a single character may take anywhere from 1 to 4 bytes of storage. Using the Euro symbol “€” as an example, in our now WE8MSWIN1252 database, it is 1 byte, while in AL32UTF8, it will use 3 bytes. Suppose you have a table named CURRENCY with a column SYMBOL as VARCHAR2(1) to store the Euro symbol, it is like trying to put a 3-inch block into a 1-inch box. In order to fit into the box, the block will be truncated.

Note: SQL “dump” can be used to find the code and length of a character (in byte), e.g. SELECT dump (SYMBOL, 1016) from CURRENCY; The output of Euro symbol in AL32UTF8 database would be: Typ=1 Len=3 CharacterSet=AL32UTF8: e2,82,ac

If you don’t adjust the “truncation” data, you will get ORA-12899 error: “value too large for column” when importing the data.

www.nyoug.org 212.978.8890 79

4.2 How Do You Resolve Truncation Issues? There are a few methods: 4.2.1 Enlarge the Column Using More Bytes In our example, modifying VARCHAR2(1) to VARCHAR2(3) can resolve the problem. However, this method is not logical as we only need 1 character. 4.2.2 Change Length Semantics Changing database parameter NLS_LENGTH_SEMANTICS to “CHAR” is another solution. It can be done at database level or session level. 4.2.2.1 Database Level

ALTER SYSTEM SET NLS_LENGTH_SEMANTICS='CHAR' SCOPE=BOTH;

However, some Oracle installed products such as ORACLE TEXT (context index, etc.) won’t support database level “CHAR” length semantics. The changed NLS_LENGTH_SEMANTICS value will only be used when creating NEW columns. In other words, if you have columns using BYTE now and you change the instance parameter to CHAR, then those columns will still be BYTE. To change existing tables you must use "alter table".

Note: Don’t create a database with NLS_LENGTH_SEMANTICS=CHAR, you will have to change it after database is created because the data dictionary must use BYTE length semantics. 4.2.2.2 Column Level You may use “alter table” to modify the CHAR and VARCHAR2 columns from BYTE to CHAR. Although it is enough for the actual conversion to take action only on the columns with “truncation” data, it is strongly recommended to use CHAR semantics for all columns when transferring to a variable-width character set like AL32UTF8. The following can be used to generate script to change all CHAR and VARCHAR2 columns from BYTE to CHAR semantics explicitly: SET pagesize 0 linesize 120 feedback off verify off SPOOL byte_to_char.sql SELECT 'alter table '||t.owner||'.'||t.table_name|| ' modify ('||c.column_name||' '||c.data_type||'('||c.data_length|| ' CHAR));' FROM dba_tab_columns c, dba_tables t WHERE c.owner=t.owner AND t.owner not in ('SYS','SYSTEM',...)-- exclude system owners AND c.table_name=c.table_name AND c.char_used = 'B' -- only if not in CHAR semantics AND t.PARTITIONED !='YES' -- exclude partitioned tables AND c.table_name not in (select table_name from dba_external_tables) -- exclude external tables AND c.data_type in ('VARCHAR2', 'CHAR'); SPOOL OFF Note: Byte semantics is the default for the database character set. Character length semantics is the default and the only allowable length semantics for NCHAR data types.

www.nyoug.org 212.978.8890 80

If you have partitioned tables with CHAR semantics, you must fix them manually (see Doc 330964.1). Also, if you have functional indexes on affected columns, you must drop them before changing to CHAR semantics and then recreate them afterwards. 4.3 How Do You Handle Data Exceeding Maximum Bytes? What if a VARCHAR2 column stretches beyond 4000 bytes after changing to CHAR length semantics? Oracle’s maximum length for VARCHAR2 is 4000 bytes (not chars)! This problem also exists for CHAR type exceeding 2000 bytes. Suppose you have a table named RECORDS which has a column named DESC as VARCHAR2(4000) and you see the following in the CSSCAN output:

User : ABC Table : RECORDS Column: DESC Type : VARCHAR2(4000) Number of Exceptions 1 Max Post Conversion Data Size: 4082 And you find that the violated ROWID is ‘AACQtnAC3AAAp2XAAF’. You can’t put 4082 blocks into a box with a maximum of 4000 cells. Now what should you do? 4.3.1 Shorten the Violated Data UPDATE ABC.RECORDS SET DESC = ’abcdefg … …’ -- a value <= 4000 bytes WHERE ROWID = ‘AACQtnAC3AAAp2XAAF’; But if you are not allowed to change the data or you have too many violated rows in the column, you should consider using CLOB instead of VARCHAR2. 4.3.2 Change VARCHAR2 Column to CLOB You can not simply alter a VARCHAR2 column to CLOB. You may need to do the following: ALTER TABLE ABC.RECORDS ADD (tmp CLOB); UPDATE ABC.RECORDS SET tmp=TO_CLOB(DESC); COMMIT; ALTER TABLE ABC.RECORDS DROP COLUMN DESC; ALTER TABLE ABC.RECORDS RENAME COLUMN tmp to DESC; 4.4 Dependent PL/SQL Objects Related to Length Semantic Change If a column is modified from BYTE to CHAR, you must consider other dependencies such as user defined types, stored procedures, functions, packages and other PL/SQL objects. Using a stored procedure as an example, there may be code defined as VARCHAR2(5) which means VARCHAR2(5 BYTE). Now that the table column definition is changed to VARCHAR2(5 CHAR), you have to make the PL/SQL object change corresponding to the table change. There are at least two methods to do so: 4.4.1 Use Explicit CHAR Semantics in Code Define variables explicitly with “CHAR” in PL/SQL code: x CHAR(5 CHAR); -- not implicitly as x CHAR(5); y VARCHAR2(10 CHAR); -- not implicitly as y VARCHAR2(10);

www.nyoug.org 212.978.8890 81

This method is preferred to avoid confusion. 4.4.2 Change Session Length Semantics Setting (Implicit Change) The implicit method is to alter the session to use “CHAR” length semantics before compiling the PL/SQL objects. For example: alter session set NLS_LENGTH_SEMANTICS=’CHAR’; Create or replace procedure XYZ… The following query can be used to find out what length semantics a PL/SQL object (procedure, function, package/body, type/body, trigger, etc) was compiled with: SELECT owner, type, name, nls_length_semantics From DBA_PLSQL_OBJECT_SETTINGS Where owner = 'ABC' And name = 'XYZ';

5. HANDLING DATA DICTIONARY ISSUES If you choose to perform a partial export/import method with CSALTER, you must fix data dictionary issues. If you choose a full export/import into a pre-created Unicode database, you can ignore it from your source database as your data dictionary will be from the pre-installed Unicode database. As mentioned, in order to use CSALTER, the CSSCAN results for data dictionary must be Changeless for all CHAR, VARCHAR2 and LONG data Convertible and changeless for all CLOB The following query gives a list of data dictionary objects you must address manually:

SELECT table_name, column_type, coumn_name, conv_rows, exceed_size_rows, data_loss_rows FROM csmig.csmv$columns WHERE owner_name in (select distinct username from csmig.csm$dictusers) AND (conv_rows>0 or exceed_size_rows>0 or data_loss_rows>0) AND NOT (column_type = 'CLOB' and conv_rows>0); You may face different kinds of data dictionary issues depending on your system data and the installed products. You must address them on a case-by-case basis. Oracle has documents and workarounds regarding data dictionary issues when changing character sets. If you cannot find an answer, or if you are not sure about the resolutions, please open a Service Request with Oracle Support. The following are some of the possible data dictionary issues and workarounds:

5.1 SYS.JOB$ and/or SYS.SCHEDULER$_JOB Remove the problematic jobs and resubmit them after conversion. 5.2 SYS.WRH$ and/or SYS.WRI$ Stop “GATHER_STATS_JOB” and drop the AWR snapshots. Sample SQL script: SET pagesize 0 linesize 130 feedback off verify off exec DBMS_SCHEDULER.DISABLE('GATHER_STATS_JOB'); SPOOL drop_snapshot.sql SELECT

www.nyoug.org 212.978.8890 82

'execute dbms_workload_repository.drop_snapshot_range'|| min(snap_id)||','||max(snap_id)||');' FROM SYS.WRH$_SQLTEXT; SPOOL OFF Then run the generate script to drop the snapshots. Remember to restart the job and get fresh statistics after the conversion. 5.3 SYS.HISTGRM$ Delete statistics on the tables reported by running the following: $ORACLE_HOME/nls/csscan/sql/analyze_histgrm.sql Or, you may use the following SQL to generate a script to delete the problematic statistics. SET pagesize 0 linesize 120 feedback off verify off SPOOL delete_histgrm_stats.sql SELECT DISTINCT 'ANALYZE TABLE '||o.owner||'.'||o.object_name||' delete statistics;' FROM dba_objects o, SYS.HISTGRM$ h WHERE o.object_id = h.obj# AND h.ROWID IN (SELECT data_rowid FROM csmv$errors WHERE owner_name||'.'||table_name='SYS.HISTGRM$' AND error_type in ('CONVERTIBLE','DATA_LOSS') ); SPOOL OFF Then run the generated script. You may re-analyze them after converting to Unicode. 5.4 SYS.SOURCE$ Analyze and investigate the output from running the following: $ORACLE_HOME/NLS/CSSCAN/SQL/ANALYZE_SOURCE.SQL You may want to drop the objects, then fix and recompile them later in the Unicode database (Doc 291858.1). Data dictionary problems are sensitive. The above cases may or may not work for your specific situations. Be cautious when making changes. When in doubt, open a SR with Oracle.

6. HANDLING CONVERTIBLE DATA The “convertible” data is valid, but the characters will change to a different code point in the new character set. For example, the pound sign £ is "code 163" (A3 in hex) in the WE8ISO8859P1 and WE8MSWIN1252 character sets, but in AL32UTF8 it is code 49827 (C2 A3 in hex). Any application data that is "convertible" must be exported before changing the character set and imported afterwards. Oracle suggests not using data pump (expdp/impdp) when converting to Unicode or other character sets on all 10g versions lower than 10.2.0.4 (and 11.1.0.6). It will provoke data corruption unless you applied Patch 5874989 on the impdp side (expdp is not affected). The "old" exp/imp tools are not affected. You may choose one of the two methods to handle the convertibles based on the amount of convertible data:

6.1 Method 1: Full Export / Import (exp / imp)

www.nyoug.org 212.978.8890 83

6.1.1 Pre-create a Unicode Database (AL32UTF8) 6.1.2 Set NLS_LANG to the Source Database Character Set and Perform a Full Export NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252 exp FULL=Y ... 6.1.2 Set NLS_LANG to the Same (Source) Character Set and Import to Unicode Database NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252 imp FULL=Y ...

This is the traditional method. If your database is large, this method can be very time consuming. We will focus more on the next method.

6.2 Method 2: Partial Export/Import with CSALTER 6.2.1 Export the Convertible Tables with NLS_LANG Set to Source Character Set At this stage, you may generate scripts using CSMIG owned objects from last CSSCAN for truncating tables, export/import parfiles, disable/enable triggers and foreign keys etc.

Note: When you re-scan or get table structure changes after last CSSCAN, your CSMIG tables/views will be changed. To generate export parfile for the convertible tables of a schema:

SET echo off feedback off pagesize 0 linesize 100 SPOOL exp_conv_tab.&1..par SELECT 'files=exp_conv_tab.&1.dmp'||chr(10)|| 'log=exp_conv_tab.&1.log'||chr(10)|| 'buffer=100000'||chr(10)|| '….' -- (other options if needed) FROM dual; SELECT 'TABLES='owner||'.'||table_name FROM dba_tables WHERE owner||table_name IN (SELECT distinct owner_name||table_name FROM csmig.csmv$columns WHERE conv_rows >0 AND owner_name =upper('&1') ); SPOOL OFF

Other necessary scripts can be generated using similar methods. You must generate them before the next CSSCAN and before truncating the tables. Those activities will change CSMIG owned data and result in inaccurate or incorrect scripts. Then run the export. It can be run in parallel if desired. NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252 exp / parfile=… (use pre-generated parfile) exp / parfile=…

6.2.2 Truncate the Convertible Tables Before truncating the tables, you may need to disable triggers, foreign key constraints, etc.

www.nyoug.org 212.978.8890 84

Also consider whether or not to drop indexes in order to reduce export/import time. Then run the generated script to truncate all “convertible” tables. 6.2.3 Run CSSCAN Again after Truncating Tables csscan FULL=Y FROMCHAR=WE8MSWIN1252 TOCHAR=AL32UTF8

This time the application data should have “changeless” only, and the following 3 lines should be listed in the summary scan.txt file: All character type data in the data dictionary remain the same in the new character set All character type application data remain the same in the new character set The data dictionary can be safely migrated using the CSALTER script

6.2.4 Convert to AL32UTF8 Using CSALTER

Again, before running CSALTER, be sure to back up your database, shutdown the listener and any applications connected to the database, purge dba_recyclebin, and then run CSALTER as sysdba. SELECT * from nls_database_parameters WHERE parameter like '%CHARACTERSET%'; shutdown immediate startup restrict SPOOL To_AL32UTF8.log @$ORACLE_HOME/rdbms/admin/csalter.plb spool off -- change back job_queue_processes, etc to original value shutdown startup SELECT * from nls_database_parameters WHERE parameter like '%CHARACTERSET%';

Your database character set should be AL32UTF8 now.

6.2.5 Re-install Datapump Packages Run scripts at $ORACLE_HOME/rdbms/admin as sysdba to de-install and re-install datapump packages (see document # 260192.1)

For 10.2.X

catnodp.sql catdph.sql catdbp.sql

For 10.1.X

catnodp.sql catdp.sql

6.2.6 Import the Convertible Tables You may use the pre-generated scripts or parfiles to import and can do them in parallel. Again, remember the NLS_LANG should be set to the source character set. In our case:

www.nyoug.org 212.978.8890 85

NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252 imp / parfile=… (use pre-generated parfiles) imp / parfile=… …

Note: It is critical to set the correct NLS_LANG. For both export and import, it should be set to the source character set you export from the “old” database. 6.2.7 Enable or Recreate Disabled / Dropped Objects

6.2.8 Validate Using Unicode Client The best tool to validate and display Unicode data is Oracle’s SQL Developer. It is a fully supported free GUI tool, a Unicode client, and it is not dependent on the NLS_LANG settings. SQL Developer can be downloaded from Oracle’s website: http://www.oralce.com/technology/products/database/sql_developer/index.html Another Unicode client is iSQL*plus, which can be started by running: $ORACLE_HOME/bin/isqlplusctl start and accessed from client URL: http://machine_name:5560/isqlplus/ Now our journey to Unicode has reached its destination. The database and data are AL32UTF8. Congratulations!

SUMMARY More and more companies are realizing the need for and benefits of using Unicode. Deploying in Unicode offers many advantages in usability, compatibility, and extensibility. With careful planning and full comprehension of the implementation methods and steps, migration to Unicode can be smooth and successful.

REFERENCES Oracle White Paper: Oracle Unicode database support (May 2005) Oracle White Paper: Character Set Migration Best Practices (October 2002) Doc 260893.1: Unicode character sets in the Oracle database Doc 306411.1: Character Set Consolidation for the Oracle Database Doc 333489.1: Choosing a database character set means choosing Unicode Doc 788156.1: AL32UTF8 / UTF8 (Unicode) Database Character Set Implications Doc 225938.1: Database Character Set Healthcheck Doc 158577.1: NLS_LANG Explained (How does Client-Server Character Conversion Work?) Doc 745809.1: Installing and configuring Csscan in 10g and 11g (Database Character Set Scanner) Doc 444701.1: Csscan Output Explained Doc 276914.1: The National Character Set in Oracle 9i, 10g and 11g Doc 260192.1: Changing WE8ISO8859P1 / WE8ISO8859P15 or WE8MSWIN1252 TO (AL32)UTF8 Doc 555823.1: Changing US7ASCII or WE8ISO8859P1 to WE8MSWIN1252 Doc 341676.1: Difference between WE8MSWIN1252 and WE8ISO8859P1 characterset Doc 144808.1: Examples and limits of BYTE and CHAR semantics usage (NLS_LENGTH_SEMANTICS) Doc 274507.1: Finding out the length semantics of type attributes Doc 274507.1: Finding out the length semantics of type attributes Doc 227332.1: NLS considerations in Import/Export - Frequently Asked Questions Doc 237593.1: Problems connecting to AL32UTF8 databases from older versions (8i and lower) Doc 258904.1: Solving Convertible or Lossy data in Data Dictionary objects when changing the NLS_CHARACTERSET

www.nyoug.org 212.978.8890 86

Tuna Helper – A Proven Process for Tuning SQL

Dean Richards, Confio Software

Introduction Many who undertake SQL tuning projects look for the magic bullet in the form of a software tool for help. While those tools can help tune simple SQL statements, tuning complex ones that contain unions, subqueries, outer joins, etc, give tools problems. I have used a variety of tools and have been less than impressed. If tools are less than adequate, how do we go about tuning SQL statements and applications? Instead of relying on tools that do not work well, I suggest that you learn how to perform SQL tuning the old fashioned way, i.e. doing it yourself. Too many people I cross paths with in my work do not understand even the basics when it comes to SQL tuning, so the fundamental goal of this paper is to provide a process that will help you get started down the tuning path. Many people fail when it comes to tuning, so if you can become good at it, your image and worth to your company will undoubtedly rise. The reason most tools do not work, and the reason people fail at SQL tuning projects, is quite simple - It’s hard! Many people make their living off training people how to do SQL tuning well and those classes are usually 3-5 days in length. If it were easy there would be no performance issues because they would have been solved by now. We are by no means going to make you an expert in this short paper, but hopefully I will present you with a framework that you can use to become a better SQL tuner. Based on this framework you can start to build your own process. As I mentioned before, there is no magic bullet or tool that you can point at your database and voila, the database is tuned. Becoming an expert SQL tuner takes time and practice, so if you’re scared of the prospects of undertaking tuning projects, just go for it. You will never learn by sitting on the sidelines and doing the same old things. However, when beginning the project you will want to make sure you are not standing on an island by yourself. Most likely you did not write the application or the SQL statements that are performing poorly. One of the biggest mistakes I see are DBAs that work in isolation when tuning. It’s there nature – “I’m smart and I can do this by myself”, or possibly “I’m a recluse and I don’t want to deal with anyone else”. It’s always better to include other technical people that are familiar with the application. If the application is custom built by your company, work with the developer who wrote or maintains the code. Work with the people that designed the application to get a better understanding of the requirements. Work with the business people to understand what they do in the application when the performance problem occurs. All of these groups are typically more than willing to help because they will benefit from a faster application.

Challenges There are many challenges in a tuning project. I already said this, but it’s worth saying again – SQL tuning is difficult. To do it correctly you or the people you get on your team, need to be very familiar with many different aspects of the application. From a technical standpoint, you need to understand execution plans or how Oracle is executing the SQL and how the data is being accessed. You also need to be familiar with SQL design concepts because sometimes tuning the SQL means rewriting it. When you are working with the end users, understand how the application is used and why they do things? Why do you fill in those fields? Why do you use this screen? Understanding the purpose of the SQL and application will help you make better decisions down the road. Put a Face on the Project - Many people are impatient, but SQL tuning takes time. That’s why working with other

people is also about putting a face on the project. Instead of the users saying something similar to “I’ve been complaining about this problem for weeks and the DBA team says they’re doing something, but no one knows what.”, instead they might say “I’ve been working with Bob from the DBA group and he definitely asks a lot of questions and seems to care about my experience.” A very different circumstance for sure.

Large Number of SQL Statements - Another huge challenge is that there are a large number of SQL statements and how do you know for sure you are working on the right one? I’ll talk about this in more detail, but this is also where

www.nyoug.org 212.978.8890 87

the end users can help you focus on the correct things. Instead of worrying about 100s or 1000s of SQL statements, worry about the ones that affect this user and this screen in this application they are complaining about.

All Statements are Different - all SQL statements are different. Just because you solved the last problem in 30 minutes by tuning a certain way does not mean the next project is that easy or can be tuned the same way.

Lack of Priority - Some companies in general do not care about performance. As long as it gives the correct results, they don’t care as much about how long it takes to get the results.

Indifference - some users get used to the way things work – “I always press this button the first thing in the morning and then go get coffee, because I know it takes an hour.”. Bad performance becomes a way of life and sometimes people actually get upset when the application becomes faster. They can’t walk around interrupting everyone else with what their kids did yesterday.

Never Ending Task - there always seems to be the next problem. Once you tune something, other people want their application tuned as well, but this is a good thing for you.

Process Working with many other customers and our Ignite for Oracle product, I have developed a process that works very well for me. This does not mean it will work for you, but I think the basics are a good starting point for anyone. The process centers around four main steps:

1. Identify – pick the correct SQL statement to tune and avoid wasting your time. 2. Gather – gather the proper information that will help you make the best tuning decisions. 3. Tune – tune the SQL statement based on gathered information. 4. Monitor – ensure the SQL statement is tuned and stays tuned. Monitoring also helps you understand the exact

benefits achieved. This step also starts the process over again and helps you identify the next project.

Identify – Which SQL to Tune Once you get back to the office and want to tune something, where do you start? Don’t just pick one that looks interesting, and do have some method to choose the SQL. The SQL statement could come from a discussion with users and understanding their complaints. It may stem from a batch job that continues to run longer and longer. It could also come from tracing a session because of user complaints. Have some reason for tuning, even if it’s because the statement is the number one SQL in the database from a wait time perspective, or it does the most logical I/O (LIO) on a daily basis. You may notice your application performs a lot of full table scans and from that you determine the top SQL statements affected by the problem. Maybe it’s a known poor performing statement that comes from the developer asking your help (be careful, as the top SQL statement from a developer may not be the one affecting your end users the most). Whatever it is, have a method for picking the SQL statements. At Confio we believe the best measurement is end user wait times, which is the focus of our Ignite product line.

Identify – End to End View When I mention End-to-End view, you may think of the performance of the application from the web browser or client application, through the application server and to the database. That is technically correct, but from a business perspective, you should also know the SQL and application end-to-end. I encourage you to understand what the SQL is used for, why does the business need to know this information, how often is this data needed and who consumes the information. It’s helpful (I think it should be a requirement) to get the big picture when jumping into a tuning project. When the user complains, how do you know the problem is with the database and a SQL statement? The problem could be in the Java application tier or the Oracle E-Business forms tier. Even if you tuned the worst performing SQL statement for that process, you may not make that much of a perceived difference if only 10% of the total time was spent in the database. However, if you know that 90% of the time is spent in the database, and most of that time is spent executing a specific SQL statement performing a full table scan (wait events will be discussed soon), you can start to make predictions about the performance gains. To get this type of information you will probably need a tool of some sort, but there are also ways to get this information via debugging, logging times to files, instrumenting code among many others.

www.nyoug.org 212.978.8890 88

Identify – Database Wait Time View I already mentioned that I feel wait event information is very critical when you jump into SQL tuning. When SQL statements execute, Oracle has instrumented their code to give information about where the SQL statement is spending time. If it runs for 3 minutes, where did it spend it’s time? This is where v$session and v$session_wait give those clues. You can also get this information from tracing, which is a great way to get detailed information about a problem. Detailed explanation of wait events and tracing are outside the scope of this paper, but I encourage you to understand those topics very well. Speaking of wait time and wait events, here is a quick test. Which of these scenarios is worse? SQL Statement 1

o executed 1000 times o made the end users wait for 10 minutes o waited 99% of the time on “db file sequential read”.

SQL Statement 2 o executed 1 time o made the end user wait for 10 minutes o waited 99% of it’s time on a locking problem

I think the answer is that both are equally bad because they both made the end user wait for 10 minutes. End users don’t care what they wait for, only that it took 10 minutes. SQL statement 2 may be harder to tune, because locking problems are typically application design issues, but both are equal in the eyes of the user. However, one caveat is that if you are faced with two tuning projects like this, I would choose to work on SQL Statement 1 first, because it may be easier to get results for someone in a shorter amount of time.

Identify – Simplification So far we have identified:

1. The users or business impact of the performance issue 2. End-to-end view of the process that specifies the layer of the application where performance is suffering 3. SQL statements being executed in the database and the ones (there may be more than one) causing the problems.

So we now have identified some things to work on and these statements are not typically “select * from table1 where col1 = X”, they are usually much more difficult. They may be pages long, include subqueries, unions, not exists and all sorts of things. The first step I take is to simplify. Break the SQL down into manageable pieces that can be more easily comprehended. Complex SQL statements may become 5 smaller SQL components that are simpler. Tune each of these separately. If you have 5 statements all unioned together, tune each statement separately. If views are used, get the definitions and tune the views separately as well. If synonyms are used, know where they point. I’ll talk about this more later when we dive in execution plans.

Identify – High Level Analysis From a high-level view, determine the limiting factors for the query. They will look similar to the following: ColumnX = :1 ColumnX in (‘A’,’B’) ColumnX like ‘ABCD%’ – note the % is not in the front Eventually you will match up existing indexes to these limiting factors and this will be discussed more when we review the execution plan. A hidden limiting factor is also the size of a table. If you are joining a million row table to a 10 row table, the 10 row table will probably limit the number of rows retrieved from the million row table (not always, it still depends on the join criteria). This will be discussed in more detail below. You should also avoid non-limiting factors like the following as Oracle will never use indexes on these criteria. These could become candidates for rewriting the statement if you have no other choice:

www.nyoug.org 212.978.8890 89

<>, != NOT LIKE NOT IN LIKE ‘%ABCD%’

Gather – SQL Statement Metrics The next phase is all about gather critical information and metrics about the SQL statement. These metrics should include the following: How long does the statement take now? What is acceptable to the end users? If they want the query to return in 10 seconds and it is reading tons of data, you

may have to stop the tuning project immediately, because you will not satisfy expectations. Collect wait time information because all performance problems are not created equal.

o If you find a query waiting on locking/blocking wait events, you know that tuning the end user wait time is not about tuning the SQL statement. Fixing locking issues are usually application design issues, and in this case, you should become very good friends with the developers and the people that designed the application.

o If you find a statement waiting on I/O wait events, e.g. db file sequential read or db file scattered read, etc, this also helps you get to the next step. Waits on db file scattered read indicate full table scans while waits on db file sequential read typically indicates inefficient indexes being used for this query.

o If you have latch contention, you have to determine the exact latch causing the bottleneck and understand why it is so popular.

o If you see network waits, understand if the statement is returning a lot of data, either in the form of a lot of rows, or large columns like LOBs are involved. There could also be real network latency between the database and the client, and in this case, make friends with your network administrator.

o Quite likely there will be multiple problems. There could be a full table scan on Table1 and an inefficient index being used to access Table2 which is also causing latching problems.

If a query executes for 3 minutes, understand what makes up that 3 minutes from a wait event view, i.e. it waits 2:30 on db file scattered read, 15 seconds on latching, and 15 seconds on CPU (service time). When you have gathered the necessary metrics, document it in simple language. Examples of this are:

o “This query spends 85% of it’s time locked because of other processes accessing the same data.”, vs. “This query spends 85% of it’s time waiting on “enq: TX – row lock contention’. Not many know what this wait event means, but they will likely understand “locking problem”.

o “This query takes a long time because it is inefficient and reads every row of a million row table.” vs. “This query spends 85% of it’s time waiting on db file scattered read”.

o “The query takes 20 ms to execute, which sounds efficient, but every time the user executes this process, the query runs 100,000 times and causes the user to wait 2000 seconds or over 30 minutes”. Don’t say “The query executes in 20 ms and there is nothing I can do” – the user will not be happy about this at all because they are still waiting on their end.

Gather – SQL Execution Plan Once you have broken the query down you need to understand how each component behaves. This is where executions plans help. They supply costing information, data access paths, join operations and many other things. However, not all plans are the same. Do you know that EXPLAIN PLAN could be wrong and not match how Oracle is really executing the statement? How could the explain plan be wrong? It can be wrong because the EXPLAIN PLAN command is typically executed in a completely different environment, i.e. via SQL*Plus or Toad and not from the application code and environment. There could be different session settings, bind variable data types, along with many other things that are outside the scope of this document. I feel the best places to get execution plan information are:

www.nyoug.org 212.978.8890 90

1. V$SQL_PLAN – contains raw data for execution plans of executed SQL statements. It provides the exact plan Oracle used so why not go straight to the source. Use DBMS_XPLAN to get the execution plan in a readable format.

2. Tracing – gives all sorts of great information as well as executions plans for an entire session or process. 3. Historical Data – if you can, collect and save execution plan information so you can go back to a week ago and

understand why the SQL statement started performed poorly. Once you have an execution plan, determine how each of the SQL components is being executed. Based on wait time data, you should already have a feel whether full table scans are significant or whether more time is spent reading indexes or if you have other issues. Execution plans help determine where those problems are - look for expensive steps.

Gather – Not All Plans Created Equal I mentioned that data from the EXPLAIN PLAN command can be wrong. Here is an example where, based on wait time data, we were pretty sure the query was doing a full table scan. However, when we reviewed the EXPLAIN PLAN output, the query looked very efficient and seemed to be using a unique index to retrieve one row of data. Here is the statement and the explain plan information: SELECT company, attribute FROM data_out WHERE segment = :B1; Data from EXPLAIN PLAN Command

How could this be? It is a very simple query with one criterion in the WHERE clause, so if it’s executing very efficiently, why are my end users waiting 20 seconds for this query? The answer is revealed when we review the execution plan from V$SQL_PLAN using DBMS_XPLAN:

Having multiple sources of performance data, i.e. wait time data, execution plans, end user documented performance, end-to-end views, etc, are helpful. If we were only reviewing explain plan data in this case, we may have dismissed the issue and pointed our fingers back to the development group claiming the problem was in the application tier. It appears to be using a unique index to get one row and cannot get any faster. They would probably instrument their code to better understand where the performance problem is and would soon realize that this query is indeed the problem. You now look bad because they now have evidence the statement is performing poorly. However, if you would have reviewed the execution plan above, you would very quickly understand what is happening. In the data from V$SQL_PLAN, it points to a full table scan on the DATA_OUT table. Furthermore, by using the predicate information under the plan, you also understand why. The :B1 bind variable is being passed as a BINARY DOUBLE number from Java. The segment column is defined as a standard NUMBER datatype so Oracle is forced to implicitly convert the data in the table to BINARY DOUBLE to apply the criteria. Anytime a function is used around a column, even if it is an implicit function, standard indexes on that column cannot be used. You now have a plan of action:

www.nyoug.org 212.978.8890 91

1. Change Application - go back to the developers and tell them to modify the code to have the Java code pass in a standard number.

2. Modify Table Definition - redefine the column in the DATA_OUT column to be a BINARY DOUBLE if it needs to be.

3. Create Function-Based Index – this will immediately help performance while the development team figures out how to change the code.

You are a hero by telling them what they need to do to help performance, and, you can create a second index to help performance for now. Everyone wins when the correct information is applied to the problem.

Gather – Bind Values The third step of gathering information is to understand the data being passed into any bind variables. You can get this information from the V$SQL_BIND_CAPTURE table. One caveat is the database parameter STATISTICS_LEVEL should be set to ALL or TYPICAL. The values are only collected every 15 minutes, so the data you get from here is not exact, but it can be very helpful. If we would have used this in our previous example, we would have seen that :B1 pass in as BINARY DOUBLE and started asking questions as well. To get the best bind variable information, trace the process and collect bind values. It is the most accurate and you will see every value passed in while the trace was running. Use Level 4 or 12 to get binds. Level 12 gives both wait events and binds, so use it if possible.

Gather – Table and Index Statistics The next step is to gather information about each table being accessed inefficiently. These tables come from a review of the execution plan and are a result of finding the highest execution steps. There is no use gathering data for objects that are already being accessed efficiently. Confio has a script on the support site (http://support.confio.com/kb/1534) that will help with this process and will gather information such as:

1. Table sizes – a full table scan on a 20 row table is better than using an index even if it existed, so understand if you have small, medium, large or very large tables involved. a. Also understand segment sizes. We had an experience with a 20 row table where Oracle was doing a full

table scan to access it. We kept overlooking it because it only had 20 rows, however, the underlying segment for the table was 80MB. At some point in the life of the table, it had over 500,000 rows, so it grew to 80MB, but as you know, when the rows were deleted, the segment remained the same size. Each time the full table scan was done, Oracle did not just read 20 rows, it read 80MB of data with most of blocks empty. Truncating and reloading the data solved the problem.

2. Existing indexes – get a list of all indexes that already exist on these tables. 3. Understand the selectivity or cardinality of columns from your limiting factors discovery. If a query includes

criteria such as “STATUS=‘D’”, how many values does the status field have, i.e. how selective is it. a. Also understand data skew at this point. If there are only 5 values for the STATUS column, but 1% of them

have a status=‘D’, an index with histograms may help this statement. A simple query for determining data skew is provided below.

Gather – Entity Relationship Diagram Understand the relationships of the tables involved in the statement. The sample SQL statement we will tune in this paper is as follows and answers the question: “Who registered for the SQL Tuning class today?”: SELECT s.fname, s.lname, r.signup_date FROM student s, active_registrations r, class c WHERE s.student_id = r.student_id AND r.class_id = c.class_id AND UPPER(c.name) = 'SQL TUNING' AND c.class_level = 101

www.nyoug.org 212.978.8890 92

AND r.signup_date BETWEEN TRUNC(SYSDATE) AND TRUNC(SYSDATE-1) The ERD for this query is very simple and restricts the universe to the objects the query is accessing. If the statement only accesses 3 tables, do not review a full ERD containing 100s or 1000s of tables. Only review the relationships of the three tables. You will be surprised by the number of times you can find mistakes in the SQL statement by reviewing the ERD.

Tune – Review Execution Plan The execution plan for our sample SQL statement is given below: ----------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 79| | 1 | NESTED LOOPS | | 1 | 167 | 79| | 2 | NESTED LOOPS | | 1 | 81 | 78| | 3 | NESTED LOOPS | | 1 | 51 | 77| | 4 | VIEW | VW_SQ_1 | 1 | 35 | 77| |* 5 | FILTER | | | | | | 6 | HASH GROUP BY | | 1 | 17 | 77| |* 7 | FILTER | | | | | |* 8 | TABLE ACCESS FULL | REGISTRATION | 1 | 17 | 76| |* 9 | INDEX UNIQUE SCAN | SYS_C0020876 | 1 | 16 | 0| | 10 | TABLE ACCESS BY INDEX ROWID| STUDENT | 1 | 30 | 1| |* 11 | INDEX UNIQUE SCAN | SYS_C0020874 | 1 | | 0| |* 12 | TABLE ACCESS BY INDEX ROWID | CLASS | 1 | 86 | 1| |* 13 | INDEX UNIQUE SCAN | SYS_C0020875 | 1 | | 0| ----------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 5 - filter((MAX("SIGNUP_DATE")>=SYSDATE@! AND MAX("SIGNUP_DATE")<=TRUNC(SYSDATE@!-1))) 7 - filter(SYSDATE@!<=TRUNC(SYSDATE@!-1)) 8 - filter("CANCELLED"='N') 9 - access("R1"."STUDENT_ID"="STUDENT_ID" AND "R1"."CLASS_ID"="CLASS_ID" AND "SIGNUP_DATE"="VW_COL_1") filter(("SIGNUP_DATE">=SYSDATE@! AND "SIGNUP_DATE"<=TRUNC(SYSDATE@!-1))) 11 - access("S"."STUDENT_ID"="STUDENT_ID") 12 - filter(("C"."CLASS_LEVEL"=101 AND UPPER("C"."NAME")=‘SQL TUNING')) 13 - access("CLASS_ID"="C"."CLASS_ID")

When reviewing the plan at a high level, look for costly data access steps, i.e. “table access full”, “table index by index rowid”, along with the indexes and tables involved. Also, start from the inside and work your way out because this determines the order in which Oracle is accessing objects. From this execution plan, the inside and most indented step is “TABLE ACCESS FULL” on the REGISTRATION table. Notice that the REGISTRATION table is not used directly in the query, but something named ACTIVE_REGISTRATIONS is. This typically indicates a view or synonym so go gather those definitions as well. Also review the predicate information for any strange things like our previous example. When you match up step 8 in the execution plan with the predicate information, you can see the REGISTRATION table is

CLASS class_id name class_level …

STUDENT student_id fname lname

…

REGISTRATION class_id student_id signup_date cancelled …

www.nyoug.org 212.978.8890 93

being filtered by a CANCELLED=“N’ criterion as well as MAX(SIGNUP_DATE) >= now and <= a day ago. The STUDENT and CLASS tables are being accessed by unique ids and appear to be very efficient. Based on this information Oracle first goes to the REGISTRATION table and looks for rows that are not cancelled and then filters on the date crtieria. Oracle then joins the CLASS and STUDENT tables using unique indexes, which probably represents foreign keys back to the main tables. From the example we can definitely tell that the most expensive portion of the plan is the full table scan on the REGISTRATION table. The plan of action is to understand why a full table scan is necessary or if an index could help the process.

Tune – View Simplification We saw that the execution plan referenced a table named REGISTRATION, but something named ACTIVE_REGISTRATION is used in the query. Most likely this indicates that a view or synonym is being used. Check for synonyms and view definitions – in this case this is a view defined as follows. set long 8000 select text from user_views where view_name='ACTIVE_REGISTRATIONS'; TEXT ---------------------------------------- SELECT student_id, class_id, signup_date FROM registration r1 WHERE signup_date = ( SELECT MAX(signup_date) FROM registration r2 WHERE r1.class_id = r2.class_id AND r1.student_id = r2.student_id AND r2.cancelled = 'N') Things now start to make sense between the execution plan and the query itself. This view, or essentially a subquery, is also being executed. This is the cause of the full table scan on the REGISTRATION table.

Tune – Review Table and Index Statistics Below is information about the REGISTRATION table from the TableTuningInfo.sql script referenced above. To understand what you need to do for tuning a statement, you first need to understand the basic details of the tables involved. Name Null? Type --------------------------- -------- ---------------- STUDENT_ID NOT NULL NUMBER CLASS_ID NOT NULL NUMBER SIGNUP_DATE NOT NULL DATE CANCELLED CHAR(1) INDEX_NAME UNIQUENES COLUMN_NAME COLUMN_POSITION ------------------------------ --------- --------------- --------------- SYS_C0020876 UNIQUE STUDENT_ID 1 SYS_C0020876 UNIQUE CLASS_ID 2 SYS_C0020876 UNIQUE SIGNUP_DATE 3 COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY SAMPLE_SIZE ------------------------- ------------ ---------- ---------- -----------

www.nyoug.org 212.978.8890 94

CANCELLED 2 0 .5 5443 CLASS_ID 998 0 .001002004 5443 SIGNUP_DATE 32817 0 .000030472 79983 STUDENT_ID 9999 0 .00010001 79983 The top portion of the output shows the table definition while the second section shows all existing indexes and the columns they contain. The third section of the output includes other column data, i.e. the number of distinct values for a column, which is very important when thinking about potential indexes. Density, shown in the last section, is a statistic used by the Cost Based Optimizer (CBO) to give selectivity estimates for columns where better information is unavailable (i.e. from histograms). In this case, the CANCELLED column would return 50% of the rows based on the 2 distinct values. Assuming an even data distribution for a column, density tells us the percentage of rows that will be returned by any random value in a WHERE clause. As long as density is between .07 - .10 (7% - 10% of the rows will be returned), the column may make a good candidate for indexing. The formula for calculating density is: Density = 1 / Num Distinct Or Density = Number of Rows for Value / Total Rows in Table However, when we review the data distribution for the CANCELLED column, it gives far different information than Oracle currently knows about. select cancelled, count(1) from registration group by cancelled; C COUNT(1) - ---------- Y 638 N 79345 In this case, 99.3% of the rows in the REGISTRATION table contain a value of N for the CANCELLED column. Unfortunately, our query is looking for those rows, so an index will not help. The other criterion used in the subquery is based on the SIGNUP_DATE column and is retrieving rows for the past 24 hours. Reviewing the data distribution for this column provides us some interesting information: select trunc(signup_date), count(1) from registration group by trunc(signup_date) TRUNC(SIGNUP_D COUNT(1) -------------- ---------- 01/01/09 00:00 100 01/02/09 00:00 290 01/03/09 00:00 107 01/04/09 00:00 845 01/05/09 00:00 3190 01/06/09 00:00 2727 … 01/29/09 00:00 2693 Assuming today is January 30th (when out customer ran the tests), this output tells us that 2,693 rows would be retrieved by this query. Calculating density/selectivity for this columns gives us 1 / 2693 / 79983 = 0.03367 (3%) or less than the 7-10% signifying this criteria may be a decent candidate for an index. This sounds like a good start.

www.nyoug.org 212.978.8890 95

Tune – Test and Review Results In our first test, we created the index on the SIGNUP_DATE column, gathered stats, executed the query and the new execution plan is much different. The new plan is a cost of 10 and it shows that our new index REG_SUDT is being utilized. We have been successful because the cost is much lower. However, a strong word of caution is to ensure that you test the query and new index in good test environment before claiming success. A lower cost does not always equate to better execution times. create index reg_sudt on registration(signup_date); ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 10| |* 1 | FILTER | | | | | | 2 | HASH GROUP BY | | 1 | 174 | 10| |* 3 | FILTER | | | | | |* 4 | TABLE ACCESS BY INDEX ROWID | REGISTRATION | 1 | 17 | 3| | 5 | NESTED LOOPS | | 1 | 174 | 9| | 6 | NESTED LOOPS | | 1 | 157 | 6| | 7 | NESTED LOOPS | | 1 | 59 | 5| | 8 | TABLE ACCESS BY INDEX ROWID| REGISTRATION | 1 | 17 | 4| |* 9 | INDEX RANGE SCAN | REG_SUDT | 2 | | 2| | 10 | TABLE ACCESS BY INDEX ROWID| STUDENT | 1 | 42 | 1| |* 11 | INDEX UNIQUE SCAN | SYS_C0020874 | 1 | | 0| |* 12 | TABLE ACCESS BY INDEX ROWID | CLASS | 1 | 98 | 1| |* 13 | INDEX UNIQUE SCAN | SYS_C0020875 | 1 | | 0| |* 14 | INDEX RANGE SCAN | SYS_C0020876 | 1 | | 2| ----------------------------------------------------------------------------------

Tune – Review all Alternatives Reviewing other portions of the query for limiting factors, we find a criteria of “UPPER(c.name) = ‘SQL TUNING’. If we know our data well, most likely there are very few classes with a name of “SQL TUNING” so in theory this may make a good candidate for a function-based index. This criterion should return one row (or a few) from the CLASS table which could be joined to the REGISTRATION table to pick out everything for that CLASS_ID. The STUDENT table would then be joined to that to get the final results. However, remember the index on the REGISTRATION table begins with STUDENT_ID and not CLASS_ID. To make this work, we can try creating the function based index and creating a new index with CLASS_ID on the REGISTRATION table (or recreating the primary key for the REGISTRATION table starting with CLASS_ID). This option is better (cost of 7 vs. 10) than the other plan and is an example of knowing the data helps you tune the query. If we had no idea there was only one class named ‘SQL TUNING’ we may not have looked at this option. However, if you do your due diligence and check every condition in the WHERE clause, you would have known this information with a simple query: SELECT COUNT(1) FROM CLASS WHERE UPPER(NAME) = 'SQL TUNING';

Tune – Example 2 Here is another example on the same set of tables that answers the question: “Who cancelled a class within the last week?”. SELECT s.lname, c.name, r.signup_date cancel_date FROM registration r, student s, class c where r.signup_date between sysdate and sysdate-7 AND r.cancelled = 'Y'

www.nyoug.org 212.978.8890 96

AND r.student_id = s.student_id AND r.class_id = c.class_id ------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | | | 78| |* 1 | FILTER | | | | | | 2 | NESTED LOOPS | | 1 | 103 | 78| | 3 | NESTED LOOPS | | 1 | 82 | 77| |* 4 | TABLE ACCESS FULL | REGISTRATION | 1 | 18 | 76| | 5 | TABLE ACCESS BY INDEX ROWID| CLASS | 1 | 64 | 1| |* 6 | INDEX UNIQUE SCAN | SYS_C0020875 | 1 | | 0| | 7 | TABLE ACCESS BY INDEX ROWID | STUDENT | 1 | 21 | 1| |* 8 | INDEX UNIQUE SCAN | SYS_C0020874 | 1 | | 0| ------------------------------------------------------------------------------

Using our process, the first step is to review the execution plan and understand wait events for the query. It waits exclusively on “db file scattered” read and the execution plan clearly shows a full table scan on the REGISTRATION table. We also know from our data (based on the query against the SIGNUP_DATE column above) that almost 30% of the rows in the REGISTRATION table have a SIGNUP_DATE value from the past week. This is not a good candidate for index usage. However, remember our data skew query for the CANCELLED column. Only 638 rows include a value of ‘Y’ – the selectivity of this column becomes 638 / 79938 = .007 or very good selectivity. As a first test we will create an index on the CANCELLED column. Will this help? It will not help because Oracle assumes even data distribution and thinks this column will return half the rows in the table and will not use the index. We will also need to collect histograms so Oracle knows about this uneven distribution of data.

Tune – Histograms Collecting histograms gives Oracle the same data as returned by the query: select cancelled, count(1) from registration group by cancelled; C COUNT(1) - ---------- Y 638 N 79345 It sees good selectivity for cancelled=‘Y’ and decides to use the index. Histograms work best with literal values. If a bind variable, the first time the query is parsed Oracle will perform bind variable peeking to determine the execution plan. However, what if the first time the query is executed, the bind variable is ‘N’? Oracle will perform a full table scan for that execution and all others, even though subsequent queries may use a bind variable of ‘Y’ and benefit from the index. Collecting histograms is very easy and in this case we may use something like the following: create index reg_can on registration(cancelled); dbms_stats.gather_table_stats( ownname => 'STDMGMT', tabname => 'REGISTRATION', method_opt=>'FOR COLUMNS cancelled SIZE AUTO'); Based on this new data, the execution plan becomes: ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 7|

www.nyoug.org 212.978.8890 97

|* 1 | FILTER | | | | | |* 2 | TABLE ACCESS BY INDEX ROWID| REGISTRATION | 1 | 17 | 7| |* 3 | INDEX RANGE SCAN | REG_CAN | 754 | | 2| ----------------------------------------------------------------------------

Monitor The final step is to monitor the results. Even with very good testing, sometimes you will find that what worked in the test database does not work in production. Be sure to monitor the results when your end users begin using the newly tuned statement. Collect all metrics again and understand how quickly it executes, what it waits on, etc. Then, document the improvements to show the success of the tuning project. Again, document everything in easy to understand language. Monitoring is also important because it really is the start of the next tuning opportunity. Tuning is iterative, so now go after the next worst query.

About the Author Dean Richards, Senior DBA, Confio Software Dean Richards has over 20 years of Oracle and SQL Server project development, implementation and strategic database architecting. Before coming to Confio, Dean held engineering positions at McDonnell Douglas and Daugherty Systems, an object-oriented solution provider for Anheuser-Busch. Dean was a technical director for Oracle Corporation managing all aspects of their broadband account including short and long-term technical planning and strategic alliances. As a highly successful liaison between management and technical staff, Dean has proven to be an effective collaborator implementing cutting-edge solutions. A double major in Computer Sciences and Mathematics, Dean graduated Cum Laude from Southern Illinois University.

About Confio Software Confio Software develops performance management solutions for Oracle, SQL Server, DB2, and Sybase databases. Confio Ignite PI, which applies business intelligence analysis to IT operations, improves service levels and reduces costs for database and application infrastructure. The Confio Igniter Suite PI is an open, multi-vendor, agentless monitoring solution that allows DBAs and management the ability to detect problems, analyze trends, and resolve bottlenecks impacting database response time. Built on an industry best-practice Wait-Time methodology, Confio’s Igniter™ Suite improves service levels for IT end-users and reduces total cost of operating IT infrastructure. Confio Software products today are used by customers in North America, Europe, South America, Africa and Asia whose mission includes getting most value out of their business critical IT systems. Customers are reached directly through the Confio sales force and through a network of partners in the US and internationally. For more detailed information about Confio, e-mail us at [email protected], telephone us at 1.303.938.8282 or see us on the web at www.confio.com. Confio Software Boulder, Colorado, USA (303) 938-8282 [email protected] www.confio.com

www.nyoug.org 212.978.8890 98

Get More for Less: Enhance Data Security and Cut Costs

Ulf Mattsson, CTO, Protegrity Corporation

Dominic Dougherty, Protegrity Technical Support

Introduction Data security plans often center around the "more is better" concept. These call for locking everything down with the strongest available protection and results in unnecessary expenses and frequent availability problems and system performance lags. Alternatively, IT will sometimes shape their data security efforts around the demands of compliance and best practices guidance, and then find themselves struggling with fractured security projects and the never-ending task of staying abreast of regulatory changes. There is a better way — a risk-based classification process that enables organizations to determine their most significant security exposures, target their budgets towards addressing the most critical issues and achieve the right balance between cost and security. In this article, I discuss the risk-analysis processes that can help companies achieve cost-savings while measurably enhancing their overall data security profile by implementing a holistic plan that protects data from acquisition to deletion. This paper will review different options for data protection in an Oracle environment and answer the question “How can IT security professionals provide data protection in the most cost effective manner?” The paper present methods to protect the entire data flow across systems in an enterprise while minimizing the need for cryptographic services. This session will also review some PCI requirements and corresponding approaches to protect the data including secure encryption, robust key management, separation of duties, and auditing.

Payment Card Industry Data Security Standard (PCI DSS) Encryption is a critical component of cardholder data protection. If an intruder circumvents other network security controls and gains access to encrypted data, without the proper cryptographic keys, the data is unreadable and unusable to that person. Other effective methods of protecting stored data should be considered as potential risk mitigation opportunities. For example, methods for minimizing risk include not storing cardholder data unless absolutely necessary, truncating cardholder data if full PAN(Primary Account Number) is not needed.

Oracle Database Security and PCI DSS Oracle Database security provides powerful data protection and access control solutions to address PCI-DSS requirements. Oracle Database Vault prevents highly privileged users from accessing the credit card information and helps reduce the risk of insider threats with separation of duty, multi-factor authorization and command rules. Oracle Advanced Security Transparent Data Encryption (TDE) provides the industry's most advanced database encryption solution, enabling encryption of credit card numbers with complete transparency to the existing application. Oracle Audit Vault consolidates and protects database audit data from across the enterprise. Oracle Audit Vault reports and alerts provide pro-active notification of access to credit card information. Oracle Enterprise Manager provides secure configuration scanning to insure your databases stay configured securely. Oracle Label Security extends user security authorizations to help enforce the need-to-know principle. Here are examples of some powerful general functionality that Oracle provides to address different data security requirements, including PCI requirements: To Make Data Un-readable

1. Data Masking 2. Column Level Encryption 3. Table Level Encryption

www.nyoug.org 212.978.8890 99

4. Database backup Encryption 5. Network Traffic Encryption

Access Control

1. Access Control with column filtering 2. Segregation of Duties - Protect Data Access from DBA and Privileged Users 3. Maintain identity of application users in access control 4. Multi-factor authorization - approved subnets, authentication methods and time based constraints 5. Row Level Access Control 6. Multi-level security (MLS) & Mandatory Access Control (MAC)

Reporting

1. A table is accessed between 9 p.m. and 6 a.m. or on Saturday and Sunday. 2. A specific column has been selected or updated. 3. A specific value for this column has been used. 4. Capture identity of application users in the database audit trail 5. An IP address from outside the corporate network is used. 6. Reporting across multiple database brands

PCI DSS - Protect Stored Cardholder Data Render PAN, at minimum, unreadable anywhere it is stored (including data on portable digital media, backup media, in logs, and data received from or stored by wireless networks) by using any of the following approaches:

1. One-way hashes (hashed indexes) 2. Truncation 3. Index tokens and pads, (pads must be securely stored) 4. Strong cryptography with associated key management processes and procedures

Protect Stored Cardholder Data in Oracle Databases Oracle Advanced Security Transparent Data Encryption (TDE) can be used to encrypt the number on media and backup. Optionally TDE can be used with Oracle RMAN to encrypt the entire backup when backed up to disk. Oracle Secure Backup provides a solution for backing up and encrypting directly to tape storage. Encryption algorithms supported include AES and 3DES with 128, 192 (default), or 256 bit key length. Oracle Advanced Security Transparent Data Encryption (TDE) has key management built-in. Encrypted column data stays encrypted in the data files, undo logs, and redo logs, as well as in the buffer cache of the system global area (SGA). SHA-1 and MD5 are used for integrity. Oracle Advanced Security Transparent Data Encryption (TDE) keys are stored in the database and encrypted using a separate master key that is stored in the Oracle Wallet, a PKCS#12 file on the operating system. The Oracle Wallet is encrypted using the wallet password; in order to open the wallet from within the database requires the 'alter system' privilege. Oracle Database Vault command rules can be implemented to further restrict who, when, and where the 'alter system' privilege can be executed. Please see ‘Oracle Implementation - Sample code’ for some implementation examples.

Protection of the Oracle Encryption Keys PCI DSS require protect of encryption keys used for encryption of cardholder data against both disclosure and misuse.Oracle Advanced Security Transparent Data Encryption (TDE) keys are stored in the database and encrypted using a separate master key that is stored in the Oracle Wallet, a PKCS#12 file on the operating system. The Oracle Wallet is encrypted using the wallet password; in order to open the wallet from within the database requires the 'alter system'

www.nyoug.org 212.978.8890 100

privilege. Oracle Database Vault command rules can be implemented to further restrict who, when, and where the 'alter system' privilege can be executed. Oracle Database 11g TDE integrates with PKCS#11 compliant hardware vendors for centralized master key generation and management. Oracle Advanced Security Transparent Data Encryption (TDE) uses a FIPS certified RNG (random number generator); master key can also be generated using certificates or PKI key pairs. Please review ‘Key Management with Oracle 11g TDE using PKCS#11‘for more information about this topic.

The Risk Based Data Classification Process

Step 1: Determine Data Risk Classification Levels The first step in developing a risk-based data security management plan is to determine the risk profile of all relevant data collected and stored by the enterprise, and then classify data according to its designated risk level. Sounds complicated, but it’s really just a matter of using common sense. Data that is resalable for a profit — typically financial, personally identifiable and confidential information — is high risk data and requires the most rigorous protection; other data protection levels should be determined according to its value to your organization and the anticipated cost of its exposure — would business processes be impacted? Would it be difficult to manage media coverage and public response to the breach? Then assign a numeric value for each class of data; high risk = 5, low risk = 1. Classifying data precisely according to risk levels enables you to develop a sensible plan to invest budget and efforts where they matter most.

Step 2: Map the Data Flow Data flows through a company, into and out of numerous applications and systems. A complete understanding of this data flow enables an enterprise to implement a cohesive data security strategy that will provide comprehensive protections and easier management resulting in reduced costs. Begin by locating all the places relevant data resides including applications, databases, files, data transfers across internal and external networks, etc. and determine where the highest-risk data resides and who has or can gain access to it (see ‘attack vectors’ section below). Organizations with robust data classification typically use an automated tool to assist in the discovery of the subject data. Available tools will examine file metadata and content, index the selected files, and reexamine on a periodic basis for changes made. The indexing process provides a complete listing and rapid access to data that meets the defined criteria used in the scanning and classification process. Most often, the indices created for files or data reflect the classification schema of data sensitivity, data type, and geographic region. High risk data residing in places where many people can/could access it is obviously data that needs the strongest possible protection. When the classification schema is linked to the retention policy, as described above, retention action can be taken based on file indices. Additionally, the reports based on the indices can be used to track the effectiveness of the data retention program. While we’re discussing data retention policies, it’s important to remember that data disposal also needs to be a secure process; usually you’ll opt to delete, truncate or hash the data the enterprise no longer needs to retain. Truncation will discard part of the input field. These approaches can be used to reduce the cost of securing data fields in situations where you do not need the data to do business and you never need the original data back again. It is a major business decision to destroy, truncate or hash the data. Your business can never get that data back again and it may be more cost effective to transparently encrypt the data and not impact current or future business processes. In addition, the sensitive data may still be exposed in your data flow and logs prior to any deletion or truncation step. Hash algorithms are one-way functions that turn a message into a fingerprint, at least twenty bytes long binary string to limit the risk for collisions. PCI DSS provided standards for strong encryption keys and key management but is vague in different points regarding hashing. Hashing can be used to secure data fields in situations where you do not need the data to do business and you never need the original data back again. Unfortunately a hash will be non-transparent to applications and database schemas since it will require long binary data type string. An attacker can easily build a (rainbow) table to expose the relation between hash values and real credit card numbers if the solution is not based on HMAC and a rigorous key management system. Salting of the hash can also be used if data is not needed for analytics. Done properly, data classification begins with categorization of the sensitivity of data (i.e., “public,”“sensitive,” “confidential,” etc). Classification goes on to include the type of data being classified, for example, “sensitive, marketing program,” and where applicable, the countries to which the data classification applies. The classification allows the

www.nyoug.org 212.978.8890 101

organization to automate the routines for flagging, removing, or archiving applicable data. Pay particular attention when automating the removal of data; consider instead alerting the user privileges of data requiring attention. Additionally, an understanding of where all the sensitive data resides usually results in a project to reduce the number of places where the sensitive is stored. Once the number of protection points has been reduced, a project to encrypt the remaining sensitive data with a comprehensive data protection solution provides the best protection while also giving the business the flexibility it needs, and requires a reduced investment in data protection costs.

Step 3: Understand Attack Vectors (Know Your Enemy) Use your data risk classification plan and the data flow map, along with a good understanding of criminals favored attack vectors, to identify the highest risk areas in the enterprise ecosystem. Currently web services, databases and data-in-transit are at high risk. The type of asset compromised most frequently is online data, not offline data on laptops, back-up tapes, and other media. Hacking and malware proved to be the attack method of choice among cybercriminals, targeting the application layer and data more than the operating system. But these vectors change so keep an eye on security news sites to stay abreast of how criminals are attempting to steal data. There are two countervailing trends in malware, both likely to continue. One trend is toward the use of highly automated malware that uses basic building blocks and can be easily adapted to identify and exploit new vulnerabilities. This is the malware that exploits unpatched servers, poorly defined firewall rules, the OWASP top ten, etc. This malware is really aimed as the mass market – SMEs and consumers. The other trend is the use of high-end malware which employs the “personal touch” – customization to specific companies, often combined with social engineering to ensure it’s installed in the right systems. This is the type of malware that got TJX, Hannaford, and now Heartland according to a recent report published on KnowPCI (http://www.knowpci.com.) The point is: the more we create concentrations of valuable data, the more worthwhile it is for malware manufacturers to put the effort into customizing a “campaign” to go after specific targets. So, if you are charged with securing an enterprise system that is a prime target (or partner with/outsource to a business that is a major target) you need to ensure that the level of due diligence that you apply to data security equals or exceeds that expended by malicious hackers, who are more than willing to work really, really hard to access that data. Reports about recent data breaches paint an ugly picture. In mid-March Heartland Security Systems has yet, they claim, to be able to determine exactly how many records were compromised in the breach that gave attackers access to Heartland’s systems, used to process 100 million payment card transactions per month for 175,000 merchants. Given the size and sophistication of Heartland’s business—it is one of the top payment-processing companies in the United States—computer-security experts say that a standard, in-the-wild computer worm or Trojan is unlikely to be responsible for the data breach. Heartland spokespeople have said publicly that the company believes that the break-in could be part of a "widespread global cyber fraud operation."

Step 4: Choose Cost-Effective Protections Cost-cutting is typically accomplished in one of two ways: reducing quality or by getting the most out of a business’ investment. Assuming you’ve wisely opted for the latter, look for multi-tasking solutions that protect data according to its risk classification levels, supports business processes, and is able to be change with the environment so that you can easily add new defenses for future threats and integrate it with other systems as necessary. Concerns about performance degradation, invasiveness, application support, and how to manage broad and heterogeneous database encryption implementations too often produce hard barriers to adopting this important security measure. Some aspects to consider when evaluating data security solutions for effectiveness and cost-control include: Access Controls and Monitoring The threat from internal sources including administrators will require solutions that go beyond traditional access controls. Effective encryption solutions must provide separation of duties to protect the encryption keys. A centralized solution can also provide the most cost effective strategy for an organization with a heterogeneous environment. Although some of the legal data privacy and security requirements can be met by native DBMS security features, many DBMSes do not offer a comprehensive set of advanced security options; notably, many DBMSes do not have separation of duties, enterprise key management, security assessment, intrusion detection and prevention, data-in-motion encryption, and intelligent auditing capabilities. This approach is suitable for protection of low risk data.

www.nyoug.org 212.978.8890 102

Tokenization The basic idea behind tokens is that each credit card number that previously resided on an application or database is replaced with a token that references the credit card number. A token can be thought of as a claim check that an authorized user or system can use to obtain the associated credit card number. Rule 3.1 of the PCI standard advises that organizations “Keep cardholder data storage to a minimum.” To do so, organizations must first identify precisely where all payment data is stored. While this may seem simple, for many large enterprises it is a complicated task for a large enterprise the data discovery process can take months of staff time to complete. And then security administrators must determine where to keep payment data and where it shouldn’t be kept. It’s pretty obvious that the fewer repositories housing credit card information, the fewer points of exposure and the lower the cost of encryption and PCI initiatives. In the event of a breach of one of the business applications or databases only the tokens could be accessed, which would be of no value to a would-be attacker. All credit card numbers stored in disparate business applications and databases are removed from those systems and placed in a highly secure, centralized tokenization server that can be protected and monitored utilizing robust encryption technology. Tokenization is a very hot “buzzword” but it still means many things to different people, and some implementations of it can pose an additional risk relative to mature encryption solutions. Companies are still being required to implement encryption and key management systems to lock down various data across the enterprise, including PII data, transaction logs and temporary storage. A tokenization solution would require a solid encryption and key management system to protect the tokenizer. Organizations use card numbers and PII data in many different places in their business processes and applications that would need to be rewritten to work with the token numbers instead. This approach is suitable for protection of high risk data. Please see the discussion of tokenization in http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1126002 . File-level Database Encryption File level Database Encryption has been proven to be fairly straight forward to deploy and with minimal impact on performance overhead, while providing convenient key management. This approach is cost effective since it installs quickly in a matter of days, utilizes existing server hardware platforms and can easily extend the protection to log files, configuration files and other database output. This approach is the fastest place to decrypt as it is installed just above the file system and encrypts and decrypts data as the database process reads or writes to its database files. This enables cryptographic operations in file system by block chunks instead of individually, row-by-row since the data is decrypted before it is read into the database cache. Subsequent hits of this data in the cache incur no additional overhead. Neither does the solution architecture diminish database index effectiveness, but remember that the index is in clear text and unprotected within the database. This approach can also selectively encrypt individual files; and does not require that "the entire database" be encrypted. Database administrators can assign one or more tables to a table space file, and then policies can specify which table spaces to encrypt. Therefore, one need only encrypt the database tables that have sensitive data, and leave the other tables unencrypted. However, some organizations choose to encrypt all of their database files because there is little performance penalty and no additional implementation effort in doing so. Production database requirements often use batch operations to import or export bulk data files. If these files contain sensitive data, they should be encrypted when at rest no matter how short the time they are at rest. (Note: some message queues such as MQ Series write payload data to a file if the message queue is backed up, sometime for a few seconds or up to hours if the downstream network is unavailable) It may be difficult to protect these files with column level encryption solutions. This approach can encrypt while still allowing transparent access to authorized applications and users. This approach is suitable for protection of low risk data. Be aware of the limitations with this approach in the areas of no separation of DBA duties and potential issues that operating system patches can cause. File encryption doesn’t protect against database-level attacks. How are you going to effectively and easily keep administrators from seeing what they don’t need to see with file-level encryption? Protection of high risk data is discussed below in the sections ‘Field level encryption’ and ‘End-to-end encryption’.

www.nyoug.org 212.978.8890 103

Experience from some organizations has shown that the added performance overhead for this type of database encryption is often less than 5%. However, before deciding on any database file encryption solution, you should test its performance in the only environment that matters: your own. Field-level Encryption Field level full or partial encryption/tokenization can provide cost effective protection of data fields in databases and files. Most applications are not operating on and should not be exposed to all bytes in fields like credit card numbers and social security numbers, and for those that do require full exposure an appropriate security policy with key management and full encryption is fully acceptable. This approach is suitable for protection of high risk data. Continuous protection via end-to-end encryption at the field level is an approach that safeguards information by cryptographic protection or other field level protection from point-of-creation to point-of deletion to keep sensitive data or data fields locked down across applications, databases, and files - including ETL data loading tools, FTP processes and EDI data transfers. ETL (Extract, Transform, and Load) tools are typically used to load data into a data warehousing environments. This end-to-end encryption may utilize partial encryption of data fields and can be highly cost effective for selected applications like an e-business data flow. Field-level Encryption and End-to-End Encryption End-to-end encryption is an elegant solution to a number of messy problems. It’s not perfect; field-level end-to-end encryption can, for example, break some applications, but its benefits in protecting sensitive data far outweigh these correctable issues. But the capability to protect at the point of entry helps ensure that the information will be both properly secured and appropriately accessible when needed at any point in its enterprise information life cycle. End-to-end data encryption can protect sensitive fields in a multi-tiered data flow from storage all the way to the client requesting the data. The protected data fields may be flowing from legacy back-end databases and applications via a layer of Web services before reaching the client. If required, the sensitive data can be decrypted close to the client after validating the credentials and data-level authorization. Today PCI requires that if you’re going outside the network, you need to be encrypted, but it doesn’t need to be encrypted internally. If you add end-to-end encryption, it might negate some requirements PCI have today, such as protecting data with monitoring and logging. Maybe you wouldn’t have to do that. So PCI Security Standards Council is looking at that in 2009. Data encryption and auditing/monitoring are both being necessary for a properly secured system - not one vs. the other. There are many protections that a mature database encryption solution can offer today that cannot be had with some of the monitoring solutions that are available. Installing malicious software on internal networks to sniff cardholder data and export it is becoming a more common vector for attack, and by our estimates is the most common vector of massive breaches, including TJX, Hannaford, Heartland and Cardsystems. Storage-layer encryption or file layer encryption doesn’t provide the comprehensive protection that we need to protect against these attacks. There is a slew of research indicating that advanced attacks against internal data flow (transit, applications, databases and files) is increasing, and many successful attacks were conducted against data that the enterprise did not know was on a particular system. Using lower-level encryption at the SAN/NAS or storage system level can result in questionable PCI compliance, and separation of duties between data management and security management is impossible to achieve. Please see the discussion of end-to-end encryption in http://papers.ssrn.com/sol3/papers.cfm?abstract_id=940287 . The Operational Impact Profile of different Database Protection Approaches is summarized in this diagram:

www.nyoug.org 212.978.8890 104

Compensating Controls PCI compensating controls are temporary measures you can use while you put an action plan in place. Compensating controls have a “shelf life” and the goal is to facilitate compliance, not obviate it. The effort of implementing, documenting and operating a set of compensating controls may not be cost effective in the long run. This approach is only suitable for temporary protection of low risk data.

Software-based Encryption Many businesses also find themselves grappling with the decision between hardware-based and software-based encryption. Vendors selling database encryption appliances have been vociferously hawking their wares as a faster and more-powerful alternative to software database encryption. Many organizations have bought into this hype based on their experiences with hardware-based network encryption technology. The right question would be about the topology or data flow. The topology is crucial. It will dictate performance, scalability, availability, and other very important factors. The topic is important but the question is usually not well understood. Usually, hardware-based encryption is remote and software-based encryption is local but it doesn’t have anything to do with the form factor itself. Instead, it is about where the encryption is happening relative to your servers processing the database information. Software to encrypt data at the table or column levels within relational database management systems is far more scalable and performs better on most of the platforms in an enterprise, when executing locally on the database server box. Software based encryption combined with an optional low cost HSM for key management operations will provide a cost effective solution that proves to be scalable in an enterprise environment. The most cost effective solutions can be deployed as software, a soft appliance, a hardware appliance or any combination of the three, depending on security and operational requirements for each system. The ability to deploy a completely "green" solution, coupled with deployment flexibility, make these solution alternatives very cost effective also for shared hosting and virtual server environments. The green solution is not going away. There’s too much at stake. Local encryption services are most often implemented in software and remote encryption services are frequently implemented in hardware:

www.nyoug.org 212.978.8890 105

Step 5: Deployment Focus initial efforts on hardening the areas that handle critical data and are a high-risk target for attacks. Continue to work your way down the list, securing less critical data and systems with appropriate levels of protection. Be aware though that the conventional “Linked Chain” risk model used in IT security — the system is a chain of events, where the weakest link is found and made stronger — isn’t the complete answer to the problem. There will always be a weakest link. Layers of security including integrated key management, identity management and policy-based enforcement as well as encryption of data throughout its entire lifecycle are essential for a truly secure environment for sensitive data. It is critical to have a good understanding of the data flow in order to select the optimal protection approach at different points in the enterprise. By properly understanding the dataflow we can avoid less cost effective point solutions and instead implement an enterprise protection strategy. A holistic layered approach to security is far more powerful than the fragmented practices present at too many companies. Think of your network as a municipal transit system – the system is not just about the station platforms; the tracks, trains, switches and passengers are equally critical components. Many companies approach security as if they are trying to protect the station platforms, and by focusing on this single detail they lose sight of the importance of securing the flow of information. It is critical to take time from managing the crisis of the moment to look at the bigger picture. One size doesn’t fit all in security so assess the data flow and risk environment within your company and devise a comprehensive plan to manage information security that dovetails with business needs. Careful analysis of use cases and the associated threats and attack vectors can provide a good starting point in this area. It is important to note that implementing a series of point solutions at each protection point will introduce complexity that will ultimately cause significant rework. Protecting each system or data set as part of an overall strategy and system allows the security team to monitor and effectively administer the encryption environment including managing keys and key domains without creating a multi-headed monster that is difficult to control. Centralized management of encryption keys can provide the most cost effective solution for an organization with multiple locations, heterogeneous operating systems and databases. All standards now require rotation of the Data Encryption Keys (DEK’s) annually and some organizations choose to rotate some DEKs more frequently (such as a disconnected terminal outside the corporation firewall such as a Point of Sale system). Manual key rotation in a point solution would require an individual to deliver and install new keys every year on all the servers. Automated key rotation through a central key management system reduces most of this cost and can potentially reduce the down time. Distributed point solutions for key management would include an initial investment for each platform, integration effort, maintenance and operation of several disparate solutions. It is our experience that manual key rotation in a point solution environment inevitably leads to increased down time, increase resource requirements, and rework. Key management and key rotation is an important enabler for several of the protection methods discussed above. Please see http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1051481 for more information on that topic. Centralized management of reporting and alerting can also provide a cost effective solution for an organization with multiple heterogeneous operating systems and databases. This solution should track all activity, including attempts to change security policy, and encrypted logs to ensure evidence-quality auditing. Just as the keys should not be managed by the system and business owners, they should not have access to or control over the reporting and alerting logs and system A system with manual or nonexistent alerting and auditing functionality can increase the risk of undetected breaches and increase audit and reporting costs.

www.nyoug.org 212.978.8890 106

The impact of implementing the different Database Protection Approaches at different system layers is summarized in this diagram:

Build vs. Buy Many projects that have made the build vs. buy decision purely based on the misconceived notions from management about one option or the other. This is a decision that requires analysis and insight. Why re-invent the wheel if several vendors already sell what you want to build? Use Build or Buy Analysis to determine whether it is more appropriate to custom build or purchase a product. When comparing costs in the buy or build analysis, include indirect as well as direct costs in both sides of the analysis. For example, the buy side of the analysis should include both the actual out-of-pocket cost to purchase the packaged solution as well as the indirect costs of managing the procurement process. Be sure to look at the entire life-cycle costs for the solutions.

1. Is the additional risk of developing a custom system acceptable? 2. Is there enough money to analyze, design, and develop a custom system? 3. Does the source code have to be owned or controlled? 4. Does the system have to be installed as quickly as possible? 5. Is there a qualified internal team available to analyze, design, and develop a custom system? 6. Is there a qualified internal team available to provide support and maintenance for a custom developed system? 7. Is there a qualified internal team available to provide training on a custom developed system? 8. Is there a qualified internal team available to produce documentation for a custom developed system? 9. Would it be acceptable to change current procedures and processes to fit with the packaged software?

Data Protection in Testing, Staging and Production Environments In some cases traditional data masking cannot provide the quality of data that is needed in a test environment. Masking can only be used to reduce the cost of securing data fields in situations where you do not need the data to do business and never need the original data back again. There is a need during the development lifecycle to be able to perform high quality test scenarios on production quality test data by reversing the data hiding process. The consistency of data protection tools across an enterprise is a very important strategy for assuring that sensitive data in each environment across the enterprise is properly protected and in compliance with growing regulatory requirements. Encryption and data protection is a strategically important process in an enterprise across Test, Operational and Archive environments. Several of the data protection approaches discussed earlier can assure that sensitive data is protected across development, testing, staging and production environments. These approaches for data protection can also be used in outsourced environments and on virtual servers, when data level protection can be a powerful way of enforcing separation of duties.

Oracle Data Masking The Data Masking Pack for Databases helps organizations share production data in compliance with privacy and confidentiality policies by replacing sensitive data with realistic but scrubbed data based on masking rules. There are 2

www.nyoug.org 212.978.8890 107

primary use cases for the Data Masking Pack. First, DBAs want to take a copy of production for testing purposes and use the Data Masking Pack to replace all sensitive data with innocuous but realistic information and then make this database available to developers. Second, organizations want to share production data with 3rd parties while hiding sensitive or personally identifiable information.

Limitations of Data Masking

1. The data masking process cannot be reversed. This is a one-way process (transformation - scramble, obfuscate ...). 2. Data used for test/development is masked according to different rules. 3. It can be used to de-identify confidential information to protect privacy only in non-production environment. 4. It cannot be generally used for securing data in a production environment. 5. There is a need in the development lifecycle to support required test scenarios by reversing the data protection

process to be able to perform a high quality test. A reversible protection process, audit-ability and accountability are important in these situations.

6. If necessary to reload the test database, the data masking rules would need to be applied to the original data again and this could take some time for a large database.

Data Protection in the Development Life Cycle:

Key Management with Oracle 11g TDE Using PKCS#11 Starting with Oracle 10.2, Oracle supports Transparent Data Encryption (TDE). Individual columns may be encrypted, and as the name indicates, this encryption is transparent to the database user. TDE is part of Oracle Advanced Security. Though TDE encryption is per column, keys are rather per table. All encrypted columns within the same table share the same key. These table keys are stored inside the database in a dictionary table, encrypted with a master key. The master key, which may either be a symmetric key or an asymmetric key pair, is however not stored within the database, but rather in some external module. This enables separation of duties, but also means a separate backup procedure for the master key is necessary.

www.nyoug.org 212.978.8890 108

In Oracle 10.2 the TDE master key is stored in an Oracle Wallet. This is a password-protected container where different security objects like certificates and private keys may be stored, normally in PKCS#12 format. Principal use for wallets is not really TDE, they are also used for keys and certificates associated with SSL communication. There are two wallet types. A standard type wallet stores all credentials inside the wallet, whereas a PKCS#11 type wallet keeps some objects in an external PKCS#11 token. In the latter case the token password is stored inside the wallet, enabling access to the external token objects. Even if Oracle 10.2 uses wallets to protect the TDE master key, only the standard type wallet is supported for TDE. If using a PKCS#11 type, either errors are raised or the objects inside the PKCS#11 token aren’t used. Hence TDE master keys can’t be protected inside an HSM if the 10.2 database version is used. Starting with Oracle 11.1, HSM support for Oracle TDE has been added. This support doesn’t use a PKCS#11 type wallet, though. It has a separate, non-wallet based implementation. The master key may be stored directly inside an HSM instead of a wallet. The HSM interface used is however still PKCS#11. Oracle 11.1 has also added support for Tablespace Encryption, which is basically file encryption. It has a separate master key without HSM support. Tablespace encryption is not considered in this paper; it only looks at Oracle TDE key management integration. As such this paper also doesn’t cover access control/audit. A general TDE consideration is that it really is transparent; access control is not part of TDE. Access control is rather accomplished with the Oracle Database Vault feature. TDE uses its own internal format for encryption. As specified in the 11.1 version of Oracle Advanced Security Administrator’s Guide: "…Each encrypted value is associated with a 20 byte integrity check. In addition, transparent data encryption pads out encrypted value to 16 bytes. This means that if a credit card number requires 9 bytes for storage, then an encrypted credit card value will require an additional 7 bytes. Also, if data has been encrypted with salt, then each encrypted value requires an additional 16 bytes of storage. To summarize, encrypting a single column would require between 32 and 48 bytes of additional storage for each row, on average…" This is a bit cryptic, but assumingly all encrypted values include a SHA-1 checksum. The cleartext + checksum is padded to the block size before encryption. In addition there may also be an IV (salt) value attached to the data. This data format would be similar to DPS encryption, though in Oracle TDE SHA-1 is used instead of CRC-32 for integrity, and integrity is not optional.

Performance When Using Oracle TDE Since TDE is database internal, it may be very fast, but performance will however depend on how it’s used: Using symmetric master key inside wallet Using asymmetric master key inside wallet Using symmetric master key inside HSM As seen in the Using Oracle TDE with HSM master key section below, table keys are loaded once per statement as necessary for execution. This won’t affect performance that much for the first option, in which case the key load process is fast. But for the others performance may suffer. The second option includes a private key decryption, which is rather slow compared to using a symmetric key. The third option includes an HSM decryption, which may be slow depending on the HSM used. Here the HSM callout time is critical. To give an example of Oracle TDE encryption, a script performance test may be used. This is a script measuring performance when doing INSERT/UPDATE/SELECT of 10.000 rows in a table with a single encrypted column. In the tables below the columns mean: If running a test with Oracle TDE 10.2 and symmetric master key inside wallet, one may get numbers as below. Each encryption test is performed twice, with and without index on the encrypted column. SQL Rows Iter CLEAR NOIDX INDEX NOIDX/CLEAR INDEX/CLEAR --- ----- ----- ------ ------ ------ ------ ------ ------ ------ ------ ------ Ins 1 10000 0.58 0.81 0.90 1.22 1.31 1.40 1.55 2.10 2.26

www.nyoug.org 212.978.8890 109

Sel 1 10000 8.52 8.86 8.89 8.89 9.11 1.04 1.04 1.04 1.07 Sel 10 1000 1.03 1.17 1.17 1.16 1.19 1.14 1.14 1.13 1.16 Sel 100 100 0.27 0.39 0.39 0.39 0.39 1.44 1.44 1.44 1.44 Sel 1000 10 0.20 0.32 0.30 0.31 0.31 1.60 1.50 1.55 1.55 Sel 10000 1 0.20 0.29 0.31 0.30 0.30 1.45 1.55 1.50 1.50 Sel Enc 1 1 0.00 0.08 0.09 0.00 0.03 999.99 999.99 999.99 999.99 Upd 1 10000 0.50 0.77 0.78 1.93 1.72 1.54 1.56 3.86 3.44 Upd 10 1000 0.10 0.21 0.22 0.67 2.89 2.10 2.20 6.70 28.90 Upd 100 100 0.03 0.18 0.21 2.83 2.80 6.00 7.00 94.33 93.33 Upd 1000 10 0.04 0.17 0.17 3.80 5.50 4.25 4.25 95.00 137.50 Upd 10000 1 0.10 0.30 0.26 2.42 2.76 3.00 2.60 24.20 27.60 Del 1 10000 0.59 0.59 0.74 0.99 1.29 1.00 1.25 1.68 2.19 A TDE index will be used for equality comparisons (as used in the test script), but not for range type of comparisons, for which a table scan will occur. Concerning the use of an asymmetric master key inside wallet, it’s said in the Oracle Advanced Security Administrator’s Guide: "… encryption using current PKI algorithms requires significantly more system resources than symmetric key encryption. Using a PKI key pair as a master encryption key may result in greater performance degradation when accessing encrypted columns in the database..." Running the same performance test in 10.2 as for the symmetric master key above, using a 1024-bit RSA key, the performance numbers may be like: SQL Rows Iter CLEAR NOIDX INDEX NOIDX/CLEAR INDEX/CLEAR --- ----- ----- ------ ------ ------ ------ ------ ------ ------ ------ ------ Ins 1 10000 1.67 44.77 45.10 45.25 45.93 26.81 27.01 27.10 27.50 Sel 1 10000 8.39 53.58 53.64 53.57 53.52 6.39 6.39 6.38 6.38 Sel 10 1000 1.02 5.65 5.65 5.63 5.62 5.54 5.54 5.52 5.51 Sel 100 100 0.27 0.85 0.85 0.84 0.85 3.15 3.15 3.11 3.15 Sel 1000 10 0.20 0.35 0.36 0.36 0.36 1.75 1.80 1.80 1.80 Sel 10000 1 0.20 0.30 0.31 0.31 0.31 1.50 1.55 1.55 1.55 Sel Enc 1 1 0.00 0.10 0.09 0.00 0.01 999.99 999.99 999.99 999.99 Upd 1 10000 0.50 44.85 44.71 46.21 46.18 89.70 89.42 92.42 92.36 Upd 10 1000 0.10 4.63 4.61 5.12 5.12 46.30 46.10 51.20 51.20 Upd 100 100 0.03 0.61 0.61 1.41 0.88 20.33 20.33 47.00 29.33 Upd 1000 10 0.04 0.59 0.21 2.94 0.51 14.75 5.25 73.50 12.75 Upd 10000 1 0.33 0.27 0.43 2.56 1.11 0.82 1.30 7.76 3.36 Del 1 10000 0.66 0.59 0.62 1.26 0.85 0.89 0.94 1.91 1.29 If using a 512-bit key for comparison, the numbers may be like: SQL Rows Iter CLEAR NOIDX INDEX NOIDX/CLEAR INDEX/CLEAR --- ----- ----- ------ ------ ------ ------ ------ ------ ------ ------ ------ Ins 1 10000 1.43 26.82 27.22 27.85 27.38 18.76 19.03 19.48 19.15 Sel 1 10000 9.00 36.23 36.89 35.65 35.73 4.03 4.10 3.96 3.97 Sel 10 1000 1.07 3.92 3.92 3.85 3.86 3.66 3.66 3.60 3.61 Sel 100 100 0.28 0.67 0.67 0.67 0.66 2.39 2.39 2.39 2.36 Sel 1000 10 0.21 0.35 0.35 0.34 0.34 1.67 1.67 1.62 1.62 Sel 10000 1 0.20 0.31 0.31 0.31 0.32 1.55 1.55 1.55 1.60 Sel Enc 1 1 0.02 0.08 0.11 0.00 0.01 4.00 5.50 0.00 0.50 Upd 1 10000 0.51 26.94 26.81 28.46 28.33 52.82 52.57 55.80 55.55 Upd 10 1000 0.08 2.79 2.83 3.37 3.78 34.88 35.38 42.13 47.25 Upd 100 100 0.05 0.67 0.44 1.67 0.95 13.40 8.80 33.40 19.00 Upd 1000 10 0.03 0.29 0.20 1.88 1.25 9.67 6.67 62.67 41.67 Upd 10000 1 0.17 0.23 0.34 5.20 3.39 1.35 2.00 30.59 19.94

www.nyoug.org 212.978.8890 110

Del 1 10000 0.66 0.59 0.61 0.83 1.05 0.89 0.92 1.26 1.59 As seen above, using an asymmetric master key may in many cases be much slower than using a symmetric key. Numbers are dependent on the key size used. Time for asymmetric operations are much dependent on this size. Some numbers are however still the same as for the symmetric master key case. This variance is due to Oracle’s load of the table keys for each new statement. Single-row statements like Upd 1 will have the table key loaded for each encryption, and will hence be much slower in the asymmetric case. But multi-row statements like Sel 10000, for which many decryptions are performed per key load, won’t be affected that much. What about performance for symmetric master key inside HSM? This will be rather dependent on the PKCS#11 library/HSM used. A slow HSM would typically look like the asymmetric case, an HSM decryption is performed each time the table key is loaded. A SW based, fast library would be closer to the symmetric case. Using Oracle TDE with HSM Master Key To investigate how the TDE keys are handled when using an HSM for master key storage, and if it would be possible for some type of integration, tests have been performed looking at what type of PKCS#11 calls that are produced from Oracle 11.1 in this context. To use the HSM support, TDE must first be defined for HSM utilization. If using the normal wallet based approach, as also supported in 10.2, one would normally add a parameter in SQLNET.ORA like: ENCRYPTION_WALLET_LOCATION= (SOURCE=(METHOD=FILE)(METHOD_DATA = (DIRECTORY = c:\oracle\admin\orcl\wallet))) Here DIRECTORY points to where to find/put the Oracle wallet used. When using the HSM support the ENCRYPTION_WALLET_LOCATION parameter should be changed to one of ENCRYPTION_WALLET_LOCATION= (SOURCE=(METHOD=HSM)) ENCRYPTION_WALLET_LOCATION= (SOURCE=(METHOD=HSM)(METHOD_DATA = (DIRECTORY = c:\oracle\admin\orcl\wallet))) The first applies if there isn't any existing SW wallet, the second would be used if migrating from a SW wallet into HSM support. Oracle HSM integration is based on PKCS#11. Oracle searches for the PKCS#11 DLL to use based on a directory, the DLL should be put into a pre-defined path. This path will be different in UNIX and Windows: UNIX: /opt/oracle/extapi/[32,64]/hsm/{VENDOR}/{VER}/libapiname.ext Windows: %SYSTEM_DRIVE%\oracle\extapi\[32,64]\hsm\{VENDOR}\{VER}\libapiname.ext Here [32,64] indicates a 32- or 64-bit library, VENDOR is the library vendor, VER is the library version, and apiname.ext is the name of the PKCS#11 DLL. A 32-bit library, named cryptoki.dll, with version 1.2.0, produced by Protegrity, would be in a Windows environment with %SYSTEM_DRIVE% = c: placed in c:\oracle\extapi\32\hsm\Protegrity\1.2.0\libcryptoki.dll This type of DLL definition seems a bit premature, it seems more logical to specify the path in ENCRYPTION_WALLET_LOCATION. But it’s the type of handling used.

www.nyoug.org 212.978.8890 111

Example of Oracle Software Pricing A company will typically require the following options of Oracle software at list cost listed below as well as for Enterprise editions:

1. Oracle Advanced Security (TDE) 10g or 11g ( $ 11.5K Per Processor) 2. Oracle Vault ( $ 23K Per Processor) 3. Oracle Audit Vault ($ 57.5K per processor Oracle and 3.5K for SQL Server per Processor) 4. Oracle Label Security ( $ 5, 750 Per Processor)

Oracle Implementation - Sample Code Using DBMS_CRYPTO (Oracle 10G+)

Using TDE - alter system set encryption wallet open authenticated by "remnant";

Using DBMS_FGA

www.nyoug.org 212.978.8890 112

Conclusion Risk-based prioritization replaces the all too common and costly triage security model —which is ineffective whether you’re triaging based on compliance needs or the security threat of the moment — with a thought-out logical plan that takes into account long range costs and benefits as well as enabling enterprises to target their budgets towards addressing the most critical issues first. It’s a balanced approach that delivers the enhanced security, reduced costs and labor with the least impact on business processes and the user community. The Oracle PCI compliance features are in general limited to the Oracle platform only except for the Oracle Audit Vault collection agent which also works with SQL Server. In order to get the compliance auditing you would need Oracle Audit Vault, specific column access requires Oracle Label Security and Oracle Vault is required to get specific columns encrypted, and to do entire tables you will need Oracle Advanced Security (TDE). Consider to add a third party product to provide a cost effective management of different data protection options, including encryption keys, policies and reporting, across the entire enterprise environment. Or to meet Oracle PCI compliance you can Purchase Oracle Advanced Security TDE and tokenize the data while turning on all auditing of the database table access which adds at minimum a 10% overhead just for the auditing on 10g but this does not provide the detail level reporting on access to the data. This also will not work with all applications as in 10g the datatypes change when using Oracle Advanced Security TDE Option. In 11g although Oracle has overcome most of the limitations on datatype alterations which will provide more compatibility to other applications.

Your Ad Here!

Vendors, place your advertisement in the NYOUG Tech Journal. Let our members know you want to do business with them.

Ad Options Available: Full Page – Black/White or Color

Half-Page – B/W only

Sponsorships: General Meeting – Primary and Secondary Special Interest Group

Journal Ad only Most sponsorship packages include color and/or black/white ads.

www.nyoug.org 212.978.8890 113

Advanced Report Printing with Oracle APEX Marc Sewtz, Oracle USA, Inc.

Introduction Oracle Application Express (Oracle APEX), a feature of the Oracle Database, combines rapid web application development with the power of the Oracle Database. Its easy to use browser based application builder enables developers and non-programmers to develop and deploy data driven web applications in very little time. This paper discusses the built-in reporting capabilities of Oracle APEX with a focus on high-fidelity report printing and the integration of Oracle APEX with Oracle Business Intelligence (BI) Publisher and Apache FOP. Oracle APEX provides a number of different reporting options. The first option, classic reports, allows to easily create reports based on SQL queries and PL/SQL functions returning SQL statements, and provides full control over the generated HTML through report templates. Classic reports are widely used throughout the Oracle APEX IDE and are also the basis for the built-in multi-row update capabilities (tabular forms). The second option, interactive reports, which were introduced with Oracle APEX 3.1 and are based on a new Web 2.0 reporting engine, provides the end user with advanced AJAX based filtering and customization capabilities. The third reporting option is called report queries. Report queries are defined as shared components and allow to centrally define reports for use in many different places throughout the application. A report query can be based on multiple SQL source statements, allowing to combine multiple reports, as well as charts in a single high-fidelity report document. In Oracle APEX, reports can be exported as PDF, Word, Excel, HTML or XML Documents. Once a reporting server has been configured and the definition details entered into Application Express, any report region can be easily exported as a printable report, including customizable report attributes and automatic page numbering. With the declarative print customization interface a range of attributes associated with the output document can be defined, e.g. the report page size, background color, page header, etc. Using Oracle BI Publisher as the reports server, “high fidelity” reports can be incorporated. Oracle BI Publisher provides a MS Word plug-in to develop complex report templates, which can include multiple tables (e.g. master – detail), charts, and parameters passed from Application Express. These report templates can then be loaded into Application Express and associated with report queries and report regions. Report queries can be accessed via a URL, and called from a button or link.

Classic Reports Using the application builder, reports can be quickly created as the formatted result of a SQL query. An easy report wizard guides the user through building a report without requiring knowledge of SQL. Features of the reporting engine include: Controlling Report Layout and Pagination Column linking to other reports or charts Column Breaks Column based sorting Controlling when columns display Applying HTML based expressions to column values Output to comma delimited files or Print them to PDF, Word or Excel documents

Interactive Reports The interactive reporting region is an innovative new technology implementation that allows end users to customize reports. This reduces development time and effort while simultaneously enhancing application functionality. This dynamic reporting region allows users to: Customize the layout of the data by choosing the columns they wish to view/ display and applying filters Enabling highlighting and sorting Define breaks, aggregations, different charts, and their own computations

www.nyoug.org 212.978.8890 114

Create multiple variations of the report and save them as named reports Output to comma delimited files or Print them to PDF, Word or Excel documents

Figure 1: Interactive Report Region

Report Queries Reports can be generated and printed by defining a report query as a shared component. A report query identifies the data to be extracted. Unlike SQL statements contained in regions, report queries contain SQL statements that are always validated when the query is saved. Note that report queries must be based on SQL statements. PL/SQL functions returning SQL statements are currently not supported. A report query can be associated with a report layout and downloaded as a formatted document. If no report layout is selected, a generic layout is used. The generic layout is intended to be used to test and verify a report query. When using the generic layout option and multiple source queries are defined, only the first result set is included in the print document. The reports can include session state of the current application. To make these reports available to end users, they need to be integrated into an application. For example, they can be associated with a button, list item, branch, or other navigational component that allows the use of URLs as targets. Selecting that item then initiates the printing process.

Architecture All printing options in Oracle APEX share the same architecture. When the end user of the application clicks on a print link, the request is sent to the Oracle APEX engine (which is part of the Oracle database). The Oracle APEX engine then generates the corresponding report data in XML format and sends the XML file to the external reporting engine along with the report template in XSL-FO or RTF format. The external reporting engine then transforms the data and the template into a PDF document, which is returned to the database using either the convert Servlet that ships with BI Publisher 10.1.3.2 (formerly known as Oracle XML Publisher) or the apex_fop.jsp Java Server Page, which ships with Oracle APEX 3.0 and above. All of the architectural complexity is transparent to the end user and developer. Developers only need to enable report printing on the report attributes page and end users just need to click on a print link to initiate the download of the print report.

www.nyoug.org 212.978.8890 115

Figure 2: Oracle APEX Report Printing Architecture

Configuration Options There are a number of print servers available that can be used with Oracle APEX. Oracle BI Publisher, OC4J with Apache FOP or other XSL-FO processing engines, such as Cocoon, can be configured to process the supplied XML data and RTF or XSL-FO style sheets. If Oracle BI Publisher is chosen as the report server, a higher level of functionality is available. To accommodate the difference in functionality, Oracle APEX provides two levels of printing support - Standard and Advanced. With standard printing support, reports are limited to XSL-FO based report templates. Standard Report Printing The standard configuration can be implemented with Apache FOP or another standard XSL-FO processing engine. Beginning with Oracle APEX 3.0.1, a supported configuration of Apache FOP in conjunction with Oracle Containers for J2EE (10.1.3.2) is included. This provides declarative formatting of report regions and report queries with basic control over page attributes. These attributes include orientation, size, and column heading formats, page header, and page footer. Advanced Report Printing The advanced configuration requires a valid license for Oracle BI Publisher. With the advanced configuration, all the capabilities of standard configuration are available plus the ability to define RTF-based report layouts developed using the BI Publisher Word Template Plug-In and additional export formats. This provides easy graphical control over every aspect of your report. Logos can be added to pages, complex control breaks can be used, and full pagination control is available. Even charts can be embedded and reports can be created that look exactly like standard government forms. Advantages of Using BI Publisher within Application Express Using Oracle BI Publisher as the Oracle APEX print server provides the following benefits: Seamless: Print capabilities are fully integrated into Oracle APEX Multiple Output Formats: Export to Word, Excel, and HTML output in addition to PDF.

www.nyoug.org 212.978.8890 116

Robust Report Layout: Developers can use RTF based templates providing significantly greater control over control breaks, headers, footers, and provide the ability to embed charts.

Support for non-western European fonts: Superior localization capabilities Single file Export/Import: RTF based report layouts are part of your application definition so they are exported and

imported along with the application. In addition to the advantages of using Oracle BI Publisher within Oracle APEX, Oracle BI Publisher can benefit organizations in other ways: Scheduling & Delivery Create Reports based on Multiple SQL Queries Heterogeneous Database Support Bursting and Report Caching

High Fidelity Report Printing Using Report Queries Report queries are defined as shared components of an application. They are integrated with the UI of an application by associating them with a button, list item, branch, or other navigational component that allows the use of URLs as targets. Integration with buttons can be done declaratively as part of the create button wizard. For other components, report queries can be referenced by their unique names or IDs as part of a URL. F?P=&APP_ID.:0:&SESSION.:PRINT_REPORT=[UNIQUE NAME OR ID]

Creating a Report Query The following output formats can be selected when creating and editing a report query. When using Oracle BI Publisher, all options are available. When using Apache FOP, then only PDF and XML can be selected. To enable the end user choose the output format at runtime, the format can also be determined based on either a page or application item. PDF - Adobe Portable Document Format MS Word - Microsoft Word Rich Text Format MS Excel - Microsoft Excel Format HTML XML - Extensible Markup Language

When a print document is returned back to the client, the user is typically presented with an open file/save-as dialog. This is controlled by the so-called content-disposition. To show the open file/save-as dialog is, the content disposition needs to be set to “Attachment”. If the print document is to be shown inside the browser window, rather than an external window, the content disposition should be set to “Inline”. The “Inline” option should also be chosen if MS Internet Explorer is used and writing temporary files to the local file system is prohibited by company policy. When opening printing document with content disposition set to “Attachment”, the MS Internet Explorer first attempts to write a temporary file to the file system. Another important attribute when creating and editing report queries is the inclusion of session state. This is useful, if in addition to the report result set, application data is required for rendering the print document. An example is printing a master-detail form, where the detail data is stored in a report, but the master record set is stored in page items. If the value of these items is to be shown in the print document, inclusion of session state needs to be enabled and the relevant items need to be selected. A report query can be based on one or multiple SQL statements. Multiple SQL statements are used to include multiple reports in a single document, or to combine reports and charts. The SQL is parsed and validated upon creation of the report query and when editing. PL/SQL functions returning SQL statements are currently not supported with report queries. However PL/SQL functions are supported with report regions, and report regions can be associated with RTF layouts just like report queries. After all SQL source queries have been entered, the report can be exported in XML format. This can be either an XML file based on the XML structure of the result set, or an XML schema document. Both formats are supported data sources

www.nyoug.org 212.978.8890 117

for the Oracle BI Publisher Word Plug-In. Some third-party tools that can be used for the report design instead of Oracle BI Publisher, only support XML schema as a data source. <?xml version="1.0"?> <ROWSET> <ROW num="1"> <EMPNO>100</EMPNO> <ENAME>Jo Bloggs</ENAME> <JOB>CLERK</JOB> <SAL>100</SAL> <DNAME>ACCOUNTING</DNAME> </ROW> <ROW num="2"> <EMPNO>100</EMPNO> <ENAME>Jane Doe</ENAME> <JOB>CLERK</JOB> <SAL>100</SAL> </ROW> ... </ROWSET> After creating a report layout, either as an RTF file using the Oracle BI Publisher Word Plug-In, or as an XSL-FO style sheet, it can be uploaded in Oracle APEX as part of the report query creation wizard, or directly as a report layout under shared components. When uploading the layout in the report query wizard, a separate report layout holding the file is created automatically. As discussed above, report queries can be integrated with an application either declaratively in the create-button wizard, or by referencing a URL containing the report query name or ID. The Oracle BI Publisher Word Plug-In allows including data in several different ways. Single values, such as page and application item values can be included as fields. Report data can be included using the table wizard or table/data dialog. Additionally, data can be visualized as a chart, rendered as a cross tab and conditionally be included or formatted. The chart wizard provides many common chart types, such as vertical and horizontal bar charts, stacked charts, line charts, pie charts, etc. There are also a number of color schemes called styles available, as well as options to apply a 3D look and gradient effects. The data chosen to be included in the chart can be aggregated using sum, average or count operators.

www.nyoug.org 212.978.8890 118

Figure 3: PDF Report with Report and Chart

Including Dynamic Images in Report Queries In addition to including data driven components, such as reports and charts, it is also possible to include images in the report layout. Images can be included as static images, e.g. company logos, etc, by simply placing them into the Word document. And it is also possible to dynamically include images that stored in the database. A common use case would be a report on product data that also includes product images. To include those images in a PDF report, two things are needed: the image needs to be included in the generated XML representation of the report data and the RTF report layout needs to include instructions on what to do with the image information stored in the XML data. To get the BLOB values to be included in the XML data, they need to be base64 encoded. The Oracle BI Publisher Word Plug-In provides the ability to edit the attributes of a forms field or report column and associate additional instructions with those fields. In order to render an image, the XSL-FO code snipped below needs to be added to the column holding the image data. This code specifies the image format (image/jpg in this case) and the column name: <fo:instream-foreign-object content-type="image/jpg"> <xsl:value-of select="IMAGE_ELEMENT"/> </fo:instream-foreign-object>

Using Web Services to Call Reports in Oracle BI Publisher from Oracle APEX As discussed above, the integration of Oracle APEX with Oracle BI Publisher allows for easily printing high-fidelity reports directly from within your Oracle APEX applications with full access to the database schemas associated with a workspace and access to an application’s session state. The integration of both products is transparent to the application developer and end user and security and authentication need to be configured in only one place, the Oracle APEX workspace setup. Yet some customers might already have reports defined within the Oracle BI Publisher repository and

www.nyoug.org 212.978.8890 119

look for ways to call these directly from Oracle APEX. Or they want to take advantage of the additional report delivery options of Oracle BI Publisher to FTP, WebDAV, a network printer, etc. For those customers, Oracle APEX and Oracle BI Publisher can be integrated through Web Services using the Web Services API introduced with Oracle BI Publisher 10.1.3.3.2.

Figure 4: Web Services-based Integration with Oracle BI Publisher

This technique is different from the previously discussed integration in several respects. With the built-in integration, Oracle APEX and the data on which the reports are based, typically reside in the same database. Oracle BI Publisher on the other hand is installed in the middle-tier using an application server, and the report data can come from any JDBC compliant database (Oracle, SQL Server, MySQL, etc), Web Service or File based data source. This requires however that the database connection information to the report data also needs to be configured in Oracle BI Publisher itself, in addition to the already existing configuration of the Oracle APEX application. Before an Oracle APEX application can access a report in Oracle BI Publisher, a number of steps need to be taken. Using the Oracle BI Publisher administration interface, a JDBC database connection needs to be created. This connection should point to the same database and database schema that is used by the Oracle APEX application, unless the report and the application are based on different data sources. Once a database connection is defined, a report can be created. The Oracle BI Publisher user interface allows defining the underlying data sets by either typing in the SQL statement directly or using a graphical query builder similar to the one found in Oracle APEX. After the creation of the data model for the Oracle BI Publisher report, a layout needs to be created. This is typically done using the Oracle BI Publisher World Plug-In discussed earlier. As before, the data source for the layout creation can be the actual report data exported in XML format, an XML schema definition, or in this scenario, the report data can also be accessed directly from the Word Plug-In by supplying the login information to the Oracle BI Publisher instance. Once the report is completed, it can be viewed inside Oracle BI Publisher or delivered to e.g. FTP, WebDAV, email, fax or a network printer. Delivery can be done directly or using the built-in scheduler. Using the scheduler, the report can be executed immediately, at a specific date and time, or in daily, weekly or monthly interval. Additionally, the delivery of a report can be requested through a web service, using the following URL for the WSDL request: HTTP://[HOST]:[PORT]/XMLPSERVER/SERVICES/PUBLICREPORTSERVICE?WSDL

www.nyoug.org 212.978.8890 120

A sample SOAP request for initiating the delivery of a report via Web Service is shown below. This sample assumes delivery to a FTP server configured as “myftp”, with “demo/demo” as username and password. The report name in this sample is “tasks” and the Oracle BI Publisher user account / password is "Administrator/Administrator". <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"> <soapenv:Body> <pub:scheduleReport xmlns:pub="…"> <scheduleRequest> <deliveryRequest> <ftpOption> <ftpServerName>myftp</ftpServerName> <ftpUserName>demo</ftpUserName> <ftpUserPassword>demo</ftpUserPassword> <remoteFile>/myreport.pdf</remoteFile> </ftpOption> </deliveryRequest> <notificationTo>[email protected]</notificationTo> <notifyWhenFailed>true</notifyWhenFailed> <reportRequest> <attributeFormat>pdf</attributeFormat> <reportAbsolutePath>…/tasks.xdo</reportAbsolutePath> </reportRequest> <userJobName>tasks</userJobName> </scheduleRequest> <userID>Administrator</userID> <password>Administrator</password> </pub:scheduleReport> </soapenv:Body> </soapenv:Envelope> The generation and delivery of an Oracle BI Publisher report can be requested from an Oracle APEX application using the built-in support for Web Services. Using the WSDL URL and SOAP request shown above, a “Web Service Reference” reference can be created under the Shared Components of an application. Web Service references are then available in the create page process wizards. So in order to initiate the report generation, all that is needed is an Oracle APEX page with a button, which submits the page and executes the page processing calling the Web Service reference. This technique also allows for dynamically supplying arguments to Oracle BI Publisher. If the report has parameters defined, than these parameters can be included in the SOAP request. In the SOAP request, application and page item values can be referenced using the #ITEM_NAME# syntax. Calling reports that are defined in Oracle BI Publisher from Oracle APEX, instead of defining them directly in Oracle APEX, can be useful e.g. when these reports had previously been defined in Oracle BI Publisher, to publish a report in HTML or PDF format to a web site via FTP, for long running reports that are to be asynchronously delivered via email, or to actually print reports to a network attached printer.

www.nyoug.org 212.978.8890 121

Tuning the Oracle Grid Richard J. Niemiec, TUSC

Abstract With Oracle 10g Enterprise Manager Grid Control, you now have a tremendous product at your side. This paper will look at a few of the screens that you can use to monitor the grid. BUT, I warn you, there are so many great screens and tools that I could never do the product justice with such a short paper. See the Oracle Database 10g Performance Tips and Techniques book for a detailed look at this product. Clustering and an Acceleration toward Grid Computing In June 1970, IBM’s Ted Codd published the 11 page paper “A Relational Model of Data for Large Shared Data Banks.” This article would lead to the relational databases that would hold the world’s data. Advances in information storage/extraction/analysis as well as predicting customer needs has driven us deep into the information age. With 64-bit processing and using grid control, it is theoretically possible to store all of information currently in every database into a single Oracle10g database (Oracle10g allows an 8E database) and load all of it in memory (64 bit allows 16E). The advances of the last decade will be dwarfed by those of the next 10 years. Welcome to the 21st century DBA! This paper is a light hearted look at things NOT to do. The information age is about to take a drastic step forward. The power of 64-bit computing can be imagined when you consider the theoretical limit for addressable memory. In unsigned 32-bit computing, we could directly address 4G of memory (the sign will cost you 2G). For a standard Oracle database, this allowed a tremendously increased System Global Area (SGA). The SGA is where my most often used data can be stored and kept in memory for fast access. The move to 64-bit starts to accelerate the information age exponentially. With 64-bit, the theoretical limit of addressable memory (2 to the power of 64) becomes 16E (Exabytes) or 18,446,744,073,709,551,616 bytes of memory. Since it is estimated that there is only 12 exabytes of information in the entire world, 16 exabytes is a pretty healthy amount memory (Larry now can run the entire world in a single Oracle database - WOL). Imagine storing every single piece of information on earth in one database and IN MEMORY. It is theoretically possible, although the physical architecture has not been needed (and hence, not built). Author Note: If you did store all databases on a single system, in memory, I predict there would be a major CPU bottleneck. Now that we know that the future of hardware theoretically solves any amount of data we will ever need to store in our system, let’s move to the database. How we will we quickly and securely access that information? The answer, of course, is Oracle. While competing databases have some Oracle features, the information age requires a database that is incredibly fast and tune-able while the system is running, completely available 24x7x52, completely recoverable at a moments notice, allows maintenance on information that is being accessed, altered or even being recovered, and allows for test recoveries and resume-able recoveries of full or partial information. We also now require encrypted backups, encrypted data, and a way to manage it all with less resources. Welcome to the world of Oracle! How do we scale the hardware that runs this database so that when we need more CPU power or want to service additional users. Welcome to the world of RAC (Real Application Clusters)! Real Applications allow us to share ONE database while having MULTIPLE System Global Areas (SGA) on multiple pieces of hardware. Using Oracle’s “cache fusion,” which is a process where we can move data from one SGA to another (saving costly disk I/O) when needed (via a high speed fiber interconnect), we get the most scalable Oracle to date. Imagine a single database running on a 8 machine cluster (8 machines hooked up by cache fusion to the same database) with 256G of memory on each instance. That’s 2T of physical memory making 1+ terabyte of combined SGA not impossible to imagine along with 256 CPUs and 10T filesystems (required to get to an 8E database). I’m still anxiously awaiting the 16 Exabyte hardware that will run my 8E database. Consider the amount of directly addressable memory from 4-bit to 64-bit and you can see where we are headed now that most hardware is heading rapidly toward 64-bit computing. Memory to Directly Address Indirect/Extended 4 Bit: 16 (640)

www.nyoug.org 212.978.8890 122

8 Bit: 256 (65,536) 16 Bit: 65,536 (1,048,576) 32 Bit: 4,294,967,296 64 Bit: 18,446,744,073,709,551,616 The information age is about to take a drastic step forward. The power of 64-bit computing can be imagined when you consider the theoretical limit for addressable memory. In unsigned 32-bit computing, we could directly address 4G of memory (the sign will cost you 2G). For a standard Oracle database, this allowed a tremendously increased System Global Area (SGA). The SGA is where my most often used data can be stored and kept in memory for fast access. The move to 64-bit starts to accelerate the information age exponentially. With 64-bit, the theoretical limit of addressable memory (2 to the power of 64) becomes 16E (Exabytes) or 18,446,744,073,709,551,616 bytes of memory. Since it is estimated that there is only 12 exabytes of information in the entire world, 16 exabytes is a pretty healthy amount memory (Larry now can run the entire world in a single Oracle database - WOL). Imagine storing every single piece of information on earth in one database and IN MEMORY. It is theoretically possible, although the physical architecture has not been needed (and hence, not built). Author Note: If you did store all databases on a single system, in memory, I predict there would be a major CPU bottleneck. Now that we know that the future of hardware theoretically solves any amount of data we will ever need to store in our system, let’s move to the database. How we will we quickly and securely access that information? The answer, of course, is Oracle. While competing databases have some Oracle features, the information age requires a database that is incredibly fast and tune-able while the system is running, completely available 24x7x52, completely recoverable at a moments notice, allows maintenance on information that is being accessed, altered or even being recovered, and allows for test recoveries and resume-able recoveries of full or partial information. We also now require encrypted backups, encrypted data, and a way to manage it all with less resources. Welcome to the world of Oracle! How do we scale the hardware that runs this database so that when we need more CPU power or want to service additional users. Welcome to the world of RAC (Real Application Clusters)! Real Applications allow us to share ONE database while having MULTIPLE System Global Areas (SGA) on multiple pieces of hardware. Using Oracle’s “cache fusion,” which is a process where we can move data from one SGA to another (saving costly disk I/O) when needed (via a high speed fiber interconnect), we get the most scalable Oracle to date. Imagine a single database running on a 8 machine cluster (8 machines hooked up by cache fusion to the same database) with 256G of memory on each instance. That’s 2T of physical memory making 1+ terabyte of combined SGA not impossible to imagine along with 256 CPUs and 10T filesystems (required to get to an 8E database). I’m still anxiously awaiting the 16 Exabyte hardware that will run my 8E database. Consider the amount of directly addressable memory from 4-bit to 64-bit and you can see where we are headed now that most hardware is heading rapidly toward 64-bit computing. Memory to Directly Address Indirect/Extended 4 Bit: 16 (640) 8 Bit: 256 (65,536) 16 Bit: 65,536 (1,048,576) 32 Bit: 4,294,967,296 64 Bit: 18,446,744,073,709,551,616

Managing the Grid One of the best screens to manage the grid is displayed below. It’s the screen to click on a cluster and see whether the nodes are up or down as well as see the individual nodes. Here is the cluster “IOUG” showing six nodes that are all up. To get to this screen, I just went to the Targets tab and clicked on the IOUG cluster.

www.nyoug.org 212.978.8890 123

Figure 1: Looking at the IOUG Cluster Database under Targets/Databases If you move down the page a bit, you can see the instances (all using ASM) that are associated with this cluster as shown in Figure 2.

www.nyoug.org 212.978.8890 124

Figure 2: Further down the Page of Figure 1 We See the Individual Nodes 1-6

If I click on the topology tab (see Figure 3), we can see the topology for all six instances (each instance is on a separate node, so there are also six separate nodes. Notice that my mouse if over one of the instances and additional information about the instance is provided.

Figure 3: Looking at the Topology of the 6 Nodes in the IOUG Cluster

www.nyoug.org 212.978.8890 125

If I click on the Performance tab and then click onto the CPU Used chart (see Figure 4), I can see performance all nodes in the “IOUG” cluster, each in a different color.

Figure 4: Looking at CPU for 4 of the Selected Nodes in the IOUG Cluster

Running the AWR Report from Enterprise Manager The Database Administration tab of Enterprise Manager can also be used at the Instance level to run the Automatic Workload Repository (AWR) Report. An option from Administration Screen only at the instance level is the link to the Automatic Workload Repository (AWR). Once the AWR option from the Administration screen is clicked, the AWR General information is displayed. This screen includes information on all Snapshots and Collection Levels.

www.nyoug.org 212.978.8890 126

Figure 5: Database Administration Links Instance Level In the example in Figure 6, there are 40 snapshots with a Retention of 25 days and an interval of 10 minutes (way too often - an hour may be a better interval).

www.nyoug.org 212.978.8890 127

Figure 6: Automatic Workload Repository (AWR) By clicking on the “Edit” button (see Figure 7), the interval or retention of the information may be changed. The collection level can also be edited here.

www.nyoug.org 212.978.8890 128

Figure 7: Automatic Workload Repository (AWR) Edit Settings

By clicking on the number of snapshots displayed in the AWR General information screen (the number 40 as shown in Figure 6), the 40 snapshots will then be displayed one at a time as shown in Figure 8. The time that the snapshot was generated is listed along with the collection level.

www.nyoug.org 212.978.8890 129

Figure 8: Automatic Workload Repository (AWR) Snapshot Listing Clicking on any specific snapshot to begin and end with will generate some basic snapshot details listed in Figure 9 (like a very mini-statspack), or we can run a report by clicking on Report. This will run and display the AWR Report (Figure 10).

www.nyoug.org 212.978.8890 130

Figure 9: Automatic Workload Repository (AWR) Snapshot Listing

Figure 10: AWR Report Output

www.nyoug.org 212.978.8890 131

Figure 11 shows an extremely helpful screen, the Hang Analysis screen of Enterprise manager, which allows you to view blocking sessions and resolve them. It is a very fast screen even when the system is overwhelmed. I usually kill the session in SQL*Plus after finding them here.

Figure 11: Hang Analysis Screen And there’s much more to this tool including interconnect information and global block transfer information. It is a great tool for monitoring the grid. See the Collaborate ’08 talk by Rich Niemiec on Grid Control for a lot more information on this great tool.

References: Oracle10g Performance Tuning Tips and Techniques; Niemiec, 2007 Oracle9i Performance Tuning Tips and Techniques, Niemiec, 2003 Oracle Performance Tuning Tips and Techniques, Niemiec, 1999 RAC SIG Event Update, Rich Niemiec & Murali Vallath Oracle Clusterware, Oracle Using Oracle Clusterware to Protect 3rd Party Applications, Oracle IOUG Masters Tuning Class, Rich Niemiec, 2002 www.oracleracsig.org

Author Biography Rich Niemiec, 47, Rolta’s President of International EICT (Enterprise Information & Communications Technology) and President of TUSC – A Rolta Company. TUSC is a Chicago-based systems integrator of Oracle-based business solutions since 1988; TUSC was the Oracle Partner of the Year in 2002, 2004, 2007 & 2008. Rolta is an international market leader in IT-based geospatial solutions, and caters to industries as diverse as infrastructure, telecom, electric, airports, defense, homeland security, urban development, town planning and environmental protection. Rich is the past President of the International Oracle Users Group (IOUG) and the current President of the Midwest Oracle Users Group (MOUG). Rich is one of six originally honored worldwide Oracle Certified Masters. In 2007, he authored the Oracle bestseller "Oracle10g Performance Tuning Tips & Techniques," an update of his previous 2 Oracle best sellers on Oracle8i and Oracle9i Performance Tuning. Rich was inducted into the Entrepreneurship Hall of Fame in 1998. Rolta TUSC is an expert level consultancy that helps companies optimize their investment in Oracle technology. We provide integrated functional and technical solutions since 1988 in the areas of Oracle’s E-Business Suite, Business Intelligence/Data Warehousing, Custom Development, Managed Services/Remote DBA, Database Services, Training &

www.nyoug.org 212.978.8890 132

Mentoring and Oracle Licensing. Please report errors in this article to [email protected]. Neither TUSC nor the author warrants that this document is error-free. TUSC © 2009. This document cannot be reproduced without expressed written consent from an officer of TUSC except Collaborate 2008 may make copies and make this paper available as needed for the conference and proceedings. Thanks Oracle: Ken Jacobs, Debbie Migliore, Penny Avril, Maria Colgan. and Linda Smith. Oracle Disclaimer: This paper is intended to outline Oracle's general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle's products remains at the sole discretion of Oracle.

Special Thanks to: Angelo Pruscino, Murali Vallath, Sohan DeMel, Erik Peterson, Debbie Migliore, Sudip Datta, Philip Newlan, Jagan Athreya, Anil Khilani, Bharat Paliwal,Ron Weiss, Ara Shakian, Prabhaker Gongloor (GP), Barb Lundhild, Kirk McGowan, Carol Colrain, Stefan Pommerenk, Troy Anthony, Sue Tang, Joshua Solomin, Eunhei Jang, Supratim Chowdhury, Sourav Mukherjee and Khethavath Singh (KP).

www.nyoug.org 212.978.8890 133

Practical Data Masking: How to Address Development and QA Teams' Seven Most

Common Data Masking-related Reactions and Concerns

Ilker Taskaya and Clodine Mallinckrodt, Axis Technology, LLC

Data Masking

Easier Said Than Done Data Masking is the act of replacing sensitive data with fictitious but realistic data in order to eliminate the risk of exposure to unauthorized parties. It’s easier said than done. Implementing a sustainable data masking program in a modern enterprise environment can be surprisingly challenging – not just technically and organizationally, but culturally. The reactions and concerns you’ll encounter from your development and QA teams are understandable. You’re not just introducing new controls over confidentiality exposures in development and testing systems. You’re shutting off developers’ and testers’ unfettered access to live customer Production data – something that’s been quite normal and convenient for them to date. Something they think they just can’t do their job without. Any project is only as successful as its level of buy-in from all involved. Getting colleagues to “do things a little differently” than they traditionally have means getting them on board early and anticipating their concerns. This article recounts lessons learned from years of implementing data masking programs at some of the largest financial services companies.

1. Post-masking Stress Disorder

Helping Your Loved Ones Cope with the Loss of Their Real Production Data Developers balk at the idea of losing access to production data. A typical initial response is: “We won’t be able to test! The application won’t work. I can’t do my job like that…” First, remind them that this is an industry-wide initiative. New and evolving Government and State laws deem it a necessary process change. Plus, in this economic environment, the organization simply cannot risk its reputation by flying in the face of compliance regulation. Not to mention, the costs associated with an exposure can run in the millions of dollars – would you want your team to be found responsible for that? Next, acknowledge that the target data must be realistic and usable. (Who would want an app going live without real-world testing done first?) For example, discuss your plan to replace customer names with real names, not encrypted gibberish, blanks, X’s, etc. Finally, offer to demonstrate that your approach to data masking will provide usable, production-like fictitious data. “Let us show you – just give us a sandbox, then check out the results before giving approval.” Make it a test, a pilot program, or whatever makes most sense for your business. Just build it into to your project plan as part of the data masking development and rollout process, so that the demo folds seamlessly into formal execution of the program. Validation of the masked data – typically done in iterations – is a natural checkpoint.

www.nyoug.org 212.978.8890 134

Example Extract of a Data Masking Project Plan In this approach, you give your development teams the responsibility to evaluate and approve the masked data. They can see and experience for themselves how well an intelligent approach to data masking works. As a result, you’re not just empowering them in the process. You’re also ensuring you achieve your own goal of providing them with truly usable data.

2. You Sunk My Database! When It Comes to Meeting Application Testing Requirements, Referential Integrity Is Just the Tip of the Iceberg Known data interaction points can add to the fear of data masking. Not to mention, the (usually correct) gut feeling that there are many undocumented ones, as well. Your teams may protest: “These applications need to talk to each other even after they’re masked.” This is precisely why a holistic approach to data masking is critical to success. Plenty of CIOs, panicked to prove themselves in compliance, fire off a host of independent data masking projects instead of leading a more coordinated initiative across systems. Sadly, they think they’re saving money and hassle by letting each team govern themselves – something the teams would rather do anyway. Inevitably, everyone has to undo and then redo the work. Not only does this waste precious time and money, worse yet, credibility is lost internally and data masking becomes even more unwelcome. Integrated Systems and Fields Analyzing the data is one task. Analyzing the integration of environments is another. Determine which systems must be masked in synch and link them in the project plan. For masking rollout, this also means coordinating schedules between their respective teams – from planned releases, reporting schedules and vacation days. Also determine which systems receive data only; this will help you decide which need to be masked first, then used to feed downstream applications. A more subtle challenge is interaction of data elements within the same system. In some cases, a data element may need to be preserved in order for the application being tested to perform correctly. For example, the zip code for a statement

www.nyoug.org 212.978.8890 135

mailing program. Evaluate its context – as in, the other sensitive data elements present with it. The good news is that no data element all by itself is a risk. Given this, if you ensure that others related to it (elements that when present would ‘give it away’) are masked, you can preserve its value. Another case of interaction of data elements within the same system are interrelated fields. For example, tax accounting systems need securities to retain their original tax properties in order for the production numbers to be useful. This is where using the right kind of algorithm technique is key. Algorithm Selection This point cannot be stressed enough: selecting the best technique to mask each data element is context-dependent. You must give yourself options.

Types of Algorithms and Their Transformations When evaluating a data masking algorithm, determine to what degree it: Preserves length and character set (as in, provides realism) Provides consistency across occurrences Maintains uniqueness and RI Allows unmasking when needed (a reversible algorithm) Deters hackers (hardness verified) As you can see, when it comes to meeting application testing requirements, referential integrity is just the tip of the iceberg. By evaluating integrated systems, identifying interrelated fields and deciding on the right algorithm per data element masked, applications and interactions of systems are not ‘broken’ by masking. They work seamlessly because masked data from one system matches masked data from another, just as Production data does.

3. A ‘Zero Business Value’ Project… How to Make Friends with Testing Teams and Show ROI at the Same Time It is natural for Development and Testing teams to resist a data masking initiative. After all, who sees themselves as the kind of person who would sell customer data on the black market or accidently leave a loaded laptop in the shopping mall? Malicious or just plain silly, mistakes in the form of data breaches happen. And they cost millions of dollars, not to mention customer confidence. With data masking, role-based access control or straightforward lock-down, you are fundamentally changing what to date has been a bad habit – playing with customer, employee and other company confidential information. Your colleagues could easily say: “This is going to slow down and complicate my work.”

www.nyoug.org 212.978.8890 136

To encourage adoption of data masking, a 2-pronged approach works best. First, place more stringent controls on Production data. As in, make it harder to get. From requiring a lengthier request-and-approval process to further limiting system access – instead of just ‘taking away’ what has been available to date, keep it but over time make it less and less convenient. Next, streamline integration of the data masking process with the current data provisioning process. That way, getting good test data that’s masked is just as fast – if not faster – than getting real, live Production data. At the same time, employ automation in the design of your data masking solution wherever possible. This one of the very best ways you can minimize the impact of the initial data masking effort on your development and testing teams. That’s because data identification and masking rules algorithms naturally lend themselves to automation. A benefit of an automated approach is that the tools used for automating the data masking process are the very tools that ease the burden of periodic audits.

Automation in Data Masking Development and Maintenance Another benefit of an automated approach is that it creates greater efficiency in other areas besides data masking. This is because documenting and adding automation to the data provisioning process enables development teams to understand their data and system integration points better. It gives them the opportunity to formalize their testing procedures and apply more structure to their SDLC. The results can be dramatic. For example, a developer may no longer just say: “this account has a bug.” With a better understanding of how the application works, bug- and break-fixing becomes faster and easier.

4. Oh Yeah? Prove It Showing Results to ISOs, Auditors, Regulators, and Sponsors If your organization has attempted data masking in the past, you might hear: “We did masking last year, so we’re all set.” However, saying a masked system is still ‘clean’ and proving it is are two different things. Auditors naturally want proof. To minimize impact on your development and testing teams, employ smart tools, process, automation and audit trails in the ongoing monitoring and periodic testing of masked environments. Developing and maintaining a Sensitive Data Inventory enables your teams to account for and address natural changes in the data. If new sensitive data elements are introduced, they are detected and get rolled into the masking process. However, unmasked sensitive data can be reintroduced into a masked system in a number of ways – granted none being ‘permitted’. For example, a tester may manually re-load some favorite old test data and not even realize it is contains old Production data. Or, in haste on a new integration project, a developer may manually load data from an external source. For this reason, ‘Certification’ is considered the last step in the data masking process – as in, formal testing, documentation and sign-off that the environment is masked. And it’s also considered an on-going step for maintenance.

www.nyoug.org 212.978.8890 137

Example Extract of a Data Masking Project Plan The degree to which you ‘prove’ a masked system contains no unmasked sensitive data depends entirely on compliance culture of your organization. An audit check can be as simple as the DBA spot-checking data and signing off on it. Or, it can be as involved as demonstrating masking in action. In any case, having the environment and its Sensitive Data Inventory checked through automated identification and monitoring tools will save countless hours and provide peace of mind about the state of the system. The automated process and tools make it easy to keep and show that a masked environment is still ‘clean.’

5. Just Not in My Backyard Where Data Masking Fits into Your Overall Information Security Framework of Controls Your organization likely has other data protection controls in place. You may hear: “We already don’t just give ‘anyone’ access…” However, there is no such thing as a single data protection control. For example, no one wants to limit access to point of hindering the speed of break-fixing. Keep in mind that Data Masking is one of several tools in your Data Security Toolkit and not every environment needs or should be masked. Others tools include role-based access control, secure FTP, and in extreme cases, completely shutting off access. The use of each tool depends on how appropriate it is for the context of the environment. When the wrong tool is employed, productivity can slow to a grind as access issues arise.

www.nyoug.org 212.978.8890 138

Data Confidentiality Controls Evaluate when and where Data Masking should be ‘on’ or ‘off.’ Note that Data Masking controls confidentiality exposures in the stages of the development lifecycle where flexibility is paramount – such as Dev, SIT, and QA. Knowing when and where to employ data masking versus other data confidentiality controls (RBAC, lockdown, etc.) helps your organization avoid a slow-down in productivity. Because applications evolve and data is always getting refreshed, masking it is not a one-time event. It may be provisioned by a DBA, or run regularly with a tool. Either way, it becomes part of the process.

6. Who’s Going to Clean up This Mess? Integrating Data Masking into the Application Development Lifecycle When developers and testers hear that data masking is to become part of their on-going process, their knee-jerk reaction will be: “How am I supposed to get my job done if I have to mask data at every step of the way?” They’ll be expecting chaos, and won’t want any part of it. The truth is, data masking can be seamlessly integrated into the normal data provisioning process. In fact, to minimize impact on developers, this approach is critical. In many cases, the set-up of the current data provisioning process can be maintained with few and only minor adjustments. To achieve this, analyze each SDLC instance to determine the best potential ‘in point’ for data masking.

www.nyoug.org 212.978.8890 139

Integrating Data Masking in the Data Provisioning Portion of the SDLC Process In designing your data masking solution, be sure to:

1. Document the data provisioning portion of your SDLC process with Visio, Erwin or the like 2. Analyze your findings 3. Identify the most appropriate ‘in-points’ for data masking 4. Identify the most appropriate form for your data masking solution, such as mask-on-demand, daily data

provisioning or a trusted data repository 5. Identify where automation can be applied to the solution

Note that there a number of ways to implement data masking. From building an enterprise-wide Center of Excellence, to maintaining a constant masked copy of a Production environment, to daily provisioning of masked data, to self-service masking-on-demand. The best option of course depends on your organization’s size, goals, complexity, integration points and budget. When data masking is appropriately integrated into the data provisioning portion of the SDLC process, it seamlessly becomes part of the normal activities. Data masking becomes part of the ongoing process, yet remains as behind-the-scenes as possible.

7. Can’t We All Just Get Along?

Options for a Shared Center of Excellence Architecture for Data Masking Your testers and developers may rightly point out that different business units may elect to mask the same data in very different ways. This is a logical concern: “Sure this’ll works for us, but our systems feed and receive data across businesses.” Even if your initiative is local, keeping an eye towards the enterprise level can have huge payoffs later. Build for at least a basic-level Center of Excellence for Data Masking. You’ll be able to leverage your work not just for other business units, but for your own as it expands. Achieve economies of scale by enabling areas of your business to share knowledge about and resources for: Process Technology Organization

www.nyoug.org 212.978.8890 140

Optional Components of a Data Masking Center of Excellence A Data Masking Center of Excellence can take on a variety of shapes and sizes. It all depends on which components most logically fit your enterprise. From recommending Best Practices to sharing Centralized Services, your group can leverage its data masking experience to help inform and lead others through the process.

Data Protection Options for Data Masking and More Axis Technology, LLC offers data protection Consulting Services. Areas of expertise include assessments, access control, database logging, data transmission encryption, process design, pilot program guidance, COEs and more. DMSuite™ is Axis’ robust, proprietary tool with data masking and self-service provisioning functionality. Straight out of the box, this software enables you to easily protect, provision and audit sensitive data from most known data sources (DB2, Oracle, Sybase, VSAM, MSSQL, flat files etc.)

Data masking clients of Axis Technology, LLC include multiple businesses in Fidelity Investments and Citigroup, American Student Assistance, New York Life and Wellington Management. Contact [email protected] to learn more or comment on this article. © 2009 Axis Technology LLC.

www.nyoug.org 212.978.8890 141

NYOUG 2009 Sponsors

The New York Oracle Users Group wishes to thank the following companies for their generous support.

Compuware (www.compuware.com) Confio Software (www.confio.com)

Corporate Technologies (www.cptech.com) Fadel Partners (www.fadelpartners.com) IBM Systems (www.ibm.com/systems/)

Oracle GoldenGate (www.oracle.com/goldengate/) Oracle Corporation (www.oracle.com)

Quest Software (www.quest.com) Rolta TUSC (www.rolta.com)

Sun Microsystems (www.sun.com) Texas Memory Systems (www.texmemsys.com)

XDuce/LearnDBA (www.xduce.com, learndba.com)

Contact Sean Hull and Irina Cotler for vendor information, sponsorship, and benefits

Copyright © 2009, Oracle. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates.Other names may be trademarks of their respective owners.

oracle.com/goto/middlewareor call 1.800.ORACLE.1

#1Middleware

#1 in Application Servers

#1 in Application Infrastructure Suites

#1 in Enterprise Performance Management

PRODUCTION NOTESJob No.:File Name:

Product:Headline:

Date:Pub:

Traffic:Library Ref.:

Fri, Nov. 20, 2009 11:15 AM

MdW_1MdW_3cks_2271_NYOUG

002271CUSTOM

8” x 10.75”New York Oracle

Users Group

PUB NOTE: Please use center marks to align page.

Please examine these publication materials carefully. Any questions regarding the materials, please contact Darci Terlizzi (650) 506-9775

Middleware

APPROVALS

Traffic

Production

Proofing

Graphic Mgr.

Adv. Mgr.

Buddy Check

BY DATE

#1 Middleware

NYOUGHQ

7” x 10”8” x 10.75”8.25” x 11”4C

Live:Trim:

Bleed:Color:

Production:

READER

01LASER%

RELEASED002223

Fonts:Univers LT Std. Font Family