66
TIBCO ® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage ®

TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Two-Second Adv

TIBCO® Clarity

ExamplesSoftware Release 1.3August 2014

antage®

Page 2: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Important Information

SOME TIBCO SOFTWARE EMBEDS OR BUNDLES OTHER TIBCO SOFTWARE. USE OF SUCH EMBEDDED OR BUNDLED TIBCO SOFTWARE IS SOLELY TO ENABLE THE FUNCTIONALITY (OR PROVIDE LIMITED ADD-ON FUNCTIONALITY) OF THE LICENSED TIBCO SOFTWARE. THE EMBEDDED OR BUNDLED SOFTWARE IS NOT LICENSED TO BE USED OR ACCESSED BY ANY OTHER TIBCO SOFTWARE OR FOR ANY OTHER PURPOSE.USE OF TIBCO SOFTWARE AND THIS DOCUMENT IS SUBJECT TO THE TERMS AND CONDITIONS OF A LICENSE AGREEMENT FOUND IN EITHER A SEPARATELY EXECUTED SOFTWARE LICENSE AGREEMENT, OR, IF THERE IS NO SUCH SEPARATE AGREEMENT, THE CLICKWRAP END USER LICENSE AGREEMENT WHICH IS DISPLAYED DURING DOWNLOAD OR INSTALLATION OF THE SOFTWARE (AND WHICH IS DUPLICATED IN THE LICENSE FILE) OR IF THERE IS NO SUCH SOFTWARE LICENSE AGREEMENT OR CLICKWRAP END USER LICENSE AGREEMENT, THE LICENSE(S) LOCATED IN THE “LICENSE” FILE(S) OF THE SOFTWARE. USE OF THIS DOCUMENT IS SUBJECT TO THOSE TERMS AND CONDITIONS, AND YOUR USE HEREOF SHALL CONSTITUTE ACCEPTANCE OF AND AN AGREEMENT TO BE BOUND BY THE SAME.This document contains confidential information that is subject to U.S. and international copyright laws and treaties. No part of this document may be reproduced in any form without the written authorization of TIBCO Software Inc.TIBCO, Two-Second Advantage, TIBCO ActiveSpaces, TIBCO Patterns, TIBCO Cloud Marketplace, TIBCO Spotfire, TIBCO MDM, and TIBCO GeoAnalytics Builder are either registered trademarks or trademarks of TIBCO Software Inc. in the United States and/or other countries.Enterprise Java Beans (EJB), Java Platform Enterprise Edition (Java EE), Java 2 Platform Enterprise Edition (J2EE), and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle Corporation in the U.S. and other countries.All other product and company names and marks mentioned in this document are the property of their respective owners and are mentioned for identification purposes only.THIS SOFTWARE MAY BE AVAILABLE ON MULTIPLE OPERATING SYSTEMS. HOWEVER, NOT ALL OPERATING SYSTEM PLATFORMS FOR A SPECIFIC SOFTWARE VERSION ARE RELEASED AT THE SAME TIME. SEE THE README FILE FOR THE AVAILABILITY OF THIS SOFTWARE VERSION ON A SPECIFIC OPERATING SYSTEM PLATFORM.THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT.THIS DOCUMENT COULD INCLUDE TECHNICAL INACCURACIES OR TYPOGRAPHICAL ERRORS. CHANGES ARE PERIODICALLY ADDED TO THE INFORMATION HEREIN; THESE CHANGES WILL BE INCORPORATED IN NEW EDITIONS OF THIS DOCUMENT. TIBCO SOFTWARE INC. MAY MAKE IMPROVEMENTS AND/OR CHANGES IN THE PRODUCT(S) AND/OR THE PROGRAM(S) DESCRIBED IN THIS DOCUMENT AT ANY TIME.THE CONTENTS OF THIS DOCUMENT MAY BE MODIFIED AND/OR QUALIFIED, DIRECTLY OR INDIRECTLY, BY OTHER DOCUMENTATION WHICH ACCOMPANIES THIS SOFTWARE, INCLUDING BUT NOT LIMITED TO ANY RELEASE NOTES AND "READ ME" FILES.Copyright © 2013-2014 TIBCO Software Inc. ALL RIGHTS RESERVED.TIBCO Software Inc. Confidential Information

Page 3: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

| iii

Contents

Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .v

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .vii

Related Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .viiiTIBCO Clarity Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .viiiOther TIBCO Product Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .viiiThird-Party Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Typographical Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

Connecting with TIBCO Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xiiiHow to Join TIBCOmmunity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xiiiHow to Access TIBCO Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xiiiHow to Contact TIBCO Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xiii

Chapter 1 Example Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1

Sample Datasets Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Chapter 2 Working with the Sample-customers Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3

Sample-customers Dataset Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Creating the Dataset and Project. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Analyzing Data (Basic Tutorial) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Row Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Column Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Validating Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Defining Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Validating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Analyzing Data (Advanced Tutorial). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Charting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Dependency Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Pattern Faceting of String . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Address Cleansing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Transforming and Deduping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Transforming Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Deduping Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

TIBCO Clarity Examples

Page 4: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

iv |

Chapter 3 Working with the Sample-patients Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Sample-patients Dataset Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Analyzing Data (Basic Tutorial) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Row Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Column Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Validating Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Defining Data Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Validating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31Analyzing Validation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Exporting Validation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Analyzing Data (Advanced Tutorial) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35Faceting Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35Dependency Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Charting Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Transforming and Cleansing Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Transform Column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Converting Date Format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Trimming Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Removing Empty Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Removing and Modifying Invalid Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Delivering Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Reloading Dataset with a ZIP File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Applying Transformation Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47Exporting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

TIBCO Clarity Examples

Page 5: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Figures | v

Figures

Figure 1 Sample Datasets and Projects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Figure 2 Upload Data from Dropbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Figure 3 Manual Map Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Figure 4 Auto Map Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Figure 5 Minimum Column Size for Project 1 of Sample-customers Dataset. . . . . . . . . . . . . . . . . . . . . . . . . 10

Figure 6 Validation Results for Project 1 of Sample-customers Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Figure 7 Bar Chart Distribution of Gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Figure 8 Column Dependency Check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Figure 9 Pattern Facet on Column SSN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Figure 10 ZIP Column Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Figure 11 Column of Copy DOB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Figure 12 Month Column for Project 1 of Sample-customers Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Figure 13 Transformed Month Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Figure 14 Row Analysis for Project 1 of Sample-patients Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Figure 15 Column Analysis for Project 1 of Sample-patients Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Figure 16 Defined Validation Rule for Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Figure 17 Validation Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Figure 18 Column Analysis for Validation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Figure 19 Quartile Diagram od DBP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Figure 20 Normal Diagram of DBP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Figure 21 Validation Errors Facet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Figure 22 PATNO Facet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Figure 23 GENDER Facet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Figure 24 VISIT Facet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Figure 25 HR Facet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Figure 26 Column Dependency Check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Figure 27 Duplicate Check with Bar Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Figure 28 Invalid Data Check with Bar Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

TIBCO Clarity Examples

Page 6: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

vi | Figures

Figure 29 Pie Chart for GENDER Column. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Figure 30 Transformed DX Column. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Figure 31 Reloaded Patients Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Figure 32 Cleansed Patients Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

TIBCO Clarity Examples

Page 7: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

| vii

Preface

TIBCO Clarity is a data preparation and cleansing tool. You can use it as cloud-based service without installation, setup, and configuration. By using it, you can quickly and efficiently analyze, profile, standardize, and transform raw data collated from disparate sources.

Topics

• Related Documentation, page viii

• Typographical Conventions, page xi

• Connecting with TIBCO Resources, page xiii

TIBCO Clarity Examples

Page 8: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

viii | Related Documentation

Related Documentation

This section lists documentation resources you may find useful.

TIBCO Clarity DocumentationThe following documents form the TIBCO Clarity documentation set:

• TIBCO Clarity Users’ Guide Read this manual to learn the main features of TIBCO Clarity. This manual describes how to subscribe and launch TIBCO Clarity, create a dataset, create projects, profile data, define metadata and validate data, cleanse and transform data, and deliver results.

• TIBCO Clarity Examples Read this manual to work through the examples provided with TIBCO Clarity.

• TIBCO Clarity Release Notes Read the release notes for a list of new features. This document also contains lists of known issues for this release.

• TIBCO Clarity Desktop Edition Installation Read this manual for instructions on site preparation and installation.

Other TIBCO Product DocumentationYou may find it useful to read the documentation for the following TIBCO products:

The desktop edition is available only for premium subscribers.

Table 1 TIBCO Products

TIBCO Product Description

TIBCO ActiveSpaces® TIBCO ActiveSpaces is a distributed peer-to-peer in-memory data grid, a form of virtual shared memory that leverages a distributed hash table with configurable replication.

TIBCO ActiveSpaces combines the features and performance of databases, caching systems, and messaging software to support large, highly volatile data sets and event-driven applications. It lets you off-load transaction-heavy systems and allows developers to concentrate on business logic rather than the complexities of developing distributed fault-tolerance.

TIBCO Clarity Examples

Page 9: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Preface | ix

Third-Party DocumentationYou may also find it useful to read the documentation on the following websites:

https://www.box.com

TIBCO® Cloud Marketplace TIBCO Cloud Marketplace offers individual users and organizations the ability to pay only for the software they use - with no upfront expenses or long-term commitments.

TIBCO® GeoAnalytics Builder

TIBCO GeoAnalytics Builder is designed for corporations that require complete freedom and flexibility to create customized applications using Maporama’s powerful mapping engines, TIBCO GeoAnalytics Builder Web Services provide software developers all the basic functionalities they need to build their own location-centric applications and/or enrich other applications with location-centric features.

TIBCO® MDM TIBCO MDM is a high-performance master data management (MDM) platform that consolidates, cleanses, and unifies disparate data sources to create a centralized source of accurate intelligence.

Combining an extensible master data repository, real-time data synchronization, and a rules-based workflow engine, it can quickly comply with ever changing requirements while automating existing processes that manage master data.

TIBCO® Patterns The TIBCO Patterns family of products provide an error tolerant matching and querying of structured data. Data is loaded into engines like a DBMS system. A simple API allows for error tolerant searching and matching of the loaded records. The data is loaded across multiple machines and queried transparently, allowing extremely large data sets and very high load levels to be handled.

TIBCO Spotfire® TIBCO Spotfire designs, develops and distributes in-memory analytic software for next generation business intelligence.

TIBCO® Vault TIBCO Vault is a file-sharing solution that provides the security and auditing capabilities needed to ensure the confidentiality of corporate assets. It can be integrated to provide user-to-system sharing.

Table 1 TIBCO Products (Cont’d)

TIBCO Product Description

TIBCO Clarity Examples

Page 10: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

x | Related Documentation

Collaboration tools adopted by over 180000 companies globally. Box simplifies online file storages, replaces FTP and connects teams in online workspaces.

https://www.dropbox.com

Dropbox is a free service that lets you bring your photos, docs, and videos anywhere and share them easily. Never email yourself a file again!

https://drive.google.com

Drive. Welcome to Google Drive, the new home for Google Docs. Access everywhere; Store files safely; Collaborate with Google Docs.

https://www.mysql.com

MySQL: The world ’s most popular open source database.

http://www.postgresql.org

The official site for PostgreSQL, the world’s most advanced open source database.

http://www.salesforce.com

The official site for Salesforce.

TIBCO Clarity Examples

Page 11: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Preface | xi

Typographical Conventions

The following typographical conventions are used in this manual.

Table 2 General Typographical Conventions

Convention Use

code font Code font identifies commands, code examples, filenames, pathnames, and output displayed in a command window. For example: Use MyCommand to start the foo process.

bold code font Bold code font is used in the following ways:

• In procedures, to indicate what a user types. For example: Type admin.

• In large code samples, to indicate the parts of the sample that are of particular interest.

• In command syntax, to indicate the default parameter for a command. For example, if no parameter is specified, MyCommand is enabled: MyCommand [enable | disable].

italic font Italic font is used in the following ways:

• To indicate a document title. For example: See TIBCO Clarity Examples.

• To introduce new terms. For example: A portal page may contain several portlets. Portlets are mini-applications that run in a portal.

• To indicate a variable in a command or code syntax that you must replace. For example: MyCommand PathName.

Key combinations Key name separated by a plus sign indicate keys pressed simultaneously. For example: Ctrl+C.

Key names separated by a comma and space indicate keys pressed one after the other. For example: Esc, Ctrl+Q.

The note icon indicates information that is of special interest or importance, for example, an additional action required only in certain circumstances.

The tip icon indicates an idea that could be useful, for example, a way to apply the information provided in the current section to achieve a specific result.

The warning icon indicates the potential for a damaging situation, for example, data loss or corruption if certain steps are taken or not taken.

TIBCO Clarity Examples

Page 12: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

xii | Typographical Conventions

Table 3 Syntax Typographical Conventions

Convention Use

[ ] An optional item in a command or code syntax.

For example:

MyCommand [optional_parameter] required_parameter

| A logical OR that separates multiple items of which only one may be chosen.

For example, you can select only one of the following parameters:

MyCommand param1 | param2 | param3

{ } A logical group of items in a command. Other syntax notations may appear within each logical group.

For example, the following command requires two parameters, which can be either the pair param1 and param2, or the pair param3 and param4.

MyCommand {param1 param2} | {param3 param4}

In the next example, the command requires two parameters. The first parameter can be either param1 or param2 and the second can be either param3 or param4:

MyCommand {param1 | param2} {param3 | param4}

In the next example, the command can accept either two or three parameters. The first parameter must be param1. You can optionally include param2 as the second parameter. And the last parameter is either param3 or param4.

MyCommand param1 [param2] {param3 | param4}

TIBCO Clarity Examples

Page 13: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Preface | xiii

Connecting with TIBCO Resources

This section describes how to connect with TIBCO Resources.

How to Join TIBCOmmunityTIBCOmmunity is an online destination for TIBCO customers, partners, and resident experts. It is a place to share and access the collective experience of the TIBCO community. TIBCOmmunity offers forums, blogs, and access to a variety of resources. To register, go to http://www.tibcommunity.com.

After registering successfully to TIBCOmmunity, you can go to the TIBCO Clarity web page:

https://tibbr.tibcommunity.com/tibbr/#!/subjects/88007

How to Access TIBCO DocumentationYou can access TIBCO documentation here:

http://docs.tibco.com

How to Contact TIBCO SupportFor comments or problems with this manual or the software it addresses, contact TIBCO Support as follows:

• For an overview of TIBCO Support, and information about getting started with TIBCO Support, visit this site:

http://www.tibco.com/services/support

• If you already have a valid maintenance or support contract, visit this site:

https://support.tibco.com

Entry to this site requires a user name and password. If you do not have a user name, you can request one.

TIBCO Clarity Examples

Page 14: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

xiv | Connecting with TIBCO Resources

TIBCO Clarity Examples

Page 15: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

| 1

Chapter 1 Example Overview

Three sample datasets are provided with TIBCO Clarity. Each dataset associates with a project. This example selects two projects to show how TIBCO Clarity works.

Topics

• Sample Datasets Overview, page 2

TIBCO Clarity Examples

Page 16: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

2 | Chapter 1 Example Overview

Sample Datasets Overview

After launching TIBCO Clarity, you are presented with three sample datasets. By default, three projects are created for these sample datasets. In this example, the Samples-patients dataset and the Samples-customers dataset are used to show how to profile, validate, and cleanse data using different strategies.

• Samples-customers contains a fictional set of customers data. It contains some invalid data, and various profiling and validation rules are applied. See Working with the Sample-customers Dataset on page 3 for more details.

• Samples-patients contains a fictional set of patients data. Similar to the Samples-customers dataset, it also contains some invalid data that do not conform to the validation rule. See Working with the Sample-patients Dataset on page 25 for more details.

• Samples-students-records contains a fictional set of students data. You can try to apply your own profiling and validation rules to this sample project.

You can download the sample data from http://clarity.cloud.tibco.com/console/download/Samples.zip. The downloaded ZIP file contains three CSV files, which are customers-1.csv, customers-2.csv, and patients.csv. The Samples-customers dataset contains the data in the customers-1.csv and customers-2.csv files, and the Sample-patients dataset contains the data in the patients.csv file.

Figure 1 Sample Datasets and Projects

TIBCO Clarity Examples

Page 17: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Working with the Sample-customers Dataset | 3

Chapter 2 Working with the Sample-customers Dataset

The Sample-customers dataset includes one project, which contains a fictional set of customer data. Various analyzing and validating methods are used to analyze the Sample-customers dataset.

Topics

• Sample-customers Dataset Overview, page 4

• Creating the Dataset and Project, page 5

• Analyzing Data (Basic Tutorial), page 10

• Validating Data, page 12

• Analyzing Data (Advanced Tutorial), page 14

• Address Cleansing, page 18

• Transforming and Deduping, page 19

TIBCO Clarity Examples

Page 18: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

4 | Chapter 2 Working with the Sample-customers Dataset

Sample-customers Dataset Overview

The example of Sample-customers dataset starts from the very beginning, that is, creating a project.

Business Scenario

A fictional company called TWIDGCO, Inc., a widget manufacturer. This company has experienced unprecedented growth over the last decade. A recent report revealed many inefficiencies and lost opportunities due to inconsistent customer data across all brands. Executive management has mandated that all brands and their respective customer data be rolled into the main Customer Master of TWIDGCO.

It is a challenge for an administrator to consolidate a massive amount of data from multiple data sources in a variety of formats. TIBCO Clarity allows users loading data from multiple data sources and streamline this data into a unified representation.

This example shows how this challenge is resolved by TIBCO Clarity, it includes:

• Creating the Dataset and Project, page 5 shows how to create datasets by importing data from different data sources.

• Analyzing Data (Basic Tutorial), page 10 shows how to use profiling to analyze data.

• Validating Data, page 12 shows how to validate data by data types.

• Analyzing Data (Advanced Tutorial), page 14 shows how to use faceting and charting to analyze data.

• Address Cleansing, page 18 shows how to do address cleansing.

• Transforming and Deduping, page 19 shows how to transform data with look-up tables and how to check duplicates with a swappable group.

If you are familiar with creating dataset and project, you can skip Creating the Dataset and Project and begin with profiling directly by using the original Sample-customers dataset.

TIBCO Clarity Examples

Page 19: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Creating the Dataset and Project | 5

Creating the Dataset and Project

This section describes how to create a dataset from two example data files.

To create a dataset, complete the following tasks:

• Task A, Upload Data from a Local Directory, page 5

• Task B, Upload Data from a Cloud Storage, page 5

• Task C, Mapping Data, page 6

• Task D, Creating the Projects, page 8

Task A Upload Data from a Local Directory

To upload the customers-1.csv file:

1. On the Home page, click Create dataset.

2. On the Get Data From page, click My computer.

3. On the File Upload page, click Choose file and select the customers-1.csv file. Click Next.

4. On the Parse File page, click Next to use the default settings.

5. Rename dataset name to Sample-customers.

Now you have created a dataset using the local example data file.

Task B Upload Data from a Cloud Storage

To upload the customers-2.csv file:

1. On the Get Data From page, click Cloud storage.

2. Click Sign In associated with the Dropbox.

3. Enter your login credentials to log in to Dropbox.

4. Click the customers-2.csv file to upload file from the Dropbox, as shown in Figure 2.

Ensure that you have downloaded the customers-1.csv and customers-2.csv files from http://clarity.cloud.tibco.com/console/download/Samples.zip, and uploaded the customers-2.csv file to Dropbox.

TIBCO Clarity Examples

Page 20: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

6 | Chapter 2 Working with the Sample-customers Dataset

Figure 2 Upload Data from Dropbox

5. Click Next to confirm the file upload.

6. On the Parse File page, click Next to use the default settings.

Now you have created a dataset with two example data files.

Task C Mapping Data

Mapping consolidates data from multiple data sources into a unified dataset.

The following mapping methods are available:

• Mapping Data Manually, page 6

• Mapping Data Automatically, page 8

Mapping Data Manually

Manual mapping allows you selecting the columns that you want to map manually.

Here an example is given to show how to map the FirstName column:

1. Drag FirstName from customers-1.csv to the “Group data” panel.

2. Drag FirstName from customers-2.csv to the “Group data” panel.

TIBCO Clarity Examples

Page 21: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Creating the Dataset and Project | 7

You need to manually add the mapping for other columns one by one, and then Click Next.

Here click Clear all mapping to use the auto map.

Figure 3 Manual Map Data

TIBCO Clarity Examples

Page 22: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

8 | Chapter 2 Working with the Sample-customers Dataset

Mapping Data Automatically

Auto mapping helps you automatically map data. When clicking Auto map, TIBCO Clarity finds identical column titles in dataset and groups them under the same column title.

To map data automatically, click Auto map. TIBCO Clarity sorts out the rest, as shown in Figure 4.

Figure 4 Auto Map Data

Task D Creating the Projects

This section describes how to create two projects with different methods.

• Creating the First Project: project 1, page 9

• Creating the Second Project: Sample30Each, page 9

TIBCO Clarity Examples

Page 23: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Creating the Dataset and Project | 9

Creating the First Project: project 1

After mapping data, do the following on the "Sample data" page to pick up project data:

1. Click the customers-1.csv tab.

2. Click Load 100 % of rows.

3. Click the customers-2.csv tab.

4. Click Load 100% of rows.

5. Click Done. Project 1 is created and you are brought to the Data page for this project.

Creating the Second Project: Sample30Each

You can also create a project by cloning a project. This Sample30Each project only contains 30 rows.

To clone a project in a dataset:

1. On the Home page, expand Sample-customers.

2. Hover your mouse over project 1, and click Clone.

3. Rename the project name to Sample30Each.

4. Click the customers-1.csv tab.

a. Click Load rows from.

b. Enter 1 and 30 into the Row Number field.

c. Click the customers-2.csv tab.

d. Click Load rows from.

e. Enter 1 and 30 into the Row Number field.

5. Click Done. Sample30Each is created and you are brought to the Data page for this project.

TIBCO Clarity Examples

Page 24: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

10 | Chapter 2 Working with the Sample-customers Dataset

Analyzing Data (Basic Tutorial)

Profiling allows you analyzing data based on rows and columns. The following

two basic analysis methods are used:

• Row Analysis, page 10

• Column Analysis, page 10

Row AnalysisTo analyze the sample data by rows:

1. On the Home page, click the project of the Sample-customers dataset.

2. Click Profile on the menu. The row analysis result is displayed.

As shown, project1 has 0 empty rows, with 192 as the maximum row size compared to the average row size of 160.62. Only one record has 11 columns populated, while the other 490 rows of records have all 16 columns populated.

Click the value 11 of Minimum column size. The minimum column size report is displayed, as shown in Figure 5.

Figure 5 Minimum Column Size for Project 1 of Sample-customers Dataset

Column AnalysisTo analyze the sample data by columns:

1. On the Home page, click the project of the Sample-customers dataset.

2. Click Profile on the menu.

3. Select Column analysis from the Analysis type list on the Profiling Analysis page.

The column analysis contains the following two aspects:

TIBCO Clarity Examples

Page 25: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Analyzing Data (Basic Tutorial) | 11

• Numeric columns analysis provides mathematical results on operations of Empties, Nulls, Uniqueness, Unique count, Min, Max, Mean, Sum, Standard deviation, and Quartile.

• String columns analysis provides column count on the subjects of Empties, Nulls, Uniqueness, Unique count, Min length, Max length, and Mean length.

TIBCO Clarity Examples

Page 26: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

12 | Chapter 2 Working with the Sample-customers Dataset

Validating Data

Validating allows you applying validation rules to data and filtering out invalid data.

To validate the sample data, you need to:

1. Defining Data Types, page 12

2. Validating, page 12

Defining Data Types1. On the data page, click Validate on the menu.

2. Click Auto Suggest on the Data Types and Constraints page. Click Next to use the default settings.

3. Click Apply on the Column Data Type Detecting Result page to apply the data types suggested by TIBCO Clarity.

4. Configure the following fields for the Gender column on the Data Types and Constraints page:

a. Select Valid list from the Contains list.

b. Click Click to add/edit valid list.

c. Enter M in the Enter a valid value field. Click Add.

d. Enter F in the Enter a valid value field. Click Add.

e. Click Save.

ValidatingAfter Defining Data Types, click Save changes to start validating.

The validation results are displayed on the Data page, as shown in Figure 6. The invalid rows are marked with the icon. Click the icon to view details.

TIBCO Clarity Examples

Page 27: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Validating Data | 13

Figure 6 Validation Results for Project 1 of Sample-customers Dataset

TIBCO Clarity Examples

Page 28: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

14 | Chapter 2 Working with the Sample-customers Dataset

Analyzing Data (Advanced Tutorial)

In addition to the basic profiling analysis, you can also analyze the sample data by:

• Charting Data, page 14

• Dependency Analysis, page 15

• Pattern Faceting of String, page 16

Charting DataCharting not only visualizes data but also provides a way to find out duplicates and invalid data.

A bar chart is created to present distribution between values of Gender and number of row count.

To create a bar chart for the Gender column:

1. Click Chart on the menu.

2. On the Charting page:

a. Click bar.

b. Select Gender from the X axis list.

c. Select Row Count from the Y axis list.

d. Click Create chart.

The gender distribution is displayed, as shown Figure 7. From this bar chart, you see that there are 6 different types of gender in your records, these are (blank), F, FM, M, ML, and X. The gender value (blank) and X are invalid values.

TIBCO Clarity Examples

Page 29: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Analyzing Data (Advanced Tutorial) | 15

Figure 7 Bar Chart Distribution of Gender

Dependency AnalysisDependency checks the relationships between columns.

To check the dependency:

1. Expand All, and then select Dependency on the Data page.

2. Drag FirstName, LastName, and Phone from the Column Name panel to the Key Column panel on the Column Dependency page.

3. Drag SSN from the Column Name panel to the Value panel.

The principle of dependency analysis is to test whether one or a group of keys can uniquely determine the column in Value field.

4. Click Analyze Dependency.

The rows that are not uniquely determined by the columns are displayed, as shown in Figure 8.

TIBCO Clarity Examples

Page 30: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

16 | Chapter 2 Working with the Sample-customers Dataset

Figure 8 Column Dependency Check

Pattern Faceting of StringPattern faceting allows you filtering out inconsistent data formats that exists in your records.

The following examples are given to show how pattern faceting works:

• Pattern Facet on SSN, page 16

• Pattern Facet on Zip, page 17

Pattern Facet on SSN

To apply pattern facet on the SSN column:

1. Expand SSN, and then select Facet > Text pattern facet on the Data page.

2. Click count.

As shown in Figure 9, most of the values in the SSN column are in the format of 999-99-9999.

TIBCO Clarity Examples

Page 31: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Analyzing Data (Advanced Tutorial) | 17

Figure 9 Pattern Facet on Column SSN

Pattern Facet on Zip

To apply pattern facet on the Zip column:

3. Expand ZIP, and then select Facet > Text pattern facet on the Data page.

4. Click count. As shown in the ZIP Facet panel, most values in the zip column is in the format of 99999.

To convert the invalid values, see Transform Data Based on Pattern Facet Result on page 19.

TIBCO Clarity Examples

Page 32: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

18 | Chapter 2 Working with the Sample-customers Dataset

Address Cleansing

You can use the addressing functionality to check the address and select the best one for each address.

To do the address cleansing:

1. Click Address on the menu.

2. In the Address cleansing panel, move the slider to change the threshold to 80%.

3. Drag the State and City columns from the Source Columns panel to the middle.

4. Drag the State and City columns from the Destination Columns panel to the middle.

5. Click Run to start checking.

Two columns, addr_city and addr_state, have been added, which display the automatically cleansed address. You can delete the original city and state columns.

TIBCO Clarity Examples

Page 33: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Transforming and Deduping | 19

Transforming and Deduping

Within the analysis and validation of the project for the Sample-patients dataset, there are some invalid and duplicate data. You can use the transforming and deduping functions to cleanse this data.

Transforming DataThe transforming data function allows you transforming data to specified format. The following transforming methods are used in this example:

• Transform Data Based on Pattern Facet Result on page 19

• Convert Date Format, page 20

• Remove Columns, page 21

• Split Column, page 21

• Look-up Table Transformation, page 22

Transform Data Based on Pattern Facet Result

When using faceting to check the ZIP column, as described in Pattern Facet on Zip, you can see that 49 rows are in the format of 9999. This particular format is a typo from the data input process.

To fix this typo:

1. Click 9999 in the ZIP facet panel.

TIBCO Clarity lists all the data rows that have the format of 9999.

2. Expand ZIP, and then select Edit cells > Transform on the Data page.

3. Enter "0" + value into the Expression field.

4. Click OK. All the data in the 9999 format are transformed into the 99999 format, as shown in Figure 10.

The following steps are based on the faceting to the ZIP column in Pattern Facet on Zip on page 17.

TIBCO Clarity Examples

Page 34: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

20 | Chapter 2 Working with the Sample-customers Dataset

Figure 10 ZIP Column Transformation

Convert Date Format

As you look at the DOB values, you can see that the date format is inconsistent.

To convert the date format:

1. Expand DOB, and then select Edit column > Transform date format on the Data page.

2. Configure the following fields on the Transform Date Format Based on the Column DOB page:

a. Enter a new column name in the New column name field, the default value is Copy_DOB.

b. Select dd/MM/yyyy (06/06/2000) from the New format list.

c. Use the default values for the rest of the options.

d. Click OK.

A new column Copy_DOB is created and all date data is in the format of dd/MM/yyyy(06/06/2000), as shown in Figure 11.

TIBCO Clarity Examples

Page 35: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Transforming and Deduping | 21

Figure 11 Column of Copy DOB

Remove Columns

After the operation in the Convert Date Format on page 20, there are two columns with DOB data. You can remove the original column with the inconsistent date format and rename the newly created column Copy_DOB.

To remove the original column:

1. Expand DOB, and then select Edit column > Remove this column to remove this column.

2. Expand Copy_DOB, and then select Edit column > Rename this column to rename the column.

3. Enter DOB in the Please enter a new column name field.

4. Click OK on the Rename Column page.

Split Column

To split the customer's DOB by the month:

1. Expand DOB, and then select Edit column > Split into several columns to split the DOB into three columns.

2. Enter / as separator in the Separator field. Click OK.

TIBCO Clarity Examples

Page 36: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

22 | Chapter 2 Working with the Sample-customers Dataset

Two columns, DOB 2 and DOB 3, have been added that display the date and year.

3. Remove DOB 2 and DOB 3 columns:

a. Expand DOB 2, and then select Edit column > Remove this column.

b. Expand DOB 3, and then select Edit column > Remove this column.

4. Expand DOB 1, and then select Edit column > Rename this column to rename the DOB 1 column.

5. Enter Month in the Please enter a new column name field.

6. Click OK on the Rename Column page. Only a column with month is displayed, as shown in Figure 12.

Figure 12 Month Column for Project 1 of Sample-customers Dataset

Look-up Table Transformation

Look-up table also allows you predefining a date format for transforming.

To transform date format by using look-up tables, first you need to define the look-up table as follows:

1. Select User Account > Look-up tables.

2. Hover your mouse over the title and rename it as Month.

TIBCO Clarity Examples

Page 37: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Transforming and Deduping | 23

3. Select Manual input.

4. Enter the key/value pairs of month, for example, 1, January.

5. Click Add to list.

Add all the moths in key/value pairs.

6. Click Save.

7. Click Close.

You have created a look-up table for the Month column, now you can apply this look-up table to your project to substitute every numerical month value as text.

To apply your month look-up table:

1. Expand Month, and then select Edit cells > Transform.

2. Click the Lookup tab on the Custom Text Transform on the Column Month page.

3. Click hint associated with the Month table name.

Optional: Enter value.tableLookup("Month") in the Expression field.

4. Click OK.

All the months are displayed in a string, as shown in Figure 13.

Figure 13 Transformed Month Format

TIBCO Clarity Examples

Page 38: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

24 | Chapter 2 Working with the Sample-customers Dataset

Deduping DataDeduping is used to check for duplicate records within your projects. You can either create a swappable group of columns or select the columns that you want to check for duplicates.

To create a swappable group to check duplicates:

1. Click Dedup on the menu.

2. Click , and then select Create a swappable group.

3. Select the FirstName and LastName check boxes.

4. Select the SSN check box.

Make sure that the weight values for FirstNameLastName and SSN columns are 1.

5. Click Run.

The duplicate rows are marked with the icon. Three new columns dedup_isLead, dedup_group, and dedup_rowIndex have been added. See Table 4 for detailed information

Table 4 Details of Dedup Result Column

Column Name Data Type Hint

dedup_isLead Boolean true: this row is the first found row in the group.

false: this row is not the first found row in the group

dedup_group Integer 0: this row is unique row.

>0: this row is in a duplicated group.

dedup_rowInddex Integer the value is the original row index.

TIBCO Clarity Examples

Page 39: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

| 25

Chapter 3 Working with the Sample-patients Dataset

The Sample-patients dataset includes one project, which contains a fictional set of patients data. Various methods of analysis and validation methods are used.

Topics

• Sample-patients Dataset Overview, page 26

• Analyzing Data (Basic Tutorial), page 27

• Validating Data, page 29

• Analyzing Data (Advanced Tutorial), page 35

• Transforming and Cleansing Data, page 42

• Delivering Data, page 46

TIBCO Clarity Examples

Page 40: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

26 | Chapter 3 Working with the Sample-patients Dataset

Sample-patients Dataset Overview

The Sample-patients dataset is also associated with the Project1 project. Click the Sample-patients dataset on the home page, and you are brought to the Data page of the dataset.

The following patient information is displayed on the Data page:

• Patient number is displayed in the PATNO column.

• Gender is displayed in the GENDER column.

• Visit date is displayed in the VISIT column.

• Heart rate is displayed in the HR column.

• Systolic blood pressure is displayed in the SBP column.

• Diastolic blood pressure is displayed in the DBP column.

• Diagnosis code is displayed in the DX column.

• Adverse event is displayed in the AE column.

The example for the Sample-patients dataset includes:

• Analyzing Data (Basic Tutorial), page 27

• Validating Data, page 29

• Analyzing Data (Advanced Tutorial), page 35

• Transforming and Cleansing Data, page 42

• Delivering Data, page 46

The Delivering Data section contains a complete procedure of loading dataset, analyzing and cleansing data, and exporting data.

TIBCO Clarity Examples

Page 41: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Analyzing Data (Basic Tutorial) | 27

Analyzing Data (Basic Tutorial)

Profiling allows you analyzing data based on rows and columns. The following two basic analysis methods are used:

• Row Analysis, page 27

• Column Analysis, page 28

Row AnalysisRow analysis is the default option for profiling analysis. Row analysis provides you with information regarding empty rows, average row size, maximum row size, minimum row size, average column size, maximum column size, and minimum column size of the row analysis report.

To analyze the sample data by rows:

1. On the Home page, click the project of the Sample-patients dataset.

2. Click Profile on the menu. The row analysis for the project is displayed, as shown in Figure 14.

Figure 14 Row Analysis for Project 1 of Sample-patients Dataset

TIBCO Clarity Examples

Page 42: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

28 | Chapter 3 Working with the Sample-patients Dataset

Column AnalysisColumn analysis provides you with column count on the project of the Sample-patients dataset.

To analyze the project data by columns:

1. On the Home page, click the project of the Sample-patients dataset.

2. Click Profile on the menu.

3. Select Column analysis from the Analysis type list. The column analysis is displayed, as shown in Figure 15.

Figure 15 Column Analysis for Project 1 of Sample-patients Dataset

TIBCO Clarity Examples

Page 43: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Validating Data | 29

Validating Data

Validating allows you validating the project data by using your own validation rules or the imported pre-configured metadata.

A complete validation procedure includes:

1. Defining Data Types, page 29

2. Validating, page 31

3. Analyzing Validation Results, page 32

4. Exporting Validation Results, page 34

Defining Data Types TIBCO Clarity automatically configures data types for columns, and you can also define your own data types for validation.

To configure data type for each column:

1. Click Validate on the menu.

2. Click Auto Suggest on the Data Types and Constraints page. Click Next.

3. Click Apply on the Data Type Result page to apply the data types to the columns.

4. On the Data Types and Constraints page, define a custom data type for the PATNO column.

a. Change the data type from Integer to String.

b. Enter the regular expression ^\d\d\d$.

c. Click Save next to the PATNO column.

d. Enter a name for this custom type in the Enter a name for the new custom type field.

e. Click Add to save the data type.

5. Define custome data types for other columns, as described in Table 5.

Figure 16 shows the configured data types.

TIBCO Clarity Examples

Page 44: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

30 | Chapter 3 Working with the Sample-patients Dataset

Table 5 Validation Rule for Each Column

Variable Name Description Variable Type Constraints Clarity Constraint Null

Allowed

PATNO Patient Number

String Numerals Whole: ^\d\d\d$ Yes

GENDER Gender String ’M’ or ’F’ Valid values: M, F Yes

VISIT Visit Date Date Like 12/31/2013 MM/dd/yyyy Yes

HR Heart Rate Integer 40 to 120 from 40 to 120 Yes

SBP Systolic Blood Pressure

Integer 80 to 200 from 80 to 200 Yes

DBP Diastolic Blood Pressure

Integer 60 to 120 from 60 to 120 Yes

DX Diagnosis Code

Integer 1 to 3 digits Length: 3 Yes

AE Adverse Event

Integer ’0’ or ’1’ 0/1 Yes

TIBCO Clarity Examples

Page 45: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Validating Data | 31

Figure 16 Defined Validation Rule for Columns

ValidatingAfter Defining Data Types on page 29, click Save changes to start validating.

The validation results are displayed on the Data page. The invalid rows are marked with the icon , as shown in Figure 17. Click the icon to view the details.

Figure 17 Validation Result

TIBCO Clarity Examples

Page 46: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

32 | Chapter 3 Working with the Sample-patients Dataset

Analyzing Validation ResultsThe functionalities of Profiling and Validation have an iterative relationship. After defining data types and validation rules, you can perform:

• Column Analysis, page 32

• Diagram Analysis, page 33

Column Analysis

To analyze the validation results by columns:

1. Click Profile on the menu.

2. Select Column analysis from the Analysis type list.

The column analysis is displayed, as shown in Figure 18. A "Numeric columns" table is displayed with analysis of Minimum values, Maximum values, and Sum of total.

Figure 18 Column Analysis for Validation Results

TIBCO Clarity Examples

Page 47: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Validating Data | 33

Diagram Analysis

Based on the Column Analysis on page 32 for the validation data, TIBCO Clarity allows you analyzing a specific column by using a quartile diagram or a normal distribution diagram.

To view the quartile diagram or the normal distribution diagram of a particular column:

1. Click any value in the following columns: 1st quartile, Median, or 3rd quartile. A quartile diagram is generated, as shown in Figure 19.

2. Click Close.

Figure 19 Quartile Diagram od DBP

3. Click the Std deviation value for DBP. A normal distribution diagram is generated, as shown in Figure 20.

4. Click Close.

Figure 20 Normal Diagram of DBP

TIBCO Clarity Examples

Page 48: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

34 | Chapter 3 Working with the Sample-patients Dataset

Exporting Validation ResultsAfter validating and analyzing the sample data, you can separate valid rows from invalid rows, export valid rows, and delete the invalid rows:

1. On the Data page, expand All, and then select Facet > Facet by validation.

2. Click false in the Validation errors Facet panel.

3. Click include next to false.

Figure 21 Validation Errors Facet

4. On the Data page, expand All, and then select Export > Export to File > Excel. All the valid data is exported.

5. On the Data page, expand All, and then select Export.

6. Click true in the Validation errors field Facet panel.

7. On the Data page, expand All, and then select Edit rows > Remove all validated errors rows. All the invalid rows are removed.

Ensure that you have validated the sample data, as described in Defining Data Types and Validating.

TIBCO Clarity Examples

Page 49: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Analyzing Data (Advanced Tutorial) | 35

Analyzing Data (Advanced Tutorial)

In addition to the basic profiling analysis, you can also analyze the sample data by:

• Faceting Data, page 35

• Dependency Analysis, page 38

• Charting Data, page 38

Faceting DataFaceting is another way to analyze your data.

The following faceting methods are used to analyze the sample data:

• Facet to Check Duplications, page 35

• Facet to Check Invalidations, page 36

• Facet to Check Format, page 37

• Facet to Check Integrity, page 37

Facet to Check Duplications

Faceting allows you checking for duplications in a single column.

To use facet to check duplications in the PATNO column:

1. Expand PATNO, and then select Facet > Text facet on the Data page.

2. Click Count on the PATNO Facet panel.

3. Move the slider bar to see the word count of each PATNO.

You see that PATNO 002, 003, and 006 have duplications, as shown in Figure 22.

TIBCO Clarity Examples

Page 50: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

36 | Chapter 3 Working with the Sample-patients Dataset

Figure 22 PATNO Facet

Facet to Check Invalidations

Faceting allows you checking for invalid values in a single column.

To use facet to check invalidations in the GENDER column:

1. Expand GENDER, and then select Facet > Text facet, on the Data page.

2. Click Count on the GENDER Facet panel.

You see that the GENDER column has invalid values of 1, 2, X, and (blank).

Figure 23 GENDER Facet

The checking is based on the validation rule defined in Defining Data Types on page 29.

TIBCO Clarity Examples

Page 51: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Analyzing Data (Advanced Tutorial) | 37

Facet to Check Format

Faceting allows you checking data formats.

To use facet to check format on the VISIT column:

1. Expand VISIT, and then select Facet > Text pattern facet on the Data page.

2. Click Count on the VISIT Facet panel.

You see that most dates are in the format of MM/dd/yyyy. To modify dates that do not conform to this format, see Converting Date Format on page 43.

Figure 24 VISIT Facet

Facet to Check Integrity

Facet allows you checking the integrity of your data.

To use facet to check the integrity of the HR column:

1. Expand VISIT, and then select Facet > Numeric facet on the Data page. A bar chart of the HR column is displayed, as shown in Figure 25.

Figure 25 HR Facet

A patient’s heart rate should fall in a certain reasonable range. If the heart rate of a patient exceeds a certain range, the data is invalid.

This operation is based on the validation rule defined in Defining Data Types on page 29.

TIBCO Clarity Examples

Page 52: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

38 | Chapter 3 Working with the Sample-patients Dataset

As you see, heart rates of patients consist of numeric, non-numeric, and blank values. The numeric value of a patient’s heart-rate ranges between 10 and 910.

Realistically, nobody’s heart beats 910 times in a minute. See Remove Invalid Data on page 44 and Modify Invalid Data on page 45 for more details.

Dependency AnalysisDependency checks the relationships between columns.

For project1 (Sample-patients dataset), a PATNO column should be associated with a unique GENDER value.

To analyze dependency between the PATNO and GENDER columns:

1. Expand All, and then select Dependency on the Data page.

2. On the Column Dependency page:

a. Drag PATNO to the Keys panel.

b. Drag GENDER to the Value panel.

3. Click Analyze Dependency.

As shown in Figure 26, the GENDER column is not uniquely determined by the PATNO column.

Figure 26 Column Dependency Check

Charting DataA variety of chart formats are available, such as line, bar, pie, line bar, and scatter charts, to analyze your data.

The following charts are used:

• Create Bar Chart to Check Duplication, page 39

TIBCO Clarity Examples

Page 53: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Analyzing Data (Advanced Tutorial) | 39

• Create Bar Chart to Check Invalid Data, page 39

• Create Pie Chart to Visualize Data Count, page 40

Create Bar Chart to Check Duplication

You can use a bar chart to check duplications of data:

To create a bar chart:

1. On the Data page, click Chart.

2. On the Charting page:

a. Click Bar to create a bar chart. Use the default setting of X axis: PATNO; Y axis: Row Count.

b. Click Create Chart.

A bar chart is generated, as shown in Figure 27. From the chart, you see that PANTO 002, 003, and 006 have duplicate rows.

Figure 27 Duplicate Check with Bar Chart

Create Bar Chart to Check Invalid Data

You can also use the bar chart to check invalid data.

To create a bar chart:

1. On the Data page, click Chart.

TIBCO Clarity Examples

Page 54: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

40 | Chapter 3 Working with the Sample-patients Dataset

2. On the Charting page:

a. Click Bar to create a bar chart.

b. Select GENDER from the X axis list.

c. Select Row Count from the Y axis list.

d. Select GENDER from the Group by list.

e. Click stacked.

f. Click Create Chart.

A bar chart is generated, as shown in Figure 28. The GENDER column has an invalid values of (blank), 2, and X.

Figure 28 Invalid Data Check with Bar Chart

Create Pie Chart to Visualize Data Count

You can use pie chart to check the number of rows for every data value.

To create a pie chart:

1. On the Data page, click Chart.

TIBCO Clarity Examples

Page 55: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Analyzing Data (Advanced Tutorial) | 41

2. On the Charting page:

a. Click Pie to create a bar chart.

b. Select GENDER from the Color list.

c. Select Row Count from the Size by list.

d. Click Create Chart.

A bar chart is generated, as shown in Figure 29. The GENDER column has invalid values of (blank), 2, and X.

Figure 29 Pie Chart for GENDER Column

TIBCO Clarity Examples

Page 56: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

42 | Chapter 3 Working with the Sample-patients Dataset

Transforming and Cleansing Data

Transforming allows you transforming data to a specified format. The following transforming methods are used:

• Transform Column, page 42

• Converting Date Format, page 43

TIBCO Clarity provides a lot of cleansing methods. The following methods are used to cleanse the sample data:

• Trimming Spaces, page 44

• Removing Empty Rows, page 44

• Removing and Modifying Invalid Data, page 44

Transform ColumnThe number of digits that a column value can have is defined. By default, no symbols are used to represent a value that does not exist.

For the DX column, the defined digit number is 3. Whereas, the current displayed digit number for DX column is 1. You can transform the value rows for DX column by using a regular expression.

To use hyphens (-) to replace the space that has no value:

1. Expand DX, and then select Edit cells > Transform.

2. Enter value.replace(’ ’,’-’) in the Expression field on the Custom Text Transform on Column DX page.

3. Move the cursor in the Expression field.

4. Press Tab and select Row from the displayed menu.

5. Click OK.

As shown in Figure 30 , the space is replaced by hyphens. One hyphen represents

one digit. All values in the DX column are displayed in three digits.

TIBCO Clarity Examples

Page 57: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Transforming and Cleansing Data | 43

Figure 30 Transformed DX Column

Converting Date FormatYou can convert the date format by using the transforming function.

When Defining Data Types, the date type defined for the VISIT column is MM/dd/yyy. After Validating, you find that some visit dates do not conform to the validation rule.

To convert the invalid date format to the validate one (MM/dd/yyy):

1. Expand VISIT, and then select Edit column > Transform date format.

2. Enter a name in the New column name field on the Transform Date Format Based on Column VISIT page.

3. Click Source Format.

4. Select the MM/dd/yy(1) check box.

5. Click OK.

TIBCO Clarity Examples

Page 58: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

44 | Chapter 3 Working with the Sample-patients Dataset

Trimming SpacesWhite spaces take space in the dataset. You can trim spaces in the data table.

To trim all the leading and trailing white spaces:

1. On the Data page, expand All, and then select Edit columns > Trim all the leading and trailing white spaces.

Removing Empty RowsEmpty rows take space in the dataset.

To remove all empty rows in a data table:

1. On the Data page, expand All, and then select Edit rows > Remove all empty rows.

Removing and Modifying Invalid DataYou can use facet to analyze your data, as described in Faceting Data on page 35.

To remove or modify the invalid data checked by Facet to Check Integrity on page 37, choose one of the following methods:

• Remove Invalid Data, page 44

• Modify Invalid Data, page 45

Remove Invalid Data

As you see, the HR value for PATNO 321 is 900 in this example. Realistically, no patient’s heart beats 900 times in a minute.

To remove the invalid values in the HR column:

1. On the Data page, expand HR, and then select Facet > Numeric facet.

2. On the HR facet:

a. Move slider to the range of 900.00 to 910.00.

b. Clear the Non-numeric check box.

c. Clear the Blank check box.

3. On the Data page, expand HR, and then select Edit cells > Common transforms > Blank out cells.

4. On the HR facet, click Reset. The HR value for PATNO 321 is blank.

5. Click the Flag icon next to PATNO 321.

TIBCO Clarity Examples

Page 59: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Transforming and Cleansing Data | 45

6. Expand All, and then select Edit rows > Remove all flagged rows.

7. Click Reset. PANTO 321 is removed.

Modify Invalid Data

To modify the values of non-numeric and blank heart rates:

1. Clear the Numeric check box to show the invalid values only.

2. Hover your mouse over the value of each patient.

3. Click Edit to enter a new value for the invalid heart rate.

4. Click OK.

To modify values that do not comply with integrity:

1. Move the slider to the range between 120 and 910.

2. Hover your mouse over the value of each patient.

3. Click Edit to enter a new value for the invalid heart rate.

4. Click OK.

TIBCO Clarity Examples

Page 60: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

46 | Chapter 3 Working with the Sample-patients Dataset

Delivering Data

A complete procedure is introduced as follows to show how to TIBCO Clarity works:

1. Reloading Dataset with a ZIP File, page 46

2. Applying Transformation Rules, page 47

3. Exporting Data, page 49

Reloading Dataset with a ZIP FileBecause a lot of operations have been performed on the project1 project of the Sample-patients dataset, you need to reload the sample data.

Download the sample data from http://clarity.cloud.tibco.com/console/download/Samples.zip and compress the patients data into a ZIP file, which is called gettingStarted.zip.

To reload the sample data:

1. On the Home page, click Create dataset.

2. On the Get Data From page:

a. Click My computer.

b. Click Choose file to upload the gettingStarted.zip file on the File Upload page. Click Next.

3. On the Parse File page, click Next.

4. On the Map Data page, click Next.

5. On the Sample Data page, rename the project to patients.

6. Click Done.

TIBCO Clarity Examples

Page 61: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Delivering Data | 47

Figure 31 Reloaded Patients Data

Applying Transformation RulesAs described in Analyzing Data (Basic Tutorial), Validating Data, and Analyzing Data (Advanced Tutorial), some invalid data exists in the Sample-patients dataset.

To cleanse the invalid data:

1. On the Data page, expand PATNO, and then select Facet > Text facet.

TIBCO Clarity Examples

Page 62: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

48 | Chapter 3 Working with the Sample-patients Dataset

2. Click Count. The number following the patient number indicates whether the patient number is duplicated. As shown, PANTO002, PANTO003, and PANT006 have duplicates. To remove the duplicate rows:

a. Click 002. Two rows of patient 002 that have identical information are displayed.

b. Click Flag on any row.

c. Expand All, and then select Edit rows > Remove all flagged rows.

Repeat Step a, Step b, and Step c for PANTO003 and PANT006 to remove duplicate rows.

3. Expand GENDER , and then select Facet > Text facet. As shown, the GENDER column has an invalid value of 2. To remove the invalid value:

a. Click 2. The invalid row is displayed.

b. Expand GENDER, and then select Edit cells > Common transforms > Blank out cells to remove this value.

4. Expand VISIT, and then select Facet > Text pattern facet. As shown, some dates are not in the format of MM/dd/yyyy. To convert the invalid date format:

a. Click 99/99/99, update value of VISIT to 10/12/1998.

b. Click AAAAAAAA, blank out the value of VISIT.

5. On the Data page, to validate the HR column.

a. Expand HR , and then select Facet > Text facet.

b. Clear Numeric check box to display results of non-numeric and blank data only. There are 6 records in total.

c. Expand HR , and then select Edit cells > Common transforms > Blank out cells.

d. Click Flag for all blank rows.

e. Select the Numeric check box.

f. Move the slider between the ranges of 10 - 40 and 120 - 910 respectively. Ensure that all the rows that fall into these two ranges are marked with Flags.

g. Click Reset to return to the Data page.

6. On the Data page, apply the same transformation for the HR column to other numeric columns of SBP, DBP, and DX.

When you have finished these transformation steps, all invalid and uncompleted patient records are flagged.

TIBCO Clarity Examples

Page 63: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

Delivering Data | 49

7. Expand AE , and then select Facet > Text facet.

a. Click (blank), on the AE facet. Flag the rows associated with the blank value of the AE column.

b. Click Reset to return to the Data page.

8. Expand All, and then select Edit rows > Remove all flagged rows.

9. Expand All , and then select Edit rows > Remove all validated errors rows.

Figure 32 Cleansed Patients Data

Exporting DataYou can export the cleansed data to various destinations. See TIBCO Clarity User’s Guide for more details about exporting.

Here the cleansed data is exported to a database.

To export data to database:

1. On the Data page, expand All, and then select Export > Export to database.

2. On the Export to a Database page, from the JDBC Driver list, select one of the JDBC drivers.

3. Provide the database URL, username, and password that are used to log in to the database.

4. Click TEST Connection to validate the connection. Click Next when the connection is valid.

5. On the Export to database page, click Create new table and enter a table name.

6. Click Finish.

TIBCO Clarity Examples

Page 64: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

50 | Chapter 3 Working with the Sample-patients Dataset

TIBCO Clarity Examples

Page 65: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

| 51

Index

C

charting 14creating

a dataset and projects 5creating dataset

mapping data 6uploading data 5

creating project 8

D

deduping 24dependency analysis 15documentation

third-party ixTIBCO products viii

E

exporting data 49

F

faceting datapattern faceting of string 16

I

introductionto customers example 4

L

look-up table 22look-up table transformation 22

P

profilingadvanced functionality 14

column analysis 14basic functionality 10

R

reloading dataset as ZIP file 46

T

TIBCO supportcontacting xiiiTIBCO documentation xiiiTIBCOmmunity xiii

transformation of data 19converting data format 20removing columns 21splitting column 21

V

validation 12

TIBCO Clarity Examples

Page 66: TIBCO Clarity - Cloud Edition Examples · TIBCO® Clarity Examples Software Release 1.3 August 2014 Two-Second Advantage®

52 |

TIBCO Clarity Examples