3
Textual ETL – A Brief Description Textual ETL is the technology that allows you to read text and place that text into a standard relational environment. Textual ETL creates context for text and places that context in a standard relational data base. With textual ETL you can read text in any electronic form, transform the text, and build a standard relational data base management system with the transformed text. On the output side, the standard relational data base management systems that are supported include SQL Server, Oracle, Hadoop, Teradata and DB2/UDB. The input into textual ETL is any form of electronic text. Typical forms of input include email, documents, Hadoop, spreadsheets and any other form of electronic input. Non electronic text can be used as input after passing the non electronic text through OCR. Textual ETL reads the output of OCR. Once the output relational data base is created, any reporting or analytical technology that reads relational data can be used. Technology such as Business Objects, Cognos, MicroStrategy, SAS, Crystal Reports, Tableau are all compatible with the reading and analysis of a standard relational data base. Input The electronic text that can be processed can be in many languages – English, Portuguese, Spanish, French, German, Dutch, and many other languages. The input coming into Textual ETL can be from email, tweets, or other forms of social media. The input coming into Textual ETL can come from a data base as well. The raw textual input can come in the form of well formed language

Ia Pe 2012-04-17 Inmon AboutTextualETL

Embed Size (px)

DESCRIPTION

Textual ETL

Citation preview

Textual ETL – A Brief Description

Textual ETL is the technology that allows you to read text and place that text into a standard relational environment. Textual ETL creates context for text and places that context in a standard relational data base.

With textual ETL you can read text in any electronic form, transform the text, and build a standard relational data base management system with the transformed text. On the output side, the standard relational data base management systems that are supported include SQL Server, Oracle, Hadoop, Teradata and DB2/UDB. The input into textual ETL is any form of electronic text. Typical forms of input include email, documents, Hadoop, spreadsheets and any other form of electronic input. Non electronic text can be used as input after passing the non electronic text through OCR. Textual ETL reads the output of OCR.

Once the output relational data base is created, any reporting or analytical technology that reads relational data can be used. Technology such as Business Objects, Cognos, MicroStrategy, SAS, Crystal Reports, Tableau are all compatible with the reading and analysis of a standard relational data base.

Input

The electronic text that can be processed can be in many languages – English, Portuguese, Spanish, French, German, Dutch, and many other languages. The input coming into Textual ETL can be from email, tweets, or other forms of social media. The input coming into Textual ETL can come from a data base as well. The raw textual input can come in the form of well formed language and syntax or in the form of shorthand, slang or comments. Textual ETL can handle all forms of electronic input.

Textual ETL makes use of taxonomies. Textual ETL is tightly integrated with Wand Inc technology. As a result Textual ETL has access to over 29,000 different taxonomies covering over 2,300,000 terms.

Textual ETL is designed to be used and run in an iterative manner. Forest Rim has filed 9 patent applications on the technology found in textual ETL

Transformations

The raw text is passed through many transformations. Some of the many transformations that can be specified include –

- Document metadata only- Homographic resolutions- Taxonomic resolution- Stop word processing- Stemming (in English)- Date standardization and variable identification- Numeric transformation and variable identification- Beginning delimiter/ending delimiter variable identification- Custom variable formatting- Document logical subdivision recognition- Standard word suppression- And so forth.

These are but a few of the transformations that can be specified by the end user.

In addition textual ETL is scalable. The only capacity limitation to textual ETL is the amount of hardware that you choose to use. The individual work files that are produced are created so as to prevent key conflicts upon merger of work files.

Other features include –

- Checkpoint start/restart- Textual Business Intelligence- Spreadsheet input qualification and transformation- Transformation output file identification

Native platforms supported for the administrative work station include –

- Microsoft SQL Server, vb.net- Teradata- MySQL

Forest Rim offers classes and technical support.

Forest Rim is end user, business driven. Unlike other technologies that are classically IT driven, the interface and usage of Textual ETL is for the business user. The interface is a business oriented SME interface, not a technical interface.

Forest Rim offers free proof of concepts for samples of 50 documents or less. For more than 50 documents Forest Rim will do a proof of concept for a nominal fee.

Some of the many applications that Forest Rim can be used for include –

- Contracts management- Oil and Gas Exploration Log analysis- Email analysis- Medical records analysis- Warranty Claims automation- Customer Sentiment analysis- Insurance claims analysis

You can contact Forest Rim at [email protected].