27
Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com

Text Analytics Workshop Applications

  • Upload
    rupert

  • View
    68

  • Download
    1

Embed Size (px)

DESCRIPTION

Text Analytics Workshop Applications. Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com. Agenda. Text Analytics Applications Integration with Search –Faceted Navigation Integration with ECM Metadata Auto-categorization - PowerPoint PPT Presentation

Citation preview

  • Text Analytics Workshop Applications

    Tom Reamy Chief Knowledge ArchitectKAPS GroupKnowledge Architecture Professional Serviceshttp://www.kapsgroup.com

  • *AgendaText Analytics ApplicationsIntegration with Search Faceted NavigationIntegration with ECMMetadataAuto-categorizationPlatform for Information ApplicationsEnterprise internal and externalCommercial Structure for Social

  • *Text Analytics and Search - ElementsFacet orthogonal dimension of metadataEntity / Noun Phrase metadata value of a facetEntity extraction feeds facets, signature, ontologiesTaxonomy and categorization rulesAuto-categorization aboutness, subject facetsPeople tagging, evaluating tags, fine tune rules and taxonomy

  • *Essentials of FacetsFacets are not categoriesCategories are what a document is about limited numberEntities are contained within a document any numberFacets are orthogonal mutually exclusive dimensionsAn event is not a person is not a document is not a place.Facets variety of units, of structureNumerical range (price), Location big to smallAlphabetical, Hierarchical taxonomicFacets are designed to be used in combinationWine where color = red, price = excessive, location = Calirfornia,And sentiment = snotty

  • *Advantages of Faceted Navigation

    More intuitive easy to guess what is behind each doorSimplicity of internal organization20 questions we know and useDynamic selection of categoriesAllow multiple perspectivesAbility to Handle Compound SubjectsSystematic Advantages fewer elements4 facets of 10 nodes = 10,000 node taxonomyAbility to Handle Compound SubjectsFlexible can be combined with other navigation elements

  • *Developing Facets: Tools and TechniquesSoftware Tools Entity ExtractionDictionaries variety of entities, coverage, specialtyCost of update service or in-house50+ predefined entity types800,000 people, 700,000 locations, 400,000 organizationsRulesCapitalization, text Mr., Inc.Advanced proximity and frequency of actions, associationsNeed people to continually refine the rulesEntities and CategorizationTotal number and pattern of entities = a type of aboutness of the document Bar Code, FingerprintSAS integration of entities (concepts) and categorization

  • *Three EnvironmentsE-CommerceCatalogs, small uniform collections of entitiesUniform behavior buy thisEnterpriseMore content, more types of contentEnterprise Tools Search, ECMPublishing Process tagging, metadata standardsInternetWildly different amount and type of content, no taggersGeneral Purpose Flickr, YahooVertical Portal selected content, no taggers

  • *Three Environments: E-Commerce

  • *Three Environments: E-Commerce

  • *Enterprise Environment When and how add metadataEnterprise Content different world than eCommerceMore Content, more kinds, more unstructuredNot a catalog to start less metadata and structured content Complexity -- not just content but variety of users and activitiesCombination of human and automatic metadata ECMSoftware aided - suggestions, entities, ontologiesEnterprise Question of Balance / strategyMore facets = more findability (up to a point)Fewer facets = lower cost to tag documents IssuesNot enough facetsWrong set of facets business not informationIll-defined facets too complex internal structure

  • *Facets and Taxonomies Enterprise Environment Taxonomy, 7 facetsTaxonomy of Subjects / Disciplines:Science > Marine Science > Marine microbiology > Marine toxinsFacets:Organization > Division > GroupClients > Federal > EPAInstruments > Environmental Testing > Ocean Analysis > VehicleFacilities > Division > Location > Building XMethods > Social > Population StudyMaterials > Compounds > ChemicalsContent Type Knowledge Asset > Proposals

  • *External Environment Text Mining, Vertical PortalsInternet Content Scale impacts design and technology speed of indexingLimited control Association of publishers to selection of content to noneMajor subtypes different rules metadata and resultsComplex queries and alertsTerrorism taxonomy + geography + people + organizationsText Mining General or specific content and facets and categoriesDedicated tools or component of Portal internal or externalVertical Portal Relatively homogenous content and usersGeneral range of questionsMore specific targets the document, not a web site

  • *Internet DesignSubject Matter taxonomy Business TopicsFinance > Currency > Exchange RatesFacets Location > Western World > United StatesPeople Alphabetical and/or Topical - OrganizationOrganization > Corporation > Car Manufacturing > FordDate Absolute or range (1-1-01 to 1-1-08, last 30 days)Publisher Alphabetical and/or Topical OrganizationContent Type list newspapers, financial reports, etc.

  • *

  • *

  • *

  • *Integrated Facet ApplicationDesign Issues - GeneralWhat is the right combination of elements?Faceted navigation, metadata, browse, search, categorized search results, file planWhat is the right balance of elements?Dominant dimension or equal facetsBrowse topics and filter by facetWhen to combine search, topics, and facets?Search first and then filter by topics / facetBrowse/facet front end with a search box

  • *Integrated Facet ApplicationDesign Issues - GeneralHomogeneity of Audience and ContentModel of the Domain broadHow many facets do you need?More facets and let users decideAllow for customization cant define a single setUser Analysis tasks, labeling, communitiesIssue labels that people use to describe their business and label that they use to find informationMatch the structure to domain and taskUsers can understand different structures

  • *Automatic Facets Special IssuesScale requires more automated solutionsMore sophisticated rulesRules to find and populate existing metadataVariety of types of existing metadata Publisher, title, dateMultiple implementation Standards Last Name, First / First Name, LastIssue of disambiguation:Same person, different name Henry Ford, Mr. Ford, Henry X. FordSame word, different entity Ford and FordNumber of entities and thresholds per results set / documentUsability, audience needsRelevance Ranking number of entities, rank of facets

  • *Putting it all together Infrastructure SolutionFacets, Taxonomies, Software, PeopleCombine formal power with ability to support multiple user perspectivesFacet System interdependent, map of domainEntity extraction feeds facets, signatures, ontologiesTaxonomy & Auto-categorization aboutness, subjectPeople tagging, evaluating tags, fine tune rules and taxonomyThe future is the combination of simple facets with rich taxonomies with complex semantics / ontologies

  • *Putting it all together Infrastructure SolutionIntegration with ECMCentral Team Metadata Create dictionaries of entities Develop text analytics catalogsPublishing ProcessSoftware suggests entities, categorizationAuthors task is simple yes or no, not think of keywordEnterprise SearchIntegrate at metadata level build advanced presentation and refine resultsIntegrate into relevance

  • *Text Analytics Platform Multiple ApplicationsPlatform for Information ApplicationsContent AggregationDuplicate Documents save millions!Text Mining BI, CI sentiment analysisSocial Hybrid folksonomy / taxonomy / auto-metadataSocial expertise, categorize tweets and blogs, reputationOntology travel assistant SIRIIntegrate with ApplicationsText into data predictive analyticsUse your Imagination!

  • *New Applications in Social MediaBehavior Prediction Telecom Customer ServiceProblem distinguish customers likely to cancel from mere threatsAnalyze customer support notesGeneral issues creative spelling, second hand reportsDevelop categorization rulesFirst distinguish cancellation calls not simpleSecond - distinguish cancel what one line or allThird distinguish real threats

  • *New Applications in Social MediaBehavior Prediction Telecom Customer ServiceBasic Rule(START_20, (AND, (DIST_7,"[cancel]", "[cancel-what-cust]"),(NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", [if])))))Examples:customer called to say he will cancell his account if the does not stop receiving a call from the ad agency. cci and is upset that he has the asl charge and wants it off or her is going to cancel his actask about the contract expiration date as she wanted to cxl teh acctCombine sophisticated rules with sentiment statistical training and Predictive Analytics and behavior monitoring

  • *New Applications: Wisdom of CrowdsCrowd Sourcing Technical SupportExample Android User ForumDevelop a taxonomy of products, features, problem areasDevelop Categorization Rules:I use the SDK method and it isn't to bad a all. I'll get some pics up later, I am still trying to get the time to update from fresh 1.0 to 1.1.Find product & feature forum structureFind problem areas in response, nearby text for solutionAutomatic simply expose lists of solutionsSearch Based application Human mediated experts scan and clean up solutions

  • *New Directions in Social MediaText Analytics, Text Mining, and Predictive AnalyticsTwo Systems of the BrainFast, System 1, Immediate patterns (TM)Slow, System 2, Conceptual, reasoning (TA)Text Analytics pre-processing for TMDiscover additional structure in unstructured textBehavior Prediction adding depth in individual documents New variables for Predictive Analytics, Social Media AnalyticsNew dimensions 90% of informationText Mining for TA Semi-automated taxonomy development Bottom Up- terms in documents frequency, date, clusteringImprove speed and quality semi-automatic

  • Questions? Tom Reamy [email protected] GroupKnowledge Architecture Professional Serviceshttp://www.kapsgroup.com

    **********************