Fusepool Machine Learning Framework

Embed Size (px)

Text of Fusepool Machine Learning Framework

  • 1. Fusepool Machine Learning Framework June 25th, Brussels

2. Fusepool Structured Content Visualization Enable personalized software 3. Outline Introduction to adaptive interfaces Source refinement Document labeling Link prediction Adaptive layout Simple Machine Learning: Listen-Update-Predict (LUP) LUP in detail for document labelling Predictive Query: Predictive queries 4. Adaptive interfaces Guillaume Bouchard (Xerox) 5. Customization/Contextualization of interfaces Known and accepted by big internet companies Nor easy to implement for SMEs 6. Annotation tools To manage large knowledge bases, the is a need for efficient interfaces for annotators Web2.0 companies are investigating these tools Mixed initiative oA learning algorithm + human interface Remark: a user can be an annotator for some time 7. Supervised automation Introduction Challenge LOD provides huge amount of data Hard to organize Goal Streamline KB cleaning and management through implicit and explicit feedback Specifications Easy tagging of documents Near real-time prediction 8. Adaptive components in Fusepool Document category prediction Entity labeling Source refinement (re-ranking based on previous user clicks) Adaptive Layout 9. Simple Machine Learning: Listen-Update-Predict (LUP) Guillaume Bouchard (Xerox) 10. Motivation Adaptive systems Many systems use machine learning algorithms as internal components The interaction between raw data, annotations, algorithms and predictions is not simple: Data: Large and distributed (the 3 Vs: Velocity, Variety, Volume) Algorithms: multiple possible algorithms for the same task, slow training/inference Visualization: must carry the uncertainty about data, annotations and predictions Common problems: Confusion between predictions and data Models not automatically updated (manually re-train models) No simple way to test new algorithms Annotations not shared accross models in the same system Too few annotations in specific domain (no principled way to gather new annotations) 11. Prior art Patterns (and Anti-Patterns) for Developing Machine Learning Systems. SysML 2008 https://www.usenix.org/legacy/event/sysml08/tech/rios_talk.pdf The Agent Learning Pattern: Implementing ML algorithms in multiagent systems http://www.cs.cmu.edu/~alberto/papers/LearningPatternSugarLoaf.pdf Gestalt, a general-purpose integrated development environment designed the application of machine learning Kayur Patel (University of Washington) http://www.acm.org/uist/archive/adjunct/2010/pdf/doctoral_consortium/p355.pdf Scikit-learn. Three complementary interfaces: Estimator, Predictor, transformer http://hal.inria.fr/docs/00/85/65/11/PDF/paper.pdf Infer.net: Probabilistic programming. Compilation of machine learning codes http://research.microsoft.com/en-us/um/people/cmbishop/downloads/bishop-mbml-2012.pdf Never-Ending Language Learning (NELL). The closest to our work but focused on language www.cs.cmu.edu/~acarlson/papers/carlson-aaai10.pdf 12. Never Ending Language Learning Intelligent computer agent Runs forever. Every day: 1. extract, or read, information from the web 2. learn to perform this task better Carlson, Betteridge, Kisiel, Settles, Hruschka and Mitchell (2010) give the design principles for such an agent 13. Machine learning process 14. LUPI Module overview Listen Gets notified when new annotations arrive Update Process annotation & update learning models Predict Exposes a prediction service available for other components Investigate Actively ask for new annotations 15. LUP modules are monitored by Fusepool main platform 16. LUP Module Implementation LUPEngine in a java interface Locations: com.xerox.services.LUPEngine o + getGraphListener(...); o + graphChanged(...); o + updateModels(...); o + predict(...); 17. Guillaume Bouchard (Xerox) 18. Supervised automation Follow the LUP Listen Users give labels to documents in the GUI Labels stored in annotation store Update Optimize the model with latest annotations Warm start machine learning algorithms Predict Real time prediction based on updated model Visible in the GUI 19. Supervised automation Architecture Components Process 20. Supervised automation Xerox web services Update and prediction using REST interface Scaling up prediction to huge datasets 21. Listen private class MyListener implements GraphListener { public void graphChanged(List list) { /** * Listener method: called when matching modifications detected on * the Annostore. This method triggers the Learning process, using * the updateModels(HashMap paramas) method. */ annostore = tcManager.getMGraph(ANNOTATION_GRAPH_NAME); for (GraphEvent e : list) { log.info("New #MyKindOfAnnotation !"); HashMap params = new HashMap(); // 1.) Accessing the target of the annotation Iterator it = annostore.filter(e.getTriple().getSubject(), new UriRef("http://www.w3.org/ns/oa#hasTarget"), null); // 2.) Accessing the content as text of the target // e.g. the new word to insert into the dictionary Resource target = it.next().getObject(); it = annostore.filter((NonLiteral)target, new UriRef("http://www.w3.org/2011/content#chars"), null); String newWord = it.next().getObject().toString(); params.put("newWord", newWord); updateModels(params); } } } 22. Update public void updateModels(HashMap params) { /** * This method updates the learning models. */ String newWord = params.get("newWord"); log.info("Adding " + newWord + " to dictionnary"); myDictionnary.add(newWord); } 23. Predict HashMap params = new HashMap(); String docURI = ""; /** * We build the parameters to give it to the L3.4via the predictionHub */ params.put("docURI", docURI); /** * We call the LUP34.predict(...) method via the predictionHub.predict(...) method */ String predictedLabels = predictionHub.predict("LUP34", params); /** * We dump the result of the prediction */ log.info(predictedLabels); /** * "tissue__0.713##sodium__0.09135##English__0.016" */ 24. Supervised automation Multi-task learning services Better prediction based on multi-task algorithm with label embedding Efficient learning algorithms o Alternating optimization o Stochastic Gradient Descent Efficient storage based on Cassandra 25. Supervised automation Sequence diagram 1. The GUI insert annotations 2. The Listener calls the LUP3.4 Module 3. The LUP calls the REST API 4. Then the information flows back when doing prediction 26. Supervised automation Properly tested interface Corpus 20 Newgroups WebKB Cade Tolerance 1 2 3 1 2 3 1 2 Rank = 20 0.152 0.074 0.05 0.15 0.055 0.035 0.348 0.222 Rank = 50 0.16 0.072 0.052 0.2 0.085 0.04 0.386 0.266 Rank = 100 0.256 0.166 0.126 0.335 0.18 0.11 0.134 0.072 27. Predictive queries Guillaume Bouchard (Xerox) 28. Motivation for predictive queries Most of prediction problems can be expressed as a query on missing information. SELECT ?n WHERE