Phd Props Version 4

Embed Size (px)

Citation preview

  • 7/31/2019 Phd Props Version 4

    1/57

    CISUC has six research groups:

    G1: Cognitive and Media SystemsG2: Adaptive ComputationG3: Software and Systems EngineeringG4: Communications and TelematicsG5: Information SystemsG6: Evolutionary and Complex Systems

    The list of PhD Proposals is divided by each group.

    Each proposal has a reference Gx.y (x is the number of the group; y is the number ofthe proposal within that group).

  • 7/31/2019 Phd Props Version 4

    2/57

    G1: Cognitive and Media Systems

    http://cisuc.dei.uc.pt/csg/

    PhD Thesis Proposal: G1.1Title: Human readable ATP proofs in Euclidean Geometry

    Keywords: Automatic Theorem Proving, Axiomatic Proofs in Euclidean Geometry.

    Supervisor: Prof. Pedro Quaresma ([email protected])

    Summary:Automated theorem proving (ATP) in geometry has two major lines of research:axiomatic proof style and algebraic proof style (see [6], for instance, for a survey).Algebraic proof style methods are based on reducing geometry properties to algebraic

    properties expressed in terms of Cartesian coordinates. These methods are usually veryefficient, but the proofs they produce do not reflect the geometry nature of the problemand they give only a yes/no conclusion. Axiomatic methods attempt to automatetraditional geometry proof methods that produce human-readable proofs. Building ontop of the existing ATPs (namely GCLCprover [5, 4, 8, 9, 10] to the area method [1, 2, 3,7, 8, 11] or ATPs dealing with construction [6] the goal is to built an ATP capable ofproducing human-readable proofs, with a clean connection between the geometricconjectures and theirs proofs.

    References:[1] Shang-Ching Chou, Xiao-Shan Gao, and Jing-Zhong Zhang. Automated production of traditional proofs forconstructive geometry theorems. In Moshe Vardi, editor, Proceedings of the Eighth Annual IEEE Symposiumon Logic in Computer Science LICS, pages 4856. IEEE Computer Society Press, June 1993.

    [2] Shang-Ching Chou, Xiao-Shan Gao, and Jing-Zhong Zhang. Automated generation of readable proofs withgeometric invariants, I. multiple and shortest proof generation. Journal of Automated Reasoning, 17:325347,1996.[3] Shang-Ching Chou, Xiao-Shan Gao, and Jing-Zhong Zhang. Automated generation of readable proofs withgeometric invariants, II. theorem proving with full-angles. Journal of Automated Reasoning, 17:349370,1996.[4] Predrag Janicic and Pedro Quaresma. Automatic verification of regular constructions in dynamic geometrysystems. In Proceedings of the ADG06, 2006.[5] Predrag Janicic and Pedro Quaresma. System description: Gclcprover + geothms. In Ulrich Furbach andNatarajan Shankar, editors, IJCAR 2006, LNAI. Springer-Verlag, 2006.[6] Noboru Matsuda and Kurt Vanlehn. Gramy: A geometry theorem prover capable of construction. Journalof Automated Reasoning, (32):333, 2004.[7] Julien Narboux. A decision procedure for geometry in coq. In Proceedings TPHOLS 2004, volume 3223 ofLecture Notes in Computer Science. Springer, 2004.[8] Pedro Quaresma and Predrag Janicic. Framework for constructive geometry (based on the area method).Technical Report 2006/001, Centre for Informatics and Systems of the University of Coimbra, 2006.

    [9] Pedro Quaresma and Predrag Janicic. Geothms - geometry framework. Technical Report 2006/002, Centrefor Informatics and Systems of the University of Coimbra, 2006.[10] Pedro Quaresma and Predrag Janicic. Integrating dynamic geometry software, deduction systems, andtheorem repositories. In J. Borwein and W. Farmer, editors, MKM 2006, LNAI. Springer-Verlag, 2006.[11] Jing-Zhong Zhang, Shang-Ching Chou, and Xiao-Shan Gao. Automated production of traditional proofsfor theorems in euclidean geometry i. the hilbert intersection point theorems. Annals of Mathematics andArtificial Intelligenze, 13:109137, 1995.

  • 7/31/2019 Phd Props Version 4

    3/57

    PhD Thesis Proposal: G1.2

    Title: Formal languages in the knowledge base management

    Keywords: Formal language, knowledge base, knowledge base inconsistency,

    knowledge base management.

    Supervisor: Prof. Maria de Ftima Gonalves ([email protected])

    Summary:Knowledge base management is intended as the acquisition and normalization of newknowledge and the confrontation of this knowledge with existing knowledge, resolvingpotential conflicts and updating it. When updating a knowledge base, several problemsmay arise. Some of them are the redundancy of the updated knowledge and theknowledge base inconsistency. Formal languages can be used with success to adevelopment of an innovative system to do the knowledge base management, resolvingpotential updating problems.

    PhD Thesis Proposal: G1.3

    Title: Image classification and retrieval based on stylistic and aesthetic criteria

    Keywords: Content Based Image Retrieval, Computational Aesthetics, ArtificialIntelligence

    Supervisor: Prof. Penousal Machado ([email protected])

    Summary:The increasing volume of digital multimedia content, both online and offline, coupled with theunstructured nature of the World Wide Web (WWW) makes the need for appropriate classificationand retrieval technique more pressing than ever. As a result, there is a growing interest in ContentBased Image Retrieval (CBIR), which is clearly demonstrated by the (exponentially) increasingnumber of research papers in these areas. The automatic classification of images according to stylistic

    and aesthetic criteria would allow: image browsers and search engines to take into account the user'saesthetic preferences; on-line artwork sites to tailor their offer to match the implicit preferencesrevealed by the previous purchases of a specific user; online museums to reorganize their exhibitionsaccording to user preferences, thus offering personalized virtual tours; digital cameras to makesuggestions regarding photographic composition. Additionally, this type of system could be used toautomatically index image databases or even, if coupled with an image generation system, to createimages of a particular style or possessing certain aesthetic qualities.

  • 7/31/2019 Phd Props Version 4

    4/57

    As the title indicates, the main goal of this thesis is the development of techniques for stylistic andaesthetic based image classification and retrieval, which is a relatively unexplored area. Focusing onour own research efforts, in [1] we used a subset of the features proposed in this project an ANNclassifier for author identification. To the best of our knowledge, [3] was the first computationalsystem that dealt with aesthetic classification and/or evaluation tasks. Works such as [3,4,5] explorethe use of an autonomous image classifier in the context of evolutionary art.

    This thesis will be conducted in the Cognitive Media Systems Group of CISUC in close collaborationwith the RNASA Laboratory of The University of A Corua and is an integral part of the researchproject TIN2008-06562/TIN.A one-year renewable scholarship is available.

    1. Machado, P. and Romero, J. and Ares, M. and Cardoso, A. and Manaris, B. , "Adaptive Criticsfor Evolutionary Artists", 2nd European Workshop on Evolutionary Music and Art, Coimbra,Portugal, April 2004

    2. Machado P, Cardoso A. Computing aesthetics. In: Oliveira F, editor. XIVth BrazilianSymposium on Artificial Intelligence SBIA98. LNAI Series. Porto Alegre, Brazil: Springer;1998. p. 219294

    3. Machado, P. and Romero, J. and Cardoso, A. and Santos, A. , "Partially Interactive Evolutionary

    Artists", New Generation Computing, Special Issue on Interactive Evolutionary Computation,H. Takagi, January 20054. Machado, P. and Romero, J. and Santos, A. and Cardoso, A. and Pazos, A. , "On the

    development of evolutionary artificial artists", Computers & Graphics, Vol. 31, # 6, pp. 818-826, Elsevier, December 2007

    5. Machado P, Romero J, Manaris B. Experiments in computational aesthetics. The Art ofArtificial Evolution: A Handbook on Evolutionary Art and Music. Springer, Romero. J andMachado, P. (eds). Natural Computing Series. 2007. pp 381-415

    PhD Thesis Proposal: G1.4

    Title: Artificial intelligence Approaches to Artistic Creativity

    Keywords: Artificial Intelligence, Computational Art

    Supervisor: Prof. Penousal Machado ([email protected])

    Summary:Artistic behavior is one of the most celebrated qualities of the human mind. Although artisticmanifestations vary from culture to culture, dedication to artistic tasks is common to all. In other

    words, artistic behavior is a universal trait of the human species [1]. The current, Western definitionof art is relatively new. However, a dedication to artistic endeavors such as the embellishment oftools, body ornamentation, or gathering of unusual, arguably aesthetic, objects can be traced backto the origins of humanity. That is, art is ever-present in human history and prehistory [2]. In the

    words of Leonardo da Vinci Art is the Queen of all sciences communicating knowledge to all thegenerations of the world.

    We consider that creativity, emotion, the perception of beauty and artistic behavior are fundamentalaspects of Intelligence and that, as such, Artificial Intelligence approaches that ignore these aspectsmiss an important part of what makes us Humans. Additionally, If machines could understand and

  • 7/31/2019 Phd Props Version 4

    5/57

    affect our perceptions of beauty and happiness, they could touch peoples lives in fantastic newways (Hugo Liu).

    As the title indicates this thesis seeks: to grasp a deeper understanding of artistic creative behavior; tostudy and develop models that may capture essential aspects of beauty, emotional response andcreativity; and ultimately to develop intelligent agents that implement these models. To pursue this

    goal the candidate will be integrated in the multifaceted team of researchers of the Cognitive MediaSystems Group of CISUC which possesses a vast experience in fields such as: Computational

    Aesthetics, Evolutionary Computation, Artificial Neural Networks, Music Information Retrieval,Creative Systems and Computational Art.

    1. Romero J and Machado P. The Art of Artificial Evolution: A Handbook on Evolutionary Artand Music (eds). Springer. Natural Computing Series. 2007. pp 381-415

    2. Dissanayake E, Homo Aestheticus, University of Washington Press, 19953. Machado P, Romero J, Manaris B. Experiments in computational aesthetics. The Art of

    Artificial Evolution: A Handbook on Evolutionary Art and Music. Springer, Romero. J andMachado, P. (eds). Natural Computing Series. 2007. pp 381-415

    PhD Thesis Proposal: G1.5

    Title: Self-Adaption and Evolution of Bio-Inspired Algorithms

    Keywords: Adaptation, Evolution, Bio-Inspired Algorithms, Complexity Science

    Supervisor: Prof. Penousal Machado ([email protected]),Dr. Jorge Tavares ([email protected])

    Summary:In spite of some performance improvements of biologically inspired techniques, such as evolutionaryalgorithms and swarm intelligence, it is a fact that biology knowledge has advanced faster than ourability to incorporate novel ideas from life science disciplines into these methods. As such, and takenthat nature has been an inspiration to several different kinds of optimization and learning algorithms,

    we consider that it is still a source for improvement and new techniques.

    Usually, in order to achieve competitive results, it is often required the development of problem

    specific operators and representations, and parameter fine-tuning [1,2]. As a result, much of theresearch practice of Bio-Inspired Algorithms focuses on these aspects.

    Following the research work done on this topic [3,4], this thesis should contribute to the study anddesign of nature-inspired methods that can adapt themselves to the problem they are solving [3-5].

    The evolution of these components such as representation, operators and parameters [3], maycontribute to performance improvements, give insight to the idiosyncrasies of particular problems,alleviate the burden of researchers when designing bio-inspired algorithms, and push frontiers ofproblem-solving.

  • 7/31/2019 Phd Props Version 4

    6/57

    References:1. Eiben, A.E., and Smith, J.E., "Introduction to Evolutionary Computing", Natural Computing

    Series, Springer, 2007.2. Beyer, H.-G., and Meyer-Nieberg, S., Self-Adaptation in Evolutionary Algorithms,

    In F. Lobo, C. Lima, and Z. Michalewicz, editors, Parameter Setting in Evolutionary Algorithm,

    47-75, Springer, Berlin, 2007.3. Tavares, J. and Machado, P. and Cardoso, A. and Pereira, F. B. and Costa, E. , "On the

    Evolution of Evolutionary Algorithms", in Proc. of the EuroGP 2004 Proceedings, 7thEuropean Conference on Genetic Programming, Coimbra, Portugal, April 2004.

    4. Machado, P. and Tavares, J. and Cardoso, A. and Pereira, F. B. and Costa, E. , "EvolvingCreativity", Computational Creativity Workshop, 7th European Conference in Case BasedReasoning, Madrid, August 2004.

    5. Oltean, M. Evolving Evolutionary Algorithms Using Linear Genetic Programming.Evolutinary Computation Journal, MIT Press, 13, 3, 387-410, 2005.

    PhD Thesis Proposal: G1.6

    Title: Case-Based Hierarchical-Task Network Planning

    Keywords: Planning, Case-based Planning, Decision-theoretic planning

    Supervisor: Prof. Lus Macedo ([email protected])

    Summary:

    Hierarchical-Task Network (HTN) planning is a planning methodology that is more

    expressive than STRIPS-style planning. Given a set of tasks that need to be performed(the planning problem), the planning process decomposes them into simpler subtasksuntil primitive tasks or actions that can be directly executed are reached. Methodsprovided by the domain theory indicate how tasks are decomposed into subtasks.However, for many real-world domains, sometimes it is hard to collect methods tocompletely model the generation of plans. For this reason an alternative approach that isbased on cases of methods has been taken in combination with methods. Real-worlddomains are usually dynamic and uncertain. In these domains actions may have severaloutcomes, some of which may be more valuable than others. Planning in these domainsrequire special techniques for dealing with uncertainty. Actually, this has been one ofthe main concerns of the planning research in the last years, and several decision-

    theoretic planning approaches has been proposed and used successfully, some based onthe extension of classical planning and others on Markov-Decision Processes. In thesedecision-theoretic planning frameworks actions are usually probabilistic conditionalactions, preferences over the outcomes of the actions is expressed in terms of an utilityfunction, and plans are evaluated in terms of their expected utility. The main goal is tofind the plan or set of plans that maximizes an expected utility function, i.e, to find theoptimal plan.

  • 7/31/2019 Phd Props Version 4

    7/57

    In this thesis a planner that combines the technique of decision-theoretic planning withthe methodology of HTN planning should be built in order to deal with uncertain,dynamic large-scale real-world domains [Macedo & Cardoso, 2004]. Unlike in regularHTN planning, methods for task decomposition shouldnt be used, but instead cases ofplans. The planner should generate a variant of a HTN - a kind of AND/OR tree of

    probabilistic conditional tasks - that expresses all the possible ways to decompose aninitial task network.

    References:Macedo, L. and A. Cardoso (2004). Case-Based, Decision-Theoretic, HTN Planning.Advances in Case-Based Reasoning: Proceedings of the 7th European Conference on Case-BasedReasoning. P. Calero and P. Funk. Berlin, Springer: 257-271.Macedo, L. The Exploration of Unknown Environments by Affective Agents. PhD Thesis,2006.

    PhD Thesis Proposal: G1.7

    Title: Collaborative Multi-Agent Exploration of 3-D Dynamic Environments

    Keywords: Exploration, multi-agent systems

    Supervisor: Prof. Lus Macedo ([email protected])

    Summary:

    Exploration gathers information about the unknown. Exploration of unknown

    environments by artificial agents (usually mobile robots) has actually been an activeresearch field [Macedo & Cardoso, 2004]. The exploration domains include planetaryexploration (e.g., Mars or lunar exploration), search for meteorites in Antarctica, volcanoexploration, map-building of interiors, etc. Several exploration techniques have beenproposed and tested either in simulated and real, indoor and outdoor environments,using single or multiple agents. The main advantage of multi-agent approaches is toavoid covering the same area by two or more agents. However, there is still much to bedone especially in dynamic environments as those mentioned above. Besides, realenvironments, however, consist of objects. For example, office environments possesschairs, doors, garbage cans, etc., cities comprise several kinds of buildings (houses,offices, hospitals, churches, etc.), cars, etc. Many of these objects are non-stationary, that

    is, their locations may change over time. This observation motivates research on a newgeneration of mapping algorithms, which represent environments as collections ofobjects. At a minimum, such object models would enable a robot to track changes in theenvironment. For example, a cleaning robot entering an office at night might realize thata garbage can has moved from one location to another. It might do so without the needto learn a model of this garbage can from scratch, as would be necessary with existingrobot mapping techniques.

  • 7/31/2019 Phd Props Version 4

    8/57

    This thesis addresses the problem of finding multi-agent strategies to address theproblem of collaborative exploration of unknown, 3-D, dynamic environments. Thestrategy or strategies should be tested against other exploration strategies found in theliterature.

    References:

    Macedo, L. The Exploration of Unknown Environments by Affective Agents. PhD Thesis,2006.

    Macedo, L. and A. Cardoso (2004). Exploration of Unknown Environments withMotivational Agents. Proceedings of the Third International Joint Conference on AutonomousAgents and Multiagent Systems. N. Jennings and M. Tambe. New York, IEEE ComputerSociety: 328 - 335.

    PhD Thesis Proposal: G1.8

    Title: Simulating Consciousness in Computers

    Keywords: Consciousness, Affect, Emotion

    Supervisor: Prof. Lus Macedo ([email protected])

    Summary:

    Consciousness is a characteristic of the mind generally regarded to comprise qualitiessuch as subjectivity, self-awareness, sentience, sapience, and the ability to perceive therelationship between oneself and one's environment. Some researchers attempt toexplain consciousness directly in neurophysiological or physical terms, while othersoffer cognitive theories of consciousness whereby conscious mental states are reduced tosome kind of representational relation between mental states and the world. There are anumber of such representational theories of consciousness currently on the market,including higher-order theories which hold that what makes a mental state conscious isthat the subject is aware of it in some sense.We generally agree that human beings are conscious, and that much simpler life forms,such as bacteria, are not. Many of us attribute consciousness to higher-order animals

    such as dolphins and primates. Academic research is investigating the extent to whichanimals are conscious. This suggests the hypothesis that consciousness has co-evolvedwith life, which would require it to have some sort of added value, especially survivalvalue. People have therefore looked for specific functions and benefits of consciousness.Bernard Baars (1997), for instance, states that "consciousness is a supremely functionaladaptation" and suggests a variety of functions in which consciousness plays animportant, if not essential, role: prioritization of alternatives, problem solving, decisionmaking, brain processes recruiting, action control, error detection, planning, learning,

  • 7/31/2019 Phd Props Version 4

    9/57

    adaptation, context creation, and access to information. Antnio Damsio (1999) regardsconsciousness as part of an organism's survival kit, allowing planned rather thaninstinctual responses. He also points out that awareness of self allows a concern for one'sown survival, which increases the drive to survive, although how far consciousness isinvolved in behaviour is an actively debated issue.

    The possibility of machine (or robot) consciousness has intrigued philosophers and non-philosophers alike for decades. Could a machine really think or be conscious? Could arobot really subjectively experience the smelling of a rose or the feeling of pain? Severaltests, such as those based on the Turing Test, have been developed which attempt toprovide an operational definition of consciousness and try to determine whethercomputers and other non-human animals can demonstrate through their behavior, bypassing these tests, that they are consciousThe goal of this thesis is simulating consciousness in computers based on one or moretheories about human consciousness.

    PhD Thesis Proposal: G1.9

    Title:Bridging the Gap Between Web 2.0 Collaborative Environments and the

    Semantic Web

    Keywords: Ontologies, Semantic Web, Web 2.0, Collaborative and Social

    Environments

    Supervisor: Prof. Paulo Gomes ([email protected])

    Summary:

    New kinds of highly popular user-centered applications such as blogs,folksonomies, and wikis, have come to be known as "Web 2.0". The reason fortheir immediate success is the fact that no specific skills are needed forparticipating. These new kinds of tools do not only provide data but alsogenerate a lot of weakly structured meta data. One perfect example is tagging.Here users add tags to a resource which can be seen as a kind of meta data.Tags are supposed to describe, from the users point of view, the resource. Suchmeta data is easy to produce but it lacks any kind of formal grounding used inthe Semantic Web.

    On the other hand the Semantic Web complements the described bottom-upeffort of the Web 2.0 community in a top down manner as, one of its centralpoints is a fixed vocabulary, typed relations and a stronger knowledgerepresentation based on some kind of ontology. Such structure is typicallysomething users have in mind when they provide their information. But forresearcher it is hidden in the data and needs to be extracted. Techniques toanalyze network structures or weak knowledge representations like those foundin the Web 2.0 have a long tradition in different other disciplines, like socialnetwork analysis, machine learning or data mining. These kinds of automatic

  • 7/31/2019 Phd Props Version 4

    10/57

    mechanisms are necessary to extract the hidden information and to reveal thestructure in a way that the Semantic Web community can benefit from, andthus provide added value to the end user. On the other hand the establishedway to represent knowledge gained from the unstructured data can bebeneficial for the Web 2.0 in that it provides Web 2.0 users with enhancedSemantic Web features to structure their data.

    The aim of this thesis is to bridge the gap between the Semantic Web andthe Web 2.0 environments. Since both ideas have in common the improvementof search and semantics in the web, the combination of these techniques is animportant step towards a more intelligent web as Tim Berners-Lee envisioned[Berners-Lee, T., J. Hendler, and O. Lassila, The Semantic Web. ScientificAmerican, 2001. 284(5): p. 34-43]. Techniques can be, but are not limited to,social network analysis, graph analysis, machine learning, ontology learning,text mining or web mining methods [Peter Mika. Ontologies are us: A unifiedmodel of social networks and semantics. Journal of Web Semantics 5 (1), page 5-15, 2007].

    PhD Thesis Proposal: G1.10

    Title:Intelligent Knowledge Management using the Semantic Web

    Keywords: Semantic Web, Knowledge Management, Artificial Intelligence, Ontologies

    Supervisor: Prof. Paulo Gomes ([email protected])

    Summary:Nowadays, companies gather and store big amounts of information in databases. Thisinformation presents potential high value knowledge for a company. But most of thisinformation or data is not transformed in knowledge, remaining lost in data bases ordocument repositories. Software development is a knowledge intensive activityinvolving several types of know-how and skills. Usually development teams haveseveral members, which makes sharing and dissemination of knowledge crucial forproject success. One evolving technology that can be used with the purpose of buildingknowledge management tools for the software development area is the semantic web.Semantics are the lost chain between information/data and knowledge, and thesemantic web provides the infrastructure needed for making a true sharing of

    knowledge possible.The semantic web is an infrastructure providing semantics associated with words in

    web resources. But, by itself it does not provide a tool for knowledge management.What are needed, are tools that enable the usage of the semantic web in an intelligentway, so that users can take advantage of knowledge sharing. The main problem to bedealt with in this thesis is how a team of software development engineers can be aidedby a tool, or a set of tools, that enable them to reuse knowledge in a more efficient way,thus increasing their productivity.

  • 7/31/2019 Phd Props Version 4

    11/57

    The main objective of this thesis is to develop a set of tools based on the semanticweb. These tools are intended to have a set of intelligent characteristics, such as:learning, proactive reasoning, semantic searching and retrieval of knowledge,representation of knowledge, knowledge acquisition, personalization, and others.Several reasoning methods have been developed in Artificial Intelligence and are ideal

    candidates to be used in this research work. Some of the results of this research work arenew algorithms and methodologies for knowledge management.

    PhD Thesis Proposal: G1.11

    Title:A Markov Logic Reasoning Engine for the Semantic Web

    Keywords: Semantic Web, Markov Logics, Ontologies

    Supervisor: Prof. Paulo Gomes ([email protected])

    Summary:

    A Markov logic network (MLN) is a first-order knowledge base with a weight attachedto each formula, and can be viewed as a template for constructing Markov networks.From the point of view of probability, MLNs provide a compact language to specifyvery large Markov networks, and the ability to flexibly and modularly incorporate awide range of domain knowledge into them. From the point of view of first-order logic,MLNs add the ability to soundly handle uncertainty, tolerate imperfect andcontradictory knowledge, and reduce brittleness. Many important tasks in statistical

    relational learning, like collective classification, link prediction, link-based clustering,social network modeling, and object identification, are naturally formulated as instancesof MLN learning and inference.

    The semantic web is an infrastructure providing semantics associated with words inweb resources. The foundations of the semantic web are ontologies and descriptivelogics, which by itself do not deal well uncertainty. MLNs have the ability to deal withboth worlds: logics and statistics. The main problem to be dealt with in this thesis is howto build a reasoning engine for the semantic web infrastructure using MLNs. Theapplications of this reasoning engine are immense, from natural language processing totext mining and web mining, to the core of knowledge management applications.

  • 7/31/2019 Phd Props Version 4

    12/57

    PhD Thesis Proposal: G1.12

    Title:SemanticMining of Software Resources

    Keywords: Semantic Mining, Web Mining, Semantic Web, Software Reuse,Knowledge Management

    Supervisor: Prof. Paulo Gomes ([email protected])

    Summary:

    Semantic Mining (Berendt et. al. 2002, Stumme et. Al. 2006) combines Semantic Webtechnologies with Web Mining in a way that both contribute to the enrichment anddiscovery of new knowledge. These two areas can be combined in the following ways:extracting semantics from the Web; using semantics for web mining; and mining theSemantic Web. Extracting semantics from the Web requires the use of Web Miningtechniques to extract the semantics that are present in page content and structure, bylearning, mapping and merging Ontologies in the Web. Another way to extractsemantics from the web is by Web Usage Mining, which explores the user navigationpaths and actions to infer new knowledge. Using semantics for Web Mining comprisesthe improvement of Web Mining results by exploiting the ontologies and other semanticstructures that are present in the Semantic Web. This can be especially important to thesharing of knowledge among communities in the same scientific area, thus making a realweb of knowledge. The mining of the Semantic Web is also a way of gathering andfinding new knowledge from an already organized structure, but that can be importantat a more abstract and complex level of reasoning.

    All these techniques described can be used at a local level, such as a project Intranet

    or in a particular knowledge domain. The aim of this thesis is to use Semantic Mining toexplore and enhance the knowledge associated with software development, so that itcan be shared within a organization that develops software. The main idea is to applySemantic Mining techniques to software repositories, so that new knowledge areextracted, stored and indexed, to be reused by software engineers in the development ofnew software or maintenance of already developed systems.

    References:Berendt, B., Hotho, A., Stumme, G. (2002). Towards semantic web mining. In Horrocks,

    I., Hendler, J.A., eds.: The Semantic Web. In Horrocks, I., Hendler, J.A., eds.:Proceedings of the First International Semantic Web Conference, Springer. 264278.

    G. Stumme, A. Hotho, and B. Berendt, Semantic web mining: State of the art and futuredirections, Web Semantics: Science, Services and Agents on the World Wide Web,vol. 4, no. 2, pp. 124143, June 2006.

  • 7/31/2019 Phd Props Version 4

    13/57

    PhD Thesis Proposal: G1.13

    Title: Algorithms for Semantic Annotation of Positioning Information

    Keywords: locations, places, positioning systems, location based services

    Supervisors: Prof. Francisco Cmara ([email protected])

    Prof. Carlos Bento ([email protected])

    Summary:Although we find today a myriad of positioning technologies (from the common GPS toWireless, GSM cell or Ultra Wide Band positioning algorithms), the interpretation ofwhat exactly position means is still cumbersome. For example, the information that weare at latitude 4,234W and longitude 30,123N or my current GSM cell ID is 1098 ispoor in terms of meaning for a user. Informations such as I am in Morrocco, mycurrent location is in Coimbra or I am at work are clearly richer and useful for a

    wealth of applications. This is known as the From Position to Place problem(Hightower, 2003) and is currently a hot topic in the Ubiquitous Computing area. Theprimary goal of this PhD project is to study and develop methodologies that cancontribute to solving the problem just described. The approach expected will likely takeinto account the user model, context and social interaction. This work is one of thecentral topics of research of the Ubiquitous Systems Group of the AILab and has a highpotential of applicability in a range of state-of-the-art ubiquitous systems.

    PhD Thesis Proposal G.1.14

    Thesis: "YouTrace: Collaborative Map Generation"

    Keywords: Ubquitous Computing; Map Making; Map Matching; GPS traces;Intelligent Transport Systems

    Supervisors: Francisco Cmara Pereira and Ana AlmeidaIn the YouTrace project, we propose to develop a social networking platform for sharinglocalization (GNSS) traces. Working on inspirations from well-known Web2.0applications (e.g. Wikipedia, YouTube), the vision is to provide a platform where users

    voluntarily share their localization traces in order to get services that improve theirquality of life and allow for social interaction. Such platform must be responsible foraggregating those traces into a collaborative Map of the World, providing socialnetworking services and adding intelligent data analysis tools to support decisionmaking, both at the level of the individual user as well as of the urban transport policymaker.

  • 7/31/2019 Phd Props Version 4

    14/57

    This PhD thesis should focus on the development of efficient algorithms forAggregation, Update, Filtering and Map Matching of the incoming GPS traces. In CMS,work has already been started in these algorithms and the incoming PhD student will beintegrated in a very motivated team.

    PhD Thesis Proposal G.1.15

    Thesis:Individual Mobility Optimization from Trace Analysis

    Keywords: Spatial Data Analysis; GPS Traces; Artificial Intelligence; IntelligentTransport Systems

    Supervisor:Francisco Cmara Pereira

    Summary:

    Current widespread use of GPS (and other localization technology) receivers is leadingto the generation of very large amounts of movement data. Due to privacy and securityconcerns, it is understandable that such data is scarcely available unless good valueadded is provided to the contributor. Starting with individual uses that guaranteeprivacy, yet allowing value added services (for example, a person that keeps ownmovement traces could analyze the efficiency of his/her mobility in terms oftime/cost/fuel spent), this thesis should focus on the study of algorithms for analysis ofmovement traces. After the algorithms focusing on individual use, the should alsoapproaches the analysis of aggregated sets of traces, enabling, for example, theextraction of movement patterns in the city.

    PhD Thesis Proposal G.1.16

    Thesis:Data Analysis in the City

    Keywords: Spatial Data Analysis; Data Fusion; Intelligent Transport Systems

    Supervisor: Francisco Cmara Pereira

    Summary:

    Within the CityMotion project (a collaboration with MIT, IST and FEUP), a number ofdifferent kinds of data inputs from the city are expected to be available. For example,from taxi fleets, cell phone usage and traffic detectors. This thesis intends to focus on thesearch for algorithms that extract new information out of this data, particularly thatinformation that can only be obtained by the fusion on two or more of these sources. Thestudent will learn and work with Spatial Data Analysis algorithms, Data Fusiontechniques, and any other options that seem promising.

  • 7/31/2019 Phd Props Version 4

    15/57

    PhD Thesis Proposal: G1. 17

    Title: Alternative specification and visualization representations in initialprogramming learning

    Keywords: computer science education; programming learning; alternativerepresentations.

    Supervisor: Prof. Maria Jos Marcelino ([email protected])

    Summary:The main objective of this thesis is to study, propose and validate, on one hand, newalternative forms of representation for algorithm and program specification and, on theother, new alternative ways of algorithm and program visualization to support initialprogramming learning and evaluate their impact on the quality of the achievedstudents learning.Initial programming learning is quite hard for the majority of students. It is usuallysupported by one (or more) of three typical modes of algorithm/program representation:

    pseudo code, flowcharts and code in a specific programming language. In whatconcerns algorithm/program visualization several approaches have also been used:variable log, debugging helps, simulated algorithm/program animation. Each studenthas her/his own preferences about these representation and visualization metaphors.

    There are particular types of programming problems that are mandatory in initialprogramming learning and for which typical students solutions (good as well aserroneous) have been identified. We believe that, although final programs must becoded in one particular programming language, during initial learning stages manyprogramming students could benefit from the study and implementation of diversealternative solution representations as well as visualizations, especially if they are moreclose to students previous experience and context.In the scope of this thesis student preferential alternative representations both at thelevel of algorithm and program specification and of results visualization will be

    identified and evaluated. After new forms will be developed and proposed in order tocope with students more commonly found difficulties that traditional approaches cannot deal with. These new forms will be afterwards the object of thorough evaluation.

    PhD Thesis Proposal: G1.18

    Title: Learning communities to support initial programming learning

    Keywords: computer science education; programming learning; learningcommunities.

    Supervisor: Prof. Antnio Jos Mendes ([email protected])

    Summary:Initial programming learning is known as a hard task to many novice students atcollege level, leading to high failure and drop out in many courses. Many reasons can befound for this scenario and several approaches have been proposed to facilitate

  • 7/31/2019 Phd Props Version 4

    16/57

    students learning. However, problems continue to exist and it is necessary toinvestigate new solutions that may help programming students and teachers.Learning communities concept exists for some time. It has been presented as a way tocreate rich learning contexts where teachers, students and other people, namelyexperts, can coexist and collaborate in the production of knowledge, consequentlyleading to learning enhancement.

    This thesis proposal includes first the study of representative learning communitiessuccessful cases and characteristics, and after the study, proposal and creation of alearning communities support platform specially adapted to the needs of studentsduring programming learning. The platform and its utilization will undergo a fullevaluation, in order to access its success in promoting programming learning. It isexpected that this platform includes innovative characteristics, for example theinclusion of virtual members that may interact with real members when necessary andspecially tailored features and tools that may improve the quality of programminglearning.

    PhD Thesis Proposal: G1.19

    Title: Problem solving patterns and remediation strategies in programming learning

    Keywords: computer science education; programming learning; learningcommunities.

    Supervisors: Prof. Antnio Jos Mendes ([email protected])

    Prof. Maria Jos Marcelino ([email protected])

    Summary:Initial programming learning is known as a difficult task to many novice students at

    college level. In those courses it is common to use a set of typical problems to introducestudents to basic programming concepts and also to stimulate them to develop theirfirst programs and programming skills. This work is essential, since it should allowbeginners to develop the basic programming problem-solving skills necessary to befurther developed and refined later. So, this first learning stage is crucial to studentsperformance in all programming related courses.

    This thesis proposal includes a study about the different ways students approach thesetypical basic problems, leading to the identification of common problem solvingpatterns. Some of these patterns will be adequate, while others will not lead to thedevelopment of correct solutions, being considered wrong or erroneous patterns thatmust be identified and corrected in students strategies knowledge. Based on thisinformation, the thesis main objective will be the proposal, implementation andevaluation of methods and/or tools that may identify novice students strategies,

    categorize typical wrong patterns and common errors, and interact with them givingpersonalized remediation feedback when necessary. The forms of this feedback mustalso be studied, so that it becomes effective not only to help students to solve thecurrent problem, but mainly to help them to develop better approaches that may lead tocorrect solutions in later problems and learning stages.

  • 7/31/2019 Phd Props Version 4

    17/57

    PhD Thesis Proposal: G1.20

    Title: Cognitive skills to programming learning

    Keywords: computer science education; cognitive skills; motivation; problem solving;programming learning;

    Supervisors:

    Prof. Antnio Jos Mendes ([email protected])Prof. Ana Cristina Almeida FPCE ([email protected])

    Summary:

    In the last years programming courses have become more and more difficult for manystudents. High failure and dropout rates are evidence of those difficulties. However, it is

    common to find in the same course novice students with many learning difficulties sideby side with others that are able to learn programming basics without too much effort.This often results in a very unbalanced situation where a good number of novices gethigh grades while the remaining fails. Medium grades are not as common inprogramming courses as they are in other courses.The above situation can have several reasons, such as different backgrounds, cognitiveskills, motivations, interests or study methods. Possibly all these aspects are relevantand play a role in students learning capacities. That is why it is interesting and relevantto compare these aspects in fast learning students with slow learning students in thecontext of initial programming courses (and possibly also between expert programmersand non-programmers). Can we determine the cognitive skills more relevant toprogramming learning? Can we define a taxonomy that includes the most importantcognitive skills necessary to programming learning? Can we evaluate or develop(technology-based) instruments that allow programming teachers to better know theirstudents characteristics and needs? Can we find ways (possibly technology-based) tohelp students to develop the skills they dont have?This thesis main objective will be to provide answers to the above research questions.This means that it will include a diagnosis and an intervention phases. The first includesan in-depth study of students cognitive characteristics and skills confronting studentswith programming learning difficulties with students who learn programming withoutmajor difficulties, working in the same context. Ideally this phase should end with theproposal of taxonomy of the necessary cognitive skills to learn programming. Thesecond phase will be based on the analysis of the results of the field study and shouldpropose strategies and/or tools that may help students to develop the necessary skills.Ideally, this proposal should be evaluated to verify its validity.

  • 7/31/2019 Phd Props Version 4

    18/57

    PhD Thesis Proposal: G1.21

    Title: Mathematical skills and programming learning: the ability to solve problems

    Keywords: computer science education; mathematics; problem solving; programminglearning;

    Supervisors:

    Prof. Antnio Jos Mendes ([email protected])Prof. Ana Maria Almeida ([email protected])

    Summary:

    Initial programming learning is quite hard for many students. Although several factorsmay contribute to this situation, the lack of basic mathematical proficiency is probablyone of the most relevant. In fact, some preliminary studies made by our research groupestablished that programming learning difficulties are often accompanied by a deep lackof basic mathematical concepts.However, it is not clear which of the basic mathematical concepts and cognitivecompetencies are the more important to develop the needed programming skills, oreven if the development of those concepts and abilities has a direct impact inprogramming learning. If this is the case, how to develop tools that enable the studentsto rapidly acquire this competencies in the context of basic programming education?This thesis main objective will be to provide answers for the above research questions.This means that it will include both a diagnosis and an intervention phases. The firstphase (field study) implies that a comparison of mathematical knowledge and skillsmust be made and should confront results provided by students with programminglearning difficulties with results provided by students who learn programming withoutmajor difficulties, working in the same context. An also interesting study would be toconsider groups of programming experts and novices. The second phase will be basedon the analysis of the results of the field study and should conclude with a proposal ofspecific teaching and learning strategies that may be applied in the context ofprogramming courses and that may lead to an improvement in the learning results ofmany students. Ideally, this proposal should be instantiated and evaluated so as toascertain its validity.

  • 7/31/2019 Phd Props Version 4

    19/57

    PhD Thesis Proposal: G1.22

    Title:Experimental Web-based Mathematical Learning

    Keywords: Experimental learning, learning communities, mathematical

    learning, collaborative learning

    Supervisors: Prof. Ana Maria de Almeida ([email protected])

    Prof. Maria Jos Marcelino ([email protected])

    Summary:

    It is claimed by many actors in the learning process that one of the majorcauses for students failure in science subjects in general, and in Maths inparticular, is the lack of immediate application of concepts to real situations,which, in many instances, cannot be obtained through the usual class bookexercises. But how can we bring real world problems into the classroom? There

    seems to be an easy answer: simply use the Web and the tools developed for theInformation Age! This theme intends to devise a model, in the form ofguidelines, and produce a case study of implementing experimental E-learningin schools to promote the scientific method and the successful apprehension ofelementary mathematical concepts. Towards this goal, it is necessary to identifysome major mathematical keystones competences for grades 5 to 9 ofElementary Portuguese Schools, and activities centred on an appealing realscience based thematic and intended for collaborative learning. The proposedactivities should use computer-based tools and be devised so that they can bedone in the classroom using the Web. They should also allow the collaborativeinteraction with distant users (other students in different schools). After thestudy on chosen schools, it should be evaluated for the necessary conclusions

    and inferences to be made.

    PhD Thesis Proposal: G1.23

    Title: Entropy Optimization with Information Theory

    Keywords: Shannon's Entropy, Kolmogorov Complexity, Information Measure, NP-completeness

    Supervisors: Prof. Ana Maria de Almeida ([email protected])

    Summary:

    A very simple concept lays in the core of most of the applications found for InformationTheory results: Shannon's Entropy. It not only describes an optimal transmission rate or a

  • 7/31/2019 Phd Props Version 4

    20/57

    safety assurance for encription but its results, corolaries and uses go far beyond this. In thevast majority of applications tt is fundamental to proceed with a maximization of an entropyfunctions so as to derive the required results. But this optimization rarely is trivial or wellknown, involving imcomplete information. In particular, this is the case for the analisys ofNP-hard problem instances. This thesis proposes the study and development of a MaximumEntropy Diagonis tool for information characterization in the presence of imcompleteknowdlege.

    PhD Thesis Proposal: G1.24

    Title: Similarity Measures and Applications

    Keywords: Kolmogorov Complexity, Normalized distances, Machine learning,

    Clustering,Universal Similarity Metric

    Supervisors: Prof. Ana Maria de Almeida ([email protected])

    Summary:

    A reccurring problem within Knowledge based approachs is the need to identify patternsand , moreover, to apply recognition tools, which needs a similarity measure. But how canwe measure similarities between: two genotypes, two computer programs or two eco-cardiographic lines?This theme intends to study similarity distance measures useful for data-mining, patternrecognition, learning e automatic semantic extraction. After a state-of-the-art survey, thefocus of this study should rely on the confront two very different approachs: that ofNormalized Information Distance (based on Kolmogorov Complexity) versus the morecommon Optimization strategies, and derive the guidelines for choosing the more adequateapproach to specific applications like the ones above mentioned.

    PhD Thesis Proposal: G1.25

    Title: Melody Detection in Polyphonic Audio

    Keywords: music information retrieval, melody detection in polyphonic audio.

    Supervisor: Prof. Rui Pedro Paiva ([email protected])

    Summary:

  • 7/31/2019 Phd Props Version 4

    21/57

    Melody extraction from polyphonic audio is a research area of increasing interest inMusic Information Retrieval (MIR). It has a wide range of applications in various fields,including music information retrieval (particularly in query-by-humming, where theuser hums a tune to search a database of musical audio), automatic melodytranscription, performance and expressiveness analysis, extraction of melodic

    descriptors for music content metadata, and plagiarism detection, to name but a few.This area has become increasingly relevant in recent years, as digital music archives arecontinuously expanding. The current state of affairs presents new challenges to musiclibrarians and service providers regarding the organization of large-scale musicdatabases and the development of meaningful methods of interaction and retrieval.Several different approaches have been proposed in recent years, most of themevaluated and compared in the corresponding track of the Music Information RetrievalEvaluation eXchange (MIREX, a small competition that takes place every year).In [Paiva, 2006], the problem of melody detection in polyphonic audio was addressedfollowing a multistage approach, inspired by principles from perceptual theory andmusical practice. The system comprises three main modules: pitch detection,determination of musical notes (with precise temporal boundaries, pitches, and intensitylevels), and identification of melodic notes.The main objective of this thesis is to build on the work carried out in [Paiva, 2006] totackle several open issues in the developed system, namely: derive a more efficient pitchdetector, improve note determination in the presence of complex dynamics such asstrong vibrato, address the current limitation in the melody/accompanimentdiscrimination task, improve the reliability of melody detection in signals with lowersignal-to-noise-ratio, add top-down information flow to the system (e.g., the effect ofmemory and expectations), add context information (e.g., piece tonality, rhythmicinformation), augment the song evaluation database etc.

    References:

    Rui Pedro Paiva, Melody Detection in Polyphonic Audio, PhD Thesis, Department ofInformatics Engineering, University of Coimbra, 2006, Portugal.

    PhD Thesis Proposal: G1.26

    Title: Audio Fingerprinting and Music Identification

    Keywords: music information retrieval, music identification, audio fingerprinting.

    Supervisor: Prof. Rui Pedro Paiva ([email protected])

    Summary:

    Music identification systems aim to recognize songs based on their playback inmoderate noisy environments. In most current platforms (e.g., Shazam, Gracenote

  • 7/31/2019 Phd Props Version 4

    22/57

    MusicID, 411-Song), you dial the number of the service provider with your cell phone,hold your phone towards the source of the music for a few seconds (from 3 to 20,depending on the provider) and then wait for a message containing the identification ofthe song (artist, title, etc.). Such applications are based on audio fingerprintingtechniques, where an individual signature is extracted for each song in the database, and

    then compared with the fingerprint computed for the query sample.Present challenges in the area include the identification of songs in disturbed conditions,e.g., noisy environments, poor recordings, etc., or using only a few seconds of audio formatching.The main objective of this thesis is to improve the state of the art on music identificationby investigating and extending the current techniques and proposing new approaches tothe problem (e.g., hashing and search techniques, feature extraction approaches, etc.)

    References:

    - Eugene Weinstein and Pedro Moreno (2007). Music Identification with WeightedFinite-State Transducers, Proceedings of the International Conference on Acoustics,Speech, and Signal Processing (ICASSP) 2007.

    - Jaap Haitsma and Ton Kalker. (2002). A highly robust audio-fingerprinting systems.Proceedings of the 3rd International Conference on Music Information Retrieval.- Avery Wang (2003). An industrial-strength audio search algorithm. Proceedings ofthe 4th International Conference on Music Information Retrieval, invited talk.

    PhD Thesis Proposal: G1.27

    Title: Audio Music Mood Analysis

    Keywords: music information retrieval, music mood analysis.

    Supervisor: Prof. Rui Pedro Paiva ([email protected])

    Summary:

    Audio music mood-based classification is a research area of increasing interest in MusicInformation Retrieval (MIR). It has a wide range of applications in fields such usautomatic music classification, playlist generation and similarity analysis.In fact, recent studies identify music mood/emotion as an important criterion used by

    people in music retrieval and organization. Moreover, music psychology and educationrecognize the emotion component of music as the one most strongly associated withmusic expressivity.The analysis of audio music in terms of mood/emotion is challenging in its very nature:mood is a subjective notion, techniques are still in an embryonic stage and a uniformevaluation framework is yet to be agreed upon. Nevertheless, the Music InformationRetrieval Evaluation eXchange (MIREX, a small competition that takes place everyyear), has in 2007 (and for the first time) a track on audio music mood classification,

  • 7/31/2019 Phd Props Version 4

    23/57

    which will certainly give a strong impulse towards the improvement of techniques andcreation of evaluation standards.The main objective of this thesis is to analyze audio music in terms of mood contentinformation, e.g., contentment, depression, exuberance, anxiety, This involves thestudy and derivation of mood-like features, development of mood-based classifiers and

    mood-based similarity metrics. This can be further applied to mood-based musicrecommendation systems.The PhD candidate will have the opportunity to work in a cutting-edge research areawith several open and exciting research possibilities, with plenty of room for scientificinnovation.

    References:

    - Juslin P.N., Karlsson J., Lindstrm E., Friberg A. and Schoonderwaldt E. (2006) Play ItAgain With Feeling: Computer Feedback in Musical Communication of Emotions,Journal of Experimental Psychology: Applied, Vol. 12, No.2, pp. 79-95.- Lu, Liu and Zhang (2006), Automatic Mood Detection and Tracking of Music AudioSignals, IEEE Transaction on Audio, Speech and Language Processing, Vol. 14, No. 1.,

    pp. 5-18.

  • 7/31/2019 Phd Props Version 4

    24/57

    G2: Adaptive Computation

    http://cisuc.dei.uc.pt/acg/

    PhD Thesis Proposal: G2.1

    Title: Adaptive Mining for Detecting Trends in Evolving Data Sets

    Keywords: Web mining, machine learning, and pattern recognition.

    Supervisor: Prof. Bernardete Ribeiro ([email protected])

    Summary:

    A wide range of applications require the analysis of underlying data that is generated bya non-stationary process, i.e., a process that evolves over time. Recently, the discovery oftrends as data streams in has become a major challenge. Examples of such data includeWeb click-streams, network traffic monitoring, trade surveillance for security fraud andmoney laundering, dynamic tracing of stock fluctuations, biomedical signalsmonitoring, climate data from satellite measurements, financial time series, etc. Asmost decision making tasks rely on the up-to-dateness of their supporting data, theevolving nature of the data creates tremendous complexity for many mining algorithms.As most of the existing data mining techniques assume that the underlying data isgenerated by stationary processes, such techniques may not be suitable for analyzingevolving data sets. On the other hand, users are often interested in changes embodied

    by the data. To this end, the goal of this research is to develop mining algorithms moreeffective and efficient in view of changing data characteristics and for extracting patternsdescribing these changes. It will be expected that techniques developed from thisresearch will be applied to a wide variety of applications including Web mining,monitoring of biomedical signals and others.

    ProposalThis PhD will comprehend the following initial tasks: (1) Study state-of-the-art ofexisting methods for temporal data mining; (2) Construct a benchmark of a non-

    stationary data set [3] Develop techniques for clustering, classification and detectingfrequent patterns from data [4] Building accurate models for evolving data [5] Developtechniques of detecting changes in evolving data (5) Evaluate and determine

    performance measures; [6] Propose a general framework to detect trends in evolvingdata sets.

  • 7/31/2019 Phd Props Version 4

    25/57

    PhD Thesis Proposal: G2.2

    Title: Learning From Heterogeneous Data SourcesKeywords: Machine Learning, Clustering, Data Mining

    Supervisor: Prof. Bernardete Ribeiro ([email protected])

    Summary: In the last decade we have witnessed a dramatic growth in theavailability of data from a variety sources along with an increasing diversity ofthe data types. For example, bioinformatics tasks can exploit proteinsequences, gene expression profiles and ontologies; image classification taskscan use data collected from different sensors and countless more. Theseheterogeneous data sets allow a researcher to represent different characteristicsof a sample superseding the capability of a more homogenous data set. Thequestions of how to deal with this heterogeneity and how to weight the

    importance of different sources of data and information remain to be solved.Recently empirical experiments have shown that by using heterogeneousfeatures it is possible to increase overall performance and to obtain significantgains in systems detection rates.While these heterogeneous data sets are plentiful in a variety of machinelearning applications, including biomedicine, image processing, web mining,goal detection, and business, to name a few areas, conventional machinelearning algorithms may be limited by the general underlying assumptions thatthe training data available are drawn from a single source and that each sampleis represented by a single vector of variables. Therefore, more sophisticatedlearning and data fusion methods are necessary to make the best use possibleof heterogeneous data sets.

    Learning from heterogeneous sources of data, and learning in semi-supervisedlearning settings exploring novel learning approaches such as kernel methods,ensemble approaches and feature selection/extraction methods is the maintheme of this research.

    ProposalThis PhD will comprehend the following initial tasks: (1) Study state-of-the-art ofexisting methods for heterogeneous data mining; (2) Construct a benchmark of a

    heterogeneous data set [3] Develop techniques for clustering, classification and detectingpatterns from heterogeneous data [4] Building supportive underlying assumptions andaccurate models for heterogeneous data (5) Evaluate and determine performancemeasures; [6] Propose a general framework for target detect in multivariate

    heterogeneous data sets.

  • 7/31/2019 Phd Props Version 4

    26/57

    PhD Thesis Proposal: G2.3

    Title: Assigning Confidence Score in Page Ranking for Intelligent Web SearchKeywords: Graph Mining, Machine Learning, Ranking, Text Mining,

    Supervisor: Prof. Bernardete Ribeiro ([email protected])

    Summary:Web has become the main centre of research around the globe. Users facethemselves with an overload of data when a simple search is fed into Google ora similar web search engine. A recurrent problem is to unveil the desiredinformation from the wealth of available search results. Ranking, which can beachieved by providing a meaningful score for each classification decision, isimportant in most practical settings. For instance, text retrieval systemstypically produce a ranking of documents and let a user decide how far downthat ranking to go. Several Learning Machine machine learning techniques

    allow the definition of scores or confidences coupled with their classificationdecisions. The main idea of the current proposal is to explore ranking systemsbased on Bayesian graph-based data mining which has recently gained a highlevel of attraction due to its broad range of applications. The basic idea is todevise a ranking approach able to quantify the important role of a node as thedegree to which it has direct and indirect relationships with other nodes in agraph. Moreover classification systems can be improved by enrichinginformation and information representation with external backgroundinformation, such as, ontology-related data.Evaluation can be done on benchmarks, but also with real users defining thegoals and assessing the final results, including score changes in final ranking.The visual examples and applications are provided to demonstrate the

    effectiveness of our approaches.

    Summary:Web has become the main centre of research around the globe. Users facethemselves with an overload of data when a simple search is fed into Google ora similar web search engine. A recurrent problem is to unveil the desiredinformation from the wealth of available search results. Ranking, which can beachieved by providing a meaningful score for each classification decision, isimportant in most practical settings. For instance, text retrieval systemstypically produce a ranking of documents and let a user decide the searchdepth. Most of the current approaches use machine learning techniques thatallow the definition of scores or confidences coupled with classification

    decisions. Moreover, these classification systems can be improved by enrichinginformation (and information representation) with external backgroundinformation, such as, ontology-related data. Graph-based data mining is arecently emerging approach able to quantify the important role of a node as thedegree to which it has direct and indirect relationships with other nodes in agraph. The most popular web ranking system is based on this method which isstill sub-optimal.

  • 7/31/2019 Phd Props Version 4

    27/57

    The main idea of the current proposal is to devise a ranking approach based onthe combination of machine learning techniques and graph mining (e.g. GraphClustering, Graph Kernels etc.) joining the advantages of both systems.Evaluation will be done on real benchmarks, synthetic data created by graphgenerators, but also with real users defining the goals and assessing the finalresults, including score changes in the final ranking.

    Proposal

    This PhD will comprehend the following initial tasks: (1) Study state-of-the-artof existing methods for page ranking on the web; (2) Construct a benchmark ofa data set either from the web or based on graph generators [3] Developtechniques for clustering, classification and detecting patterns from data [4]Building supportive underlying assumptions and accurate ranking models forweb data (5) Evaluate and determine performance measures; [6] Propose ageneral framework for ranking on web data sets.

    PhD Thesis Proposal: G2.4

    Title: Homecare Diagnosis of Pediatric Obstructive Sleep Apnea

    Keywords: homecare; obstructive sleep apnea; reduction of complexity; biosignalsprocessing; computational intelligence; automatic diagnosis.

    Supervisor: Prof. Jorge Henriques ([email protected])

    Summary:The main goal of this work is to investigate homecare solutions that could stratifynormal and apnea events for diagnostic purposes in children suspected for the presenceof obstructive sleep apnea syndrome.Obstructive sleep apnea syndrome (OSAS) is a condition whereby recurrent episodes ofairway obstruction are associated with asphyxia and arousal from sleep. It is estimatedto affect between 1 and 3% of young children and its potential consequences includeexcessive daytime somnolence, behavioral disturbances and learning deficits,pulmonary and systemic hypertension, and growth impairment. The currently acceptedmethod for diagnosis of OSAS is overnight polysomnography (PSG), done in sleep

    laboratories, where multiple signals are collected by means of face mask, scalpelectrodes, chest bands etc. It monitors different activities, including brain waves (EEG),eye movement (EOG), muscle activity (EMG), heartbeat (ECG), blood oxygen levels andrespiration. However, the diagnosis of OSAS from these huge collection of data issometimes not straightforward to clinicians, since major relations between features andconsequents are most often very high dimensional, non-linear and complex. Theserequirements impose the necessity of innovative signal processing techniques andcomputational intelligent data interpretation methodologies, such as neural networksand fuzzy systems. One of the main goal of this work is to provide clinicians with thetools that can help them in their diagnosis.

  • 7/31/2019 Phd Props Version 4

    28/57

    Although PSG is considered the gold standard for diagnosis of OSAS, given the relativelyhigh medical costs associated with such tests and the insufficiency number of pediatricsleep laboratories, PSG is not readily accessible to children in all geographic areas.

    Thus, analysis of the validity of alternative diagnostic approaches should be done, evenassuming their accuracy is suboptimal. The second goal of this work points in thisdirection. It aims investigating the viability to reduce the number and complexity of

    measurements in order to make possible the stratification of OSAS in children naturalenvironment.

    PhD Thesis Proposal: G2.5

    Title: Architectures and algorithms for real-time learning in interpretable neuro-fuzzy systems

    Keywords: on-line learning; neuro-fuzzy systems; interpretability; machine learning

    Supervisor: Prof. Antnio Dourado ([email protected])

    Summary:The development of fuzzy rules to knowledge extraction from data acquired in real timeneeds new recursive techniques for clustering to produce well designed fuzzy-systems.For Takagi Sugeno-Kang (TSK) systems this applies mainly to the antecedents, whilefor Mamdani type it applies both for the antecedents and consequents fuzzy sets. Toincrement pos-interpretability of the fuzzy rules, such that some semantic may bededuced from the rules, pruning techniques should be developed to allow a human-interpretable labelling of the fuzzy sets in the antecedents and consequents of the rules.For this purpose convenient similarity measures between fuzzy sets and techniques formerging fuzzy rules should be developed and applied. The applications envisaged are inindustrial processes and medical fields.

    PhD Thesis Proposal: G2.6

    Title: Intelligent Monitoring of Industrial Processes with application to a Refinery

    Keywords: intelligent process monitoring; multidimensional scaling; computationalintelligence; clustering

    Supervisor: Prof. Antnio Dourado ([email protected])

  • 7/31/2019 Phd Props Version 4

    29/57

    Summary:High dimensional data in industrial complexes can be profitably used for advancedprocess monitoring if it is reduced to a dimension where human interpretability is easilyverified. Multidimensional scaling may be used to reduce it to two or three dimensions ifappropriate measures of similarity/dissimilarity are developed. The measures expressthe distance between attributes, the essence of the information, and a similar difference

    should be guaranteed in the reduced space in order to preserve the informative contentof the data. Research of appropriate measures and reduction method is needed.In the reduced space, classification of the actual operating point should be domethrough appropriate recursive clustering and pattern recognition techniques. Theclassification is intended to evidence clearly the quality level of the actual and pastoperating points in such a way that the human operator finds in it a useful decisionsupport system for the daily operation of the mill. The work has as applications theprocess of visbreaker in the Galp Sines Refinery.

    PhD Thesis Proposal: G2.7

    Title: Intelligent supervision of the colour transition in a paper machine

    Keywords: Computational intelligent methodologies, support decision system, pulp andpaper industry

    Supervisors: Prof. Jorge Henriques ([email protected]) and Prof. Alberto Cardoso([email protected])

    Summary:

    The main goal of this work consists in the developing of a learning and support decisionsystem, to be applied in the colour transition process during the production of differenttypes of paper, in a paper machine. Computational intelligent methodologies will beimplemented, in order to acquire, understand and analyze the experimental knowledgecaptured by operators during the transition phase between operation regimes. Thiswork will contribute with a clear identification and characterization of the besttransition approaches, as well as a unification strategy concerning the best practices tobe followed. Specifically, it is expected to improve the optimization of the colourtransition process (whiteness and tonality) during the paper production in the pulp andpaper industry at the Portucel and Soporcel group, in Figueira da Foz.A critical situation in an industrial environment concerns the procedures to beperformed during the transitions between operation regimes. Ideally, these transitionsshould occur minimizing both the time interval duration and the overshoot inherent to

    that transition. Additionally, this fact becomes much more pertinent when significantcosts are involved. Regarding the transition of papers colour, to be produced in a papermachine, although the process is automatically controlled, the operator has tomanually define the set-point of the colour values, as well as the instants when theyshould occur. Thus, since this procedure is complex and non trivial, significant settlingtimes are observed until the paper being produced achieves a new steady state colourspecification. However, and given that the paper machine under consideration produces1500 meters of paper/minute, a huge waste of paper is verified, with all inherentoperating cost associated. From the above, it is imperative that those transitions occurwith stability and during the minimum time interval possible. Despite the difficulties,

  • 7/31/2019 Phd Props Version 4

    30/57

    operators are able to deal with the large diversity and complexity of information involvedin the transition process, given their experience and evidence based knowledge.However, formulate this process in a systematic and precise way, is a challenging task.One of the main goals of this work is to develop and implement computationalintelligence strategies (neural networks, fuzzy systems, neuro-fuzzy systems, etc) toaddress this challenge. Take into account the historical data of past transitions, as well

    as operators know-how and expertise, a solution will be developed, able to learn andincorporate the available information and experience. The developed solution will be avaluable tool in order to provide the best strategies to follow, regarding the colourtransition process during the production of different types of paper.

    PhD Thesis Proposal: G2.8

    Title: Management system to support ideas and projects in a collaborativeenvironment

    Keywords: Computational intelligent methodologies, support decision system,collaborative environment, pulp and paper industry

    Supervisors: Prof. Alberto Cardoso ([email protected]) and Prof. Jorge Henriques([email protected])

    Summary:The main goal of this proposal is to investigate and develop a solution for a web basedmanagement system that could act as a platform to support and facilitate the process ofcreation, development and management of ideas, projects or new products, in acollaborative environment. The success and effectiveness of these processes are stronglydependent on the accurate application of Information and Communication Technologies(ICT). In this context, the applications for collaborative environments will contribute tothe natural settling of synergies between several intervenients, providing an effectivesupport to the processes associated to the creation, development and management ofideas, projects or new products.

    PhD Thesis Proposal: G2.9

    Title: Dynamical platform for training, simulation and decision support

    Keywords: Adaptive computation methods, support decision system, training,modelling and simulation.

    Supervisors: Prof. Alberto Cardoso ([email protected]) and Prof. Paulo Gil([email protected])

    Summary:

  • 7/31/2019 Phd Props Version 4

    31/57

    The development of simulation systems is a very important and valuable tool in theindustrial context, namely to understand the process dynamical behaviour, to train andimprove the knowledge of operators and to decision support action. The basis of thesimulator for one specific petroleum refinery would be a non-linear hybrid model,obtained from the high dimensional data and the operators knowlede, using adaptivecomputation methods (neural networks, fuzzy, neuro-fuzzy, ). The overall system

    should be web based and the interfaces should be similar to the panels currently usedby the operators.

    PhD Thesis Proposal: G2.10

    Title: Adaptive Intelligent Supervision in Changing Environments

    Keywords: Computational intelligent methodologies, support decision system,collaborative environment, pulp and paper industry

    Supervisors: Prof. Paulo Gil ([email protected]) and Prof. Alberto Cardoso([email protected])

    Summary:Supervision can be regarded as the manifold process of collecting relevant informationfrom the world (monitoring), predicting futures states and acting accordingly,whenever required. When the systems nature is itself time varying this high levelframework must be materialized or implemented in an adaptive way. This means thatthe supervisor performance should be permanently assessed and parameters adaptedin real time to cope with a changing environment. Another issue, of key importance,

    involving supervision in real world applications concerns the study andimplementation of intelligent methodologies enhancing the overall system robustness incase of disturbances and faults events, including varying latency times in networkcommunications.

    Subjects where contributions are expected:Non-linear modelling using artificial neural networks number of layers and the lagwindow;Real time adaptation of models mechanisms for recursive parameters adjustment innoisy environments;Fault diagnosis study and implementation of techniques for fault detection andisolation;Intelligent systems behavior conditioning study and implementation of

    reconfiguration methodologies assuring acceptable performance level in the presence offaults.

  • 7/31/2019 Phd Props Version 4

    32/57

    G3: Software and Systems Engineering

    PhD Thesis Proposal: G3.1

    Title: Self-Healing Techniques for Legacy Application Servers

    Keywords: Autonomic computing, self-healing, dependability.

    Supervisor: Prof. Lus Moura e Silva ([email protected])

    Summary:One of the actual big-challenges of the computer industry is to deal with the complexityof the systems. The Autonomic Computing initiative driven by IBM defined the following

    functional areas as the cornerstone of an autonomic system: self-configuration, self-healing, self-optimization and self-protection. The self-healing property refers to theautomatic prediction and discovery of potential failures and the automatic correction topossibly avoid downtime of the computer system. This leads to the vision of computersthat heal themselves and do not depend so much on a system manager to take care of.While there has been some interesting work on self-healing techniques for mission-critical systems there is a long way to achieve that goal in commercial off-the-shelve(COTS) servers running Apache/Linux, Tomcat, JBoss, Microsoft .Net. The purpose ofthis PhD is to study and propose low-cost and highly-effective self-healing techniquesfor these application servers. One of the potential causes of failures in 24x7 serversystems is the occurrence of software aging. The phenomena should be studied in detailtogether with high-level techniques for application-level failure detection. Somemathematical techniques should be applied to detect software aging and to forecast the

    potential time for the failure of the server system. When the aging is detected the serversystem should apply pro-actively a software rejuvenation technique to avoid thepotential crash and to keep the service up and running. Techniques for micro-rejuvenation should be further studied to avoid downtime of the server. The final resultof this PhD should be a set of software artifacts and the refinement of data analysistechniques to apply in COTS application servers in order to predict failures andsoftware aging in advance and to apply some corrective action to avoid a server crash.

    Proposal

    This PhD will comprehend the following initial tasks: (1) State-of-the-art aboutAutonomic Systems, Self-healing, Software Aging, Software Rejuvenation, Micro-rebooting and Dependability Benchmarking; (2) Machine learning techniques to forecastthe failures and software aging; (4) Application-level techniques for failure prediction

    and early detection; (5) Micro-rejuvenation techniques for application servers; (6)Extension of the techniques SOA-based and N-tier applications; (7) Dependabilitybenchmarking; (8) Implementation of an experimental framework; (9) Analysis ofexperimental results.

  • 7/31/2019 Phd Props Version 4

    33/57

    PhD Thesis Proposal: G3.2

    Title: Wired self-emerging ad hoc network

    Keywords: peer-to-peer, ad hoc, distributed hash table.

    Supervisor: Prof. Filipe Arajo ([email protected])

    Summary:In recent years computer communication is departing from the client-serverarchitecture and moving increasingly more toward a peer-to-peer architecture. Oneaspect that characterizes this kind of interaction is the opportunistic participation ofmany of the peers: they connect to the network for only a few moments, just to discoverand download (or not) what they are looking for and then they disconnect. Interestingly,mobility and battery exhaustion can reproduce this same trend in wireless ad hocnetworks, comprised of devices that use radio broadcast to communicate.

    While wired peer-to-peer and wireless ad hoc networks share a number of commonfeatures, like self-configuration, decentralized and fault-tolerant operation, they havehowever an important difference: wired peer-to-peer networks run as overlay networkson top of the IP infrastructure. This raises the following question: can we take theparadigms from wireless networks and create IP-less self-organizing wired networks?Our goal is to plug-in and out new devices or even entire networks from the wiredinfrastructure in a scalable and decentralized way and without the need for any a prioriconfiguration. In contrast, current IP networks can only scale, because they are highlyhierarchical and they require a considerable amount of human assistance. As aconsequence they are often highly congested, expensive to maintain and unreliable.

    The fundamental difference between the solution we seek and wireless ad hoc networkshas to do with available bandwidth. In fact, the most important constraint that makescollection of routing information so challenging and that limits the pace of change oftopology in wireless ad hoc networks is the (lack of) available bandwidth. Availablebandwidth is a very scarce resource, because it is shared among all the nodes. Thismakes it theoretically impossible to create a wireless ad hoc network that scales withthe number of nodes. As a consequence, algorithms for wireless ad hoc networks areoften localized or have, at most, very limited information of distant regions of thenetwork. This is very unlike the situation in wired networks: for the same pace oftopological change, the supply of bandwidth is not shared and it is much larger. Thispaves the way for better and more powerful solutions, which, we believe are largelyunexplored in literature.

    Proposal:This PhD work encompasses the following tasks: (a) review of the state-of-the-art; (b)design of the architecture; (c) evaluation of the scalability of the architecture (admissiblenumber of nodes and topological changes versus available bandwidth); (d) exact andrange-based lookup algorithms that leverage on previous work on distributed hashtables and peer-to-peer file-sharing applications; (e) design of an interconnectioninfrastructure, to connect islands of wired ad hoc networks with the IP network.

  • 7/31/2019 Phd Props Version 4

    34/57

    PhD Thesis Proposal: G3.3

    Title: Fast Moving Wireless Ad Hoc Nodes

    Keywords: peer-to-peer, wireless ad hoc, wireless infrastructured, Wireless Accessfor the Vehicular Environment (WAVE).

    Supervisors: Prof. Filipe Arajo ([email protected])

    Summary:In recent years we have assisted to an increasing interest in wireless networks. Whilemost current applications seem to be set for sensor networks, we can foresee manyother applications for mobile ad hoc or mixed ad hoc/infrastructured networks, wherenodes are mainly mobile and communication goes beyond simple data gathering of asensor network. For instance, applications can enhance the behavior of a crowd byproviding additional services to users holding mobile wireless devices, like search for agiven person that is momentarily lost, search for a person that matches some socialinterests, exchange of diverse information, of a product, etc. Another context that isextremely promising is that of a spontaneous network formed by cars in a road,enriched with some infrastructure that is able to provide traffic, weather and otherinformation to drivers. By letting cars share their information, it may be possible to savesignificant costs in the infrastructure and still considerable improve the quality andquantity of information.

    In this PhD work we want to leverage on some existing routing algorithms for wirelessad hoc networks and make them work on particular environments with specificpatterns of mobility. Interestingly, in networks with a high degree of mobility it is oftenpossible to increase the speed of the flow of information, because mobility creates moreopportunities to exchange this information. In particular, we want to consider ascenario where the network is comprised of fast-moving cars equipped with IEEE802.11p network adapters (Wireless Access for the Vehicular Environment WAVE).

    This is a case where part of the information is created and sent to some points of theinfrastructure through a chain of nodes, while at the same type, cars can also introducenew information in the network, for instance by signaling their presence to cars infront, in the rear or to cars traveling in the opposite direction. In particular, theinformation shared with cars going in the opposite direction first and then with the basestations located along the road is of paramount utility as this has the potential topropagate very accurate data of traffic jams or accidents at virtually no cost. We expectto use similar principles to more complex but slower-moving networks comprised ofpeople with handheld or other wireless devices walking in crowds.

    Proposal:This PhD work encompasses the following tasks: (a) review of the state-of-the-art; (b)design of routing algorithms for environments with high mobility; (c) design ofinformation-sharing applications for environments with high mobility; (d) simulation inreal environments.

  • 7/31/2019 Phd Props Version 4

    35/57

    PhD Thesis Proposal: G3.4

    Title: Detecting Software Aging in Database Servers

    Keywords: Software aging, software rejuvenation, autonomic computing, databasemanagement systems, dependability benchmarking

    Supervisors: Prof. Marco Vieira ([email protected])

    Prof. Lus Moura e Silva ([email protected])

    Summary:One of the main problems in software systems that have some complexity is theproblem of software aging, a phenomenon that is observed in long-running applicationswhere the execution of the software degrades over time leading to expensive hangsand/or crash failures. Software aging is not only a problem for desktop operatingsystems: it has been observed in telecommunication systems, web-servers, enterprise

    clusters, OLTP systems, spacecraft systems and safety-critical systems.Software aging happens due to the exhaustion of systems resources, like memory-leaks,unreleased locks, non-terminated threads, shared-memory pool latching, storagefragmentation, data corruption and accumulation of numerical errors. There are severalcommercial tools that help to identify some sources of memory-leaks in the softwareduring the development phase. However, not all the faults can be avoided and thosetools cannot work in third-party software modules when there is no access to thesource-code. This means that existing production systems have to deal with theproblem of software aging.

    The natural procedure to combat software aging is to apply the well-known technique ofsoftware rejuvenation. Basically, there are two basic rejuvenation policies: time-basedand prediction-based rejuvenation. The first applies a rejuvenation action periodically,while the second makes use of predictive techniques to forecast the occurrence of

    software aging and apply the action of rejuvenation strictly only when necessary.

    The goal of this PhD Thesis is to study the phenomena of software aging in commercialdatabase engines, to devise and implement some techniques to collect vital informationfrom the engine and to forecast the occurrence of aging or potential anomalies. Withthis knowledge the database engine can apply a controlled action of rejuvenation toavoid a crash or a partial failure of its system. The ultimate goal is to improve theautonomic computing capabilities of a database engine, mainly when subjected to highworkload and stress-load from the client applications.

    Proposal:The PhD work will comprehend the following initial tasks: (a) overview of the state-of-the-art about software aging, rejuvenation and dependability benchmarking; (b)

    development of a tool for dependability benchmarking of database engines; (c)development of a workload and stress-load tool for databases; (d) infrastructure ofprobes (using Ganglia) to collect vital information from a database engine; (e)development of mathematical techniques to forecast the occurrence of software aging(time-series analysis, data-mining, machine-learning, neural-networks);(f) experimentalstudy. Analysis of results; (g) adaptation of rejuvenation techniques for databaseengines; (h) writing of papers;

  • 7/31/2019 Phd Props Version 4

    36/57

    PhD Thesis Proposal: G3.5

    Title: Security benchmarking of COTS componentsKeywords: Software reliability, Security benchmarking, Experimental evaluation,Dependability benchmarking

    Supervisors: Prof. Henrique Madeira ([email protected]]

    Prof. Joo Dures ([email protected])

    Summary:One of the main problems in software systems is the vulnerability to malicious attacks.Complex systems and systems that have high degree of interaction with other systemsor users are more prone to be successfully attacked. The consequences of a successfulattack are potentially very severe and may include the theft o