View
219
Download
0
Embed Size (px)
Citation preview
7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts
1/20
7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts
2/20
7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts
3/20
7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts
4/20
4
7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts
5/20
5
Program
Time
October 22
Monday
October 23
Tuesday
October 24
Wednesday
October 25
Thursday
October 26
Friday
8:30 Registration
9:00 Opening
9:30Presentation
FAPESPTalk C. Ambroise Talk T. Dunning Talk J. E. Ferreira Talk Y. Xu
10:30 Break Break Break Break Break
11:00
Talk M. Mattoso
Talk C. Ambroise Talk T. Dunning Talk S. Sansone Talk Y. Xu
12:00 Talk M. Mattoso Talk T. Dunning Talk S. Sansone
13:00 Lunch Lunch Lunch Lunch Lunch
14:30 Talk C. B. Medeiros Talk M. MattosoTalk B.S.
Manjunath
Talk C. B.
Medeiros
15:30 Talk C. AmbroiseTalk B.S.
ManjunathPosters Students
Talk Graduate
Progs
16:30 Break Break Break Break
17:00 Talk C. Ambroise Talk B.S.Manjunath
Posters Students - -
7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts
6/20
6
B. S. Manjunath, Centre for Bio-image Informatics, University of
California (UCSB), USA
Introduction to Bio-Image Informatics. Introduction to the topic;
fundamental issues in image and video segmentation and tracking, examples
drawn from recent research. (Lecture time: 2 hours)
Introduction to Bisque Cyber Infrastructure for Bio-image Informatics. A
high level introduction to the open source Bisque image database platform for
managing, processing, indexing and searching bio-images. (Lecture time: 1
hour)
7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts
7/20
7
Christophe Ambroise, Laboratoire Statistique et Gnome, Centre
National de la Researche Scientifique (CNRS), France
Statistical Models for Biological Network Inference. Gaussian Graphical
Models provide a convenient framework for representing dependencies
between variables. In this framework, a set of variables is represented by an
undirected graph, where vertices correspond to variables, and an edge
connects two vertices if the corresponding pair of variables are dependent,
conditional on the remaining ones. Recently, this tool has received a high
interest for the discovery of biological networks by l1-penalization of the model
likelihood. In this lecture, we introduce various ways of inferring sparse co-
expression networks from either steady-state or time-course transcriptomic
data. We will focus on inference from samples collected in different
experimental conditions and therefore not identically distributed. (Lecture
time: 2 x 2 hours)
7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts
8/20
8
Cludia Bauzer Medeiros, Institute of Computing, University of
Campinas (UNICAMP), SP, Brazil
The Era of eScience: building the ark during the data deluge. Scientists
from all domains (from the mathematical to the social sciences) are collecting
enormous amounts of data. These data are captured from a variety of devices
(from those aboard satellites to microsensors in embedded systems), but also
provided by experiments, or even social networks. This has originated the so-
called "data deluge", sometimes referred to as "data tsunami", in recognition
that a large amount of these data will never be seen or directly managed by
humans. eScience has emerged as a branch of science characterized by joint
research between computer scientists and scientists from other domains to
leverage and accelerate research in those domains, helping scientists to
analyze, filter, manipulate, visualize and interpret their data, while at the same
time supporting cooperative work. This talk is geared towards discussing a few
major trends in eScience research, from a data-centric perspective, with
examples from several scientific domains. (Lecture time: 1 hour)
Coping with Digital Preservation: preserving the present to help the
future. We daily generate an enormous amount of data - for instance, during
bank transactions, phone calls, credit card operations and others. Moreover,
there are countless kinds of data linked to us -X-ray images, security videos in
stores and banks, radar-triggered photos in streets, and so on. All this
information is stored, frequently during several years, and maintained by third
parties, given its economic and/or social value. What are we doing, however,
7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts
9/20
9
with other very valuable kinds of data sets - the data generated by our
research? Our work involves complex models and computational simulations
whose intermediate and final results need to be stored. We may archive the
most relevant files, but there are many more data sets that are lost, sometimesfor lack of adequate procedures, or time, or even appropriate hardware to
record the data. This phenomenon is repeated in any context that involves
experimental activities, e.g., in biology, chemistry, physics, sociology,
anthropology, and so on. Even when all data and models involved in an
experiment are recorded, there are other challenges to meet. For instance, how
to ensure that we will be able to retrieve the desired information, in the future?
And how to share and disseminate the results of our work? This and otherissues are at the origin of digital preservation concerns. They are geared
towards investigating new methods, models, algorithms and mechanisms to
support data organization, archival and retrieval, for long term accessibility,
while at the same time considering the issues of quality, reliability and
durability. Preservation research can also be applied to corporate or business
data, but the problems involved (and their solution) are not the same. This talk
will discuss some of the challenges faced by the research in the preservation ofexperimental research data. (Lecture time: 1 hour)
7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts
10/20
10
Joo Eduardo Ferreira, Computer Science Department, Institute of
Mathematics and Statistics (IME), University of So Paulo, BrazilTransaction Processing for e-Science Applications. The management of molecular
and clinical data in e-Science applications has introduced new requirements for
database storage and transaction processing systems. There are two famous phrases
that resume the e-Science scenario. The first phrase is Science is becoming data-
intensive and collaborative, and the second is Researchers from numerous disciplines
need to work together to attack complex problems; openly sharing data will pave the
way for researchers to communicate and collaborate more effectively. These phraseswere written by Ed Seidel, acting assistant director for NSF Mathematical and Physical
Sciences directorate. This e-Science scenario shows that we are in data deluge age
where transaction processing systems under collaborative research perspective is an
important computer science challenge. More concretely, in typical e-Science laboratory
routines, transaction processing is used in many tests that are performed concurrently
and supervised by researchers. New tests are defined frequently, so researchers have to
be guided to execute the right task at appropriate time. Incompatibilities among
previous processes and new data requirements make the integration and analysis of
available knowledge very difficult. This problem is compounded by the process of
scientific knowledge discovery, which requires frequent process updates, collaborative
interactions among researchers, and refinement of scientific hypotheses. This e-Science
scenario requires an appropriate transaction processing in order to avoid data manual
approaches that quickly become very expensive or commonly infeasible. In this talk,
we provide a historical perspective, main recent challenges and solutions of
transactional processing for e-Science applications. (Lecture time: 1 hour)
7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts
11/20
11
Marta L. Queirs Mattoso (jointly with Jonas Dias and Kary Ocana),
Alberto Luiz Coimbra Institute for Graduate Studies and ResearchEngineering (COPPE), Federal University of Rio de Janeiro (UFRJ), Brazil
Exploring Provenance Data in High Performance Scientific Computing. Large-scale
scientific computations are often organized as a composition of many computational
tasks linked through data flow. After the completion of a computational scientific
experiment, a scientist has to analyze its outcome, for instance, by checking inputs and
outputs along computational tasks that are part of the experiment. This analysis can be
automated using provenance management systems that describe, for instance, the
production and consumption relationships between data artifacts, such as files, and
the computational tasks that compose the scientific application. Due to its exploratory
nature, large-scale experiments often present iterations that evaluate a large space of
parameter combinations. In this case, scientists need to analyze partial results during
execution and dynamically interfere on the next steps of the simulation. Features, such
as user steering on workflows to track, evaluate and adapt the execution need to be
designed to support iterative methods. In this course we define basic concepts of
scientific workflows and provenance data. We will show examples of scientific
workflows in the bioinformatics domain. We briefly describe how provenance of many-
task scientific computations are specified and coordinated by current workflow
systems on large clusters and clouds. We discuss challenges in gathering, storing and
querying provenance in high performance computing environments. We also show
how provenance can enable runtime and useful queries to correlate computational
resource usage, scientific parameters, and data set derivation. (Lecture time: 2 x 2
hours)
7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts
12/20
12
Susanna-Assunta Sansone, PhD. Principal Investigator, Team Leader
University of Oxford, Oxford e-Research Center, Oxford, UKThe Buzz Around Reproducible Bioscience Data: the policies, the communities
and the standards. Increased availability of the bioscience data generated is
fuelling increased consumption, and a cascade of derived datasets that
accelerate the cycle of discovery. But the successful integration of
heterogeneous data from multiple providers and scientific domains is already a
major challenge within academia and industry. Even when datasets are
publicly available, published results are often not reusable due to incomplete
description of the experimental details. In the last decade, several data
preservation, management, sharing policies, and plans have emerged in
response to increased funding for high-throughput approaches in genomics
and functional genomics bioscience [1]. A growing number of community-
based initiatives have developed minimum reporting guidelines, terminologies
and formats (referred to generally as community standards) [2] to structure and
curate datasets, enabling data annotation to varying degrees; other efforts
work to maximize the interoperability among these standards [e.g. 3, 4].
Researchers and bioinformaticians in both academic and commercial
bioscience, along with funding agencies and publishers, embrace the concept
that standards are pivotal to enriching the annotation of the entities of interest
(e.g., genes, metabolites) and the experimental steps (e.g., provenance of study
materials, technology and measurement types), to ensure that shared
investigations are comprehensible and (in principle) reproducible. But despite
7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts
13/20
13
all these efforts, in practice data sharing is challenging [5]. Vast swathes of
bioscience data still remain locked in esoteric formats, are described using ad
hocor proprietary terminology [e.g. 6], or lack sufficient contextual information;
many tools do not implement standards even where these exists; a currentwealth of domain-specific reporting standards, or their incompleteness and
absence in other areas are other major challenges. My presentation will provide
a snapshot of the current situation. I will highlight a number of stories, the
social engineering side and also key challenges, enriched by my experience
over the last decade by working with a variety of stakeholders, including
bioscience researchers, bioinformaticians, developers in public and private
sectors, standards developing communities, as well as funders and publishers.(Lecture time: 1 hour)
References
1. Field D*, Sansone SA*, Collis A, Booth T, Dukes P, Gregurick SK, Kennedy K,
Kolar P, Kolker E, Maxon M, Millard S, Mugabushaka AM, Perrin N, Remacle JE,
Remington K, Rocca-Serra P, Taylor CF, Thorley M, Tiwari B, Wilbanks J:
Megascience. 'Omics data sharing. Science 326(5950):234-236 (2009)
2. List of standards at BioSharing: www.biosharing.org3. Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ,
Eilbeck K, Ireland A, Mungall CJ; OBI Consortium, Leontis N, Rocca-Serra P,
Ruttenberg A, Sansone SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S: The
OBO Foundry: coordinated evolution of ontologies to support biomedical data
integration. Nat Biotechnol 25(11):1251-1255 (2007)
4. Taylor CF,* Field D*, Sansone SA*, Aerts J, Apweiler R, Ashburner M, Ball CA,
Binz PA, Bogue M, Booth T, Brazma A, Brinkman RR, Michael Clark A, Deutsch
EW, Fiehn O, Fostel J, Ghazal P, Gibson F, Gray T, Grimes G, Hancock JM, Hardy
NW, Hermjakob H, Julian RK Jr, Kane M, Kettner C, Kinsinger C, Kolker E, Kuiper
M, Le Novre N, et al.: Promoting coherent minimum reporting guidelines for
biological and biomedical investigations: the MIBBI project. Nat Biotechnol
26(8):889-896 (2008)
5. Sansone SA and Rocca-Serra P: On the evolving portfolio of community-
standards and data sharing policies: turning challenges into new opportunities.
GigaScience 1:10 (2012)
7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts
14/20
14
6. Harland L, Larminie C, Sansone SA, Popa S, Marshall MS, Braxenthaler M,
Cantor M, Filsell W, Forster MJ, Huang E, Matern A, Musen M, Saric J, Slater T,
Wilson J, Lynch N, Wise J, Dix I: Empowering industrial research with shared
biomedical vocabularies. Drug Discov Today 16(21-22):940-947 (2011)
The Reality From the Buzz: how to deliver reproducible bioscience data. In
this unsettled status quo - presented in my first talk - how can we enable
bioscience researchers to make use of existing community standards and
maximize data sharing and the subsequent reuse of richly annotated
experimental information?
A successful example is provided by the Investigation/Study/Assay (ISA) [1]
open source, metadata-tracking framework developed and supported by the
growing ISA Commons community [2]. The ISA framework includes both a
general-purpose file format and a software suite to tackle the harmonization of
the structure of bioscience experimental metadata (e.g., provenance of study
materials, technology and measurement types, sample-to-data relationships)
by enabling compliance with the community standards. This exampleillustrates how the synergy between research and service groups in academia,
(e.g. in Harvard [3] and at The European Bioinfomatics Institute [4]) and in
industry (e.g. at The Novartis Institutes for BioMedical Research and at Janssen
Pharmaceuticals, a company of Johnson & Johnson) across a variety of life
science domains, is pivotal to build an network of data collection, curation, and
sharing solutions that progressively enable the invisible use of standards. I will
present the rationale behind the collaborative development and the evolution
of this exemplar ecosystem of data curation and sharing solutions - built on the
common ISA framework. I will also provide high-level examples on how this is
used to collect, curate and manage heterogeneous experimental metadata in
an increasingly diverse set of domains including environmental health,
environmental genomics, metabolomics, (meta)genomics, proteomics, stem
cell discovery, systems biology, transcriptomics, toxicogenomics, etc. I will also
discuss the experiences learned by my team, our collaborators and the growing
user community with usability of the community standards and provide an
7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts
15/20
15
update on the next steps to develop user-friendly visualization functionalities
and use semantic web approaches to make existing knowledge available for
linking, querying, and reasoning. (Lecture time: 1 hour)
References
1. Rocca-Serra P, Brandizi M, Maguire E, Sklyar N, Taylor C, Begley K, Field D,
Harris S, Hide W, Hofmann O, Neumann S, Sterk P, Tong W, Sansone SA: ISA
software suite: supporting standards-compliant experimental annotation and
enabling curation at the community level. Bioinformatics. 15; 26(18):2354-6(2010); isa-tools.org
2. Sansone SA*, Rocca-Serra P*, Field D, Maguire E, Taylor C, Hofmann O, Fang
H, Neumann S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L,
Burns G, Chapman B, Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de
Matos P, Dix I, Edmunds S, Evelo CT, Forster MJ, Gaudet P, Gilbert J, Goble C,
Griffin JL, Jacob D et al.: Toward interoperable bioscience data. Nat Genet 27;
44(2):121-126 (2012); isacommons.org
3. Ho Sui SJ, Begley K, Reilly D, Chapman B, McGovern R, Rocca-Sera P, MaguireE, Altschuler GM, Hansen TA, Sompallae R, Krivtsov A, Shivdasani RA, Armstrong
SA, Culhane AC, Correll M, Sansone SA, Hofmann O, Hide W: The Stem Cell
DiscoveryEngine: an integrated repository and analysis system for cancer stem cell
comparisons. Nucleic Acids Res 40 (Database issue):D984-91 (2012). (2012);
discovery.hsci.harvard.edu
4. Haug K; Salek R; Conesa P, Hasting J, de Matos P, Rijnbeek M, Mahendraker T,
Williams M, Neumann S, Rocca-Serra P, Maguire E, Gonzalez Beltran A, Sansone
SA, Griffin J, Steinbeck C: MetaboLights An open-access general-purpose
repository for Metabolomics studies and associated meta-data. Nucleic Acids Res
(in review);www.ebi.ac.uk/metabolights
7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts
16/20
16
Thom H. Dunning, Jr., National Center for SupercomputingApplications, Institute for Advanced Computing Applications and
Technologies, and Department of Chemistry, University of Illinois at
Urbana-Champaign
Scientific Computing in Science and Engineering. Computational modeling and
simulation is among the most significant developments in the practice of scientific
inquiry in the 20th Century. Modeling and simulation are now contributors to
essentially all scientific and engineering research programs and are finding increasing
use in a broad range of industrial applications. The use of computing technology is
now spreading to the observational sciences, which are being revolutionized by the
advent of powerful new sensors that can detect and measure a wide range of physical,
chemical and biological phenomena. Massive digital detectors in a new generation of
telescopes have turned astronomy into a digital science. Sensor arrays for
characterizing ecologies and new sequencing instruments for genomics research are
revolutionizing the biological sciences. This lecture will discuss the elements ofcomputational modeling and simulation as well as the emerging area of data-driven
science and discuss the impact of these new approaches in a few fields, while also
drawing on the lecturers experiences in chemistry. (Lecture time: 1 hour)
Technology Trends and Future of High Performance Computing. Computing
technologies are undergoing a dramatic transition. Because of physical limitations, the
computational power of a single microprocessor core, the basis of all computing
systems from laptops to supercomputers, has stopped increasing. Dual-core systemswere introduced in 2005, quad-core chips in 2007, and eight-core chips are now
7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts
17/20
17
available from many vendors. This trend will continue into the future, with the number
of cores on a chip continuing to increase. In fact, the use of innovative computing
technologies based on many-core chips, e.g., NVIDIA GPUs, is now being seriously
explored in many areas of scientific computing. This technology shift presents a
challenge for computational science and engineeringthe only significant
performance increases in the future will be through the increased exploitation of
parallelism. Although these technologies promise to bring petascale computers into
researchers institutions, and even their laboratories, computers built on these
technologies have significant implications for the design of the next generation of
science and engineering applications. This lecture will provide an overview of the
directions in computing technologies as well as describe the challenges associated
with exploiting these new technologies in computational science and engineering.(Lecture time: 1 hour)
Blue Waters: overview of a sustained petascale computing system. A new
generation of supercomputerspetascale computersis providing scientists and
engineers with the ability to simulate a broad range of natural and engineered systems
with unprecedented fidelity. Just as important in this increasingly data-rich world,
these new computers allow researchers to manage and analyze unprecedented
quantities of data, seeking connections, patterns and knowledge. The impact of thisnew computing capability will be profound, affecting science, engineering andsociety.
The National Center for Supercomputing Applications at the University of Illinois at
Urbana-Champaign is deploying a computing system that can sustain one quadrillion
calculations per second on a broad range of science and engineering applications as
well as manage and analyze petabytes of data. This computer, Blue Waters, has been
configured to enable it to solve the most compute-, memory- and data-intensive
problems in science and engineering. It will have tens of thousands of chips (CPUs &
GPUs), petabytes of memory, tens of petabytes of disk storage, and hundreds of
petabytes of archival storage. The presentation will describe Blue Waters and illustrate
the role that Blue Waters will play in a few illustrative areas of research. (Lecture time: 1
hour)
7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts
18/20
18
Yan Xu, Microsoft Research, USA
Open Data for Open Science. Part 1. Tools for data scientists. An introduction
to some of the most cutting-edge Microsoft technologies that facilitate
scientists to discover, access, consume, and share scientific data. Part 2. Demos
of data tools from Microsoft. Demos of how to create solutions using the tools
presented in Part-1, with real-world scenarios and data. Attendees may bring
their Windows PC to follow the demos to create data visualization samples with
their own environmental research data in WorldWide Telescope
(http://www.worldwidetelescope.org) and share the results on Layerscape
(http://www.layerscape.org).
7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts
19/20
7/31/2019 SPAS e-SciBioenergy: Program and Presentation Abstracts
20/20