Talk on Galaxy at Genome Informatics 2008, I was a session chair so no oversight on this one. We were obsessed with authorization that year, and this talk is probably the most detailed ever on roles, groups, and dataset security in Galaxy. Another classic team slide.
Citation preview
Galaxy http://galaxy-project.org James Taylor, Emory
University
Galaxy?
Galaxy goals Making large-scale computational analysis more
accessible Facilitating transparent analysis Ensuring that analyses
are reproducible
What Galaxy provides An open-source framework for integrating
various computational tools and databases into a cohesive workspace
A web-based service we provide, integrating many popular tools and
resources for comparative genomics A completely self-contained
application for building your own Galaxy style sites
So, what about all this data?
Tool suites
What is a Galaxy Tool? The basic unit of analysis in Galaxy A
program, script, external web resource, whatever... Adapted to a
standard structured interface Parameters, data inputs, data
outputs
Short read sequence analysis Analyzing read quality and
filtering Genomic analysis Mapping against assembled genomes
Coverage, polymorphism, ... Metagenomic analysis Mapping against
sequence databases Taxonomy analysis, visualization, ...
Statistical Genetics Quality control and filtering Estimating
ancestry and correction Case control analysis ...
Data and analysis management
The Galaxy History
Beyond the history
Beyond the History I Workflows
Galaxy workflows Abstract description of an analysis procedure
Essentially: what tools to run, and the flow of data between
tools
Beyond the History II Data Libraries
Galaxy Data Libraries Mechanism for storing and organizing
shared datasets in a Galaxy instance An instance can have many
libraries, each containing datasets organized using folders as well
as tags Full type specific metadata like any other dataset in
Galaxy
Driving use cases Large shared datasets Genotype data
Sequencing reads Direct from the instrument! Data management for
distributed projects
What about protected data?
Galaxy dataset security Fine grained access controls for Galaxy
datasets Dierent actions on datasets require dierent permissions
Users and groups are granted these permissions Enforced throughout
Galaxy e.g. a History can still be shared, but access to individual
datasets in the history is controlled
Security customization Authentication mechanism can be
replaced, or can leverage a single sign-on mechanism (e.g. through
a proxying web server) Authorization provider can be customized or
replaced
Completely integrated with analysis Dataset restrictions
propagate through an analysis Analyses that combine datasets also
combine their restrictions
Up next... Libraries: sequencer integration versioning tagging
and annotation automatic workflow triggering Security configurable
adapters to dierent authorization providers (e.g. directory
services)
Acknowledgements Data and browser connections UCSC Biomart GMOD
Intermine Funding National Science Foundation Huck Institutes,
Pennsylvania Dept. of Health
The Galaxy Team Guru Ananda | Penn State Dan Blankenberg | Penn
State Wen-Yu Chung | Penn State Nate Coraor | Penn State Greg Von
Kuster | Penn State Sergei Kosakovsky | UCSD Ross Lazarus | Harvard
MS Anton Nekrutenko | Penn State
p.s. I have job openings for people who like to do cool stu:
[email protected]