TF Infer A Tool for Probabilistic Inference of Transcription Factor Activities H.M. Shahzad Asif...

Preview:

Citation preview

TFInfer

A Tool for Probabilistic Inference of Transcription Factor Activities

H.M. Shahzad AsifInstitute of Adaptive and Neural Computation

School of InformaticsUniversity of Edinburgh

Scope

Introduction Software Features Inputs and Outputs Software Interfaces Software Requirements and Availability Acknowledgements References

Introduction A novel standalone software for inference of

transcription factor activities (TFAs). Following probabilistic state space model provides the

basis:

“y(n)” is expression level of gene “n” at time instant “t” and the only observed variable.

“Xnm” contains binary value corresponding to link between gene “n” and transcription factor “m”.

“bnm” encodes the regulatory strength between gene “n” and transcription factor “m”.

Identifiability

As regulatory strengths and TFAs appear in the likelihood through a product, there is a sign ambiguity (de-repressing looks the same as activating)

This ambiguity can be resolved by providing additional information, e.g.: TF X is an activator for gene Y TF X is active/ inactive in condition C.

Introduction Latent or Hidden variable cm(t) is used to estimate

mth TFA at time instant "t”. Efficient Variational Bayesian EM algorithm is used

to obtain the posteriors over model parameters. Model exploits the natural sparsity of the regulatory

network by using connectivity information. Feasible for genome-wide applications. Probabilistic approach helps to associate confidence

intervals with the results.

Software Features

Genome-wide Inference. Freeware. Open-source. Supported data types:

Times-series data Time-independent data Replicates

Genome connectivity included for: Yeast E. coli

Software Features

Computationally efficient. User friendly. No programming expertise required. Probabilistic Modelling for TFAs. Coded in C# using dnAnalytics and ZedGraph

libraries. Usable under Linux/Mac via Mono. User manual available.

Input and Output Files

Inputs Standard format is CSV (Comma separated file). Input files contain logged gene expression data. First

column for gene annotations and a (optional) header row.

Connectivity data is included with the software for Yeast and E.coli. For yeast, the connectivity file contains common names of

genes. For E.coli, the connectivity file contains b numbers. User can supply own connectivity file.

Using data selection interface, required transcription factors can be selected.

Input and Output Files (cntd.)

Output TFAs in two formats:

Graphical representation (error bars) for every transcription factor selected.

A CSV file for TFAs.

Graphs can be saved in different formats. CSV file can be exported containing TFAs. As the model is probabilistic, all results have

confidence intervals.

Software Interface

Three main interfaces: Data input and Initial Configuration:

Gene expression data. Genome connectivity. Time-series, time-independent, replicates.

Data Selection: Transcription factor selection.

Result: Graph for each transcription factor. A CSV file containing relative concentration of all

transcription factors selected.

TFInfer Main Interface

Using this option, data file(s)is supplied containing geneexpression data.For replicates, multiple files can be used. Maximum number of replicates is 5.

Description

TFInfer Main Interface

If data file(s) contains a header row, then this optionmust be selected before selecting data file.

Description

TFInfer Main Interface

Specify whether the data is- Time-series or- Time-independent

Description

TFInfer Main Interface

In case of replicates, this option must be selected. Ifselected, number ofreplicates are shown onthe right.

Description

TFInfer Main Interface

Connectivity file is supplied using this. Two connectivity files are included; for yeast and E.coli.

Description

TFInfer Main Interface

Reset the state of the software.

Description

TFInfer Main Interface

Load the data and Connectivity files.

Description

TFInfer Main Interface

Start the process.

Description

TFInfer Main Interface

Stop the process.

Description

TFInfer Main Interface

When calculations are complete, results can be Seen using this button.

Description

TFInfer Main Interface

For every data file, TFInfer shows the summary of the data. For connectivity file, this information is alsoshown followed by the a window containing a list oftranscription factors.

Description

User can select any number of transcription factors here.

Description

TFInfer Data Selection Interface

TFInfer Results Window

TFInfer Results Window

This option is for changing the sign of the signal.

Description

TFInfer Results Window

This option is for saving theresults in the form of a plot.

Description

TFInfer Results Window

This option is for saving theresults in the csv file.

Description

Software Requirements and Availability

Microsoft .Net framework version 2 or Mono is required. Download link is available on TFInfer page.

Software installer and other related material available on TFInfer home: http://homepages.inf.ed.ac.uk/s0976841/TFInfer/

Acknowledgements

Software is based on the model proposed in bioinformatics paper[1].

Thanks to Dr Matthew Rolfe for providing connectivity information and for useful discussions.

Thanks to Dr. Guido Sanguinetti for all the support. Thanks to UoS for DoR Devolved funding.

References[1]G. Sanguinetti, N. Lawrence, and M. Rattray. Probabilistic inference of

transcription factor concentrations and gene-specific regulatory activities. Bioinformatics, 22(22):2775, 2006.

[2]C. Harbison, D. Gordon, T. Lee, N. Rinaldi, K. Macisaac, T. Danford, N. Hannett, J. Tagne, D. Reynolds, J. Yoo,et al. Transcriptional regulatory code of a eukaryotic genome. Nature, 431:99–104, 2004.

[3]T. I. Lee, N. J. Rinaldi, F. Robert, D. T. Odom, Z. Bar-Joseph, G. K. Gerber, N. M. Hannett, C. T. Harbison,C. M. Thompson, I. Simon, J. Zeitlinger, E. G. Jennings, H. L. Murray, D. B. Gordon, B. Ren, J. J. Wyrick,J.-B. Tagne, T. L. Volkert, E. Fraenkel, D. K. Gifford, and R. A. Young. Transcriptional Regulatory Networks in Saccharomyces cerevisiae. Science, 298(5594):799–804, 2002.

[4]P. T. Spellman, G. Sherlock, M. Q. Zhang, V. R. Iyer, K. Anders, M. B. Eisen, P. O. Brown, D. Botstein, and B. Futcher. Comprehensive Identification of Cell Cycle-regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization. Mol. Biol. Cell, 9(12):3273–3297, 1998.

[5]http://www.zedgraph.org/[6]Matlab C Math library.[7]http://www.ecocyc.com/

Contact

Shahzad AsifShahzad.Asif@ed.ac.uk

Institute of Adaptive and Neural ComputationSchool of Informatics

University of Edinburgh