Yuval Hart, Weizmann 2010© 1 Introduction to Matlab & Data Analysis Final Project: That’s...

Preview:

Citation preview

Yuval Hart, Weizmann 2010© 1

Introduction to Matlab & Data Analysis

Final Project: That’s all, Folks!

2

Outline

Parsing files Efficient programming - vectorization Correlation coefficients Passing extra parameters Image plotting Curve Fitting & Optimization Figure handling

3

“Rotation in 60 minutes”

4

Rotation in 60 minutes:

During the past month you’ve measured promoter activity of 20 genes.

Your PI wants you to present your results at the next group meeting.

5

To Do List

Get the sequences of the genes from a GenBank+Fasta files and calculate GC content

Display all correlation coefficients of the measured PA and relation to GC content

Find for the highest 4 genes, how correlation decays with distance from initial gene in the pathway

6

To Do List

Get the sequences of the genes from a GenBank+Fasta files and calculate GC content

Display all correlation coefficients of the measured PA and relation to GC content

Find for the highest 4 genes, how correlation decays with distance from initial gene in the pathway

7

GenBank file format

8

Step 3: Attach every gene name with its DNA sequence

Build the structure with all needed fields:

% Build the structure Genes with the desired genes and their data: % name, startPosition, endPosition, sequence, complement (1/0), GCcontent% This is also the way to preallocate for structures:% Genes(1,sum(indGeneList))=struct( 'name', [], 'complement', [], 'sequence',[],...% 'StartPosition',[],'EndPosition',[],'GCcontent',1);

Genes=struct('name',geneNames(indGeneList),…'complement', num2cell(indComplement(indGeneList)'),... 'StartPosition',CDSpositionStartEndCelled(indGeneList,1)',…'EndPosition',CDSpositionStartEndCelled(indGeneList,2)',...'sequence',seq,'GCcontent',GCcontent);a=Genes;Note: Structures are assigned one by one only with

cell arrays

9

To Do List

Get the sequences of the genes from a GenBank+Fasta files and calculate GC content

Display all correlation coefficients of the measured PA and relation to GC content

Find for the highest 4 genes, how correlation decays with distance from initial gene in the pathway

10

Calculate and plot Correlation Matrix

Load the list of genes and measurements% Input:% measurement mat file contains:% geneList - a cell array of the genes Names% measurements - a matrix of 20 genes measurements at 1001 time points% GenesGCcontent - a vector of the genes GCcontent values

%measurements has a row for each gene containing its measurements through%1001 time points and the geneList namesload measurements

11

Plot GC content and mean PA dependence

Plot fit results upon the previous graph:

Note: Smoothed data can lower the effect of outliers

12

Calculate and plot Correlation Matrix

Calculate and display the corr. matrix

13

To Do List

Get the sequences of the genes from a GenBank+Fasta files and calculate GC content

Display all correlation coefficients of the measured PA and relation to GC content

Find for the highest 4 genes, how correlation decays with distance from initial gene in the pathway

14

Step 2: Fit correlations to the desired function

Using anonymous function to add more Parameters and fitting using lsqcurvefit:

function y_hat=FittingCurveExpGuess(c,x,init)% This assumes an exponential decreasing curvey_hat=init+c(1)*exp(c(2).*x);

initDis=-0.1;c0=[.7 0.1]; %assigning the initial values for the fit searchparamfunc = @(c,x)FittingCurveExpGuess(c,x,initDis); %def. of the anonymous functionExpParam=lsqcurvefit(paramfunc,c0,XdataPoints,correl,[0 -1],[1 1],options);

Function name

Initial guess

X data

Y data

Lower bound

upper bound

15

Step 3: Plot the correlation data and fit

16

Best of Luck in the Group Meeting !

17

Best of Luck in the Group Meeting !

18

This is the end, my friend, the end

"Louis, I think this is the beginning of a beautiful friendship."