25
Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

Embed Size (px)

Citation preview

Page 1: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

Transport Identification Parser:Inferring Transport Reactions from

Protein Data for PGDBs

Thomas J Lee, Peter Karp, AIC BRG

Ian Paulsen consulting

Page 2: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

Running the Transport Identification Parser

1. Run Pathway Tools.2. Make the organism of interest the current organism.3. [Run operon predictor].4. Select Tools/Pathologic.5. From Pathologic, select Refine/Transport Identification

Parser.6. Wait, and observe progress.7. When complete, Probable Transporter Table window

appears.8. You may now review and modify the inferred transporters.

Page 3: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

Task Description

Infer transport reactions from protein data and construct them in BioCyc KBs for a variety of organisms, automatically where possible, with human assistance where necessary.

Page 4: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

Scope

• Run for all Tier 3 KBs (~350 KB)• To support both automated and user-controlled

operation:– Distinguish high- and low-confidence inferences

– Automated mode accepts all high-confidence inferences

– Track evidence where possible

– Provide accept/reject/edit options to user

Page 5: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

Output

Construct the following for each inferred transported substrate:– Transport-Reaction frame of correct subclass

• Assign compartments – use simple assumptions

– Enzymatic-Reaction frame linking protein to reaction

Construct Protein-Complexes as required

Page 6: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

Subproblems

1. Find candidate transporter proteins.2. Filter out candidates.3. Identify substrate(s).4. Assign an energy coupling to transporter.5. Identify compartment of each substrate.6. Group subunits of transporter complexes.7. Construct full compartmental reaction from

substrate and coupling.8. Construct enzymatic reaction linking each reaction

with protein.

Page 7: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

1. Find candidate transporter proteins

• Input: all protein frames of organism• Output: internal data structure (PARTRANS)• Exclude proteins with long annotations (default: 12

words)• Tokenize the annotation• Annotation must contain an indicator. Exs: "transport”,

“export”, “permease”, “channel”

Page 8: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

2. Filter candidates

• Exclude if annotation matches a list of regular expressions of counterindicator phrases and patterns

– Ex: “transport associated domain”

• Exclude if annotation contains counterindicator word– Exs: “regulator”, “nuclear-export”

Page 9: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

3. Identify substrate(s)

Search annotation for names of MetaCyc compounds. Details:

Multiple substrates indicate multiple reactions, symport/antiport pair, or both. Exs:

"cytosine/purines/uracil/thiamine/allantoin permease family protein“

"magnesium and cobalt transport protein cora, putative““sodium:sulfate symporter transmembrane domain protein"“probable agcs sodium/alanine/glycine symporter"

Exclude non-substrates that look like compounds via an exception list. Exs: "as" "be" "c" "i“

Page 10: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

3. Identify substrate(s) (cont.)

Name canonicalization. Ex: strip plurals.

Affixed substrates. Exs: "-transporting" “-specific“

Lookup special ionic forms. Exs: "cuprous" "ferric“ "hydrogen“

Two-word substrates, substrate classes. Ex: “amino acid"

Page 11: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

4. Assign an energy coupling.

1. Search annotation for prioritized list of indicators. Exs:

("atp-binding" . ATP) ("mfs" . SECONDARY) ("pts" . PTS) ("phosphotransferase" . PTS) ("carrier" . SECONDARY) ("channel" . CHANNEL)

2. Some substrates imply a coupling. Ex: protoheme => ATP

Absence of indicator => UNKNOWNDeferred some more sophisticated techniques:

• BLAST vs. E.coli• HMM family identification

Page 12: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

5. Identify compartment of each substrate.

Use keywords to determine compartment of primary substrate (Exs: “export”, “antiporter”)

Otherwise assume primary substrate is transported into cell (periplasm => cytoplasm)

Deferred complex compartment analysis:• Assume E.coli-like cellular structure

Page 13: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

6. Group subunits of transporter complexes.

Many transporters are systems of several proteins. These are grouped into complexes

Grouping criteria; all must be met:– Predicted coupling is ATP or PEP

– Predicted substrates are identical

– Genes of proteins have a common operon (NOTE requirement on operon availability)

Resulting complex is added to KB under Protein-Complexes.

Page 14: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

7. Construct full compartmental reaction from substrate and coupling.

Determine set of transported substrates for this transporter:

• For SECONDARY coupling:– Identify auxiliary substrate providing ion gradient (H+, Na+)

– Remove from transported substrate list

– Place on side of reaction indicated by symport/antiport clues

• For other couplings:– Determined previously in substrate analysis

Page 15: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

7. Construct full compartmental reaction from substrate and coupling (cont).

For each transported substrate of this transporter, either import reaction (from E.coli) or to create new one.

1. Search import KB for reaction with matching substrates (find-rxn-by-substrates)

– Transported substrate added with indicated compartment– Auxiliary substrates determined by coupling. Ex: – CHANNEL typically have none– ATP have ATP/H2O ADP/phosphate

2. If one reaction is found, import: (import-reactions trxns src-kb dst-kb …)

3. If multiple reactions found, retain all.4. Else if reaction is not present in KB, create new rxn

Page 16: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

7. Construct full compartmental reaction from substrate and coupling (cont).

Create new reaction:• Create reaction frame, subclass determined by coupling:

– (create-instance-w-generated-id rxn-class)

• Add transported and auxiliary substrates to appropriate sides of reaction

Page 17: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

8. Construct enzymatic reaction linking each reaction with protein.

For each created reaction:• (add-reactions-to-protein …)• Added evidence code, history string arguments• Subordinates new

[(import-reactions) handles import of enzymatic-reactions]

Page 18: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

Running the Transport Identification Parser

1. Run Pathway Tools.2. Make the organism of interest the current organism.3. [Run operon predictor].4. Select Tools/Pathologic.5. From Pathologic, select Refine/Transport Identification

Parser.6. Wait, and observe progress.7. When complete, Probable Transporter Table window

appears.8. You may now review and modify the inferred transporters.

Page 19: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

GUI Overview

1. Window is titled Probable Transporter Table2. Table of inferred transporters is organized into columns:

– Status– Gene – Substrate– Coupling– Reaction / Function

3. Each row contains a transport reaction description:– Multiple reactions per transport protein are possible– These reactions stay contiguous in display– Coupling, Function common to all reactions of a protein

4. Aggregate pane shows counts by status.5. Mousing over a reaction shows details in bottom pane.

Page 20: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

Reviewing and Editing

• Left-click on a row – Dialog box appears

• May edit:– Function (name)– Energy coupling

• May invoke Reaction Editor on reaction• May retract reaction• May update status

Page 21: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

Notional GUI Example

Status Gene Substrate Coupling Reaction /

AnnotationUn-reviewed

T0059 Ca2+ SECONDARY Ca+2[c] + H+[p] =

Ca+2[p] + H+[c]

calcium/proton antiporter

Rejected T3669 phosphate ATP H2O + ATP + phosphate[p] = ADP + 2 phosphate[c]

phosphate transport atp-binding protein

Accepted T0080 Na+ CHANNEL Na+[p] = Na+[c]

sodium channel

Page 22: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

Transporter Status

• Accepted: – Incorporate transporter into PGDB upon save

• Rejected: – Discard transporter upon save

• Deferred: – Indicate further analysis is required– Confirm upon save or exit

• Unreviewed: – Initial value of status– Confirm upon save or exit

“unresolved” = Deferred + Unreviewed

Page 23: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

Filtering and Sorting

• Filtering excluded transporters from display: – Filter low- or high-confidence transporters– Filter by status– Filter by number of reactions per substrate– Choices are non-sticky

• Sort transporters by: – Gene– Energy Coupling– Substrate number/name– Status (e.g., Accepted, Rejected)

Page 24: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

Saving Your Work

• The TIP has made in-memory modifications to the KB; nothing has been saved.

• Pulldown from KB menu to save (or revert)• Pulldown from Exit menu to exit the TIP

without saving• Both KB and Exit have options to:

– Accept all unresolved (leaves unresolved rxns in memory)

– Reject all unresolved (retracts unresolved rxns)

Page 25: Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

Multisession Workflow

Lack of persistence is a problem, so…

1. Accept all desired changes

2. KB/Save KB & Exit, rejecting all unresolved predictions

– [prompted to confirm discarding of predictions]

3. When work resumes, re-run TIP– Will not re-predict Accepteds

– Will re-predict Rejecteds