Perform Gene Prediction using Taverna

Back to the Course Home Page

Overview

Course presentation

In this section we use Taverna software to build gene prediction workflows and execute them. All the workflows will be based on the use of geneid software.

The various Web services, we are going to use, have been implemented using BioMoby framework.
For a complete tutorial of the Moby plugin in Taverna, go to this Web page, http://biomoby.open-bio.org/CVS_CONTENT/moby-live/Java/docs/taverna/guide/index.html

Taverna User manual

The genomic DNA sequence

We are going to work with the same sequence HS307871, which is in FASTA format. You will need to copy and paste this file later on, when you are at the stage to execute the workflows.

The EMBL gene annotations

In the third exercise, we are also going to work with the EMBL annotations of this sequence, HS307871, which is in GFF format. You will need to copy and paste this sequence later on, when you are at the stage to execute the third workflows.

Exercise 1

In this first exercise, we are going to build a workflow that just runs geneid.
Check the exercise 1 protocol diagram.

  • Start Taverna software
      1. By clicking on the Taverna icon from your desktop, if you haven't done so yet.
  • Build the workflow
    • Define the input
      1. In the Advanced model explorer window, on the bottom left-hand side corner, right click on the 'Workflow inputs'
      2. In the context menu that appears, select 'Create New Input'
      3. Name the input, 'sequence'
    • Define the output
      1. In the Advanced model explorer window, right click on the 'Workflow outputs'
      2. In the context menu that appears, select 'Create New Output'
      3. Name the output, 'geneid_predictions'
    • Add the Moby input object to the workflow
      1. On the top left-hand side corner window, search for 'FASTA_NA' Object and hit return
      2. Scroll down the 'Available Processors' tree until you find 'FASTA_NA' highlighted. You will find this object, under the 'MOBY Objects' node. When you have found it, right click on it
      3. In the context menu that appears, select 'Add to model'
    • Add the processors
      1. On the top left-hand side corner window, search for 'runGeneIDGFF' service and hit return
      2. Scroll down the 'Available Processors' tree until you find 'runGeneIDGFF' highlighted. You will find this service, under the authorithy 'genome.imim.es'
      3. When you have found it, right click on it
      4. In the context menu that appears, select 'Add to model'
      5. Repeat 1. to 4. with the service called 'fromFastaToDNASequence'. This service is actually also under the same authority, 'genome.imim.es'
    • Connect the input, the output and all the processors together
      1. In the Advanced model explorer window, on the bottom left-hand side corner, right click on the input called 'sequence'
      2. In the context menu that appears, select 'String', then in the menu 'Choose an input' that appears, select 'value'
      3. In the Advanced model explorer window, on the bottom left-hand side corner, left click on the 'FASTA_NA' object to display its list of inputs and outputs
      4. Right click on its output, called 'mobyData'
      5. In the context menu that appears, select 'fromFASTAToDNASequence', then in the menu 'Choose an input' that appears, select 'FASTA_NA(sequence)'
      6. Left click on the 'fromFASTAToDNASequence' processor to display its list of inputs and outputs
      7. Right click on its output, called 'DNASequence(sequence)'
      8. In the context menu that appears, select 'runGeneIDGFF', then in the menu 'Choose an input' that appears, select 'DNASequence(sequence)'
    • Parse the output
      1. In the 'Advanced model explorer' window, right click on the 'runGeneIDGFF' processor
      2. In the context menu that appears, select 'Add BioMOBY Parser'
      3. In the new window, click on the little triangle to get the list of outputs
      4. Right click on the output called 'GFF('geneid_predictions')', and then left click on 'Add parser for GFF('geneid_predictions') to the workflow'
      5. Left click on the 'Parse_Moby_Data_GFF' processor to display its list of inputs and outputs
      6. Right click on its output, called 'geneid_predictions_'content''
      7. In the context menu that appears, select the Workflow output called 'geneid_predictions'
    • Check you have done it right, Exercise 1 Taverna workflow diagram
  • Run the workflow
      1. On the top left-hand side corner window, click on the 'File' item, then select 'Run workflow'
      2. Click on the input called 'sequence', then click on the 'New input' button and copy and paste the HS307871 input sequence, containing the sequence in FASTA format
      3. Execute the workflow, by clicking the button 'Run Workflow'. Workflow execution status, and then the results will show up

Exercise 2

In this exercise, we will reuse the workflow created previously in the exercise 1 to visualize the geneid predictions using gff2ps tool.
Check the exercise 2 protocol diagram.
  • Extend the workflow from Exercise 1
    • Either you carry on with the workflow you have just built, or load an already created workflow. To do so, on the left hand corner, click on the 'File' item, and then click on 'Open workflow location' and enter the following URL, /courses/IRB-INB07/TavernaPractical/geneid_exercise1.xml.
    • Define the extra output
      1. In the 'Advanced model explorer' window, right click on the 'Workflow outputs'
      2. In the context menu that appears, select 'Create New Output'
      3. Name the new output 'gff_map'.
        This output needs to be tagged as being an image. To do so:
      4. Click on the newly created output 'gff_map'
      5. In the 'Advanced model explorer' window, click on the item 'Metadata for gff_map'. Then click on the item 'MIME Types'.
      6. In the bottom of the 'MIME Types' window, enter a new type called 'image/*'
      7. Go back to the 'Workflow' item
    • Add the extra processors
      1. On the top left-hand side corner window, search for 'runGFF2JPEG' service and hit return
      2. Scroll down the 'Available Processors' tree until you find 'runGFF2JPEG' highlighted. This service is under the authority, 'genome.imim.es'. When you have found it, right click on it
      3. In the context menu that appears, select 'Add to model'
      4. Scroll up the 'Available Processors' tree until you go back to the root node, and repeat 1. to 3. with the local processor called 'Decode base64 to byte[]'. Just type 'base64' for searching it.
    • Connect all the extra input, output and processors together
      1. Left click on the 'runGeneIDGFF' processor to display its list of inputs and outputs.
      2. Right click on its output, called 'GFF(geneid_predictions)'
      3. In the context menu that appears, select 'runGFF2JPEG', then in the menu 'Choose an input' that appears next, select 'GFF(Collection - 'maps')'
    • Parse the extra output
      1. In the 'Advanced model explorer' window, right click on the 'runGFF2JPEG' processor
      2. In the context menu that appears, select 'Add BioMOBY Parser'
      3. In the new window, click on the little triangle to get the list of outputs
      4. Right click on the output called 'b64_encoded_jpeg('image')', and then left click on 'Add parser for b64_encoded_jpeg('image') to the workflow'
      5. Left click on the 'Parse_Moby_Data_b64_encoded_jpeg' processor to display its list of inputs and outputs
      6. Right click on its output, called 'image_'content''
      7. In the context menu that appears, select the 'Decode_base64_to_byte' processor, then in the menu 'Choose an input' that appears, select 'base64'
      8. Right click on the 'Decode_base64_to_byte' processor
      9. Right click on its output, called 'bytes'
      10. In the context menu that appears, select the workflow output called 'gff_map'
    • Check you have done it right, Exercise 2 Taverna workflow diagram
  • Run the workflow
      1. This workflow takes the same input than the one from the previous exercise, so you should be able to reproduce the same execution procedure.
Exercise 3

In this exercise, we will reuse the workflow created previously in the exercise 2 to visualize the geneid predictions besides the EMBL annotations using gff2ps tool.
Check the exercise 3 protocol diagram.
  • Extend the workflow from Exercise 2
    • Either you carry on with the workflow you have just built, or Load it from the following URL, /courses/IRB-INB07/TavernaPractical/geneid_exercise2.xml
    • Define the extra input
      1. In the Advanced model explorer window, on the bottom left-hand side corner, right click on the 'Workflow inputs'
      2. 3. In the context menu that appears, select 'Create New Input'
      3. Name the new input, 'embl_annotations'
    • Add the extra Moby input object to the workflow
      1. On the top left-hand side corner window, search for 'GFF' Object and hit return
      2. Scroll down the 'Available Processors' tree until you find 'GFF' highlighted. You will find this object, under the 'MOBY Objects' node. When you have found it, right click on it
      3. In the context menu that appears, select 'Add to model'
    • Add an extra processor
      1. In the 'Advanced model explorer' window, left click on the "Add Nested Workflow" button.
      2. In the menu that appears, left click 'Open Location' item.
      3. Enter the following URL, '/courses/IRB-INB07/TavernaPractical/combine_gff_maps.xml'
    • Add the extra connections
      Note that we will have to remove the link between runGeneIDGFF and runGFF2JPEG processors, as we now need to combine the two GFF objects into a collection, before feeding this collection to runGFF2JPEG.
      1. In the Advanced model explorer window, on the bottom left-hand side corner, right click on the other input called 'embl_annotations'.
      2. In the context menu that appears, select 'String1', then in the menu 'Choose an input' that appears, select 'value'.
      3. Left click on the 'GFF' object to display its list of inputs and outputs.
      4. Right click on its output, called 'mobyData'
      5. In the context menu that appears, select 'Nested_Workflow', then in the menu 'Choose an input' that appears, select 'MobyB'
      6. Left click on the service runGeneIDGFF to highlight its links in the 'Data links' subtree.
      7. Right click on the link between runGeneIDGFF output and runGFF2JPEG input and select 'Remove from model'
      8. Left click on the 'Nested_Workflow' processor to display its list of inputs and outputs.
      9. Right click on its output, called 'MobyOutput'
      10. In the context menu that appears, select 'runGFF2JPEG', then in the menu 'Choose an input' that appears, select 'GFF(Collection - 'maps')'
    • Check you have done it right, Exercise 3 workflow Taverna diagram
  • Run the worfkflow
    This workflow requires the two inputs:
    • the fasta sequence, that you load from the URL, HS307871.fa
    • the EMBL annotations, that you load from the file, HS307871.gff

A final version of the exercise 3 workflow is available.


Last modified: 11/22/2007 20:53:19Last modified: 11/22/2007 16:57:17Last modified: 11/22/2007 14:30:19