Genome BioInformatics Research Lab

  IMIM * UPF * CRG * GRIB HOME SOFTWARE * gff2aplot * HowTo * First Contact
 
Introduction to gff2aplot
 
   

Summary

The main goal of this tutorial is to introduce you on the basics of using gff2aplot. We will see how to run the program on a unix shell and few command-line switches. After that, we will introduce the customization files and the basics of customizing the program's output. Then we will look at examples of the GFF input records. Finally, we will build up a figure following three easy steps.

NOTE.- For the sake of clarity, we are going to use long names for the comand-line switches. See the command-line help if you prefer short names for those cases in which a short name is available.
Bitmaps for the examples were generated as PNGs (Portable Network Graphics). If your browser is not ready for such format yet, you can visualize the PDF or PS versions by clicking on the links below each snapshot. Links to customization files, log files, GFF input files and output PostScript figures, are also available on each command-line shown.

Contents


Command-Line Basics


From now on, we will represent bash shell command-line prompt as  $> . The most basic command-line switches are those to retrieve info regarding program execution parameters and current version.

 
  $>
 
 gff2aplot.pl --help 
  
  $>
 gff2aplot.pl --version 
  

You can take a look to the output produced by those two command-line switches from these links: --help output and --verbose output.

We must think of gff2aplot as a Unix filter, as it can process input GFF records from standard input and/or from GFF files. In the first case, data can be read through a command-line Unix pipeline ( command | gff2aplot.pl ) or from a file redirection ( gff2aplot.pl < input_file.gff ). Some examples of this command-line flexibility are shown below.

 
  $>
 
 gff2aplot.pl --quiet [ ...more options... -- ] input_file.gff > output_file.ps 2> report_file.err 
  

Previous program execution took GFF records from a file (input_file.gff). --quiet switch disables any message to standard error except those fatal errors that abort program normal progress for instance.

 
  $>
 
 gff2aplot.pl --verbose [ ...more options... -- ] < input_file.gff > output_file.ps 2> report_file.log+err 
  

On the last command-line, we read GFF records from standard input provided via filehandle redirection. --verbose shows the program execution progress report via standard error.

 
  $>
 
 cat input_file.gff | gff2aplot.pl [ ...more options... -- ] - > output_file.ps 2> report_file.log+err 
  

Shown above is input parsing of GFF records from a unix pipeline. As no verbose switch was provided only warnings are shown through standard error.

 
  $>
 
 gff2aplot.pl --logs-filename report_file.log [ ...more options... -- ] < input_file.gff > output_file.ps 2> report_file.err 
  

--logs-filename enables verbose mode too, but redirecting the progres report to the given report file. Warnings are sent both to the report file and to standard error.

Here you can find a more complex example of command-line:

 
  $>
 
gff2aplot.pl                                      \
    --custom-filename file_1.rc                   \
    --custom-filename file_2.rc                   \
    --title 'CMD-LINE USAGE'                      \
    --subtitle                                    \
      'This is an example on command-line usage.' \
    --x-sequence-zoom  100..400                   \
    --y-sequence-zoom  600..900                   \
    --zoom-area                                   \
    --feature-var '*::alignment_color=darkblue'   \
    -- annotation.gff alignment.gff               \
     > output_file.ps 
  

If verbose mode is enabled, when the previous command-line options and file names are being validated, gff2aplot will report something resembling the following output:

   
  ··· 
################################################################################
###                             SETTING DEFAULTS                             ###
################################################################################
################################################################################
###                      CHECKING COMMAND-LINE OPTIONS                       ###
################################################################################
###
### Validating INPUT FILENAMES
###
###---> "annotation.gff" exists, including as Input File.
###---> "alignment.gff" exists, including as Input File.
###
### Validating CUSTOM FILENAMES
###
###---> ".gff2aplotrc" NOT FOUND, skipping!!!
###---> "file_1.rc" exists, including as Custom File.
###---> "file_2.rc" exists, including as Custom File.
###
### Checking COMMAND-LINE Options
###
..... [000005]
###
### Checking Custom Variables SET by COMMAND-LINE
###
F [000001]
################################################################################
###                       COMMAND-LINE OPTIONS CHECKED                       ###
###           0 wallclock secs ( 0.06 usr +  0.00 sys =  0.06 CPU)           ###
################################################################################
  ···  
  

Imagine the following scenario, you are waiting for gff2aplot to finish, but you are not providing input records via the standard input chanel. You will wait forever that gff2aplot will finish. Take that into account when passing data through standard input. If you stop porgram execution (i.e. pressing CTRL-C) you will be warned with a message like the following:

   
<<<--------->>>
<<< WARNING >>> gff2aplot has been stopped by user !!!
<<<--------->>>
<<< WARNING >>> ---------- Exiting NOW !!! ----------
<<<--------->>> 
  

Customization Files


The following table summarizes few aspects of a gff2aplot customization file:

   
# 
# Optional Header
#                           (i.e. program version, creation date, user id,
#                                         version control tags, and so on)
#  
#                                   (Comments and empty lines are skipped) 

# L #                                 (This is Block Separator for Layout) 

variable_name1=value  # blah,blah,blah… 
#              (Comments are also allowed after the value definition
#               Note that there must be a single blank space at least,
#               shown here as '  ' before #. Otherwise, the comment
#               will be considered part of the variable value) 

  ··· 
# blah, blah, blah…    (Extra comment-lines can be added where you need) 

  ··· 
variable_namen=value 

# F # blah, blah, blah… 
#     (You can place extra comments also on section headers, after second #) 

GFF-feature_key1::feature_variable_name=value 
  ··· 
GFF-feature_keyn::feature_variable_name=value 

# G # 
GFF-group_key1::group_variable_name=value 
  ··· 
GFF-group_keyn::group_variable_name=value 

# S # 
GFF-source_key1::source_variable_name=value 
  ··· 
GFF-source_keyn::source_variable_name=value

# X #
# Examples of features available for this section
# are available at the "Layers on gff2aplot Figures" tutorial 
  

One important feature of such files is that they are not only optional but also any variable definition within them, even any of the file sections too. So that, you can start up with a simple customization file containing just few settings and grow it up as you need. As you can now set any of those customization parameters from command-line (via the --layout-var, --source-var, --group-var, --feature-var, ..., command-line switches), you are able to test them before including a new variable into the customization file. You may find examples of this on several gff2aplot tutorials.
You can also provide more than one customization file. You must take into account that when a variable appears more than once, latest uploaded value will be the one finally set (later definitions override previous ones). Furthermore, any command-line switch will override the corresponding variable value.
The customization files are modular. This means that there is a set of variables that can be used within a section, but those sections can be placed in any order, or they may appear several times in a customization file. When parsing those files, the program will classify each variable to create a table with all the settings. The sections must begin with a section header, which is a special comment line starting with five fixed chars. That string will follow the pattern '# * #': a hash mark followed by a white space, followed by the section code char, then another white space and a closing hash mark. If verbose mode is on, program will output the following report for each of the custom files parsed:

   
  ··· 
################################################################################
###                           READING CUSTOM FILES                           ###
################################################################################
###
### Reading Customization Parameters from "file_1.rc"
###
.....*LLLLLLLLLLL..*FFF [000023]
###
### Reading Customization Parameters from "file_2.rc"
###
......*.LLLLL. [000014]
  ··· 
    (... as many custom files you have passed 
         via commad-line option --custom-filename file.rc ...)
  ··· 
################################################################################
###                           CUSTOM FILES LOADED                            ###
###           0 wallclock secs ( 0.02 usr +  0.00 sys =  0.02 CPU)           ###
################################################################################
  ··· 
  

From that example we found that file_1.rc has 23 lines: 7 empty records and comments, 2 sections, 11 layout variable definitions and settings for 3 feature vars. The following table summarizes the codes used to describe what is being processed from customization files.

  Record Type    Symbol Used  
Empty Records
or Comments
.
Section Header*
Layout Var L
Sequence Var Q
Source Var S
Strand Var T
Group Var G
Feature Var F
Special Feat X
Unknown ?

GFF Records


gff2aplot can parse three basic formats: standard GFF (versions 1 and 2) and APLOT format. The second was derived from GFF to produce more compact input files when processing alignment data. There are two basic data sets, those records that contain axes annotations (f.i. gene structures, exons, and so on) and those coding for the pair-wise alignment coordinates. Cases A and B from the table below, are suitable for annotations. Program assumes that cases C, D and E correspond to alignment records.

  

seqname  source  feature  start  end  score  strand  frame  [ attributes ] 
  
   
#########
### A ### GFF version 1 record: Axes annotation example
#########
NM_009315 chr5 exon 1756 1901 . + . mus_taf6
NM_005641 chr7 exon 7358 7435 . + . hsap_taf6 
######### ### B ### GFF version 2 records: Axes annotation example ######### (same features as in the above GFF version 1 axes annotation example) NM_009315 chr5 exon 1756 1901 . + . Gene "mmus_taf6" NM_005641 chr7 exon 7358 7435 . + . Gene "hsap_taf6"
######### ### C ### GFF version 2 records: Similarity region example ######### NM_009315 BLASTN hsp 1719 2099 1122 + . Target "NM_005641"; Start 7067; End 7448; Strand +; Frame .; NM_009315 BLASTN hsp 5219 5832 1405 + . Target "NM_005641"; Start 11597; End 12218; Strand +; Frame .;
######### ### D ### GFF version 2 pseudo-records (compact): Similarity region example ######### (same features as in the previous GFF version 2 similarity region example) NM_009315 BLASTN hsp 1719 2099 1122 + . Target "NM_005641" 7067 7448 + . NM_009315 BLASTN hsp 5219 5832 1405 + . Target "NM_005641" 11597 12218 + .
  
  

  
  

seqnameX:Y  source  feature  startX:Y  endX:Y  score  strandX[:Y]  frameX[:Y]  [ group[:id] [ ; ...more attributes... ] ] 
  
   
#########
### E ### APLOT record: Similarity region example
#########               (same features as in the above GFF version 2 similarity region example)
NM_009315:NM_005641 BLASTN hsp 1719:7067 2099:7448 1122 +:+ .:. 174.4:2
NM_009315:NM_005641 BLASTN hsp 5219:11597 5832:12218 1405 +:+ .:. 216.9:1 
  

If verbose mode was activated, you will obtain a report like the following when parsing GFF files:

   
  ··· 
################################################################################
###                        PARSING INPUT GFF RECORDS                         ###
################################################################################
###
### Reading GFF records from "blastn_aplot.gff"
###
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO [000042]
###
### Reading GFF records from "mouse.gff"
###
XXXXXXXXXXXXXXXX [000016]
###
### Reading GFF records from "human.gff"
###
XXXXXXXXXXXXXXXXX [000017]
################################################################################
###                               DATA LOADED                                ###
###           0 wallclock secs ( 0.07 usr +  0.02 sys =  0.09 CPU)           ###
################################################################################
  ··· 
  

gff2aplot assigns each GFF record it reads to one of the five classes shown before. The codes that could appear on the above report section are the following:

  Record Type    Symbol Used  
Empty Records
or Comments
.
GFF (grouped) X
GFF (ungrouped) x
GFF (alignment) A
GFF (vector) V
APLOT (grouped) O
APLOT (ungrouped) o
Unknown ?

We provide several filters to convert output from different sequence alignment programs into such GFF/APLOT formats. See the distribution README file and specific examples from other gff2aplot tutorials.

Running gff2aplot


Let's start this three steps appetizer by letting gff2aplot to assume default values for all the customization parameters. We are going to fed it with a blastn output (filtered with the parseblast script) and the corresponding annotations of the genomic region of the human TAF6 gene and its mouse orthologous gene. As you can see, it is a very simple command-line...

Default Plot
 [PNG] [PS] [PDF
   
gff2aplot.pl --verbose             \
             -- taf6.gff           \
              > taf6_defaults.ps   \
             2> taf6_defaults.log 
  

Now, we can play with several command-line switches to change the resulting PostScript plot according to our needs. For instance, we want to highlight exons by paintind them with a green color. We also want to distinguish the untranslated regions (UTRs), so we choose a blue tone. Adding titles will help to identify what we are showing in this plot. Finally, we would like to see the scores projected along the mouse sequence in the bottom percent panel.

Command-line options
 [PNG] [PS] [PDF
   
gff2aplot.pl                                                      \
    --verbose                                                     \
    --title 'TAF6 Human/Mouse Orthologous Genes'                  \
    --subtitle                                                    \
      'Figure displays TBLASTX results for this genomic region.'  \
    --show-percent-box                                            \
    --percent-box-label "SIMILARITY"                              \
    --group-var '/.*M.musculus.*/::group_label=taf6 (M.musculus)' \
    --group-var '/.*H.sapiens.*/::group_label=taf6 (H.sapiens)'   \
    --feature-var '/[it].*exon/::feature_color=darkgreen'         \
    --feature-var 'utrexon::feature_color=skyblue'                \
    --feature-var 'initialexon::feature_shape=half_arrow_end'     \
    --feature-var 'terminalexon::feature_shape=half_arrow'        \
    --  taf6.gff  > taf6_cmdline.ps  2> taf6_cmdline.log 
  

Now, we are going to move all those settings into a customization file. This will allow us to reuse the same customization on new data sets to produce a similar figure "layout". Most command-line switches have their counterpart customization file variables, the opposite can be achieved by using the special --*-var switches. A side effect of moving settings from command-line options into a customization file, is that we are simplifying the command-line...

Using customization file
 [PNG] [PS] [PDF
   
gff2aplot.pl                                                     \
    --verbose                                                    \
    --title 'TAF6 Human/Mouse Orthologous Genes'                 \
    --subtitle                                                   \
      'Figure displays TBLASTX results for this genomic region.' \
    --custom-filename taf6.rc                                    \
    --  taf6.gff  > taf6_final.ps  2> taf6_final.log 
  

You are ready to start a cycle of testing command-line options, moving those you like into a customization file, and so on. Perhaps you should do that process once, when you start defining a new figure layout. Maybe you will play with all the options or just using defaults, or you will set up several complementary customization files, etcetera. From now on is up to you to explore the features available for gff2aplot. As Perl motto says "There's More Than One Way To Do It"...

Take a look to the available tutorials, perhaps there is one that suits your current plot requests. Thanks for using gff2aplot, we hope you will enjoy it !!!

 
  Disclaimer webmaster