Genome Informatics Research Lab

  IMIM * UPF * CRG * GRIB HOME DATASETS Hsap/Mmus SGP2 gff2ps
 
Visualizing SGP2 results with gff2ps
 
 

 
SUMMARY

One of the gff2ps program strenghts is comparing results from different sources, so it is easy to see differences among a genomic sequence annotation and one or more gene prediction programs, including results from other programs such blast (always processing those results and converting to a GFF compliant file suitable for gff2ps of course). Here we are illustrating how we generated the PostScript plots you can find in the Human/Mouse Gene Prediction page. If you have problems when visualizing the PostScript files take a look on the section about using ghostview.

INDEX:

GFF input files

As example we are using Hsap_BTK files. To avoid that both sgp predictions (for Full Ortologous -FO- and WGS sequences -3X-) collapse on the same track of geneid ones, we replaced 'geneid_v1.0' from source field by 'SGP.3X' and 'SGP.homol' (in 'Hsap_BTK_sgp3X' and 'Hsap_BTK_sgpFO' respectively). Click on the filenames to see their contents.

  Hsap_BTK_annotation.gff   Annotation of real genes for BTK region converted to GFF from Dan Brown annotation.

  Hsap_BTK_sgp3X.gff   sgp predictions using as evidence tblastx on human sequence against the WGS database (currently 3X).

  Hsap_BTK_sgpFO.gff   sgp predictions using as evidence tblastx on human sequence against mouse ortologous sequence (Hsap_BTK over Mmus_BTK).

  Hsap_BTK_geneid.gff   geneid predictions on Hsap_BTK sequence.

  Hsap_BTK_genscan.gff   genscan predictions on Hsap_BTK sequence.

  Hsap_BTK_tblastx3X.gff   tblastx similarity regions from human sequence searches against the WGS database (currently 3X).

  Hsap_BTK_tblastxFO.gff   tblastx similarity regions from human sequence searches against mouse ortologous sequence (Hsap_BTK over Mmus_BTK).

  Hsap_BTK_repeatmasker.gff   repeats found in BTK human sequence converted to GFF (filtered from the 'Hsap_BTK.out' file obtained with RepeatMasker).

gff2ps customization files

We ran gff2ps on the same GFF input files, using two slightly different customization files. Following table shows the basic differences between them, all the other variables are set to same values:

  brown.a4.rc   Page size is set to A4, there are three blocks per page and 10Kbp will appear on each block.

  brown.a3.rc   Page size is set to A3, there are four blocks per page and 30Kbp will appear on each block.

Structure of gff2ps customization files is shown in the following picture.


Customization files and available variables are explained in chapter four of the User's Manual. From here you can donwload the latest version of the manual (v0.96) and of the gff2ps program (v0.97b). You must replace from the script the paths for bash and gawk to the ones defined in your system on the following two lines:

    #!/bin/bash

    GAWK="/usr/local/bin/gawk";

Running gff2ps

There are two basic ways to work on GFF records with gff2ps : you can merge all the GFF records from different sources into a single GFF file or you can split GFF records from different sources into different files. The second approach provides to you the advantage of easily ordering sources in your plots by ordering the different GFF source files in the command-line. You must keep on mind that sources appear on the PostScript figure in the same order as they are given from input (so that we prefer to work with separate files for each source).
If you manage to have a fixed set of sequence, source and feature names, you can define a fixed set of customization variables (as we did in the previous section), and reuse the custom files for all the plots having the same layout but different datasets (and it is easy to automate the process at the command-line/scripts level too).

The following two commands are using the same customization file on different files (for sgp and tblastx sources):

gff2ps -VC brown.a3.rc -- Hsap_BTK_annotation.gff \
    Hsap_BTK_sgpFO.gff Hsap_BTK_geneid.gff \
    Hsap_BTK_genscan.gff Hsap_BTK_tblastxFO.gff \
    Hsap_BTK_repeatmasker.gff \
    > Hsap_BTK_FO_a3.ps 2> Hsap_BTK_FO_a3.log


gff2ps -VC brown.a3.rc -- Hsap_BTK_annotation.gff \
    Hsap_BTK_sgp3X.gff Hsap_BTK_geneid.gff \
    Hsap_BTK_genscan.gff Hsap_BTK_tblastx3X.gff \
    Hsap_BTK_repeatmasker.gff \
    > Hsap_BTK_3X_a3.ps 2> Hsap_BTK_3X_a3.log

Note that we preserved the sources ordering (annotation, sgp, geneid, genscan, tblastx and repeatmasker). Backslashes ('\') mean that the command is passed in a single line.


The following two commands are using different customization files on the same files:

gff2ps -VC brown.a4.rc -- Hsap_BTK_annotation.gff \
    Hsap_BTK_sgp3X.gff Hsap_BTK_tblastx3X.gff \
    Hsap_BTK_sgpFO.gff Hsap_BTK_tblastxFO.gff \
    Hsap_BTK_geneid.gff Hsap_BTK_genscan.gff \
    Hsap_BTK_repeatmasker.gff \
    > Hsap_BTK_ALL_a4.ps 2> Hsap_BTK_ALL_a4.log


gff2ps -VC brown.a3.rc -- Hsap_BTK_annotation.gff \
    Hsap_BTK_sgp3X.gff Hsap_BTK_tblastx3X.gff \
    Hsap_BTK_sgpFO.gff Hsap_BTK_tblastxFO.gff \
    Hsap_BTK_geneid.gff Hsap_BTK_genscan.gff \
    Hsap_BTK_repeatmasker.gff \
    > Hsap_BTK_ALL_a3.ps 2> Hsap_BTK_ALL_a3.log

You can see here how easy is to merge new sources to the final plots and how easy is to change the plot layout when using differnt customization files.

 
  Disclaimer webmaster