Genome BioInformatics Research Lab

  IMIM * UPF * CRG * GRIB * HOME * DATASETS * HsMmRnGg Splice Sites
   
SUPPLEMENTARY MATERIALS FOR

Comparison of Splice Sites in Mammals and Chicken

J. F. Abril, R. castelo and R. Guigó *.

Genome Research, 15(1):111-119, January 3, 2005
[PubMed]   [ Abstract ]   [ Full Text ]   [ Datasets ]
[ Published online before print in Dec 2004 ]


* To whom correspondence should be addressed.
Email: rguigo@imim.es. Ph: +34 93 225 7567.

Contents


Summary GO TOP

We have carried out an initial analysis of the dynamics of the recent evolution of the splice sites sequences on a large collection of human, rodent (mouse and rat), and chicken introns. Our results indicate that the sequences of splice sites are largely homogeneous within tetrapoda. We have also found that orthologous splice signals between human and rodents and within rodents are more conserved than unrelated splice sites, but the additional conservation can be explained mostly by background intron conservation. In contrast, additional conservation over background is detectable in orthologous mammalian and chicken splice sites. Our results also indicate that the U2 and U12 intron classes seem to have evolved independently since the split of mammals and birds; we have not been able to find a convincing case of interconversion between these two classes in our collections of orthologous introns. Similarly, we have not found a single case of switching between AT-AC and GT-AG subtypes within U12 introns, suggesting that this event has been a rare occurrence in recent evolutionary times. Switching between GT-AG and the non-canonical GC-AG U2 subtypes, on the contrary, does not appear to be unusual; in particular, T to C mutations appear to be relatively well tolerated in GT-AG introns with very strong donor sites.


UCSC Initial RefSeq Datasets GO TOP

RefSeq Identifiers from Filtered Sets

                      1      2      3      4      5      6      7      8      9     10     11
Hsap  UCSC_200307    21744  20894  18117  15159  10757   7799  17939  15066  10316   7443  21091
Mmus  UCSC_200310mm  17988  16126  14432  13677   9765   9010  14175  13461   9078   8364  16192
Rnor  UCSC_200306rn   4798   4134   3454   3347   2201   2094   3368   3275   1947   1854   4536
Ggal  UCSC_200402     1496   1085      -      -      -      -      -      -      -      -   1367

Hsap  UCSC_20030410  19174  18337  18145  18067  10486  10408  18014  17901   9988   9875  18226
Mmus  UCSC_200302mm  13406  11161  10503  10404   7397   7298  10371  10255   6908   6792  12511
Rnor  UCSC_200301rn   4219   3372   3070   3049   2102   2081   3017   2991   1893   1867   4002

Click on numbers from above having a link to get the corresponding selection:
 1.- Total RefSeqs
 2.- (1) without Stop codons in frame when translating from genomic
 3.- (2) + (identity(aa)>95% + gap(aa)<6) or (identity(RNA)>95% + gap(RNA)<16)
 4.- (2) + (identity(aa)>95% + gap(aa)<6)
 5.- (2) + (identity(RNA)>95% + gap(RNA)<16)
 6.- (2) + (identity(aa)>95% + gap(aa)<6) and (identity(RNA)>95% + gap(RNA)<16)
 7.- (2) + (mismatch(aa)<4 + gap(aa)<6) or (mismatch(RNA)<10 + gap(RNA)<16)
 8.- (2) + (mismatch(aa)<4 + gap(aa)<6)
 9.- (2) + (mismatch(RNA)<10 + gap(RNA)<16)
10.- (2) + (mismatch(aa)<4 + gap(aa)<6) and (mismatch(RNA)<10 + gap(RNA)<16)
11.- Unique ID


Sequence Files for All RefSeq Genes: Exons, Introns, CDS and Splice Sites.

  All Exons All Introns All CDSs   Splice Sites  
(fasta) (fasta) (fasta) EXONIC INTRONIC
 
Hsap UCSC200307 19M 362M 11M 19M 17M
Mmus UCSC200310 15M 211M 8.5M 15M 14M
Rnor UCSC200306 4.0M 70M 2.6M 4.7M 4.4M
Ggal UCSC200402 1.2M 13M 772K 1.4M 1.3M
 
Hsap UCSC200304 16M 304M 9.3M 17M 16M
Mmus UCSC200302 10M 141M 6.4M 12M 11M
Rnor UCSC200301 3.4M 55M 2.3M 4.0M 3.7M
 
This table shows the file sizes of the gzipped files in each category.
Click on file size numbers to retrieve the corresponding file.


RefSeq U2/U12 Intron Major Classes GO TOP

Summary of U2/U12 Intron Major Classes on RefSeq Filtered Set 1 (Total RefSeqs)

  U2 Both Sites   U12 Donor Site   U12 Acceptor Site   U12 Both Sites   TOTAL  
GTAG GCAG ATAC XXXX GTAG GCAG ATAC XXXX GTAG GCAG ATAC XXXX GTAG GCAG ATAC XXXX
 
Hsap UCSC200307 189656 1529 34 2248 128 2 9 8 12430 109 13 134 355 1 139 19 206814
Mmus UCSC200310 125587 1015 21 2407 88 0 7 9 9557 66 10 130 254 1 91 15 139258
Rnor UCSC200306 38601 289 14 1236 20 0 1 1 3038 19 4 77 69 0 20 4 43393
Ggal UCSC200402 11073 77 5 736 7 0 1 0 676 6 0 27 17 0 5 2 12632
 
Hsap UCSC200304 162740 1254 28 2273 115 0 9 6 10846 91 13 126 302 1 108 19 177931
Mmus UCSC200302 92487 721 16 3740 69 0 6 9 7027 46 5 192 196 1 67 9 104591
Rnor UCSC200301 32378 253 13 1589 18 0 1 2 2604 17 3 82 60 0 20 3 37043
 
Click on numbers of the column TOTAL to retrieve the table with the splice sites info.




Search parameters:


donor_pattern=/^ATCCT[CT]/
acceptor_max_mismatch_number=1
acceptor_pattern=/TCCTT[AG]AC/

  U2 Both Sites   U12 Donor Site   U12 Acceptor Site   U12 Both Sites   TOTAL  
GTAG GCAG ATAC XXXX GTAG GCAG ATAC XXXX GTAG GCAG ATAC XXXX GTAG GCAG ATAC XXXX
 
Hsap UCSC200307 190632 1536 34 2249 128 2 9 8 11454 102 13 133 355 1 139 19 206814
Mmus UCSC200310 126409 1021 21 2408 89 0 7 9 8735 60 10 129 253 1 91 15 139258
Rnor UCSC200306 38848 289 14 1238 20 0 1 1 2791 19 4 75 69 0 20 4 43393
Ggal UCSC200402 11150 78 5 736 7 0 1 0 599 5 0 27 17 0 5 2 12632
 
Click on numbers of the column TOTAL to retrieve the table with the splice sites info.




Search parameters:


donor_pattern=/^ATCCT[CT]/
acceptor_max_mismatch_number=2
acceptor_pattern=/TCCTT[AG]AC/

  U2 Both Sites   U12 Donor Site   U12 Acceptor Site   U12 Both Sites   TOTAL  
GTAG GCAG ATAC XXXX GTAG GCAG ATAC XXXX GTAG GCAG ATAC XXXX GTAG GCAG ATAC XXXX
 
Hsap UCSC200307 108118 973 21 1571 34 0 3 3 93968 665 26 811 449 3 145 24 206814
Mmus UCSC200310 69628 567 13 1647 22 0 1 2 65516 514 18 890 320 1 97 22 139258
Rnor UCSC200306 20943 168 9 855 4 0 0 0 20696 140 9 458 85 0 21 5 43393
Ggal UCSC200402 6444 49 4 600 0 0 0 0 5305 34 1 163 24 0 6 2 12632
 
Click on numbers of the column TOTAL to retrieve the table with the splice sites info.




Search parameters:


donor_pattern=/^ATCCT[CT]/
acceptor_max_mismatch_number=2
acceptor_pattern=/TCCTT[AG]AC/
Extra constraints:

branchpoint_distance_from_acceptor=[ -20 .. -5 ]
branchpoint_sequence_matches_to=[ /..A.$/ || /.A..$/ ]

  U2 Both Sites   U12 Donor Site   U12 Acceptor Site   U12 Both Sites   TOTAL  
GTAG GCAG ATAC XXXX GTAG GCAG ATAC XXXX GTAG GCAG ATAC XXXX GTAG GCAG ATAC XXXX
 
Hsap UCSC200307 182013 1471 31 2127 51 0 3 4 20073 167 16 255 432 3 145 23 206814
Mmus UCSC200310 120700 968 20 2316 32 0 1 2 14444 113 11 221 310 1 97 22 139258
Rnor UCSC200306 37208 275 14 1204 8 0 0 0 4431 33 4 109 81 0 21 5 43393
Ggal UCSC200402 10698 76 5 733 2 0 0 0 1051 7 0 30 22 0 6 2 12632
 
Click on numbers of the column TOTAL to retrieve the table with the splice sites info.


RefSeq Orthologs Datasets GO TOP


U2/U12 Splice Sites Datasets GO TOP

Summary of U2 Intron Major Classes on RefSeq Orthologous Set (Paper Table 3)

  U2 Both Sites   U12 Donor Site   U12 Acceptor Site   U12 Both Sites   TOTAL  
GTAG GCAG ATAC XXXX GTAG GCAG ATAC XXXX GTAG GCAG ATAC XXXX GTAG GCAG ATAC XXXX
 
Hsap UCSC200307 31425 218 3 29 27 0 0 2 2055 12 0 7 4 0 1 0 33783
Mmus UCSC200310 28168 207 2 70 23 0 0 0 2231 14 1 9 2 0 0 0 30727
Rnor UCSC200306 10019 64 4 23 5 0 0 1 835 9 0 5 0 0 0 0 10965
 
Hsap UCSC200304 31626 220 3 28 27 0 0 2 2068 12 0 6 2 0 0 0 33994
Mmus UCSC200302 28810 212 2 41 24 0 0 0 2270 14 0 7 3 0 0 0 31383
Rnor UCSC200301 10209 65 4 7 5 0 0 1 841 9 0 4 0 0 0 0 11145
 
Click on numbers of the column TOTAL to retrieve the table with the splice sites info.


Summary of U12 Intron Major Classes on RefSeq Orthologous Set

  U2 Both Sites   U12 Donor Site   U12 Acceptor Site   U12 Both Sites   TOTAL  
GTAG GCAG ATAC XXXX GTAG GCAG ATAC XXXX GTAG GCAG ATAC XXXX GTAG GCAG ATAC XXXX
 
Hsap UCSC200307 2 0 0 0 9 0 0 1 7 0 1 1 65 0 31 0 117
Mmus UCSC200310 1 0 0 0 2 0 2 0 7 0 2 1 71 0 27 1 114
Rnor UCSC200306 1 0 0 0 2 0 0 0 1 0 0 0 26 0 9 0 39
 
Hsap UCSC200304 0 0 0 0 10 0 0 1 7 0 1 1 67 0 31 0 118
Mmus UCSC200302 1 0 0 0 2 0 2 0 6 0 2 1 73 0 28 1 116
Rnor UCSC200301 0 0 0 0 2 0 0 0 1 0 0 0 27 0 9 0 39
 
Click on numbers of the column TOTAL to retrieve the table with the splice sites info.


Orthologous U2/U12 Splice Sites GO TOP

Chicken Orthologous for Human/Mouse/Rat U12 Splice Sites

x Gg200402     U2 Both Sites   U12 Donor Site   U12 Acceptor Site   U12 Both Sites   TOTAL  
Exonerate Genic CDS GTAG GCAG ATAC XXXX GTAG GCAG ATAC XXXX GTAG GCAG ATAC XXXX GTAG GCAG ATAC XXXX
 
Hs200307/Mm200310/Rn200306 TBL FA GFF 1 2 0 27 9 0 0 0 4 2 0 5 29 0 8 2 89
Hs200304/Mm200302/Rn200301 TBL FA GFF 1 2 0 28 9 0 0 0 5 2 0 6 30 0 8 3 94
 
Click on numbers of the column TOTAL to retrieve the table with the splice sites info.

Exonerate parameters:






--query NNN.u12.exoncdspairs.fa

(where NNN was hsap.gp200307mmus.gp200310rnor.gp200306 
hsap.gp200304mmus.gp200302, or rnor.gp200301
)
--target chromfa/chrNNN.fa (where NNN is a chicken chromosome number from this table)
--softmasktarget
--model coding2genome
--proteinsubmat blosum62

Alignments Summaries for the Orthologous Splice Sites Comparison

Orthologous Intron IDs
Hsap/Mmus/Rnor
Hsap/Mmus
Hsap/Rnor
Mmus/Rnor
Hsap/Ggal
Mmus/Ggal
Rnor/Ggal

Orthologous Introns IDs
Hsap/Mmus/Rnor/Ggal

Orthologous Splice Sites Alignments(Raw text)
Hsap/Mmus/Rnor TBL ALN
Hsap/Mmus/Rnor/Ggal TBL ALN
Species Code   Alignments Summary  
 
Hsap/Mmus1100 PS PDF
Hsap/Rnor1010 PS PDF
Mmus/Rnor0110 PS PDF
Hsap/Mmus/Rnor1110 PS PDF
 
Hsap/Mmus/Ggal1101 PS PDF
Hsap/Rnor/Ggal1011 PS PDF
Mmus/Rnor/Ggal0111 PS PDF
Hsap/Mmus/Rnor/Ggal *1111 PS PDF
 
*) This summary is shown in the figure below.

Orthologous Human/Mouse/Rat U12 Introns Alignments against Chicken.

Orthologous Human/Mouse/Rat U12 Introns Alignments against Chicken
Figure versions in:  JPG /  PNG /  PS /  PDF


Comparative Pictograms GO TOP

Sequence Files for Comparative Analysis of Splice Sites.

        Site Sequences  
Species Data Sets Donors
(-3/GT/+4)
  Acceptors
(-18/AG/+3)
 
H.sapiens seq.gz dat.gz dat.gz
M.musculus seq.gz dat.gz dat.gz
R.norvegicus seq.gz dat.gz dat.gz
G.gallus seq.gz dat.gz dat.gz
D.rerio fasta.gz dat.gz dat.gz
D.melanogaster fasta.gz dat.gz dat.gz
 

Comparative Pictograms for Donor and Acceptor Splice Sites.

 
Species Donor Sites Acceptor Sites
 
M.musculus
R.norvegicus
PWM COMPI MMUS/RNOR DONOR SITES

PWM :  JPG /  PNG /  PS
PWM COMPI MMUS/RNOR ACCEPTOR SITES

PWM :  JPG /  PNG /  PS
 
H.sapiens
M.musculus
PWM COMPI HSAP/MMUS DONOR SITES

PWM :  JPG /  PNG /  PS
PWM COMPI HSAP/MMUS ACCEPTOR SITES

PWM :  JPG /  PNG /  PS
 
H.sapiens
R.norvegicus
PWM COMPI HSAP/RNOR DONOR SITES

PWM :  JPG /  PNG /  PS
PWM COMPI HSAP/RNOR ACCEPTOR SITES

PWM :  JPG /  PNG /  PS
 
H.sapiens
G.gallus
PWM COMPI HSAP/GGAL DONOR SITES

PWM :  JPG /  PNG /  PS
PWM COMPI HSAP/GGAL ACCEPTOR SITES

PWM :  JPG /  PNG /  PS
 
H.sapiens
D.rerio
PWM COMPI HSAP/DRER DONOR SITES

PWM :  JPG /  PNG /  PS
PWM COMPI HSAP/DRER ACCEPTOR SITES

PWM :  JPG /  PNG /  PS
 
H.sapiens
D.melanogaster
PWM COMPI HSAP/DMEL DONOR SITES

PWM :  JPG /  PNG /  PS
PWM COMPI HSAP/DMEL ACCEPTOR SITES

PWM :  JPG /  PNG /  PS
 

Sequence Conservation GO TOP

Sequence Datasets for Donor and Acceptor Orthologous Splice Sites.

    Donor Sites   Acceptor Sites  
Species Orthologous
Pairs
Identity
Summary
  Random
Pairs
Identity
Summary
Orthologous
Pairs
Identity
Summary
  Random
Pairs
Identity
Summary
 
M.musculus/R.norvegicus
TBL SET TBL SET TBL SET TBL SET
H.sapiens/M.musculus
TBL SET TBL SET TBL SET TBL SET
H.sapiens/R.norvegicus
TBL SET TBL SET TBL SET TBL SET
H.sapiens/G.gallus
TBL SET TBL SET TBL SET TBL SET
G.gallus/M.musculus TBL SET TBL SET TBL SET TBL SET
G.gallus/R.norvegicus
TBL SET TBL SET TBL SET TBL SET
 
Human/mouse/rat/chicken orthologous introns file: TBL


Conservation Plot from Identity Summaries for Orthologous Splice Sites.

Hsap/Mmus/Rnor/Ggal U2 Splice-Sites Sequence Conservation
Figure versions in:  JPG /  PNG /  PS

 
  Disclaimer webmaster