Tag Archives: intron

Chlamy genome investigations

Chlamy coverThis month’s Genetics has a series of articles exploring the genome (published last year & freely available at Science) of the green algae Chlamydomonas reinhardtii. These manuscripts are primarily genome analyses making for a very bioinformatics focused issue of Genetics. Some of the highlights include:

B. dendrobatidis strain JAM81 released

B.dendrobatidis zoosporeThe following is an announcement to the B.dendrobatidis and fungal community at large from Alan Kuo at JGI. This is the JAM81 strain (Jess Morgan collected from a frog in the California Sierra Nevada). The JEL423 (Joyce Longcore, collected in Panama) strain genome sequence and annotation is available from the Broad Institute.

Please do contact me if you would like to contribute to assigning functions to the annotation. We’re in the last round of analyses for some of the genome work, but if there are particular questions you want to contribute to, we’re open to collaborators and can outline the basis of our work to see how other work can complement it.

From Alan Kuo at JGI:

The JGI Batrachochytrium annotation portal is now on the public JGI website. As it is public, no password is required.

For those of you who have not yet registered to be an annotator, go to this new link to register.As before, please choose a username that is personal, so that other annotators may be able to recognize it as yours. A derivative of your personal name would be best.

Those of you who are already registered, you do not need to do anything. Your old pre-release username and password are valid on the new public portal too.

As always, please direct all questions and problems to me. Use email or phone: Cheers, Alan.

Some information about the assembly and annotation:

The first annotation of the 127 scaffolds and 24 Mbp of JGI’s 8.74X assembly of the Batrachochytrim dendrobatidis JAM81 genome. We predict 8732 genes, with the following average properties:

Gene length 1825.16 nt
Transcript length 1407.29 nt
Protein length 450.56 aa
Exon frequency 4.29 exons/gene
Exon length 328.37 nt
Intron length 129.18 nt
Gene density 359.1 genes/Mbp scaffold

The genes were found by the following methods:
Total models 8732 (100%)
Jason’s models 3214 (37%)
cDNAs and ESTs 518 (6%)
Similarity to nr 1928 (22%)
ab initio 3072 (35%)

The genes were validated by the following evidence:
start+stop codons 7990 (92%)
EST support 2488 (28%)
nr hit 6787 (78%)
Pfam hit 4329 (50%)

Splicing machinery and introns

Splicing of pre-messenger RNA is necessary to remove introns and create well formed and translateable mRNA, but the purpose of introns still remains a mystery. One idea is they provide a role in the error checking machinery, or Nonsense Mediated Decay (NMD), by providing way-points during translation. A protein is deposited at the exon junction complex (EJC) which indicates a splicing event has occurred. During translation, if the ribosome encounters a premature stop (or termination) codon (PTC) and then sees one of these EJC way-points, it signals the corrupted message for degradation.


Several predictions come out of these models including the lack of introns in the 3′ UTR and that the average length of exons should be correlated with the window that the proofreading mechanism can operate on. These are discussed in several papers out of Mike Lynch’s lab including (Lynch and Connery 2003), (Lynch and Kewalramani, 2003), (Lynch and Richardson, 2002) and recently (Scofield et al, 2007).

Efforts to understand the splicing machinery, particularly in S. cerevisiae have led to the discovery of numerous genes that code for proteins that make up the spliceosome. Some of these include small RNAs as well as protein coding genes. The SR proteins are serine-arginine rich proteins that regulate splicing and are found in almost all eukaryotes including most fungi (even those with few introns, such as S. cerevisiae). SR proteins play a role in splicing and in nuclear export (Masuyama et al, 2004, Sanford et al, 2004) indicating that a coupling of these processes may explain why genes with introns tend to be more highly expressed. The evolution of the spliceosomal family of genes is also interesting because the families appear to diversify in some eukaryotes perhaps where there are more elaborate splicing and regulatory action (Barbosa-Morais et al, 2006).

There is some debate as to whether splicing occurs after the pre-mRNA is completely synthesized or if it happens as transcription is occurring. Work on this has shown that both spliceosomal assembly can co-occur with polymerase during transcription, as well as evidence that most splicing (in yeast) is post-transcriptional (Tardiff et al, 2006). It is argued that the two steps occur together to maximize efficiency and fidelity (Das et el, 2006, Moore et al, 2006), but perhaps affinities are species-specific and have evolved to correlate with intron densities?

[Note: This post has links to non-open access journal articles. At this point I am still referring to these even if they are not all readable by everyone, because they contain some data that is only available there. I will strive to focus more narrowly on only papers that are available as open access through pubmed central or directly through open-access journals.]

Whole genome tiling arrays

A recent paper describes the discovery of 9 new introns in Saccharomyces cerevisiae by Ron Davis’s group at Stanford, using high density tiling arrays from Affymetrix. The arrays are designed for both strands allow the detection of transcripts transcribed from both strands. The arrays were also put to work by the Davis and Steinmetz labs to create a high density map of transcription in yeast and for polymorphism mapping from the Kruglyak lab.

PNAS Yeast Transcriptional map

Whole genome tiling arrays have also been employed in other fungi. For example, Anita Sil’s group at UCSF constructed a random tiling array for Histoplasma capsulatum and used it to identify genes responding to reactive nitrogen species. A similar approach was used in Cryptococcus neoformans to investigate temperature regulated genes using random sequencing clones.

As the technology has become cheaper, it may become sensible to use a tiling array to detect transcripts rather than ESTs when attempting to annotate a genome. In the Histoplasma work transcriptional units could be identified from hybridization alone. Some of the algorithms will need some work to correct incorporate this information, and the sensitivity and density of the array will influence this. These techniques can be part of a resequencing approaches or fast genotyping progeny from QTL experiments when the sequence from both parents is known (or at least enough of the polymorphims for the genetic map).

What is superior about the current Affymetrix yeast tiling array is the inclusion of both strands. This allows detection of transcripts from both strands. Several anti-sense transcripts in yeast have been discovered recently including in the IME4 locus through more classical approaches, but perhaps many more await discovery with high resolution transcriptional data from whole genome tiling arrays.