Steven Salzberg (who is nominated for the Franklin award at bioinformatics.org) has an opinion piece in Genome Biology proposing wiki technology to help solve the problem of genome annotations getting out of date.
Continue reading Wikis for genome (re)annotation
A paper* this week from the Huffnagle lab argues that even though the human pathogenic fungus Cryptococcus neoformans can produce an oxylipin similar to prostaglandin, the authors were unable to identify any homologous cyclooxygenase genes in the genome. They showed through LC-MS-MS on supernatants from C. neoformans cells grown on arachidonic acid that molecules with activity similar to prostaglandin E2 are synthesized. BLAST searches of the genome could not identify any similar genes to cyclooxygenase genes including the PPo genes from Aspergillus which contain catalytic domains similar to mammalian cyclooxygenases.
So did C. neoformans evolve a new way to synthesize this enzyme which may act as a hormone and affect the host’s immune system? My cursory searches against other basidiomycete genomes did turn up homologs to these PPo genes in Ustilago and Coprinus so perhaps the enyzmes in the pathway have changed in the Cryptococcus lineage. Perhaps searches with protein structure of cyclooxygenases could pick up functionaly similar genes which would serve as good candidates which have little sequence similarity to the cannonical protein determined in humans.
* Paid access required for 6 months.
Here is an image of Neurospora crassa I took today in my first attempt at squashes. These are from strains that Dave Jacobson grew up with his constructs so I can’t take any credit other than playing with the microscope next door. Now my first attempt came out badly, so this is actually Dave’s prep as well. And these got dry so they aren’t as nices as they could be. For much nicer images, see N.B. Raju’s.
All that said, I hope these quick images give a hint at the extremely cool structures these fungi produce. These 8-chain ascospores are the result of meoisis that took place inside the perithecia (which was squeezed gently to release the rosettes [or not too gently in my case]).
( I was previous confused about the sample and had labeled this N. tetrasperma which has 4-chained ascospores [tetra] while this sample is crassa which has 8).
An NPR story on former Taylor Lab postdoc and current Harvard professor Anne Pringle airs tonight. They followed her, Ben, and Frank around collecting Amanita phalloides in Point Reyes in December. Poor Anne’s voice is going as she had a cold, but as usual she does a great job expressing her unbridled passion for mycology and biology.
The NPR newscast right after the report also has two briefs on medicinal research with fungi.
Slime molds are interesting organisms that receive surprisingly little attention. Take the case of Dictyostelium discoideum, a single-celled amoeba that, when starved, will aggregate with other D. discoideum amoeba cells in the neighborhood to create a motile, multicellular structure known as a slug. Eventually the slug differentiates into a reproductive structure, with some individuals making a long stalk and others producing spores. In other words, some individuals help other reproduce but do not reproduce themselves.
But why form a slug? Why would a single celled organism decide to cooperate with other, genetically different individuals, particularly when it may provide no direct passage of its genes? The evolutionary benefits of kin relationships aside, previous work has shown that slugs do provide multiple benefits to the population as a whole. Continue reading Social Slime Mold
Your eye contains the same genetic content as your fingernail, but these two tissues look nothing alike. One significant cause of this difference is the tissue specific regulation of the genes in the genome. In some tissues in your body, a gene may be expressed (transcribed) while that same gene may be silent in another tissue type. A great deal of modern biological research explores the regulation of expression of all the genes in a genome, collectively known as the transcriptome. Such studies are, for example, aimed at understanding which genetic regulation events account for the differences between an eye and a fingernail.
However, the effectiveness of this research is predicated upon actually knowing which parts of the genome are capable of being expressed and, subsequently, regulated. Conventionally, researchers extract RNA from an organism grown in various conditions (or, as in the case of our example, various tissues from an organism) and clone and sequence the RNA to identify at least a subset of genes that are expressed (Ebbole 2004*). Such Expressed Sequence Tags (ESTs) have proven vital to our understanding of gene and gene structure annotation as they frequently provide evidence of intron splice sites. While this method has facilitated a robust understanding of gene regulation, it is expensive, time consuming, and provides a relatively low coverage of the transcriptome. If our goal is to understand everything that is expressed, then we need a superior tool.
Enter SAGE (serial analysis of gene expression) and MPSS (massively parallel signature sequencing) [Irie 2003*, Harbers 2005*]. Both methods sequence short tags of a transcript’s 3′ end. SAGE uses conventional sequencing technology while MPSS uses Solexa, Inc.’s novel bead-based hybridization technology. One of the massive advantages of these technologies is the number of sequences they provide: large EST databases are on the order of several tens of thousands, while SAGE generally provides 100,000 to 200,00 tags and MPSS can provide over a million signatures. That being said, there are still questions regarding the sensitivity of the depth of coverage of the transcriptome. It may well be that despite a lower total sequence count, ESTs provide more information about what parts of the genome are expressed.
Fortunately, Gowda et al put all three methods to work as well as an RNA microarray (which doesn’t provide sequence, but enables its inference through hybridization) in their recent study of the Magnaporthe grisea transcriptome [Gowda 2006]. M. grisea is the causative agent of rice blast, a devastating disease that results in tremendous crop yield loss. The researchers evaluated two tissues types: the non-pathogenic mycelium and the invasive, plant penetrating appressorium.
Interestingly, 40% of the MPSS tags and 55% of the SAGE tags identified represent novel genes as they had no matches in the existing M. grisea JGI EST collection. Additionally, the authors found that no one method could identify the majority of the transcripts, but that a two-way combination of array data, MPSS or SAGE could provide over 80% of the total unique transcripts all of the methods identified. One additional suprise was that roughly a quarter of the genes identified also produced an antisense RNA, possibly for siRNA regulation of the gene.
The long story short appears to be that there is, as of yet, no magic bullet of a method. To adequately cover the transcriptome, multiple techniques are required.
*These references are, unfortunately, not located in an open access journal.
Ever wonder what goes on in a cow’s multi-chambered stomach? Probably not. I did think about it a little more after a trip to a teaching farm during grad school where we saw a cow with a fistula. This hole provides access to the cows stomach so that samples can be drawn of the community living in the gut and understand how the bovine stomach can digest the recalcitrant cellulose of grasses.
Of course all kinds of lovely things live in the dark, anaerobic environment. In fact there is a delicately balanced community of species. When cows are fed corn instead of grass this affects the rumen acid content and allows pathogenic E. coli like O:157 to survive. So far I don’t seen any JGI proposal for sequencing of the gut communities of rumens, but maybe that should be proposed.
Rumen fungi are probably not on your keyword list, but these fungi are extremomophiles living in highly anaerobic environment. A paper in Microbiology details an analysis of the genome of the anaerobic fungus Orpinomyces.
Splicing of pre-messenger RNA is necessary to remove introns and create well formed and translateable mRNA, but the purpose of introns still remains a mystery. One idea is they provide a role in the error checking machinery, or Nonsense Mediated Decay (NMD), by providing way-points during translation. A protein is deposited at the exon junction complex (EJC) which indicates a splicing event has occurred. During translation, if the ribosome encounters a premature stop (or termination) codon (PTC) and then sees one of these EJC way-points, it signals the corrupted message for degradation.
Several predictions come out of these models including the lack of introns in the 3′ UTR and that the average length of exons should be correlated with the window that the proofreading mechanism can operate on. These are discussed in several papers out of Mike Lynch’s lab including (Lynch and Connery 2003), (Lynch and Kewalramani, 2003), (Lynch and Richardson, 2002) and recently (Scofield et al, 2007).
Efforts to understand the splicing machinery, particularly in S. cerevisiae have led to the discovery of numerous genes that code for proteins that make up the spliceosome. Some of these include small RNAs as well as protein coding genes. The SR proteins are serine-arginine rich proteins that regulate splicing and are found in almost all eukaryotes including most fungi (even those with few introns, such as S. cerevisiae). SR proteins play a role in splicing and in nuclear export (Masuyama et al, 2004, Sanford et al, 2004) indicating that a coupling of these processes may explain why genes with introns tend to be more highly expressed. The evolution of the spliceosomal family of genes is also interesting because the families appear to diversify in some eukaryotes perhaps where there are more elaborate splicing and regulatory action (Barbosa-Morais et al, 2006).
There is some debate as to whether splicing occurs after the pre-mRNA is completely synthesized or if it happens as transcription is occurring. Work on this has shown that both spliceosomal assembly can co-occur with polymerase during transcription, as well as evidence that most splicing (in yeast) is post-transcriptional (Tardiff et al, 2006). It is argued that the two steps occur together to maximize efficiency and fidelity (Das et el, 2006, Moore et al, 2006), but perhaps affinities are species-specific and have evolved to correlate with intron densities?
[Note: This post has links to non-open access journal articles. At this point I am still referring to these even if they are not all readable by everyone, because they contain some data that is only available there. I will strive to focus more narrowly on only papers that are available as open access through pubmed central or directly through open-access journals.]
A recent paper describes the discovery of 9 new introns in Saccharomyces cerevisiae by Ron Davis’s group at Stanford, using high density tiling arrays from Affymetrix. The arrays are designed for both strands allow the detection of transcripts transcribed from both strands. The arrays were also put to work by the Davis and Steinmetz labs to create a high density map of transcription in yeast and for polymorphism mapping from the Kruglyak lab.
Whole genome tiling arrays have also been employed in other fungi. For example, Anita Silâ€™s group at UCSF constructed a random tiling array for Histoplasma capsulatum and used it to identify genes responding to reactive nitrogen species. A similar approach was used in Cryptococcus neoformans to investigate temperature regulated genes using random sequencing clones.
As the technology has become cheaper, it may become sensible to use a tiling array to detect transcripts rather than ESTs when attempting to annotate a genome. In the Histoplasma work transcriptional units could be identified from hybridization alone. Some of the algorithms will need some work to correct incorporate this information, and the sensitivity and density of the array will influence this. These techniques can be part of a resequencing approaches or fast genotyping progeny from QTL experiments when the sequence from both parents is known (or at least enough of the polymorphims for the genetic map).
What is superior about the current Affymetrix yeast tiling array is the inclusion of both strands. This allows detection of transcripts from both strands. Several anti-sense transcripts in yeast have been discovered recently including in the IME4 locus through more classical approaches, but perhaps many more await discovery with high resolution transcriptional data from whole genome tiling arrays.
It seems intuitive enough that the size of an organism’s genome should be related to its evolutionary complexity. As a general rule, this tends to be true. But look within a class of organisms and you’ll find a great deal of genome size – also known as a C-value – variation. A newt’s genome, for example, is ten times the size of a frogâ€™s.
This discrepancy between genome size and evolutionary complexity is known as the C-value paradox and it has long captured the imagination of biologists. Genome sequencing and annotation have revealed that a great amount of an organism’s genome is non-coding, suggesting that a great deal of genetic content may be gained or lost without affecting the so-called “evolutionary complexity” of the organism (though whether this non-coding DNA is truly “junk” is still up for debate).
In a recent Nucleic Acids Research paper, Gregory et al introduce another toolset to aid in our understand of genome size: the genome size databases. Three separate databases catalog the genome size statistics for various Plants, Animals and Fungi respectively, collectively covering >10,000 species. While various methods of estimating genome size may produce conflicting estimates of genome size (caveat emptor!), these tools should serve to help guide analyses and experiments of genome size evolution. Specifically, by enabling comparisons of genome size across multiple phylogenetic levels, these datasets should facilitate a better understanding of where the genome size/complexity relationship falls off.
As an interesting side note, the authors mention a few particular findings in fungi. The histogram of genome size in Fungi (see the figure) tends to be tighter than in Plants and Animals, with almost all taxa within the range of 1C or 10-60 Mb of DNA. That said, a few species appear to exhibit considerable intraspecific variation. While this may be due to the aforementioned methodological errors, the authors consider that dikaryotic hybrids and heterokaryotes may contribute to this observation. It seems that we may only be scratching the surface of genome size variation in Fungi and if genome size is indeed rapidly evolving in Fungi, they may serve to as good models to study this evolutionary phenomenon.