The genome sampling in the Eurotiomycota clade just keeps getting better. The new J. Crag Venter Institute (TIGR) deposited WGS Assemblies of the human pathogens Penicillium marneffei and Talaromyces stipitatus. P. marneffei is a thermally dimorphic fungus endemic to South-East Asia found in bamboo rats. It is studied by a number of labs and the genome will aid in many of the studies including the population structure through MLST studies.
From Genetics this week a review discusses Why are there still 1000 Uncharacterized Yeast genes? Poor Yeast – so many more genes have no known function, while S. pombe has nearly 100% coverage in functional annotation. I’ll also point out that the 1000 genes refers to protein-coding genes, not ncRNA genes which may mean that there is alot more that is unknown.
I think this sentence from the abstract hits the nail on the head.
Notably,the uncharacterized gene set is highly enriched for genes whose only homologs are in other fungi. Achieving a full catalog of yeast gene functions may require a greater focus on the life of yeast outside the laboratory.
Lots of papers in Mycologia (subscription required) this month of different groups analyzing the fine-scale relationships of many different fungal clades using the loads of sequences that were generated as part of the Fungal Tree of Life project.
Some highlights – there are just too many papers in the issue to cover them all. As usual with more detailed studies of clades with molecular sequences we find that morphologically defined groupings aren’t always truly monophyletic and some species even end up being reclassified. Not that molecular sequence approaches are infallable, but for many fungi the morphological characters are not always stable and can revert (See Hibbet 2004 for a nice treatment of this in mushrooms; subscription required).
- Meredith Blackwell and others describe the Deep Hypha research coordination network that helped coordinate all the Fungal Tree of Life-rs.
- John Taylor and Mary Berbee update their previous dating work with new divergence dates for the fungi using as much of the fossil evidence as we have.
- The early diverging Chytridiomycota, Glomeromycota, and Zygomycota are each described. Tim James and others present updated Chytridiomycota relationships so of which were only briefly introducted in the kingdom-wide analysis paper published last year.
- There is a nice overview paper of the major Agaricales clades (mushrooms for the non-initiated) from Brandon Matheny as well as as individual treatment of many of the sub-clades like the cantharelloid clade (mmm chanterelles…) .
- Relationships of the Puccinia clade are also presented – we blogged about the wheat pathogen P. graminis before.
- A new Saccharomycetales phylogeny is presented by Sung-Oui Suh and others.
- The validity of the Archiascomycete group is also tested (containing the fission yeast Schizosaccharomyces pombe and the mammalian pathogen Pneumocystis) and they confirm that it is basal to the two sister clades the euascomycete (containing Neurospora) and hemiascomycete (containing Saccharomyces) clades. However it doesn’t appear there are enough sampled species/genes to confirm monophyly of the group. There are/will be soon three genome sequences of Schizosaccharomyces plus one or two Pneumocystis genomes – it will be interesting to see how this story turns out if more species can be identified.
This was a monster effort by a lot of people who it is really nice to see it all have come together in what looks like some really nice papers.
- Coccidioides has 3 strains already plus the outgroup Uncinocarpus and conceivable one could include Histoplasma in there. This resources will grow to 14 strains (which comprise two species) of Coccidioides contributed by FGI and one from TIGR.
- Aspergillus currently has 8 species sequenced with several in pipeline at Broad and TIGR.
- Fusarium group has 3 species including recently released F. oxysporium.
- The Candida clade also have several different already sequenced genomes and of course there is the already well studied (and well utilized genome resources I’ll add) for the Saccharomyces clade.
- There are 4 genomes (well 5 but JEC21 and B-3501 are nearly identical) of Cryptococcus.
All in all a very exciting time for comparative genomics and I’m particularly intrigued to see how people will begin to use the resources.
This work to consolidate the clusters of genomes will, I hope, be very powerful. However, I still feel we are not doing a good job translating and centralizing information from different related species into a more centralized resource. Lots of money is spent on sequencing but I don’t know that we have realized the dream of having the comparative techniques illuminate the new genomes to the point that we are learning huge new things.
It seems to me, initially there is the lure of gathering low-hanging fruit from a genome analysis (which drives the first genome(s) paper), but not always the financial support of the longer term needs of the community to feed the experimental and functional work back into the genome annotation and interpretation.Â The cycle works really well for Saccharomyces cerevisiae because the curators who work with the community to insure information is deposited and that literature is gleaned to link genomic and functional information. But this is expensive in terms of funding many curators for many different projects.
It seems as we add more genomes there isn’t a very centralized effort for this type of curatorial information and so we lack the gems of high-quality annotation that is only seen in a few “model” systems.Â At some point a better meta-database that builds bridges between resource and literature rich “model system” communities may help, but maybe something new will have to be created? I like thinking about this as a user-driven content via a wiki which also dynamic (and versioned!) content from automated intelligent systems to map the straight-forward things.Â Tools like SCI-PHY already exist that can do this and generate robust orthology groups (or Books as the PhyloFact database organizes them) for futher analysis. The SGD wiki for yeast is a start at this, but is mostly an import of SGD data into a mediawiki framework – I wonder how this can be built upon in a more explictly comparative environment.
The genome of Pyrenophora tritici-repentis, the fourth sequenced Dothideomycete genome, was released by the FGI at the Broad Institute this spring (March 2007). P. tiritici-repentis was sequenced for its role as the cause of tan spot on wheat and as a research model for other Pyrenophora sp. that are pathogens of several grasses.
The 6X assembly contains 37.8 Mb of sequence similar to the other Dothideomycetes such as Stagnospora nodorum (37.2 Mb), Alternaria brassicola (32 Mb), and Mycosphaerella graminicola (41.8 Mb).
I read this blurb in the New Scientist about a PNAS paper (subscription required for next 6 months) on how hive beetles (Aethina tumida) are able to infest bee hives by throwing off the bees because they are producing isopentyl acetate which is thought to be produced and used by bees to signal an alarm. So the increased levels of the pheromone disorients the bees allowing beetles to continue infecting. European bees appear to be susceptible to this attack while the African bees have apparently evolved to better handle the beetle infestation. I’m not clear if the African bees have a different behavior or if they have different biochemical pathways/receptors to not be fooled by the cheap perfume of the invaders.
The fungus part here is that the beetles are carrying a hemiascomycete yeast, Kodamaea ohmeri in the Saccharomyces clade (see Suh and Blackwell 2005 for more details), which produces the isopentyl acetate pheromone. So it is a sort of auto-immune hive reaction where the defense mechanism is being short-circuited and harming the host.
CNN reports on a giant (25 ft tall) prehistoric fungus classified by C. Kevin Boyce and collegues. Also see U Chicago press release and Softpedia articles about the manuscript entitled Devonian landscape heterogeneity recorded by a giant fungus published in Geology describing the Prototaxites fossil.Â It has apparently been studied for quite a long time (150 years) to no avail as to whether it was fungus, algae, or lichen prior to this study.
We got word last week from the JGI that our DNA for Neurospora tetrasperma and N. discreta have passed QC and library QC and are on their way to being sequenced. The center also plans to do some EST sequencing to improve gene calling abilities.
Why more Neurospora genomes? The sequencing proposal discussed these species as a model system for evolutionary and ecological genetics. It will allow us and others to test several hypotheses about the molecular evolution of things like genome defense in Neurospora and to understand more about the evolutionary history of the model organism N. crassa.
A paper in PLoS One, Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes, reports a new approach to assess the performance of automated orthology detection. These authors also wrote the OrthoMCL (2006 DB paper, 2003 algorithm paper) which uses MCL to build orthologous gene families. The authors discuss the trade-offs between highly
sensitive specific tree-based methods and fast but less sensitive approaches of the Best-Reciprocal-Hits from BLAST or FASTA or some of the hybrid approaches. The authors employ Latent Class Analysis (LCA) to aid in “evaluation and optimization of a comprehensive set of orthology detection methods, providing a guide for selecting methods and appropriate parameters”. LCA is also the statistical basis for feature choice in combing gene predictions into a single set of gene calls in GLEAN written by many of the same authors including Aaron Mackey.
I’ve been reading a lot of orthology and gene tree-species tree reconcilation papers lately, some are listed in Ian Holmes’s group as well as listing some of the software on the BioPerl site. This also follows with on our Phyloinformatics hackathon work which we are trying to formalize in some more documentation for phyloinformatics pipelines to support some of the described use cases. I’m also applying some of this to a tutorial I’m teaching at ISMB2007 this summer.