I think this sentence from the abstract hits the nail on the head.
Notably,the uncharacterized gene set is highly enriched for genes whose only homologs are in other fungi. Achieving a full catalog of yeast gene functions may require a greater focus on the life of yeast outside the laboratory.
A paper in PLoS One, Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes, reports a new approach to assess the performance of automated orthology detection. These authors also wrote the OrthoMCL (2006 DB paper, 2003 algorithm paper) which uses MCL to build orthologous gene families. The authors discuss the trade-offs between highly sensitive specific tree-based methods and fast but less sensitive approaches of the Best-Reciprocal-Hits from BLAST or FASTA or some of the hybrid approaches. The authors employ Latent Class Analysis (LCA) to aid in “evaluation and optimization of a comprehensive set of orthology detection methods, providing a guide for selecting methods and appropriate parameters”. LCA is also the statistical basis for feature choice in combing gene predictions into a single set of gene calls in GLEAN written by many of the same authors including Aaron Mackey.
I’ve been reading a lot of orthology and gene tree-species tree reconcilation papers lately, some are listed in Ian Holmes’s group as well as listing some of the software on the BioPerl site. This also follows with on our Phyloinformatics hackathon work which we are trying to formalize in some more documentation for phyloinformatics pipelines to support some of the described use cases. I’m also applying some of this to a tutorial I’m teaching at ISMB2007 this summer.
I’ve never worked with Magnaporthe grisea, the fungus responsible for rice blast, one of the most devastating crop diseases, but I do know that its life cycle is complicated and that knocking out roughly 61% of the genes in the genome and evaluating the mutant phenotype to infer gene function is not trivial. In their recent letter to Nature, Jeon et al did what many of us have dreamed of doing in our fungus of interest: manipulate every gene to find those that contribute to a phenotype of interest.
In their study, the authors looked for pathogenecity genes. Interestingly, the defects in appressorium formation and condiation had the strongest correlation with defects pathogenicity, suggesting that these two developmental stages are crucial for virulence. Ultimately, the authors identify 203 loci involved in pathogenecity, the majority of which have no homologous hits in the sequence databases and have no clear enriched GO functions. Impressively, this constitutes the largest, unbiased list of pathogenecity genes identified for a single species (though so of us, I’m sure, may have a problem with the term “unbiased”).
If you’d like to play with their data, the authors have made it available in their ATMT Database.
The Saccharomyces Genome Database has deployed a wiki for gene annotation from the community.Â This should be an interesting experiment in how information can flow from the community into these databases.
A paper by Martin Aslett and Val Wood indicate that the fission yeast community is approaching 100% coverage of a GO annotation for every gene in the S. pombe genome. Only Ashbya gossypii has a smaller genome in the fungi (see a recent paper on Ashbya annotation database) and doesn’t yet have complete GO coverage. This is quite remarkable and a great dataset for studies in S. pombe and all fungi.
My quick predictions of genes a closely related species, S. japonicus, has more than twice as many genes as S. pombe (but be over-prediction by ab initio predictors). Taken in comparison to many other fungi, S. pombe represents a streamlined and reduced genome which probably occured indepdently from reduction in the Hemiascomycetes.