Category Archives: genome annotation

Yeast genome: Known knowns, and known unknowns

From Genetics this week a review discusses Why are there still 1000 Uncharacterized Yeast genes? Poor Yeast – so many more genes have no known function, while S. pombe has nearly 100% coverage in functional annotation. I’ll also point out that the 1000 genes refers to protein-coding genes, not ncRNA genes which may mean that there is alot more that is unknown.

I think this sentence from the abstract hits the nail on the head.

Notably,the uncharacterized gene set is highly enriched for genes whose only homologs are in other fungi. Achieving a full catalog of yeast gene functions may require a greater focus on the life of yeast outside the laboratory.

Continue reading Yeast genome: Known knowns, and known unknowns

Orthology detection software

Blogging about Peer-Reviewed Research A paper in PLoS One, Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes, reports a new approach to assess the performance of automated orthology detection. These authors also wrote the OrthoMCL (2006 DB paper, 2003 algorithm paper) which uses MCL to build orthologous gene families. The authors discuss the trade-offs between highly sensitive specific tree-based methods and fast but less sensitive approaches of the Best-Reciprocal-Hits from BLAST or FASTA or some of the hybrid approaches. The authors employ Latent Class Analysis (LCA) to aid in “evaluation and optimization of a comprehensive set of orthology detection methods, providing a guide for selecting methods and appropriate parameters”. LCA is also the statistical basis for feature choice in combing gene predictions into a single set of gene calls in GLEAN written by many of the same authors including Aaron Mackey.

I’ve been reading a lot of orthology and gene tree-species tree reconcilation papers lately, some are listed in Ian Holmes’s group as well as listing some of the software on the BioPerl site. This also follows with on our Phyloinformatics hackathon work which we are trying to formalize in some more documentation for phyloinformatics pipelines to support some of the described use cases. I’m also applying some of this to a tutorial I’m teaching at ISMB2007 this summer.

That was a lot of work

I’ve never worked with Magnaporthe grisea, the fungus responsible for rice blast, one of the most devastating crop diseases, but I do know that its life cycle is complicated and that knocking out roughly 61% of the genes in the genome and evaluating the mutant phenotype to infer gene function is not trivial. In their recent letter to Nature, Jeon et al did what many of us have dreamed of doing in our fungus of interest: manipulate every gene to find those that contribute to a phenotype of interest.

In their study, the authors looked for pathogenecity genes. Interestingly, the defects in appressorium formation and condiation had the strongest correlation with defects pathogenicity, suggesting that these two developmental stages are crucial for virulence. Ultimately, the authors identify 203 loci involved in pathogenecity, the majority of which have no homologous hits in the sequence databases and have no clear enriched GO functions. Impressively, this constitutes the largest, unbiased list of pathogenecity genes identified for a single species (though so of us, I’m sure, may have a problem with the term “unbiased”).

If you’d like to play with their data, the authors have made it available in their ATMT Database.

Approaching 100% coverage for GO assignments in S.pombe

A paper by Martin Aslett and Val Wood indicate that the fission yeast community is approaching 100% coverage of a GO annotation for every gene in the S. pombe genome. Only Ashbya gossypii has a smaller genome in the fungi (see a recent paper on Ashbya annotation database) and doesn’t yet have complete GO coverage. This is quite remarkable and a great dataset for studies in S. pombe and all fungi.

S. pombe taken from Paul Young’s site

My quick predictions of genes a closely related species, S. japonicus, has more than twice as many genes as S. pombe (but be over-prediction by ab initio predictors). Taken in comparison to many other fungi, S. pombe represents a streamlined and reduced genome which probably occured indepdently from reduction in the Hemiascomycetes.

Wikis for genome (re)annotation

Steven Salzberg (who is nominated for the Franklin award at bioinformatics.org) has an opinion piece in Genome Biology proposing wiki technology to help solve the problem of genome annotations getting out of date.
Continue reading Wikis for genome (re)annotation