Category Archives: database

Lest you think annotation is easy

Ensembl!Ewan Birney and Ensembl (the other/original genome browser depending on if you are a UCSC junkie) have started blogging a bit more about what is going on under the proverbial hood over there in Hinxton.  There are some great nuggets talking about what are some of the current problems.  These bite-sized comments should be a great glimpse into what is going on without drowning in the deluge that is ensembl-dev.  

This is a recent post on the challenges of gene annotation coordination among “manual” and “automated” annotation of gene structure of groups at the same institution.  

Scale that up among multiple genomes, genome centers, quality of prediction programs and assemblies, and you can see why the fungal genome comparisons could use a little bit more help. It is great to hear what the animal genome annotation groups are doing to solve informatics challenges and data management issues and coordination. I’m big fan of more informatics+science in the open where it is feasible. 

(re)Annotating GenBank

NCBI LogoTom Bruns, Martin Bidartondo and 250 others sent a letter to Science describing the current problems with fixing annotation in GenBank. There is an entertaining accompanying news article that interviews several people about the problem of updating annotation and species assigned to sequences in the database. In particular the problem for mycologists that many fungi found from metagenomic approaches are only identified through molecular sequences and having the wrong species associated with a sequence can be difficult when studying community ecology composition.  This problem is not limited to fungi by any means, but recent reports find as many as 20% of fungal Intergenic Spacer (ITS) sequences are mis-attributed to the wrong species. 

There’s a nice quote in the news article from Steven Salzberg talking about the difficulties in getting sequences, especially from big centers, updated. I’m sure he is thinking of many examples, like reclassifying some Drosophila sequence traces.

Continue reading (re)Annotating GenBank

Some links

I’ve been too busy to post much these last few days, but here are a few links to some papers I found interesting in my recent browsing.

Schmitt, I., Partida-Martinez, L.P., Winkler, R., Voigt, K., Einax, E., Dölz, F., Telle, S., Wöstemeyer, J., Hertweck, C. (2008). Evolution of host resistance in a toxin-producing bacterial–fungal alliance. The ISME Journal DOI: 10.1038/ismej.2008.19

LEVASSEUR, A. (2008). FOLy: an integrated database for the classification and functional annotation of fungal oxidoreductases potentially involved in the degradation of lignin and related aromatic compounds. Fungal Genetics and Biology DOI: 10.1016/j.fgb.2008.01.004

Shivaji, S., Bhadra, B., Rao, R.S., Pradhan, S. (2008). Rhodotorula himalayensis sp. nov., a novel psychrophilic yeast isolated from Roopkund Lake of the Himalayan mountain ranges, India. Extremophiles DOI: 10.1007/s00792-008-0144-z


Robin reviews recent Nature paper by Ilan Wapinski et al describing the orthogroups they built from multiple fungal genomes. I’ve been remiss in reviewing the paper myself, but they’ve created an important resource in the SYNERGY tool for orthology identification and a database of orthologs of some ascomycete fungi. I am excited there is a level of interest in the properties of gene duplication and how this may be an important aspect of adaptation and evolution. corn smut

The Cornell Mushroom blog has a nice treatment of the maize pathogen and Mexican delicacy Ustilago maydis corn smut.

Chris and Tom took some more Coprinus pictures while I was away from the lab.

Wikis for genome (re)annotation

Steven Salzberg (who is nominated for the Franklin award at has an opinion piece in Genome Biology proposing wiki technology to help solve the problem of genome annotations getting out of date.
Continue reading Wikis for genome (re)annotation

The C is for Catalog

It seems intuitive enough that the size of an organism’s genome should be related to its evolutionary complexity. As a general rule, this tends to be true. But look within a class of organisms and you’ll find a great deal of genome size – also known as a C-value – variation. A newt’s genome, for example, is ten times the size of a frog’s.

This discrepancy between genome size and evolutionary complexity is known as the C-value paradox and it has long captured the imagination of biologists. Genome sequencing and annotation have revealed that a great amount of an organism’s genome is non-coding, suggesting that a great deal of genetic content may be gained or lost without affecting the so-called “evolutionary complexity” of the organism (though whether this non-coding DNA is truly “junk” is still up for debate).

In a recent Nucleic Acids Research paper, Gregory et al introduce another toolset to aid in our understand of genome size: the genome size databases. Three separate databases catalog the genome size statistics for various Plants, Animals and Fungi respectively, collectively covering >10,000 species. While various methods of estimating genome size may produce conflicting estimates of genome size (caveat emptor!), these tools should serve to help guide analyses and experiments of genome size evolution. Specifically, by enabling comparisons of genome size across multiple phylogenetic levels, these datasets should facilitate a better understanding of where the genome size/complexity relationship falls off.

As an interesting side note, the authors mention a few particular findings in fungi. The histogram of genome size in Fungi (see the figure) tends to be tighter than in Plants and Animals, with almost all taxa within the range of 1C or 10-60 Mb of DNA. That said, a few species appear to exhibit considerable intraspecific variation. While this may be due to the aforementioned methodological errors, the authors consider that dikaryotic hybrids and heterokaryotes may contribute to this observation. It seems that we may only be scratching the surface of genome size variation in Fungi and if genome size is indeed rapidly evolving in Fungi, they may serve to as good models to study this evolutionary phenomenon.

Making the Revolution Work for You

In a recent Microbiology Mini-Review, Meriel Jones catalogs both the potential benefits and problems that arise from fungal genome sequencing. Using the nine genomes (being) sequenced from the Aspergillus clade, Jones addresses several issues tied to a singular theme: if we are to unlock the potential that fungal genome sequencing holds, both academically and entrepreneurially, then a more robust infrastructure that enables comparative and functional annotation of genomes must be established.

Fortunately, like any good awareness advocate, Jones points us in the direction of e-Fungi, a UK based virtual project aimed at setting up such an infrastructure. Anyone can navigate this database to either compare the stored genomic information or evaluate any fungus of interest in the light of the e-Fungi genomic data. The data appears to be precomputed, similar to IMG from JGI, so there are inherent limitations on the data that one can obtain. However, tools such as these put important data in the hands of expert mycologists that can turn the information into something biologically meaningful.

As Jones points out, this is just the beginning. If fungal genomes are to live up to their promise, they must engage more than just experts at reading genomes.