Tag Archives: annotation

2012 Fungal Genomes: a review of mycological genomic accomplishments

2012 was certainly a banner year in genome sequence production and publications. The cost of generating the data keeps dropping and the automation for assembly and annotation continues to improve making it possible for a range of groups to publish genomes.

I made a NCBI PubMed Collection of these here Fungal Genomes 2012

Some notable fungal genome publications include

There were also several new insights into the evolution of wood decay fungi derived from new genomes of basidiomycete fungi. This includes

(Now I might have missed a few in my attempt to get this done before holidays overtake me – if so, please post comments or tweets and I’ll be sure to amend the list on pubmed and here.)

A new trend for fungal genome papers can be seen now in the Genome Announcements of Eukaryotic Cell which aim to get the genome data out quickly with a citateable reference. These are short descriptions which I expect will become more popular ways to insure data made public can also be cited. I only counted about 5 published in 2012 but I expect to see a lot more of these in the 2013 either at EC or other journals. I’m sure there will still be some tension between providers making data public as soon as possible and the sponsoring authors’ desire to have first crack at analyzing and publish interpretations and comparison of the genome(s). The bacterial community has been doing this for Genome Reports in the SIGS journal and the Journal of Bacteriology so will see what happens as these small eukaryotic genomes become even easier to produce.

I look forward to exciting year with more of the 1000 Fungal genomes and other JGI  projects start to roll out more genomes.  I also predict there will be many more resequencing datasets published as functional and population genomics. It will also probably be a countdown for what are the last Sanger sequenced genomes and how the many flavors of next generation sequencing will be optimized for generation.  I am hopeful work on automation of annotation and comparisons will be even easier for more people to use and that we start to provide a shared repository of gene predictions.  I’ve just launched the latter and look forward to engaging more people to contribute to this.

Neurospora annotation update (v5)

Here is a message from the Broad Institute about a gene annotation update that was made recently in response to an issue that was revealed in the June 2010 release.  This new version is called V5 and should be on its way to GenBank.

Dear Neurospora scientists,

Recently we discovered an issue with the way locus tags were assigned
to our most recent Neurospora gene set, released publicly on the Broad
website in June of 2010. Many genes in this gene set have mismatched
locus numbers compared to the same genes released in February 2010.
Adding to the confusion, both releases were labeled version 4.

To remedy this we have recalled the June locus numbers and released a
new, version 5 gene set. Genes in this set have been numbered to
preserve historical locus numbers (back to the original genbank
release) as much as possible.

Folks who call their favorite genes by their v1, v2 or v3 numbers can
search for them on our web page, which will map them to v5
automatically and accurately. The same will work for most v4 numbers.
Unfortunately, 863 genes have different locus tags in the two v4
releases. If you search for one of them, you will get two hits - the
v5 gene that the February edition mapped to, and the v5 gene that the
June edition mapped to.

Two examples to clarify:

A. Suppose you search for NCU11713.4 on our web page. This query will
retrieve two genes, NCU11688.5 and NCU11713.5. The gene which in the
February release was called NCU11713.4 is the same as NCU11688.5,
while the gene labeled NCU11713.4 in June is the same as NCU11713.5.

B. Searching for NCU11324.4 yields but one hit because that gene, like
most genes, was consistently numbered between the two releases labeled

If you are not sure when you downloaded your genes, the following may
help. If you see any of these locus numbers in your gene set:

NCU00129.4, NCU00457.4, NCU00499.4, NCU00556.4, NCU00627.4,
NCU00685.4, NCU00768.4, NCU00856.4, NCU00986.4, NCU01064.4,
NCU01065.4, NCU01282.4, NCU01299.4, NCU01300.4, NCU01483.4,
NCU01559.4, NCU01560.4, NCU01610.4, NCU01611.4, NCU01664.4,
NCU01665.4, NCU01871.4, NCU01903.4, NCU02200.4, NCU02259.4,
NCU02666.4, NCU02758.4, NCU02837.4, NCU02998.4, NCU03047.4,
NCU03206.4, NCU03773.4, NCU04239.4, NCU04240.4, NCU04518.4,
NCU04519.4, NCU04710.4, NCU04711.4, NCU05275.4, NCU05512.4,
NCU05776.4, NCU06013.4, NCU06370.4, NCU06732.4, NCU07107.4,
NCU07259.4, NCU07260.4, NCU07301.4, NCU07405.4, NCU07856.4,
NCU07857.4, NCU08090.4, NCU08182.4, NCU08323.4, NCU08332.4,
NCU09085.4, NCU09256.4, NCU09257.4, NCU09998.4, NCU10166.4,
NCU10574.4, NCU11040.4, NCU11240.4, NCU11253.4, NCU11376.4,
NCU11390.4, NCU11393.4

then your genes are from the February 2010 gene set. However, if you see

NCU00082.4, NCU00083.4, NCU00084.4, NCU00085.4, NCU00516.4,
NCU01819.4, NCU04299.4, NCU04300.4, NCU04301.4, NCU04302.4,
NCU04303.4, NCU04304.4, NCU04305.4, NCU05000.4, NCU05111.4,
NCU05112.4, NCU05113.4, NCU05114.4, NCU05115.4, NCU05116.4,
NCU05448.4, NCU05452.4, NCU06667.4, NCU07323.4, NCU09066.4,
NCU10179.4, NCU10301.4, NCU10379.4, NCU10383.4, NCU10753.4,
NCU10866.4, NCU10914.4, NCU11068.4, NCU11182.4, NCU12157.4,
NCU12158.4, NCU12159.4, NCU12160.4, NCU12161.4, NCU12162.4,
NCU12163.4, NCU12164.4, NCU12165.4, NCU12166.4, NCU12167.4,
NCU12168.4, NCU12169.4, NCU12170.4, NCU12171.4, NCU12172.4,
NCU12173.4, NCU12174.4, NCU12175.4, NCU12176.4, NCU12177.4,
NCU12178.4, NCU12179.4, NCU12180.4, NCU12181.4, NCU12182.4,
NCU12183.4, NCU12184.4, NCU12185.4, NCU12186.4, NCU12187.4, NCU12188.4

then your genes are from the June 2010 release.

Attached please find five mapping tables which can be used to migrate
locus numbers from any of the previous releases to the latest version
5 locus tags (linked below).

We apologize for any confusion this may cause.
The Broad Institute

I’ve also uploaded the locus update files which maps between versions of the annotation.

Genome sequence of mushroom Schizophyllum commune

Schizophyllum CommuneI am excited to announce the publication of another mushroom genome this week. The mushroom Schizophyllum commune is an important model system for mushroom biology, development of genome was sequenced as part of efforts at the Joint Genome Institute and a collection of international researchers.  The data and analyses from these efforts are presented in a publication appearing in Nature Biotechnology today.

Studies in mushrooms can have important impact on other research areas.  They can be useful in biotechnology as protein biosynthesis factories for producing compounds or even as an edible delivery mechanism for new drugs.  What we found in the analysis of this genome include clues to mechanisms of how white rotting fungi degrade lignin through analysis of enzyme families.  We also saw evidence for extensive antisense transcription during different developmental stages suggesting some important clues as to how some gene regulation could impact or control developmental progression.  Through gene expression comparison (by MPSS) a large number of transcription factors were shown to be differentially regulated during sexual development.  A knockout out two of these (fst3 and fst4) resulting in changes in ability to form mushrooms (fst4) or smaller mushrooms (fst3).

Several more interesting findings in this work that I hope to add back to this post when there is a little more time –

Ohm, R., de Jong, J., Lugones, L., Aerts, A., Kothe, E., Stajich, J., de Vries, R., Record, E., Levasseur, A., Baker, S., Bartholomew, K., Coutinho, P., Erdmann, S., Fowler, T., Gathman, A., Lombard, V., Henrissat, B., Knabe, N., Kües, U., Lilly, W., Lindquist, E., Lucas, S., Magnuson, J., Piumi, F., Raudaskoski, M., Salamov, A., Schmutz, J., Schwarze, F., vanKuyk, P., Horton, J., Grigoriev, I., & Wösten, H. (2010). Genome sequence of the model mushroom Schizophyllum commune Nature Biotechnology DOI: 10.1038/nbt.1643

A mushroom on the cover

I’ll indulge a bit here to happily to point to the cover of this week’s PNAS with an image of Coprinopsis cinerea mushrooms fruiting referring to our article on the genome sequence of this important model fungus.  You should also enjoy the commentary article from John Taylor and Chris Ellison that provides a summary of some of the high points in the paper.

Coprinopsis cover

Stajich, J., Wilke, S., Ahren, D., Au, C., Birren, B., Borodovsky, M., Burns, C., Canback, B., Casselton, L., Cheng, C., Deng, J., Dietrich, F., Fargo, D., Farman, M., Gathman, A., Goldberg, J., Guigo, R., Hoegger, P., Hooker, J., Huggins, A., James, T., Kamada, T., Kilaru, S., Kodira, C., Kues, U., Kupfer, D., Kwan, H., Lomsadze, A., Li, W., Lilly, W., Ma, L., Mackey, A., Manning, G., Martin, F., Muraguchi, H., Natvig, D., Palmerini, H., Ramesh, M., Rehmeyer, C., Roe, B., Shenoy, N., Stanke, M., Ter-Hovhannisyan, V., Tunlid, A., Velagapudi, R., Vision, T., Zeng, Q., Zolan, M., & Pukkila, P. (2010). Insights into evolution of multicellular fungi from the assembled chromosomes of the mushroom Coprinopsis cinerea (Coprinus cinereus) Proceedings of the National Academy of Sciences, 107 (26), 11889-11894 DOI: 10.1073/pnas.1003391107

Coprinopsis cinereus genome annotation updated

Coprinus cinereus genome projectThe Broad Institute in collaboration with many of the Coprinopsis cinereus (Coprinus cinerea) community of researchers have updated the genome annotation for C. cinereus with additional gene calls based on ESTs and improved gene callers. The annotation was made on the 13 chromosome assembly produced by work by SEMO fungal biology group and collaborators across the globe including a BAC map from H. Muraguchi.  Thanks to Jonathan Goldberg and colleagues at the Broad Institute for getting this updated annotation out the door.


This updated annotation is able to join and split several sets of genes and the gene count sits at just under 14k genes in this 36Mb genome. There are a couple of hiccups in the GTF and Genome contig/supercontig file naming that I am told will be fixed by early next week.  Additional work to annotate the “Kinome” by the Broad team provides some promising new insight to this genome annotation as well.

We’re using this updated genome assembly address questions about evolution of genome structure by studying syntenic conservation and aspects of crossing over points during meiosis.  The C. cinereus system has long been used as model for fungal development and morphogensis of mushrooms as it is straightforward to induce mushroom fruiting in the laboratory.  It also a model for studying meiosis due to the synchronized meiosis occurring in the cells in the cap of the mushroom.

Happy genome shrooming.

Updated Cryptococcus serotype A annotation

SEM of clamp cell, yeast cells and sexual spore chains. Courtesy R. Velagapudi & J. Heitman

A new and improved annotation of Cryptococcus neoformans var grubii strain H99 (serotype A) has been made available in GenBank and the Broad Institute website. This update is collaboration between several groups providing data and analyses and the genome annotation team at the Broad Institute.

Some changes noted by the Broad Institute include:

“This release of gene predictions for the serotype A isolate Cryptococcus neoformans var. grubii H99 is based on a new genomic assembly provided by Dr. Fred Dietrich at the Duke Center for Genome Technology. The new assembly consists of 14 nuclear chromosomes and a single 21 KB mitochondrial chromosome, and has resulted in a reduction of the estimated genome size from 19.5 to 18.9 Mb. Improvements in the assembly and in our annotation process have resulted in a set of 6,967 predicted protein products, 335 fewer than the previous release.”

Gene prediction without training?

A new paper in Genome Research from Borodovsky lab at Georgia Tech provides an improved ab initio gene prediction building on their previous program GeneMark called GeneMark.hmm ES.  This application doesn’t require a training set when building models for gene prediction in fungal genomes and reports to have as good or better sensitivity and specificity than most of the commonly used ab initio programs. They are picking up on proviously described insights about fungal gene structures and introns which is the lack of a necessary branch site and varying degrees of conservation of splice-sites in most intron rich fungi (Schwartz et al, 2008) and that these intron sizes remain short across the fungi (Stajich et al. 2007).

In practice it should simplify the initial genome annotation protocols used and could really streamline the procedures. It doesn’t replace the need to gathering EST sequence data that can also be used generate a training set in an automated fashion.  EST and transcriptional evidence is still very important for identification of UTR and alternative splicing isoforms.

Hopefully these data from the predictions will integrate into the Cryptococcus and Coprinus genome annotations that are undergoing an update at the Broad.  We’ll see how well this performs on a couple of the Chytrid genome sequences we are working on as well.

Cochliobolus genome released

Just noticed that the JGI has released the Cochliobolus heterostrophus genome sequence at their site predicting 9,633 protein-coding genes.  Torrey Mesa Research Institute had access to a sequence many years ago, but it isn’t until now that public version of this genome is available.  Cochliobolus is has been a model plant pathogen system and its production of T-Toxin by a PKS gene (Yang et al).

AAM Releases “The Fungal Kingdom” Report

AAM The Fungal Kindgom Report CoverThe American Academy of Microbiology has released a report (PDF and archived on fungalgenomes.org) on the Fungal Kingdom outlining importance of research in the kingdom and recommending several areas of priority for future areas of research.

One recommendation that makes the top of the list is an integrated database for fungal genomes, something we’re keenly interested in seeing happen.  This sort of centralized repository of functional annotation, literature links, and genome sequences and annotation is critical given the 150+ genomes that are available or on their way.  Systematic re-annotation with consistent tools, comparative analyses and gene predictions, and linking gene sequences by homology and ortholog predictions are a critical component to fully utilizing the genomic data that has been produced for the fungi and other organisms.