Tag Archives: gene

A word about databases

Logo for fungal GenomesReport concludes that a fungal genome database is of “the highest priority”.

This is the title as listed in PubMed for this article from Future Medicine about the AAM report on charting future needs and avenues of research on the fungal kingdom.

The need for a comprehensive database for information about fungi, starting at least with systematic collections of genomic and transcript data, is highlighted as a major need.  Really and sort of new database effort should strive to be more comprehensive and include genetic and population data (alleles, strains) and information like protein-protein, protein-nucleic acid interactions (as Pedro mentioned). But on top of that it, it needs to be comparative so that information from systems that serve as great models can be transferred to other fungal systems that are being studied for their role as pathogens or interacting in the environmental.

Affordable next-gen sequencing will allow us to obtain genome and transcript sequence for basically all species or strains of interest.  Researchers with no bioinformatics support in their lab will likely be able to outsource this to a company or campus core facility.  But how can they easily map in the collective information about genes, proteins, and pathways onto this new data?  And have it be a dynamic system that can update as new information is published and curated in other systems.

I think this has to be the future beyond setting up a SGD, CGD, etc for every system.  The individual databases are useful for a large enough community where there are curators (and funding), but we will have to move to a more modular system in the future (aspects of which are in GMOD) that can have both an individual focus on a specific species/clade and a more comprehensive view of the that is comparable across the kingdom.  There are 100+ fungal genomes, but the community size for some of them are in the dozens of labs or less. How can they take advantage of the new resources without an existing infrastructure of curators?  Their systems serve an important need in a research aim, but how can discoveries there make its way back into the datastream of othe systems?

I see it as there are several ways one would interact with a system that provided single-genome tools as well as a framework for comparative information.  At a gene level, one might be looking for all information about a specific gene, based on sequence similarity searches, or starting with a cloned gene in one species. Something akin to Phylofacts or precomputed Orthogroups for defining a Gene but with more linking information about function by linking in information from all sources.  So a comparative resource, but also tapping into curated andliterature mined data.

At a genome level, one might want to do whole genome comparisons of gene content from evolutionarily defined families genes (gene family size change) or at a functional level.  To start out with, each gene/protein would already need a systematic functional mapping.  This could be as simple as running InterProScan on every protein, expanded to find Orthogroups (or OrthoMCL orthologs) and transfer function from model systems, and finally even more advanced, do further classified better with tools like SIFTER.

Interlinked with these orthologous and paralogous gene sets would be anchors for analyses of chromosomal synteny and even comparative assembly including tools like Mercator.  Certainly things like all of this exist but making it more pluggable for different sets of species would be an important additional component.

At a utility level, the gene annotation and functional mapping of all this information should be possible. I would imagine a researcher could upload the sequence assembly they received from the core facility and the system can generate multiple gene predictions, annotate the genes, and link these genes within the known orthogroups of the system (preserving their privacy for these genes if desired).  Presumably this sort of thing would be easier as a standalone in-house for the researcher, but web services could also be the place for this.

For fungal-sized genomes this amount of data is not too extereme.  Things like Genome Browser, BLAST, etc should all be rolled out of the box based on the basic builds.

On the DIY and community annotation front, there would also need to be a layer of community derived annotation that could be layered on all these systems.  I would imagine this both to be for gene structure annotation (genome annotation) and functional annotation (protein X does Y based on experiment Z, here is the journal reference).  I think aspects of this would be visible, auditable (tracked), but maybe not blessed as official until a curator could oversee these inputs. In my mind, whether or not this is in a Wiki per se or just new system that allows community input is less important to me than having it be a) structured (not a bunch of free text) b) tracked and versionable c) easy for researchers to input so that the knowledge is captured, even if it has to be reorganized later on.

Seems like a lot of work to be done, but really many of these things already exist through what  the GMOD project has built.  Many loose ends and software that doesn’t fully meet up to these needs, but I think the important concept is these are all general solutions that will be of benefit to most communities, not just the fungal ones.  One lingering question I always have when approaching genomic datas

that will be dynamic, what if any of this makes its way into GenBank?  How is this sort of thing banked so that it can be captured, and does the improved functional or gene structure annotation ever make its way into the repository databases to correct and improve what has already been submitted there?

Basidiomycete genomes galore


Just finished attending Genetics and Cell Biology of Basidiomycetes in Cape Girardeau, MO which was an intimate gathering of basidiomycetaphiles.  I learned about systems that are used for studying fruiting body development, genetic mapping, pheromone and mating genes, kinesin dynamics, meoitic gene regulation, and a host of topics.  I’m happy I got a chance to meet more folks in the community and learned about where informatics and computational approaches are really needed to push along some of the interpretation of the more than a dozen basidiomycete genomes.  In particular it sounds like the PleurotusSchizophyllum, Agaricus bisporus, and Serpula genomes are all marching along to completion with some already in 4X assembly or further.  

GCBBVI Group Picture

So we’ll further have more samples from of key model and some less-model species to assist researchers working on many different mushroom-forming fungi that range from brown and white-rotting saprophyte fungi to mycorrhizal fungi that associate with plants.    I’m excited about the work to make transformation and knockouts more readily in these systems too to push the genetics and cellular biology of these systems even further.  The genome sequences will be another tool in these endeavors.

The last day ended with a discussion about genome annotation and future support for curating gene models.  Basically everyone is unhappy with computational predictions and want to be able to go in and fix things. (I think people remember the ones that are gotten wrong more readily than the ones that were right, but computational prediction definitely performs poorly in some situations).   In this Web 2.0-land we live in, this is still not something easily done with any of the freely available genome browsing tools. The JGI’s browser was lauded for its ability to handle these kinds of requests, but how do we proceed when genomes are not sequenced by that center or when (not too distant future) communities are able to sequence a genome themselves using 454/Illumina-Solexa/Helicos/Pacific Biosystems approaches in their own lab?  There is still a huge lag in what kinds of tools researchers can use to annotate genomes to fix gene models and add functions.  Hopefully projects like GMOD will continue to develop useful tools for solving these needs, but there is certainly a need for better support of distributed community annotation of genomes where this little direct money for supporting curators from a single place.

More RIP without sex?

In followup to the Aspergillus RIP paper discussion, Jo Anne posted in the comments that her paper published in FGB about RIP in another asexual species of fungi also found that evidence for the meiosis-specific process of Repeat Induced Point-mutations (RIP).

Continue reading More RIP without sex?

RIPing in an asexual fungus

ResearchBlogging.orgA.niger conidiophoreA paper in Current Genetics describes the discovery of Repeat Induced Polymorphism (RIP) in two Euriotiales fungi.  RIP has been extensively studied in Neurospora crassa and has been identified in other Sordariomycete fungi Magnaporthe, Fusiarium. This is not the first Aspergillus species to have RIP described as it was demonstrated in the biotech workhorse Aspergillus oryzae.  However, I think this study is the first to describe RIP in a putatively asexual fungus.  The evidence for RIP is only found in transposon sequences in the Aspergillus and Penicillium.  A really interesting aspect of this discovery is RIP is thought to only occur during sexual stage, but a sexual state has never been observed for these fungi.   Continue reading RIPing in an asexual fungus

(re)Annotating GenBank

NCBI LogoTom Bruns, Martin Bidartondo and 250 others sent a letter to Science describing the current problems with fixing annotation in GenBank. There is an entertaining accompanying news article that interviews several people about the problem of updating annotation and species assigned to sequences in the database. In particular the problem for mycologists that many fungi found from metagenomic approaches are only identified through molecular sequences and having the wrong species associated with a sequence can be difficult when studying community ecology composition.  This problem is not limited to fungi by any means, but recent reports find as many as 20% of fungal Intergenic Spacer (ITS) sequences are mis-attributed to the wrong species. 

There’s a nice quote in the news article from Steven Salzberg talking about the difficulties in getting sequences, especially from big centers, updated. I’m sure he is thinking of many examples, like reclassifying some Drosophila sequence traces.

Continue reading (re)Annotating GenBank

Aspergillus comparative transcriptional profiling

ResearchBlogging.org

Researchers from Technical University of Denmark published some interesting results from comparing expression across the very distinct Aspergillus species.

Kudos also goes to making it Open Access. I am posting a few key figures below the fold because I can! They grew the fungi in bioreactors fermenting glucose or xylose. After calibrating the growth curves they were able to sample the appropriate time points for comparison of gene expression across these three species. They found a set of genes commonly expressed.

Continue reading Aspergillus comparative transcriptional profiling

B. dendrobatidis strain JAM81 released

B.dendrobatidis zoosporeThe following is an announcement to the B.dendrobatidis and fungal community at large from Alan Kuo at JGI. This is the JAM81 strain (Jess Morgan collected from a frog in the California Sierra Nevada). The JEL423 (Joyce Longcore, collected in Panama) strain genome sequence and annotation is available from the Broad Institute.

Please do contact me if you would like to contribute to assigning functions to the annotation. We’re in the last round of analyses for some of the genome work, but if there are particular questions you want to contribute to, we’re open to collaborators and can outline the basis of our work to see how other work can complement it.

From Alan Kuo at JGI:

The JGI Batrachochytrium annotation portal is now on the public JGI website. As it is public, no password is required.

For those of you who have not yet registered to be an annotator, go to this new link to register.As before, please choose a username that is personal, so that other annotators may be able to recognize it as yours. A derivative of your personal name would be best.

Those of you who are already registered, you do not need to do anything. Your old pre-release username and password are valid on the new public portal too.

As always, please direct all questions and problems to me. Use email or phone: Cheers, Alan.

Some information about the assembly and annotation:

The first annotation of the 127 scaffolds and 24 Mbp of JGI’s 8.74X assembly of the Batrachochytrim dendrobatidis JAM81 genome. We predict 8732 genes, with the following average properties:

Gene length 1825.16 nt
Transcript length 1407.29 nt
Protein length 450.56 aa
Exon frequency 4.29 exons/gene
Exon length 328.37 nt
Intron length 129.18 nt
Gene density 359.1 genes/Mbp scaffold

The genes were found by the following methods:
Total models 8732 (100%)
Jason’s models 3214 (37%)
cDNAs and ESTs 518 (6%)
Similarity to nr 1928 (22%)
ab initio 3072 (35%)

The genes were validated by the following evidence:
start+stop codons 7990 (92%)
EST support 2488 (28%)
nr hit 6787 (78%)
Pfam hit 4329 (50%)

Phytopathogenic Fungi: what have we learned from genome sequences?

ResearchBlogging.orgA review in Plant Cell from Darren Soanes and colleagues summarizes some of the major findings about evolution of phytopathogenic fungi gleaned from genome sequencing highlighting 12 fungi and 2 oomycetes. By mapping evolution of genes identified as virulence factors as well as genes that appear to have similar patterns of diversification, we can hope to derive some principals about how phytopathogenic fungi have evolved from saprophyte ancestors.

They infer from phylogenies we’ve published (Fitzpatrick et al, James et al) that plant pathogenic capabilities have arisen at least 5 times in the fungi and at least 7 times in the eukaryotes. In addition they use data on gene duplication and loss in the ascomycete fungi (Wapinski et al) to infer there large numbers of losses and gains of genes have occurred in fungal lineages.

Continue reading Phytopathogenic Fungi: what have we learned from genome sequences?

Defining “gene”

Blogging about Peer-Reviewed ResearchThe term “gene” might be tired and perhaps because it can have many different meanings – (don’t get us started on homolog!). We of course know that one gene/one enzyme hypothesis and the central dogma fails to represent full complexity of the RNA world, pre- and post-transcriptional gene regulation, and post-transcriptional modifications. An article in PLoS One “Beyond the Gene” from Evelyn Fox Keller and David Harel tackles the perhaps overly stretched definition of the gene.

Continue reading Defining “gene”

Fungal Genetics 2007 details

I’m including a recapping as many of the talks as I remember. There were 6 concurrent sessions each afternoon so you have to miss a lot of talks. The conference was bursting at the seams as it was- at least 140 people had to be turned away beyond the 750 who attended.

If there was any theme in the conference it was “Hey we are all using these genome sequences we’ve been talking about getting”. I only found the overview talks that solely describe the genome solely a little dry as compared to those more focused on particular questions. I guess my genome palate is becoming refined.

Continue reading Fungal Genetics 2007 details