Category Archives: phylogenomics

NSF Poststdoc opportunity for Research using biological collections

Earlier this year the NSF released a postdoc opportunity for research to use Biological Collections. In particular these can be strain collections and stock collections. The US Culture Collection Network is a Research Coordination Network which brings together many collaborating culture collections. You can find many of the U.S. living collections there include fungal centers like the Phaff Yeast Collection and Fungal Genetics Stock Center. The Gilbertson Mycological Herbarium at U Arizona under Elizabeth Arnold‘s leadership has developed a rich collection of endophyte fungi which would be another excellent environment to work with these resources. Kyria Boundy-Mills who is the curator of the Phaff collection has also expressed interest in either hosting or helping working with a postdoc on this. There is tremendous biodiversity of the fungi available in these and other culture collections so seems like a great chance to tap into these.
This would be a great opportunity to link work in the 1000 Fungal genomes project and sampling from culture collections (not just sequencing, but growing and characterizing growth, carbon source utilization and integrating that with predictions made from genome comparisons). If this is something interesting to you – do get in touch with some of the curators at these collections, but also my lab and I expect many other labs would be interested hosting someone to work on these questions that take advantage of these living collections of fungi.
Proposals are to be submitted by potential post docs. Submitter must be a US citizen or US permanent resident. The next deadline is November 3, 2015Funding total for the program is $8 million, 40 awards anticipated, up to two years. Here’s some key text from the solicitation:

Competitive Area 2. Postdoctoral Research Fellowships Using Biological Collections.

Biological research collections represent the documented scientific history of life on Earth, and the U.S. museum community alone curates over a billion specimens ranging from bacteria to plants, insects and vertebrates, as well as fossils. Across the globe, collections represent critical infrastructure and support essential research activities in biology and its related fields. Scientists, government agencies, industry and citizens utilize collections to document and understand evolution and biodiversity, study global change, formulate advice on conservation planning, educate the general public, improve interactions between sciences, and devise new practical applications from science to every day life. New technologies supported by NSF in digitization, such as the Advancing Digitization of Biodiversity Collections (ADBC) program, are making collections and their associated data, whether they are physical specimens, text, images, sounds, or data tables, searchable in online databases. Despite this clear progress in improving access to physical specimens and their associated metadata, collections remain under-utilized for answering contemporary questions about fundamental aspects of biological processes. Thus, collections are poised to become a critical resource for developing transformative approaches to address key questions in biology and potentially develop applications that extend biology to physical, mathematical, engineering and social sciences. This postdoctoral track seeks transformative approaches that use biological collections in highly innovative ways to address grand challenges in biology. Priority may be given to applicants who integrate biological collections and associated resources with other types of data in an effort to forge new insight into areas traditionally funded by BIO. Examples of key questions in biology of interest include, but are not limited to, links between genotype and phenotype, evolutionary developmental biology, comparative approaches in functional and developmental neurobiology, and the biophysics of nanostructures. Using collections as a resource for grand challenge questions in biology is expected to present new opportunities to advance understanding of biological processes and systems, inspiring new discoveries in areas with relevance to other disciplines with overlapping interests in biological systems. Applicants must document access to the selected collection(s) in the research and training plan.

Microsporidia genomes on the way

New genomes from Microsporidia are on the way from the Broad Institute and other groups, and will be a boon to those working on these fascinating creatures. Microsporidia are obligate intracellular parasites of eukaryotic cells and many can cause serious disease in humans. Some parasitize worms and insects too. The evolutionary placement of these species in the fungi is still debated with recent evidence placing them as derived members of the Mucormycotina based on shared synteny (conserved gene order), in particular around the mating type locus.  There is still some debate as to where this group belongs in the Fungal kingdom, with their highly derived characteristics and long branches they are still make them hard to place.  The synteny-based evidence was another way to find a phylogenetic placement for them but it would be helpful to have additional support in the form of additional shared derived characteristics that group Mucormycotina and Microsporidia. There is hope that increased number of genome sequences and phylogenomic approaches can help resolve the placement and more further understand the evolution of the group.

For data analysis, a new genome database for comparing these genomes is online called MicrosporidiaDB. This project has begun incorporating the available genomes and providing a data mining interface that extends from the EuPathDB project.

Where can I get orthologs?

There are several databases that include orthology prediction for fungi. These all have pros and cons. Some are more comprehensive and have many more species. Some are curated orthologies and paralogy which should be pretty stable. Some are automated and groupings and ortholog group IDs change at each iteration.

  • A phylogenetic approach from a Saccharomyces perspective is at PhylomeDB.
  • Fungal Orthogroups is based on Synergy algorithm from I. Wapinski formerly of the Regev group at the Broad Institutue.
  • Yeast gene order browser (YGOB) for Saccharomyces spp and CGOB for Candida spp.
  • OrthoMCL database based on whole genomes, not a ton of fungi but useful starting set.
  • Ensembl Genomes provides ortholog prediction as part of the Compara pipeline though there is a limited phylogenetic diversity in the current Ensembl Fungal genomes.
  • TreeFam has Saccharomyces cerevisiae and Schizosaccharomyces pombe as the two fungi included in the curated ortholog assignments and phylogenies.
  • SIMAP provides pre-computed similarities among all proteins in UniProt.
  • InParanoid provides a pretty comprehensive of available 100 whole genomes and many fungal genomes which I tried to help select.
  • JGI’s Mycocosm attempts to provide a fungal focused paralog/gene family look at clusters of genes based on whole genomes
  • E-Fungi is also an attempt at automated clustering with some fancy webservices logic.
  • Fungal Transcription Factor database focused just on families of transcription factors.

Some of these tools are better than others in terms of providing downloadable tables.  Another problem is what Identifiers are used. Many biologists are using gene names or Locus identifiers not UniProt/GenPept IDs to identify genes or proteins of interest.  So tools that just cluster UniProt data aren’t as useful as those which refer to the gene or locus names. Also, providing a way to download all the data from a comparison is important for further mining and grouping of the data or cross-referencing local datasets.  One-by-one plugging in geneids is not really a tool that respects the idea that your user wants to ask sophisticated queries.

Also – beware that some approaches are very much pairwise comparisons lists whereas others are finding orthologous groupings.  So if you want to fine the Rad59 ortholog from all fungi it may be easier or harder depending on the source.

[I may make this a static page in the future to allow for more detailed updating since I know the available resources wax and wane]

Does gene function predict molecular evolutionary rate?
Gene sequences evolve at different rates due to different constraints, either due to chromosome position, functional constraint, and status as a single-copy or multi-copy gene.  In a recent paper, Allen Rodrigo (the new NESCent director by the, way, congrats!) the authors hypothesize that correlation in branch lengths of gene trees suggest they operate in the same pathway or have a similar function.  To do this they took alignments of orthologous genes from 10 bacterial species which were seeded with E. coli as the target species.  The alignments were used to build trees with MrBayes and only those which recovered the known species topology were retained. The ortholog groups were assigned GO terms via similarities.

They then looked at the branch lengths of gene trees and found a correlation between GO categories and rates of gene evolution/shape of the tree.  I’ll not go into more details here but I think this is an interesting finding that is probably not so surprising when you think about it.  I’m be very curious to see if this held up much in multi-domained proteins as well and of course taking this approach for a drive in fungal orthologs would be an interesting project for someone to try.

Li WL, & Rodrigo AG (2009). Covariation of branch lengths in phylogenies of functionally related genes. PloS one, 4 (12) e8487. PMID: 20041191. doi:10.1371/journal.pone.0008487

Monophyly of Taphrinomycotina

A recent paper in MBE  presents evidence that the Taphrinomycota (containing S. pombe and Pneumocystis) are in fact a monophyletic group. This is considered an early branch in the Ascomycota with the Pezizomycotina (filamentous ascomycete fungi like Neurospora and Aspergillus) and Saccharomycotina (fungi mainly with yeast forms including Candida and Saccharomyces).  The monophyly of Taphrinomyoctina fungi is something that has been fairly accepted but there are a few publications reporting  conflicting evidence in some sets gene trees. This conflict is most likely due to Long Branch Attraction (LBA) and the Philippe lab has long worked on this problem of LBA working to develop tools like PhyloBayes that attempt to correct for LBA with a parameter rich model and using lots of data (like whole genomes).  These authors are employing phylogenomics in the sense that multiple genes are used to reconstruct the phylogeny.  This use is different from the J.Eisen/Sjölander sense which is to infer gene function from a phylogeny.

This paper presents evidence using proteins of 113 mitochondrial and nuclear genes and finds strong statistical support for this monophyly.  They also note that it was necessary to remove fast evolving sites from a dataset of only mitochondrial genes in order to overcome LBA artifacts that lead to Saccharomyces and S. pombe sister relationship in previous analyses.

This paper also presents work using the Pneumocystis genome sequence helps resolve its placement and eventually understanding the evolution of this pathogen.  In this tree the sister group to Pneumocystis is Schizosaccharomyces but both lineages have very long branches.  The Saitoella lineage is basal in this paper which is different from what was found with a 4 gene (AFTOL) dataset (see Figure 2). Further work sampling more genes from these Taphrina lineages will likely help resolve the intra-clade relationships.

Y. Liu, J. W. Leigh, H. Brinkmann, M. T. Cushion, N. Rodriguez-Ezpeleta, H. Philippe, B. F. Lang (2008). Phylogenomic Analyses Support the Monophyly of Taphrinomycotina, including Schizosaccharomyces Fission Yeasts Molecular Biology and Evolution, 26 (1), 27-34 DOI: 10.1093/molbev/msn221