Category Archives: phylogenetics

NSF Poststdoc opportunity for Research using biological collections

Earlier this year the NSF released a postdoc opportunity for research to use Biological Collections. In particular these can be strain collections and stock collections. The US Culture Collection Network is a Research Coordination Network which brings together many collaborating culture collections. You can find many of the U.S. living collections there include fungal centers like the Phaff Yeast Collection and Fungal Genetics Stock Center. The Gilbertson Mycological Herbarium at U Arizona under Elizabeth Arnold‘s leadership has developed a rich collection of endophyte fungi which would be another excellent environment to work with these resources. Kyria Boundy-Mills who is the curator of the Phaff collection has also expressed interest in either hosting or helping working with a postdoc on this. There is tremendous biodiversity of the fungi available in these and other culture collections so seems like a great chance to tap into these.
This would be a great opportunity to link work in the 1000 Fungal genomes project and sampling from culture collections (not just sequencing, but growing and characterizing growth, carbon source utilization and integrating that with predictions made from genome comparisons). If this is something interesting to you – do get in touch with some of the curators at these collections, but also my lab and I expect many other labs would be interested hosting someone to work on these questions that take advantage of these living collections of fungi.
Proposals are to be submitted by potential post docs. Submitter must be a US citizen or US permanent resident. The next deadline is November 3, 2015Funding total for the program is $8 million, 40 awards anticipated, up to two years. Here’s some key text from the solicitation:

Competitive Area 2. Postdoctoral Research Fellowships Using Biological Collections.

Biological research collections represent the documented scientific history of life on Earth, and the U.S. museum community alone curates over a billion specimens ranging from bacteria to plants, insects and vertebrates, as well as fossils. Across the globe, collections represent critical infrastructure and support essential research activities in biology and its related fields. Scientists, government agencies, industry and citizens utilize collections to document and understand evolution and biodiversity, study global change, formulate advice on conservation planning, educate the general public, improve interactions between sciences, and devise new practical applications from science to every day life. New technologies supported by NSF in digitization, such as the Advancing Digitization of Biodiversity Collections (ADBC) program, are making collections and their associated data, whether they are physical specimens, text, images, sounds, or data tables, searchable in online databases. Despite this clear progress in improving access to physical specimens and their associated metadata, collections remain under-utilized for answering contemporary questions about fundamental aspects of biological processes. Thus, collections are poised to become a critical resource for developing transformative approaches to address key questions in biology and potentially develop applications that extend biology to physical, mathematical, engineering and social sciences. This postdoctoral track seeks transformative approaches that use biological collections in highly innovative ways to address grand challenges in biology. Priority may be given to applicants who integrate biological collections and associated resources with other types of data in an effort to forge new insight into areas traditionally funded by BIO. Examples of key questions in biology of interest include, but are not limited to, links between genotype and phenotype, evolutionary developmental biology, comparative approaches in functional and developmental neurobiology, and the biophysics of nanostructures. Using collections as a resource for grand challenge questions in biology is expected to present new opportunities to advance understanding of biological processes and systems, inspiring new discoveries in areas with relevance to other disciplines with overlapping interests in biological systems. Applicants must document access to the selected collection(s) in the research and training plan.

Postdoc in Fungal Phylogenetics

Post-doctoral Position in Fungal Phylogenetics

A Post-Doctoral position in fungal phylogenetics is available in the Hibbett laboratory at Clark University ( The Post-doc will participate in a large collaborative endeavor supported by the NSF AVATOL Program that is aimed at synthesizing a comprehensive tree of life from published analyses, and developing novel tools for community-driven annotation of the tree. Specific responsibilities will include (1) assembly and integration of phylogenetic datasets and trees representing all groups of Fungi; (2) coordination with a multi-laboratory team including software developers and systematists to develop and test new methods for tree integration and annotation; (3) outreach to the fungal systematics community; (4) contribution to a distributed web-based undergraduate course on assembling the tree of life, and co-instruction of a linked undergraduate course at Clark University.

The ideal candidate will be a productive researcher with interests in fungal systematics and the construction and interpretation of large-scale phylogenetic trees, will have excellent communication and interpersonal skills, and will seek a career involving both research and education. Candidates lacking background in fungal systematics, but with strong qualifications in phylogenetics, and excellent potential as educators, may be considered.

It is anticipated that the position will be available beginning May 1, 2012. Up to three years of support is possible, depending on progress. Funding is contingent on final NSF approval.

To apply, e-mail a curriculum vitae, statement of research interests and career goals, PDFs of major publications, and names and e-mail addresses for three references. Applications from women and members of underrepresented groups in science are encouraged.

Clark University is an EEO/AA Employer.

David S. Hibbett
Biology Department
Clark University
Worcester, MA 01610
(508) 793-7332

Where can I get orthologs?

There are several databases that include orthology prediction for fungi. These all have pros and cons. Some are more comprehensive and have many more species. Some are curated orthologies and paralogy which should be pretty stable. Some are automated and groupings and ortholog group IDs change at each iteration.

  • A phylogenetic approach from a Saccharomyces perspective is at PhylomeDB.
  • Fungal Orthogroups is based on Synergy algorithm from I. Wapinski formerly of the Regev group at the Broad Institutue.
  • Yeast gene order browser (YGOB) for Saccharomyces spp and CGOB for Candida spp.
  • OrthoMCL database based on whole genomes, not a ton of fungi but useful starting set.
  • Ensembl Genomes provides ortholog prediction as part of the Compara pipeline though there is a limited phylogenetic diversity in the current Ensembl Fungal genomes.
  • TreeFam has Saccharomyces cerevisiae and Schizosaccharomyces pombe as the two fungi included in the curated ortholog assignments and phylogenies.
  • SIMAP provides pre-computed similarities among all proteins in UniProt.
  • InParanoid provides a pretty comprehensive of available 100 whole genomes and many fungal genomes which I tried to help select.
  • JGI’s Mycocosm attempts to provide a fungal focused paralog/gene family look at clusters of genes based on whole genomes
  • E-Fungi is also an attempt at automated clustering with some fancy webservices logic.
  • Fungal Transcription Factor database focused just on families of transcription factors.

Some of these tools are better than others in terms of providing downloadable tables.  Another problem is what Identifiers are used. Many biologists are using gene names or Locus identifiers not UniProt/GenPept IDs to identify genes or proteins of interest.  So tools that just cluster UniProt data aren’t as useful as those which refer to the gene or locus names. Also, providing a way to download all the data from a comparison is important for further mining and grouping of the data or cross-referencing local datasets.  One-by-one plugging in geneids is not really a tool that respects the idea that your user wants to ask sophisticated queries.

Also – beware that some approaches are very much pairwise comparisons lists whereas others are finding orthologous groupings.  So if you want to fine the Rad59 ortholog from all fungi it may be easier or harder depending on the source.

[I may make this a static page in the future to allow for more detailed updating since I know the available resources wax and wane]

Horizontal gene transfer from Zygo to pea aphid

Pea AphidAnother result from the analysis of the recently published genome of the pea aphid, Acyrthosiphon pisum. Nancy Moran and Tyler Jarvik present a study of the origin of the carotenoid production gene in pea aphid. Animals typically cannot make carotenoids so they sought to discover how this is possible. They find that it is derived from a horizontal gene transfer event of a fungal gene into the aphid lineage. This gene is responsible for the red-green color polymorphism in the aphid. It appears the gene is derived from a ‘zygomycete’ or relative in the early branching lineage of the fungi. One gene, a carotenoid desaturase, is encoded in a 30kb genomic region that is missing in green aphids but present in the red morphs. The region is apparently maintained in the population by frequency dependent selection since each color has an advantage or disadvantage for evading detection by predators in different environments.

The reports of eukaryotic HGT event from fungi to animals is quite rare so this finding is surprising in that sense, but the authors argue that the important ecological role of carotenoids suggest we might see even more examples if we look harder.

Moran, N., & Jarvik, T. (2010). Lateral Transfer of Genes from Fungi Underlies Carotenoid Production in Aphids Science, 328 (5978), 624-627 DOI: 10.1126/science.1187113

Does gene function predict molecular evolutionary rate?
Gene sequences evolve at different rates due to different constraints, either due to chromosome position, functional constraint, and status as a single-copy or multi-copy gene.  In a recent paper, Allen Rodrigo (the new NESCent director by the, way, congrats!) the authors hypothesize that correlation in branch lengths of gene trees suggest they operate in the same pathway or have a similar function.  To do this they took alignments of orthologous genes from 10 bacterial species which were seeded with E. coli as the target species.  The alignments were used to build trees with MrBayes and only those which recovered the known species topology were retained. The ortholog groups were assigned GO terms via similarities.

They then looked at the branch lengths of gene trees and found a correlation between GO categories and rates of gene evolution/shape of the tree.  I’ll not go into more details here but I think this is an interesting finding that is probably not so surprising when you think about it.  I’m be very curious to see if this held up much in multi-domained proteins as well and of course taking this approach for a drive in fungal orthologs would be an interesting project for someone to try.

Li WL, & Rodrigo AG (2009). Covariation of branch lengths in phylogenies of functionally related genes. PloS one, 4 (12) e8487. PMID: 20041191. doi:10.1371/journal.pone.0008487

Monophyly of Taphrinomycotina

A recent paper in MBE  presents evidence that the Taphrinomycota (containing S. pombe and Pneumocystis) are in fact a monophyletic group. This is considered an early branch in the Ascomycota with the Pezizomycotina (filamentous ascomycete fungi like Neurospora and Aspergillus) and Saccharomycotina (fungi mainly with yeast forms including Candida and Saccharomyces).  The monophyly of Taphrinomyoctina fungi is something that has been fairly accepted but there are a few publications reporting  conflicting evidence in some sets gene trees. This conflict is most likely due to Long Branch Attraction (LBA) and the Philippe lab has long worked on this problem of LBA working to develop tools like PhyloBayes that attempt to correct for LBA with a parameter rich model and using lots of data (like whole genomes).  These authors are employing phylogenomics in the sense that multiple genes are used to reconstruct the phylogeny.  This use is different from the J.Eisen/Sjölander sense which is to infer gene function from a phylogeny.

This paper presents evidence using proteins of 113 mitochondrial and nuclear genes and finds strong statistical support for this monophyly.  They also note that it was necessary to remove fast evolving sites from a dataset of only mitochondrial genes in order to overcome LBA artifacts that lead to Saccharomyces and S. pombe sister relationship in previous analyses.

This paper also presents work using the Pneumocystis genome sequence helps resolve its placement and eventually understanding the evolution of this pathogen.  In this tree the sister group to Pneumocystis is Schizosaccharomyces but both lineages have very long branches.  The Saitoella lineage is basal in this paper which is different from what was found with a 4 gene (AFTOL) dataset (see Figure 2). Further work sampling more genes from these Taphrina lineages will likely help resolve the intra-clade relationships.

Y. Liu, J. W. Leigh, H. Brinkmann, M. T. Cushion, N. Rodriguez-Ezpeleta, H. Philippe, B. F. Lang (2008). Phylogenomic Analyses Support the Monophyly of Taphrinomycotina, including Schizosaccharomyces Fission Yeasts Molecular Biology and Evolution, 26 (1), 27-34 DOI: 10.1093/molbev/msn221

Fun with estimating divergence times

Estimating divergence times is notorious difficult and the field can be downright rancorous with some being accused of reading tea leaves and chicken entrails – interesting reading for personalities as much as the different scientific approaches. There are several different approaches to trying to estimate a divergence time among species, using calibration points usually anchored by fossil data. Molecular clock methods have problems sometimes producing extremely old dates that are quite hotly debated. In fungi we have a very few fossils (and their placement on the phylogeny is debated).

There are quite a few available methods for reconstructing divergence times including r8s and multidivtime which start with various types of trees and use calibration time points that are typically informed by fossil dates. The simplest approaches assume a molecular clock (rates are same across the tree) and then one only needs to calibrate the number of substitutions (or rate really) to time to determine how phylogenetic tree branch lengths map to time. The BEAST package also does phylogenetic inference and divergence time estimation (and provided the necessary analysis for exoneration of the Tripoli Six) across a sample of trees. BEAST (and MrBayes) use MCMC to sample the space of parameters and tree space to identify phylogenies and evolutionary parameters that explain the data (an alignment of sequences).

A paper from Akerborg and colleagues introduces a new approach that uses MCMC but apply a few twists, using a birth-death model that doesn’t assume a molecular clock and employing a hill-climbing algorithm instead of MCMC to find parameter optima. They use a Maximum a posterior (MAP) framework which is more computational efficient than MCMC. They couple the MAP approach with a dynamic-programming approach that separates the estimation of rates (branch length) from the estimation of times (which often require assumption of a molecular clock). While I can’t speak with much authority on the MAP approach or yet how well this compares on different datasets, it suggests a different method to tackle these problems. They authors point out one drawback with their approach is it only allows for derivation of point-estimates so statistical confidences like bootstrap support are not easily calculated through this approach. Their software, called PRIME is available here and I will be curious to see how it performs in other peoples’ hands.

Akerborg, O., Sennblad, B., Lagergren, J. (2008). Birth-death prior on phylogeny and speed dating. BMC Evolutionary Biology, 8(1), 77. DOI: 10.1186/1471-2148-8-77

Cryptococcus species deliniation What delineates species boundaries in fungi? Much work has been done on biological and phylogenetic species concepts in fungi. Some concepts are reviewed in Taylor et al 2006 and in Taylor et al 2000, and applications can be seen in several pathogens such as Paraccocidiodies, Coccidioides, and the model filamentous (non-pathogenic) fungus Neurospora.

A paper in Fungal Genetics and Biology on species definitions in Cryptococcus neoformans from multi-locus sequencing seeks to provide additional treatment of the observed diversity. A large study of 117 Cryptococcus isolates were examined through multi-locus sequencing (6 loci) and identified two monophyletic lineages within C. neoformans varieties that correspond to var. neoformans and var. grubii. However within the C. gattii samples they identified four monophyletic groups consistent with deep divergences observed from whole genome trees for two strains of C. gattii, MLST, and AFLP studies. By first defining species, we can now test whether any of the species groups have different traits including prevalence in clinical settings and in nature.

BOVERS, M., HAGEN, F., KURAMAE, E., BOEKHOUT, T. (2007). Six monophyletic lineages identified within Cryptococcus neoformans and Cryptococcus gattii by multi-locus sequence typing. Fungal Genetics and Biology DOI: 10.1016/j.fgb.2007.12.004

Taking into account alignment uncertainty

WrightFisher talks about a paper & the commentary in Science describing how alignment uncertainty should be taken into account when doing phylogenetic analyses on genomic datastets (some might call this phylogenomics, but Dr Eisen won’t). If the sequence alignment is treated as a random variable (and in bayesian approaches have a prior based on result(s) from an alignment program) then more accurate reconstruction. Robin points out several statistical alignment approaches that do just this including TKF91 and recent work that unifies a probabilistic framework with transducers.

Willi Hennig Superstar

Willi HennigThe Willi Hennig Society, homebase for all good cladists, has subsidized the license fee for TNT so that it is now a freely available program (although it is not open-source). TNT implements phylogenetic analysis under parsimony with a fast tree searching algorithm. I believe TNT was one of the software tools that CIPRES was targeting for optimization as well so this may reflect some of that work.

From EvolDir.