Fungal genome assembly from short-read sequences

This is a research blog so I though I’d post some quick numbers we are seeing for de novo assembly of the Neurospora crassa genome using Velvet. The genome of N.crassa is about 40Mb and sequencing of several flow cells using Solexa/Illumina technology to see what kind of de novo reconstruction we’d get. I knew that this is probably insufficient for a very good assembly given what has been reported in the literature, but sometimes it is helpful to give it a try on local data.  Mostly this is a project about SNP discovery from the outset. I used a hash size of 21 in velvet with an early (2FC) and later (4FC) dataset. Velvet was run with a hashsize of 21 for these data based on some calculations and running it with different hash sizes to see the optimal N50.  Summary contig size numbers come from the commands using cndtools from Colin Dewey.

  faLen < contigs.fa | stats

2 flowcells (~10M reads @36bp/read or about 10X coverage of 40Mb genome)

            N = 199562
          SUM = 25463251
          MIN = 49
       MEDIAN = 107.0
          MAX = 5371
         MEAN = 127.59568956
          N50 = 130

4 flow cells  (~20M reads @36bp/read; or about 20X coverage of a 40Mb genome)

            N = 102437
          SUM = 38352075
          MIN = 41
 1ST-QUARTILE = 77.0
       MEDIAN = 153
          MAX = 7189
         MEAN = 374.396702363
          N50 = 837

So that’s N50 of 837bp – for those used to seeing N50 on the order or 1.5Mb this is not great.  But from4 FC worth of sequencing which was pretty cheap.  This is a reasonably repeat-limited genome so we should get pretty good recovery if the seq coverage is high enough. Using Maq we can both scaffold the reads and recover a sufficient number of high quality SNPs for the mapping part of the project.

To get a better assembly one would need much deeper coverage as Daniel and Ewan explain in their Velvet paper and shown in Figure 4 (sorry, not open-access for 6 mo). Full credit: This sequence was from unpaired sequence reads from Illumina/Solexa Genomic sequencing done at UCB/QB3 facility on libraries prepared by Charles Hall in the Glass lab.

Will you always be able to satisfy that chocolate craving?

Crinipellis_perniciosa_mushroomNPR had a story this weekend on Cocoa plantation collapse and the ecological aftermath of the changes the witches’ broom fungus Moniliophthora perniciosa has wreaked. The genome sequence project for this Homobasidiomycete fungus (also known as Crinipellis perniciosa, phylogenetic relationships discussed by Aime and Philips-Mora 2005) is underway at the Laboratory Genomica e Expressao at UNICAMP, Brazil.  The witches’s broom (not this witches’ broom) is named because of the bristly form it induces in the cacao plants.

The genome project will hopefully improve the diagnosis and treatment work that is needed.  Beyond the insatiable need for chocolate, the NPR story does talk about the impact on farmers, the economy, and the environment with the loss of these cacao plantations.

Some links:

I was also browsing some articles on other fungi that inhabit cacao plants and saw a recent survey that includes fungi that produce mycotoxins.

Amphibian skin bacteria shown to fight off Batrachochytrium dendrobatidis.

A year ago researchers at James Madison University discovered that, Pedobacter cryoconitis, a bacteria first found on the skin of red backed salamanders, was found to prevent the growth of the chytrid B. dendrobatidis, which is currently decimating frog populations.

(Mountain Yellow-Legged Frog from wikipedia)

The newest research on the subject is being presented this year at ASM by Brianna Lam who worked with other biologists from both San Francisco State University and JMU.

Lam’s research indicates that adding pedobacter to the skin of mountain yellow-legged frogs would lessen the effects of Batrachochytrium dendrobatidis (Bd), a lethal skin pathogen that is threatening remaining populations of the frogs in their native Sierra Nevada habitats.

Lam first conducted petri dish experiments that clearly showed the skin bacteria repelling the deadly fungus. She then tested pedobacter on live infected frogs, bathing some of them in a pedobacter solution. The frogs bathed in pedobacter solution lost less weight than those in a control group of infected frogs that were not inoculated.

In addition to the lab experiments, the JMU and SFSU researchers have studied the yellow-legged frogs in their natural habitats and discovered that some populations with the lethal skin disease survive while others go extinct. The populations that survived had significantly higher proportions of individuals with anti-Bd bacteria. The results strongly suggest that a threshold frequency of individuals need to have anti-Bd bacteria to allow a population to persist with Bd. (from Eureka alert)

The research above is really interesting and I am curious as to how the bacteria is actually killing the chytrid. The only other research I can think of where chytrids were being killed was a BBC news article that wrote about scientists bathing frogs in chloramphenicol.

Penicillium marneffei project

P.marneffeiWe’re excited that a Penicillium marneffei grant to Mat Fisher and collaborators has been funded by the Welcome Trust. It includes a collaboration with Bignell Lab at Imperial College, our lab, JCVI, and Univ of Melbourne. This project will explore functional and comparative genomics approaches to studying the fungus which primarily infects immune compromised individuals in south-east asia where it is found associated with bamboo rats

Scientists at Imperial College London have received almost £350 000 from the Wellcome Trust, the UK’s largest medical research charity, to study Penicillium marneffei, the only Penicillium fungus to cause serious disease in humans. The researchers aim to find out what makes this particular fungus pathogenic.

Read the rest of the release.

Basidiomycete genomes galore

Just finished attending Genetics and Cell Biology of Basidiomycetes in Cape Girardeau, MO which was an intimate gathering of basidiomycetaphiles.  I learned about systems that are used for studying fruiting body development, genetic mapping, pheromone and mating genes, kinesin dynamics, meoitic gene regulation, and a host of topics.  I’m happy I got a chance to meet more folks in the community and learned about where informatics and computational approaches are really needed to push along some of the interpretation of the more than a dozen basidiomycete genomes.  In particular it sounds like the PleurotusSchizophyllum, Agaricus bisporus, and Serpula genomes are all marching along to completion with some already in 4X assembly or further.  

GCBBVI Group Picture

So we’ll further have more samples from of key model and some less-model species to assist researchers working on many different mushroom-forming fungi that range from brown and white-rotting saprophyte fungi to mycorrhizal fungi that associate with plants.    I’m excited about the work to make transformation and knockouts more readily in these systems too to push the genetics and cellular biology of these systems even further.  The genome sequences will be another tool in these endeavors.

The last day ended with a discussion about genome annotation and future support for curating gene models.  Basically everyone is unhappy with computational predictions and want to be able to go in and fix things. (I think people remember the ones that are gotten wrong more readily than the ones that were right, but computational prediction definitely performs poorly in some situations).   In this Web 2.0-land we live in, this is still not something easily done with any of the freely available genome browsing tools. The JGI’s browser was lauded for its ability to handle these kinds of requests, but how do we proceed when genomes are not sequenced by that center or when (not too distant future) communities are able to sequence a genome themselves using 454/Illumina-Solexa/Helicos/Pacific Biosystems approaches in their own lab?  There is still a huge lag in what kinds of tools researchers can use to annotate genomes to fix gene models and add functions.  Hopefully projects like GMOD will continue to develop useful tools for solving these needs, but there is certainly a need for better support of distributed community annotation of genomes where this little direct money for supporting curators from a single place.