This is a research blog so I though I’d post some quick numbers we are seeing for de novo assembly of the Neurospora crassa genome using Velvet. The genome of N.crassa is about 40Mb and sequencing of several flow cells using Solexa/Illumina technology to see what kind of de novo reconstruction we’d get. I knew that this is probably insufficient for a very good assembly given what has been reported in the literature, but sometimes it is helpful to give it a try on local data. Mostly this is a project about SNP discovery from the outset. I used a hash size of 21 in velvet with an early (2FC) and later (4FC) dataset. Velvet was run with a hashsize of 21 for these data based on some calculations and running it with different hash sizes to see the optimal N50. Summary contig size numbers come from the commands using cndtools from Colin Dewey.
faLen < contigs.fa | stats
2 flowcells (~10M reads @36bp/read or about 10X coverage of 40Mb genome)
N = 199562
SUM = 25463251
MIN = 49
1ST-QUARTILE = 87
MEDIAN = 107.0
3RD-QUARTILE = 146
MAX = 5371
MEAN = 127.59568956
N50 = 130
4 flow cells (~20M reads @36bp/read; or about 20X coverage of a 40Mb genome)
N = 102437
SUM = 38352075
MIN = 41
1ST-QUARTILE = 77.0
MEDIAN = 153
3RD-QUARTILE = 467
MAX = 7189
MEAN = 374.396702363
N50 = 837
So that’s N50 of 837bp – for those used to seeing N50 on the order or 1.5Mb this is not great. But from4 FC worth of sequencing which was pretty cheap. This is a reasonably repeat-limited genome so we should get pretty good recovery if the seq coverage is high enough. Using Maq we can both scaffold the reads and recover a sufficient number of high quality SNPs for the mapping part of the project.
To get a better assembly one would need much deeper coverage as Daniel and Ewan explain in their Velvet paper and shown in Figure 4 (sorry, not open-access for 6 mo). Full credit: This sequence was from unpaired sequence reads from Illumina/Solexa Genomic sequencing done at UCB/QB3 facility on libraries prepared by Charles Hall in the Glass lab.
The transcriptional landscape of yeast has been (further) defined with Solexa sequencing in a method deemed “RNA-Seq”, but what I would call “deep EST sequencing”. This approach for transcriptional profiling by sequencing alone is sure to be used by many labs looking for lower and more complete ways to describe and quantitate the full population of transcripts in an organism.
Nagalakshmi, U., Wang, Z., Waern, K., Shou, C., Raha, D., Gerstein, M., Snyder, M. (2008). The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing. Science DOI: 10.1126/science.1158441
Researchers from Technical University of Denmark published some interesting results from comparing expression across the very distinct Aspergillus species.
Kudos also goes to making it Open Access. I am posting a few key figures below the fold because I can! They grew the fungi in bioreactors fermenting glucose or xylose. After calibrating the growth curves they were able to sample the appropriate time points for comparison of gene expression across these three species. They found a set of genes commonly expressed.
Continue reading Aspergillus comparative transcriptional profiling
David Carter at the Sanger Centre emailed a message that new assemblies of Saccharomyces strain resequencing project have been posted including a new three-way alignment of S. bayanus–S.paradoxus–S.cerevisiae. This updates the Dec 2007 release.
Continue reading New Saccharomyces resequencing assembly
Nature has an overview of what goes in and out of next generation sequencers with an interview with a smiling Chad Nusbaum from the Broad Institute. Most of these have been out and about for a while, but it seems that the hayride/bandwagon is starting to pick up more steam as GT‘s Genome Scan has several posts about sequencing referencing J. Craig V, George Church, and the Nature news article (not free).
Note that Solexa is no longer the cool name – “Genome analyzer” being the name for the machine that was previously called Solexa 1G. I’m holding out hope for funnier names in the future. I do feel that ABI’s choice of SOLiD is more exciting than 310/3700/3730 that is as inspiring as HAL9000.
But I mean if your technology is called pyrosequencing, I am hoping Roche will come up with a firey or at least smoldering play on words if they rename 454 again (GS FLX for now).