Have a look at this post by Larry Moran on Takao Kasuga‘s PLoS One paper on phylogenetic distribution of genes in N. crassa genome.
The interesting next step with this paper, some of which we’re exploring as part of the Neuropsora tetrasperma and N.discreta genome sequencing, is how many of these N.crassa genes are at least shared with other Neurospora spp and whether they show a nucleotide conservation pattern that suggests they are protein coding genes. We also haves some RNASeq and microarray gene expression data to test if these species-specific genes are expressed any under conditions. So far there isn’t much evidence to throw out many of the 10k or so genes as artifacts, but the analysis is still a work in progress.
I also think an additional next analysis is to cross-reference these genes with the results from the knockout project and their phenotypes. This will take some ability to download dumps from the Broad Institute database to be able to mine data out of the phenotypes and annotations from their site, but I am hopeful that some progress will be made on that front in the next few months. This might help prioritize some of the uncharacterized genes which have phenotypes and are either in the lineage-specific or shared among all eukaryotes.
A better assignment of function to the genes that fall in the ‘shared across eukaryotes’ but with no annotated function could also be undertaken using phylogenomic approaches. If they are really shared across multiple species there may be some annotation that can be reliably and automatically transferred. If there are some universally shared genes with no known or well studied phenotype, those would be some of the first I’d order from the strain collection and get cracking at phenotyping.