Defining “gene”

Blogging about Peer-Reviewed ResearchThe term “gene” might be tired and perhaps because it can have many different meanings – (don’t get us started on homolog!). We of course know that one gene/one enzyme hypothesis and the central dogma fails to represent full complexity of the RNA world, pre- and post-transcriptional gene regulation, and post-transcriptional modifications. An article in PLoS One “Beyond the Gene” from Evelyn Fox Keller and David Harel tackles the perhaps overly stretched definition of the gene.

I find that often the definition depends on what you want to do with the end product. As the article points out, in bioinformatics this is often about describing regions of sequence so the Sequence Ontology description suffices.

“A gene is: ‘a locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions and/or other functional sequence regions’.”

In more general genetics terms this is about inherited material so the authors quote Susan Lindquist describing genetics as “genetics is about the inheritance of traits” rather than solely about the DNA material. This quote is from her research summary and is in the context of prions which provide a mechanism for inheritance outside of nucleic acids.

The article does a very nice job taking the reader through some of the different ideas around genes and highlights why protein-coding region alone is not sufficient to define a gene given miRNAs & ncRNAs, binding sites & regulatory regions, and nuances of epigenetics. They do throw a bone back to bioinformatics and the Sequence Ontology definition saying:

“Yet, as the effort of those bioinformatics researchers indicates, there is a common denominator to many uses of that word, and even if it may seem hopeless to fit into the straight-jacket of the old concept of the gene, we do think that common denominator needs to be respected.”

The authors go on to propose new jacket for the concepts. A dene that captures the notion of genetic transmission, bene that describes behavior, and a genitor that links a dene to bene. Clear? I think they’ve generalizing genotype and phenotype and that interaction into more formal terms, but maybe I’m thinking too classically here.

The article describes some examples to help define the terms.

  • The whole genome can be considered a dene.
  • The polypeptide coding region is also a dene (classic protein coding gene definition).
  • In cases of alternative splicing each isoform that produces a protein-coding unit is a dene.
  • Even things that affect mutation rate, like SSRs, are considered denes.

I wonder what one classifies the units in the NMD regulation of nonsense-isoforms that play a role in SR gene regulation. Is the NMD pathway a bene? What are the nonsense forms considered? They are produced from the pre-mRNA until the concentration of the SR proteins is low and then the productive splicing occurs. I suppose all the isoforms as considered denes here whether or not they actually become proteins if this is regulated event.

For the somewhat dismissive tone towards bioinformatics initially, the authors go on to use terms like “Turing-computable truth-valued functions” and “Church/Turing thesis for biology” so they are attempting to provide formal language to make aspects of “fuzzy” biological concepts computable. Something that the ontology consortiums, with their noted shortcomings, have been doing.

I admit I’m not able to digest all of this during my evening reading, but I think it is an interesting step towards better formalization of the concept of inheritable material, behaviors or phenotypes, and describe the idea that the interaction between the behavior and the genetic material needs to be formalized as well.