Tag Archives: National Institute for Medical Research

Mind the gaps

I’m a new PhD student at the National Institute for Medical Research. I’m fresh out of university having graduated from King’s College London this summer. I’m in the systems biology department and have a research focus on conserved non-coding elements. To me, this makes sense. However to a lot of people they look at me like I’m crazy. Not because I’ve decided to do a PhD in research science and that in itself may result in some form of psychological breakdown but because, once I’ve explained a bit more about what exactly CNEs are, they realise I’m ‘just looking in the gaps between genes’.

It’s one of the easiest ways to explain what regulatory elements are, bits in the gaps. But that doesn’t mean they’re any less important than the bits either side of the gap – the genes themselves. Someone tried telling me that statistically I would never find anything or the chances of me finding anything were so low that I wouldn’t have any results by the time I’m 30, let alone in the 4 years I have to complete my PhD, because “pretty much all the disease mutations we’ve found so far have been in the genes”. Aside from the fact I realised I’d only be 3 years away from being 30 by the time I finish my PhD (and subsequently scared myself rigid into making a life plan including a tight schedule for engagement, marriage and children) I also replied with what I find a glaringly obvious answer: that’s because we’ve only looked in the genes so far anyway. Track back to before we developed sequencing technologies and most of the diseases we knew were because of poor diet, the environment and cleanliness (or lack thereof) so why bother looking in the genes? Now we’ve moved on, discovered many mendelian and non-mendelian disease causing genes and mutations, hundreds of them. However, by looking at just the coding regions of the genome we are missing out around 98.5% of the DNA sequence in humans. When the human genome was first sequenced, the surprise that its millions of bases only held around 20,000 genes led to the labelling of much of the ‘gaps’ as ‘junk’. Why is it then that some of this ‘junk’ is so highly conserved over millions of years in evolution?

That’s kind of the idea behind everything I’m doing. There’s a set of CNEs that are conserved in vertebrates, so highly so that we can compare those in zebrafish, humans, pufferfish and mice and they’re the same. If a sequence doesn’t change over that sort of evolutionary time and distance surely it is important? We already know that there is more behind the ‘junk’ DNA so surely discrepancies, insertions, deletions and mutations in these regions could have phenotypic effects? Albeit uncovering the extent of these variations’ effects on disorders and anomalies would be trickier than how a single base change in a coding region could cause a genetic disorder as we are yet to uncover the code, grammar and spelling of these non-coding regulatory regions (if only it was as simple as the base triplet into amino acid version seen in coding regions…). The principle thought behind the theory would say that in a region as highly conserved as the ones we’re investigating, a single base pair could make a dramatic difference as it’s not seen in wild type organisms (the same with insertions and deletions). However we need to prove this. We need to decode the non-coding areas. We need to find a disease-causing mutation in these conserved CNEs. We need to prove this through a functional assay. We need a PhD student to sequence cohorts of hundreds of people with developmental disorders and anomalies and then analyse the data to find these, oh wait… When we find these (because we will, others already have and I’m a bright eyed bushy tailed new PhD student who believes I’ll have some form of answer in the next 4 years, let alone by the time I’m 30…) hopefully it will slowly start steering the balance of research from 99% exome sequencing to a more equal balance between exome and regulome searching. Our genes are crucial to who we are, but we can’t just ignore all the ‘gaps’ in between. They’re full of lots of important stuff too!


  1. Alexander, R. P., Fang, G., Rozowsky, J., Snyder, M. & Gerstein, M. B. Annotating non-coding regions of the genome. Nature reviews. Genetics 11, 559-571, doi:10.1038/nrg2814 (2010).
  2. Epstein, D. J. Cis-regulatory mutations in human disease. Briefings in functional genomics & proteomics 8, 310-316, doi:10.1093/bfgp/elp021 (2009).
  3. Nelson, A. C. & Wardle, F. C. Conserved non-coding elements and cis regulation: actions speak louder than words. Development 140, 1385-1395, doi:10.1242/dev.084459 (2013).
  4. Woolfe, A. et al. CONDOR: a database resource of developmentally associated conserved non-coding elements. BMC Developmental Biology 7, 100, doi:10.1186/1471-213x-7-100 (2007).
  5. http://scienceblogs.com/evolgen/wp-content/blogs.dir/296/files/2012/04/i-9e2a23088c9980f92d903a97e602c0af-noncoding_dna.jpg from https://www.sciencenews.org//node/21410 (image)