Skip to content
Who's in the Video
Dr. Francis Collins has served as the director of the National Institutes of Health since August, 2009. He is the former director of the National Human Genome Research Institute, where[…]
Sign up for the Smarter Faster newsletter
A weekly newsletter featuring the biggest ideas from the smartest people

The former director of the National Human Genome Research Institute describes how researchers compare DNA sequences to pinpoint which genes cause which diseases.

Question: During your work on the human genome, you linked many genes to specific diseases. How are these links established?

Francis Collins:  It’s too bad you can’t actually see DNA easily under a microscope and scan across a double helix and read out the sequence of bases that amounts to the information content because it would be easier, I think, to explain then how a geneticist goes about tracking down the molecular basis of a disease at the molecular level.  Our methods are indirect—they’re very powerful, they’re really highly accurate, but they’re not as visual as you might like.  We do have methods though, now, that allow you to read out with high accuracy, all three billion of the letters of the DNA instruction book, those letters are actually these chemical bases.  The chemical language of a DNA is a simple one, there’s only four letters in the alphabet.  Those bases that we abbreviate A, C, G and T.  and we have methods of being able to compare then the DNA sequence of people who have a disease versus people who don’t and look for the critical differences in order to nail down something that might be the cause.  

Well since, however, we all differ in our DNA sequence by about a half of one percent, you wouldn’t get very far if you basically sequenced my DNA and the DNA of somebody with Parkinson’s Disease trying to figure out what the differences were because it would be way too many of them. But if you’re willing to do that for a large number of people, you kind of average out all the noise and the difference that matters begins to be more and more clear.  That’s an overly simplified description of how a geneticists goes about zeroing in on the actual molecular cause of a complex or a simple disease.  This works most readily for diseases that are highly heritable; cystic fibrosis, Huntington’s Disease—those are conditions where as single mutation very reproducibly results in the disease.  

It’s been a lot tougher for diseases where the inheritance is muddy.  If you take diabetes, for instance, which is what my lab primarily works on, or you take asthma or high blood pressure, that is not a set of conditions where one gene is involved in risk, there are dozens of genes involved in that and no single one of them contributes very much, but you put it all together and the consequence to that individual may tip them over the threshold into having the illness.  We’re in the throes right now trying to sort that part out for the common diseases that we know have hereditary influences because they run in families but they’re much more complicated than say, cystic fibrosis.

Question:
Was there anything that totally surprised you in your research on the genome?

Francis Collins: There were a lot of surprises a lot of times where you just marveled as what you had uncovered and felt like you must have really somehow missed it when you were making guesses about what would be there.  I guess the one that startled most us the most profoundly was how few protein coding genes there actually are in the genome.  The old paradigm about DNA-makes-RNA-makes-protein, well then a stretch of DNA is going to make a protein, how many genes does it take to specify a human being?  Hooh, you would think there would be an exorbitant number.  And various estimates have been put forward before we knew the answer that we’re in the neighborhood of 100,000 to 150,000. Ultimately, it turns out we only have about 20,000 protein coding genes.  A breathtakingly short list of instructions for an organism as complex as homo sapiens.  

There are other genes that don’t code for protein that are turning out to be pretty important, so in a certain way we’re rescuing our sense of complexity by discovering there are other categories or genes that don’t have to be of the protein coding sort, but it is still astounding to think that just 20,000 of these protein coding genes is enough to take a single cell, which we all once were and inspire this program of elaborate complex development into a human being, including a nervous which is beyond our ability at the present time to even quite contemplate because of its complexity.

Recorded September 13, 2010
Interviewed by David Hirschman


Related