Reconstructing the Genome

So one of my other major interests is this computational problem called genome assembly.  So the genome again is this large molecule, but the way we can sequence it are through these little tiny fragments.  So the analogy is something like take the dictionary, or take some very big book—actually, take many copies of that same book, shred it up into little tiny fragments like fortune cookie size fragments.  And then the computational problem is given this large collection of shorter fragments of DNA sequences, how can we reconstruct them, how can we put them together into forming the whole genome.  This is this problem called genome assembly.


This is one of the bedrock problems of genomics because without assembly there would be no way to study larger sequences.  And there’s been a lot of theory developed, a lot of methods developed, a lot of improvements to these ideas on how to go about assembling genomes.  But it is very much a rapidly changing, rapidly maturing discipline as new sequencing technologies are brought on board, as new computational methods are applied, as new ideas are brought in.

So two years ago, for the first time, there was this big kind of international competition called “The Assemblathon” where it really was a competition to see given this set of data—everybody got the same set of data—what’s the best way to put this together back into reconstructing the genome, what’s the best way to do so, and how does that best reconstruction compare to the actual truth.

In this international competition, there were about 20 different labs around the world that participated, contributing about 70 different assemblies of the same genome.  So in this case, in “The Assemblathon,” it was a synthetic genome that was made by a computer program and that gave us more power to be able to really accurately measure how everyone did.  And one kind of surprising outcome was there was this—well first, none of the assemblers were perfect. None of the assemblers were able to take all this data and perfectly reconstruct the genomes. And also, there was quite a lot of variation in how well these different teams, how successful they were able to be, to put these genomes back together.

This was a little bit—depending on your outlook, a little bit disconcerting or a little bit of an opportunity.  It’s disconcerting in the sense that these genome reconstructions form the foundation for many, many studies in comparative genomics, form the basis for evolutionary studies, form the basis for, you know, many billions of dollars in research.  But none of the software for assembling genomes got it quite right.  They all had problems in one way or another.  But it’s also an opportunity, you know, putting on my kind of computer scientist side, it’s an opportunity for me in the sense that it means that work remains to be done to be able to create better assemblers, to be able to create better software and computational systems to put all this information together.

Big Think
Sponsored by Lumina Foundation

Upvote/downvote each of the videos below!

As you vote, keep in mind that we are looking for a winner with the most engaging social venture pitch - an idea you would want to invest in.

Keep reading Show less

7 fascinating UNESCO World Heritage Sites

Here are 7 often-overlooked World Heritage Sites, each with its own history.

Photo by Raunaq Patel on Unsplash
Culture & Religion
  • UNESCO World Heritage Sites are locations of high value to humanity, either for their cultural, historical, or natural significance.
  • Some are even designated as World Heritage Sites because humans don't go there at all, while others have felt the effects of too much human influence.
  • These 7 UNESCO World Heritage Sites each represent an overlooked or at-risk facet of humanity's collective cultural heritage.
Keep reading Show less

Why the number 137 is one of the greatest mysteries in physics

Famous physicists like Richard Feynman think 137 holds the answers to the Universe.

Pixabay
Surprising Science
  • The fine structure constant has mystified scientists since the 1800s.
  • The number 1/137 might hold the clues to the Grand Unified Theory.
  • Relativity, electromagnetism and quantum mechanics are unified by the number.
Keep reading Show less

Scientists discover how to trap mysterious dark matter

A new method promises to capture an elusive dark world particle.

Surprising Science
  • Scientists working on the Large Hadron Collider (LHC) devised a method for trapping dark matter particles.
  • Dark matter is estimated to take up 26.8% of all matter in the Universe.
  • The researchers will be able to try their approach in 2021, when the LHC goes back online.
Keep reading Show less