What is Big Think?  

We are Big Idea Hunters…

We live in a time of information abundance, which far too many of us see as information overload. With the sum total of human knowledge, past and present, at our fingertips, we’re faced with a crisis of attention: which ideas should we engage with, and why? Big Think is an evolving roadmap to the best thinking on the planet — the ideas that can help you think flexibly and act decisively in a multivariate world.

A word about Big Ideas and Themes — The architecture of Big Think

Big ideas are lenses for envisioning the future. Every article and video on bigthink.com and on our learning platforms is based on an emerging “big idea” that is significant, widely relevant, and actionable. We’re sifting the noise for the questions and insights that have the power to change all of our lives, for decades to come. For example, reverse-engineering is a big idea in that the concept is increasingly useful across multiple disciplines, from education to nanotechnology.

Themes are the seven broad umbrellas under which we organize the hundreds of big ideas that populate Big Think. They include New World Order, Earth and Beyond, 21st Century Living, Going Mental, Extreme Biology, Power and Influence, and Inventing the Future.

Big Think Features:

12,000+ Expert Videos

1

Browse videos featuring experts across a wide range of disciplines, from personal health to business leadership to neuroscience.

Watch videos

World Renowned Bloggers

2

Big Think’s contributors offer expert analysis of the big ideas behind the news.

Go to blogs

Big Think Edge

3

Big Think’s Edge learning platform for career mentorship and professional development provides engaging and actionable courses delivered by the people who are shaping our future.

Find out more
Close
With rendition switcher

Transcript

Michael Schatz: My interest in Cloud computing relates to kind of this data analysis, data discovery problem of being able to scan through very large volumes of DNA sequences.  A lot of the technologies that were developed for Cloud computing were actually entirely invented in other disciplines.  So in particular, large-scale Internet companies like Google and Facebook and Twitter had developed these technologies out of necessity.  So one of the key technologies that I utilize, that I look to, is a technology called MapReduce.  It was invented at Google and for a long time this was their secret sauce, if you will, for being able to do these very large studies of many trillions of web pages.  Scanning through trillions of web pages is not so different than scanning through trillions of DNA sequences.  A lot of the approaches that you would use for those studies are exactly the same.  So I borrowed heavily from kind of that sort of community, the text mining database community, and then any sort of discipline where there tends to be large volumes of data, these technologies are rapidly gaining traction just because they are so powerful.  

The first main technical challenge is if we have many thousands of genomes we want to study, how can we load all that information into the Cloud, right?  The way you would want to do that is, you know, through your web browser or through your computer, but the Internet capacity is only so big and if you have to ship, you know, this conceptual pile of two miles of DVDs, to should bring that around on the Internet it takes too long.  There are some ways to overcome this.  It’s a little bit funny to think about it, but in some ways the most practical way to ship very large data sets is to use FedEx or UPS or some sort of physical shipment of hard drives through the mail.  It’s not, you know, it’s not the sexy application that you would want for an Internet company, but that’s the practical way to do it.

So that’s the main technical barrier.  And then storing data in the Cloud opens up a lot of other challenges.  In particular, there’s a lot of privacy concerns about making sure that that data is really well guarded.  Your genome has a lot of information about who you are, what sort of diseases you’re susceptible to.  It could say a lot about your family, about your children, about your ancestors.  You know, it’s precious information that we definitely don’t want to expose without giving it some consideration.  So the concern is, if all this genetic information is in the Cloud and you’re not careful about how that data is protected, it could leak out, it could, you know, it could accidentally be exposed to other people.  And then also, if big archives are made that has collected many thousands of people, this could suddenly become an attractive target for attackers.

So today we’re a little bit guarded in the sense that this genetic information is decentralized in many different labs so that if there’s a breach in one lab it’s relatively localized.  If everything gets aggregated together it becomes a little bit more risky because it becomes a little bit more attractive as a target.  I think these challenges can be overcome, the encrypting technologies, the authentication technologies.  They exist; they certainly exist.  And there are companies that run with the highest level of security at Amazon Cloud or another Cloud resource.  It is certainly possible to do so, but we’ve just got to be so certain that we get it right on the first try, right?  We don’t want to create this big database that has all this genetic information and then accidentally leave it vulnerable.  So we just have to be really careful about how that’s engineered.  

Directed / Produced by
Jonathan Fowler & Elizabeth Rodd

 

DNA In the Cloud

Newsletter: Share: