According to Netflix's VP of Product Development, there's a misconception about big data. It's not a treasure trove of information, as many people and their companies assume, but more like "a big mountain of garbage." The problem, as Todd Yellin sees it, is sifting through the data to find the information that will actually benefit users, and that data is few and far between.
Yellin appreciates the simplicity of the subscription model on which Netflix depends. While making the on-demand entertainment company entirely beholden to their customers for success, unlike Google and Facebook which draw substantial revenue from advertisers, it simplifies their understanding of big data. Ultimately it means serving one master, the customer, instead of two.
Todd Yellin: So it's funny, big data has been kind of a cliché in Silicon Valley for the last few years: big data this, big data that. Big data is really one big mountain of garbage with little gems buried it in this tremendous trash heap, and you want to find those gems — you really want to find out what's going to make the experience better. So there are a lot of sophisticated machine learning algorithms that Netflix and other companies deploy to really figure out what are the gems that are going to make a better experience, and what's the rubbish that you want to separate out and push to the side? Once you find those gems, it doesn't make it a more alienated, machine experience — it actually makes it a more personal experience. It becomes much more about the individual member.
When I first got to Netflix we were looking at other companies that were doing personalization and leveraging the kinds of data they couldn't learn from. And one company that obviously wasn't competitive with Netflix was also doing some interesting things was Pandora, the music company. And Netflix is in Silicon Valley and they're up in Oakland, not too far away, and we're down in the South Bay. So we went up to - we had a meeting, a little powwow, this was many years ago, with Pandora. And they were really small then and Netflix was much smaller and we were just comparing notes. What was interesting about Pandora is Pandora had the Music Genome Project where they were tearing apart and deconstructing lots of music on all these different dimensions and trying to really understand the music. And I remember back in these days, and this was like ten years ago, they had their walls lined with CDs all over and they had a whole line of people in this cramped office with headphones on and they were listening to music with this big spreadsheet open and tagging everything about it.
At that's time at Netflix we were all about rating our titles on a one to five star system and we were very much using a lot of behavioral - a lot of algorithms around the behaviors of what users were doing and based on a lot of clustering techniques. We weren't really deconstructing the titles yet, we would get to that soon after — they weren't really deploying a lot of the collaborating filtering models that we were using on our algorithms. So we compared notes and we influenced each other and we only met a couple of times with them and were paying attention to what other companies were doing in terms of personalization. And based on these learnings, everyone kind of evolved across, it's not just Netflix, other companies that are trying to leverage big data to make it much easier to find something great to watch, great to listen to, great to read, great to buy, and figure out how to use, when to use human created data, a lot of metadata, a lot of deconstructing of what the material is people are watching or listening to or reading and so forth and how to use a lot of behavioral clustering data, what kinds of people are watching these kinds of shows and movies? What kinds of people are watching these or listening to these and so forth?
So it wasn't just Netflix, it's just we are very monomaniacally focused at Netflix and we really want to create great content and we want to get that great content to the right people at the right time. And so that's why we're using the data. Let me throw in one more thing. There are a lot of great companies deploying a lot of interesting techniques, not only in Northern California but around the world. And so right up the road from Netflix you have companies like Google and Facebook and they can be great partners for Netflix at times and there are brilliant people working there, but I don't envy their jobs because their jobs are different than what myself and my team have to do. When we have all this data we have one purpose for this data and that's to make each individual member's experience better. There are no advertisers. We're only subscription. And in a subscription model you use everything you can and you're basically on your knees and you saying please stay with Netflix. Stay another month. It's a great experience. There's lots of great stuff to watch. Here's another great TV show or movie.
But if you're at a Google or a Facebook you're doing really interesting work but you serving two masters: you're serving the consumer and you're serving the advertiser. So when you getting all that big data you're constantly in this conflict of, "do I use this big data to serve the advertiser and put the right advertising front of the right customer to get them to buy something that they didn't sincerely come to the service for, or am I serving the customer showing them the right thing in their newsfeed, showing them the perfect search result and so forth?" So they're very, you know, they're incredible tools in the modern age, whether on using Google Search or using the Facebook newsfeed, but once again, they're even at a more perplexing challenge, we just use the data to serve each individual member.