More than 80 per cent of the TV shows people watch on Netflix are discovered through the platform’s recommendation system. That means the majority of what you decide to watch on Netflix is the result of decisions made by a mysterious, black box of an algorithm.
In the last decade, learning algorithms and models at Netflix have evolved with multiple layers, multiple stages and nonlinearities. This has developed to the stage at which they now use machine learning and deep variants to rank large catalogues of content by determining the relevance of each of their titles to each user, creating a personalized content strategy.
Today, online platforms like Netflix offer thousands of movies and shows. However, this much choice can be overwhelming for users! With over 7,000 movies and shows in the Netflix catalog, it is nearly impossible for users to find movies they’ll like on their own. The large platform needs a recommendation engine algorithm to automate the search process for users.
Personalization of Movie Recommendations can be explained as users who watch A are likely to watch B. This is perhaps the most well known feature of a Netflix.
Machine learning impacts many exciting areas throughout our company. Historically, personalization has been the most well-known area, where machine learning powers our recommendation algorithms. We’re also using machine learning to help shape our catalog of movies and TV shows by learning characteristics that make content successful. We use it to optimize the production of original movies and TV shows in Netflix’s rapidly growing studio. Machine learning also enables us to optimize video and audio encoding, adaptive bitrate selection, and our in-house Content Delivery Network that accounts for more than a third of North American internet traffic. It also powers our advertising spend, channel mix, and advertising creative so that we can find new members who will enjoy Netflix.
Using machine learning pervasively across Netflix brings many new challenges where we need to push forward the state-of-the-art. This means coming up with new ideas and testing them out, be it new models and algorithms or improvements to existing ones, better metrics or evaluation methodologies, and addressing the challenges of scale. Our research spans many different algorithmic approaches including causal modeling, bandits, reinforcement learning, ensembles, neural networks, probabilistic graphical models, and matrix factorization.
Location Scouting for Movie Production (Pre-Production)
Using data to help decide on where and when best to shoot a movie set — given constraints of scheduling (actor/crew availability), budget(venue, flight/hotel costs), and production scene requirements (day vs night shoot, likelihood of weather event risks in a location). Notice this is more of a data science optimization problem rather than a machine learning model that makes predictions based on past data.
Network quality characterization and prediction
Network quality is difficult to characterize and predict. While the average bandwidth and round trip time supported by a network are well-known indicators of network quality, other characteristics such as stability and predictability make a big difference when it comes to video streaming. A richer characterization of network quality would prove useful for analyzing networks (for targeting/analyzing product improvements), determining initial video quality and/or adapting video quality throughout playback (more on that below).
Video quality adaptation during playback
Movies and shows are often encoded at different video qualities to support different network and device capabilities. Adaptive streaming algorithms are responsible for adapting which video quality is streamed throughout playback based on the current network and device conditions (see here for an example of our colleagues’ research in this area). The figure below illustrates the setup for video quality adaptation. Can we leverage data to determine the video quality that will optimize the quality of experience? The quality of experience can be measured in several ways, including the initial amount of time spent waiting for video to play, the overall video quality experienced by the user, the number of times playback paused to load more video into the buffer (“rebuffer”), and the amount of perceptible fluctuation in quality during playback.
Another area in which statistical models can improve the streaming experience is by predicting what a user will play in order to cache (part of) it on the device before the user hits play, enabling the video to start faster and/or at a higher quality. For example, we can exploit the fact that a user who has been watching a particular series is very likely to play the next unwatched episode. By combining various aspects of their viewing history together with recent user interactions and other contextual variables, one can formulate this as a supervised learning problem where we want to maximize the model’s likelihood of caching what the user actually ended up playing, while respecting constraints around resource usage coming from the cache size and available bandwidth. We have seen substantial reductions in the time spent waiting for video to start when employing predictive caching models.
And there are many such examples..
I hope you liked the article, Thank you!