Waymo's Self-driving Cars Are Trained Using Google's Search and Image Recognition Tech

Waymo is taking advantage of technology similar to what powers Google Photos and Google Image Search, to identify and quickly locate almost any object found into the company's driving data logs.

Waymo uses machine learning to detect and classify different types of objects and road features. Powerful neural nets make up Waymo's perception system learn to recognize objects and their corresponding behaviors from labeled examples of everything our Waymo Driver encounters, from joggers and cyclists, to traffic light colors and temporary road signs, or even trees and shrubs. Over the past decade, Waymo has built up an enormous collection of objects captured by the company's custom-designed hardware.

Although this wealth of experience is invaluable, it also poses a challenge: how to find the most useful examples in this sea of sensor data. Trying to locate specific examples – such as when our vehicles have observed a person carrying a skateboard – could be like looking for the proverbial needle in a haystack.

However, through a collaboration between Waymo and Google Research, Waymo says it has drawn from Google’s expertise in web search to develop the Content Search tool.
By using technology similar to what powers Google Photos and Google Image Search, Waymo’s Content Search lets the company's engineers quickly locate just about any object in our driving history and data logs — essentially turning Waymo’s 20 million miles of on-road experience into a searchable catalog of billions of objects.

Everything that a Waymo vehicle learns can be shared amongst the company's entire fleet. This ability to learn both quickly and collectively is important because driving environments are constantly changing. For example, with personal mobility solutions gaining popularity (especially in urban areas), the Waymo Driver regularly encounters new forms of transportation. So Waymo wants to continually train its system to ensure that it can not only distinguish between a vehicle and a cyclist, but also between a pedestrian and a person on a scooter.

In the past, to find these distinct examples in Waymo's driving logs, the company's researchers relied on heuristic methods that parsed the data based on various features, such as an object's estimated speed and height. For instance, to locate examples of people riding scooters, Waymo might have looked through its log data for objects of a certain height traveling between 0 and 20 mph. While this method yielded relevant examples, the results were often too broad, since many objects share those attributes.

With Content Search, Waymo says it can now approach this type of data mining as a search problem. The core principle underlying this new tool is “knowledge transfer” — applying knowledge gained from solving one problem (such as finding all the “dogs” in your Google Photos album) to a different, but related problem (such as searching through Waymo's driving logs to identify all the times our Waymo Driver has driven past a dog). By indexing its massive catalog of driving data, Waymo engineers can find relevant data to train and improve the company's neural networks much faster.

With Content Search, Waymo's engineers can search the world via the sensor logs in multiple ways: They can conduct a “similarity search,” hone in on objects in ultra fine-grained categories, and search by text in the scene.

Similarity search allows Waymo to easily find items similar to a given object in Waymo's driving logs by running an image comparison query. For example, to improve models around cacti as vegetation, Waymo can start with any image of a cactus, whether it’s an example we have already found in its driving history, a photo of a cactus found online, or even a drawing of a cactus. Content Search then returns instances where Waymo's self-driving vehicles observed similar-looking objects in the real world.

This core image search model works by converting every object in Waymo's driving logs, whether a park bench, a trash can near the side of the road, or a moving objects, into embeddings, a machine learning technique that makes it possible to rank objects by how similar they are to each other. By creating embeddings for each object based on attributes and deploying a process similar to Google’s real-time embedding similarities matching service, Waymo says it can efficiently compare any query against the images in its driving logs and locate objects similar to the query in a matter of seconds.

A single class of object can vary in shape, form, and type. For instance, road debris can be anything from a plastic bag or a tire scrap to a cardboard box or a lost pair of pants. To build robust machine learning models that can generalize and detect the different articles we might come across on the road – even ones Waymo's vehicle has never seen before – engineers train Waymo's neural nets with a diverse range of examples.

To do this, Waymo utilizes ultra fine-grained search to find objects within a specific category. The backend for this search is a categorical machine learning model that helps Waymo's Content Search tool understand whether a specific object category is found in an image or not. This deep level of understanding opens up the ability to perform extraordinarily niche searches on objects that share a particular trait such as the make and model of a car, or even specific breeds of dogs.

Lastly, many objects on the road contain text which is relevant to driving, such as road signs, or the “oversized” notice on a large truck. Content Search uses an optical character recognition model to annotate Waymo's driving logs based on text and words found in a scene, enabling the company to read road signs, emergency vehicles, and other cars and trucks with signage.

With Content Search, Waymo says it is able to automatically annotate billions of objects in its driving history which in turn has exponentially increased the speed and quality of data the company sends for labeling. Waymo says that the ability to accelerate labeling has contributed to many improvements across its system, from detecting school buses with children about to step onto the sidewalk or people riding electric scooters to a cat or a dog crossing a street.