At this time, a minivan with no one behind the wheel is driving through a suburb of Phoenix, Arizona. And although that may seem alarming, the company that built the "brain" that drives the autonomy of the car wants to assure you that it is totally safe. Waymo, the autonomous unit of Alphabet, is the only company in the world that today has fully driverless vehicles on public roads. This was possible thanks to a sophisticated set of neural networks based on machine learning, of which very little is known, until now.
For the first time, Waymo is breaking the curtain on what could be said to be the most important (and most difficult to understand) piece of its technology package. The company, which is in the lead in the driverless car race for most metrics, confidently states that their cars have the most advanced brains on the road today. This is thanks to an advantage in the investment in artificial intelligence, some strategic acquisitions of the sister company Google and a close working relationship with the internal team of AI researchers of the technological giant.
Anyone can buy a group of LIDAR cameras and sensors, slap them in a car and call it autonomous. But training a car without a driver to behave like a human driver or, what is more important, driving better than a human being, is on the verge of artificial intelligence research. The Waymo engineers are modeling not only how cars recognize objects on the road, but how human behavior affects the behavior of automobiles. And they are using deep learning to interpret, predict and respond to the accumulated data of their 6 million miles driven on public roads and 5 billion driven in simulation.
Anca Dragan, one of Waymo's newest employees, is at the forefront of this project. He just joined the company in January after leading InterACT Lab at the University of California at Berkeley, which focuses on the interactions between humans and robots. (A photo on the Berkeley website presents Dragan grinning widely as a robot arm serves him a steaming cup of coffee.) Its function is to ensure that our interactions with Waymo's driverless cars, such as pedestrians, passengers and other drivers, be totally positive. Or to put it another way: she is our barrier against the inevitable robot revolution.
Dragan has to find the balance. While we do not want boss robots, we do not want robust robot controllers either. For example, if you go on a busy 65 mph highway and want to join the left lane, you can move on until the other drivers finally give you space. A car without a driver who has been trained to follow the rules of the road may have difficulty doing so. Recently appeared on Twitter a video that shows one of the Waymo minivans that tries to join a busy road that almost does not work.
"How can we adapt it to the drivers with whom you share the road?" Says Dragan. "How do you adapt it to be more comfortable or to drive more naturally?" Those are the subtle improvements that if you want them to work, you really need a system that works badly.
For an innovation that is supposed to save us from traffic deaths, it has been an extremely disappointing few months. In March, a 49-year-old woman was hit and killed by an Uber vehicle that was driving automatically while crossing the street in Tempe, Arizona. A few weeks later, the owner of a Tesla Model X died in a horrific accident while using the autopilot, the semi-autonomous driver assistance system of the automaker. And just last week, a self-directed Waymo minivan was disemboweled in T by a Honda sedan that had strayed into oncoming traffic.
Meanwhile, the public is becoming increasingly skeptical. Regulators are beginning to reconsider the free pass they were considering granting companies to build and test vehicles without a driver. In the midst of all this uncertainty, Waymo invited me to his headquarters in Mountain View, California, for a series of in-depth interviews with the company's best humans in artificial minds.
Waymo is located within X, Google's high-risk research and development laboratory, which is located a few miles from the Googleplex main campus. (In 2015, when Google was restructured into a conglomerate called Alphabet, X removed the Google from its name). A year later, Google's self-propelled automobile project "graduated" and became an independent company called Waymo. However, the autonomous team is still housed on the mothership along with employees working on delivery drones and Internet balloons.
The building, an old shopping center, is unusual in the Bay Area. The only thing that distinguishes it is the pair of Chrysler Pacifica minivans that circulate in the parking lot and that occasionally stop so employees can take selfies in front of them. In Googleland, celebrities are cars.
Waymo already has a huge advantage over its competitors in the field of autonomous driving. It has driven the most miles: 6 million on public roads and 5 billion in simulation, and has accumulated vast reserves of valuable data in the process. It has partnerships with two major automakers, Fiat Chrysler and Jaguar Land Rover, with several more in the pipeline. Their test vehicles are on the road in Texas, California, Michigan, Arizona, Washington and Georgia. And he plans to launch a fully driverless commercial taxi service in Arizona later this year.
Now, the company wants its advantages in the still growing field of AI to be more widely known. Waymo CEO John Krafcik made a presentation at the company's annual I / O developer conference this week. And the message was clear: our cars can see beyond, perceive better and make quick decisions faster than others.
"It's a really difficult problem if you work in a completely autonomous vehicle … because of the capacity requirements and the accuracy requirements," says Dmitri Dolgov, Waymo's technology director and vice president of engineering. "And the experience really matters."
Deep learning, which is a type of machine learning that uses many layers in a neural network to analyze data in different abstractions, is the perfect tool to improve the perception and behavior of the auto car, says Dolgov. "And we started from the beginning … just when the revolution was happening here, next."
AI specialists from the Google Brain team regularly collaborate with Dolgov and their Waymo engineering colleagues on methods to improve the accuracy of their driverless cars. Lately, they have been working together on some of the most active elements of AI research, such as "automated machine learning", in which neural networks are used to train other neural networks. Waymo can be your own company, but when it comes to projecting an aura of invulnerability, it helps to have your older brother and much harder behind you.
Waymo's sudden interest in polishing his AI credentials is related to his high-effort effort to deploy vehicles that do not require someone in the driver's seat. To date, Waymo is the only company that assumes this risk. The rest of the industry is rushing to catch up, buying small startups in an effort to launch their own efforts of autonomy. In addition, key members of the Google self-taught team have allowed themselves to hang their own tile, attracted by great possibilities and a lot of money, and have left the technological giant to fight with news about "wear and tear" and "brain drain".
Former members of the Google Autonomous Team and outside experts acknowledge that Waymo seems to have a great advantage in the field, but they recognize that their competitors are likely to catch up. After all, Waymo does not have a monopoly on machines with brains.
"As strong as Google is," says Dave Ferguson, a former senior engineer at Google's self-management team who has since left to start his own company. "The field is stronger ."
This was not always the case. In the 2000s, the field was quite weak.
Neural networks, a type of machine learning where programmers create models that filter large amounts of data and look for patterns together, were not yet hot. A great change was taking place from neural networks that were quite shallow (two or three layers) to deep networks (two-digit layers). While the concept dates back to the 1950s, at the time of the birth of research in AI, most computers were not powerful enough to process all the necessary data. All that changed with the ImageNet competition in 2009.
ImageNet began as a poster for researchers at Princeton University, which was screened at a 2009 conference on machine vision and pattern recognition in Florida. (Posters are a typical way to share information in these types of machine learning conferences.) From there, it became a set of image data, then a competition to see who could create an algorithm that would identify most images with the lowest error rate. The data set was "trimmed" from around 10,000 images to only one thousand categories of images or "classes", including plants, buildings and 90 of the 120 dog breeds. Around 2011, the error rate was approximately 25%, which means that one in four images was incorrectly identified with the equipment algorithms.
Help came from an unexpected place: powerful graphics processing units (GPUs) that are usually found in the world of video games. "People started to realize that those devices could actually be used to do machine learning," says Vincent Vanhoucke, a former voice researcher at Google who now serves as the company's technical leader for artificial intelligence. "And they were especially suitable for executing neural networks."
The biggest breakthrough was in 2012, when IA researcher Geoffrey Hinton and his two graduate students, Ilya Sutskever and Alex Krizhevsky, showed a new way to attack the problem: a deep convolutional neuronal network for the ImageNet challenge that could detect images of everyday objects. His neural network embarrassed the competition, reducing the error rate in image recognition to 16 percent, from 25 percent of the other methods produced.
"I think it was the first time that a deep-learning approach based on neural networks beat the pants with a more standard approach," says Ferguson, the former Google engineer. "And since then, we've never looked back."
Krizhevsky takes a more circumspect approach to his role in the 2012 ImageNet Challenge. "I think we were in the right place at the right time," he tells me. He attributes his success to his penchant for programming GPUs to execute the code of the team's neural network, allowing them to run experiments that would normally take months in a matter of days. And Sutskever made the connection to apply the technique to the ImageNet competition, he says.
The success of Hinton and his team "triggered a snowball effect," says Vanhoucke. "A lot of innovation came from that." An immediate result was that Google acquired Hinton's DNNresearch company, which included Sutskever and Krizhevsky, for an undisclosed sum. Hinton stayed in Toronto, and Sutskever and Krizhevsky moved to Mountain View. Krizhevsky joined Vanhoucke's team in Google Brain. "And that's when we started thinking about applying those things to Waymo," says Vanhoucke.
Another Google researcher, Anelia Angelova, was the first to contact Krizhevsky about the application of his work to the Google car project. Neither of them worked officially on that team, but the opportunity was too good to ignore. They created an algorithm that could teach a computer to learn what a pedestrian was like -analyzing thousands of street photos- and identify the visual patterns that define a pedestrian. The method was so effective that Google began to apply the technique to other parts of the project, including prediction and planning.
Problems arose almost immediately. The new system was making too many mistakes, erroneously labeling cars, traffic signs and pedestrians. Nor was it fast enough to run in real time. So Vanhoucke and his team went through the images, where they discovered that most of the errors were errors made by human taggers . Google brought them to provide a baseline, or "fundamental truth", to measure the success rate of the algorithm, and instead, added errors. The problem with autonomous cars turned out to be people.
After correcting for human error, Google still had problems to modify the system until it could recognize the images instantly. Working closely with Google's autonomous automotive team, IA researchers decided to incorporate more traditional machine learning approaches, such as decision trees and cascading classifiers, with neural networks to achieve "the best of both worlds," recalls Vanhoucke.
"It was a very, very exciting moment for us to really show those techniques that have been used to find images of cats and interesting things on the web," he says. "Now, they were actually being used to improve the safety of cars without a driver."
Krizhevsky left Google several years later, saying he "lost interest" in the work. "I got depressed for a while," he admits. His departure disconcerted his colleagues at Google, and since then he has acquired a mythical status. (Ferguson called him an "AI whisperer.") Today, Krizhevsky wonders if these early successes will be enough to give Google an insurmountable advantage in the field of autonomy. Other car and technology companies have already understood the importance of machine learning, and the Waymo data may be too specific to be extrapolated on a global scale.
"I think Tesla has the unique advantage of being able to collect data from a wide variety of environments because there are Tesla owners with automatic driving hardware around the world," he tells me. "This is very important for machine learning algorithms to generalize, so I guess at least from the data side, if not from the algorithmic side, Tesla could be ahead."
AI and machine learning are essential for the cars they drive. But some of Waymo's competitors, which include former members of the Google autonomous team, are wondering how long the company's benefits will last.
Sterling Anderson is the former director of Autopilot at Tesla and co-founder of Aurora Innovation, which started with the former director of Google's stand-alone automobile program, Chris Urmson. He says that a natural consequence of the improvements in artificial intelligence is that the great advances like those of Waymo are "less significant as they have been". In other words, everyone working in autonomous cars by 2018 is already using deep learning and neural networks from The Beginning. The brightness is off. And like an old piece of fruit, many of the data from the first days have become soft and inedible. A mile driven in 2010 is not driven by a mile in 2018.
"The data stays on the floor after several years," says Anderson. "It becomes useful for learning and becomes useful for the evolution of architecture and the evolution of the approach, but at some point, claiming that I have X million miles or X million miles, or whatever, becomes less significant . "
Waymo the engineers agree. "For the sake of machine learning, there is a point of diminishing performance," says Sacha Arnoud, director of the company's automatic learning and perception division. "Driving 10 times more will not necessarily give you much larger datasets because what matters is the uniqueness of the examples you find."
In other words, each additional mile that Waymo accumulates must be interesting to be relevant to the process of training the company's neural networks. When cars encounter extreme cases or other unique scenarios, such as jaywalkers or parallel parking cars, they are filtered through the Waymo simulator to become thousands of iterations that can be used for additional training.
Robots can also be tricked. Adverse images, or images designed to deceive computer vision software, can be used to undermine or even crash cars without a driver. The stickers can be applied to a stop sign to confuse an artificial vision system and think it is a 45 mph sign.
A neural network trained by Google to identify everyday objects was recently tricked into thinking that a turtle printed in 3D was actually a gun. The Waymo engineers say they are creating redundancy in their system to address these possibilities. Add this to the long list of concerns surrounding driverless cars, which include hacking, ransomware, and privacy violations.
"Tell me the difference between a cat and a dog".
Dolgov is sitting in one of the X conference rooms, slate marker in hand, MacBook Pro spread out in front of him, asking me to describe the difference between Garfield and Odie.
Before I can stammer an answer, Dolgov continues: "If I give you an image, and I ask" is it a cat or a dog? ", You will know it very quickly, right? But if I ask you to describe me how you came to that conclusion, it would not be trivial, you think it has something to do with the size of the thing, the number of legs is the same, the number of tails is the same, usually the same number of ears, but it's not obvious . "
This type of question is very suitable for deep learning algorithms, says Dolgov. It is one thing to invent a set of basic rules and parameters, such as red means to stop, green means to go and teach a computer to distinguish between different types of traffic signals. Teaching a computer to take a pedestrian from an ocean of sensor data is easier than describing the difference, or even coding it.
Waymo uses an automated process and human taggers to train their neural networks. After being trained, these giant data sets must also be reduced and reduced so they can be deployed in the real world in Waymo vehicles. This process, similar to the compression of a digital image, is key to building the infrastructure to scale a global system.
If you look at the images captured by the cameras of the cars and place them next to the same scene constructed from the data of the laser sensor of the vehicle, you begin to see the enormity of the problem that Waymo tries to address. If you have never seen a LIDAR rendering, the best way to describe it is Google Street View as a psychedelic black light poster.
These images offer a panoramic view of the car without a driver and what "sees" around it. Pedestrians are represented as yellow rectangles, other vehicles are purple boxes, and so on. Waymo has categories for "cat dog" and "bird squirrel", among others, that he uses for animals. (It turns out that the differences between a dog and a cat are not entirely relevant to autonomous vehicles.) But behind that, Waymo is training its algorithms to perceive the atypical actors in the environment: a construction worker sunk in a sewer, someone in a horse costume, a man standing in the corner turning an arrow-shaped sign.
To get a human driver out of the equation, the car must adapt to the strangest elements of a typical unit. "Odd events really matter," Dolgov tells me, "especially if you're talking about removing a driver ."
Programming the car to answer someone who crosses the street during the day on one thing, but getting it noticed and reacting to a jaywalker is completely different. What happens if that jaywalker stops at a median? Waymo driverless cars will react cautiously as pedestrians usually walk to a median and wait. What happens if there is no median? The car recognizes it as unusual behavior and slows down enough to allow the pedestrian to cross. Waymo has created models that use machine learning to recognize and react to normal and unusual behaviors.
Neural networks need an excess of data to train. That means that Waymo has accumulated "hundreds of millions" of vehicle labels alone. To help put that in context, Waymo's perception manager, Arnoud, calculated that a person who tags a car per second would take 20 years to reach 100 million. Operating every hour of every day of every week and marking 10 labels per second, it still takes four months for the Waymo machines to move through the entire data set during their training process, says Arnoud.
It takes more than a good algorithm to break free of the geofenced test sites of the suburbs of Phoenix. If Waymo wants its driverless cars to be smart enough to operate in any environment and under any conditions, defined as level 5 autonomy, it needs an infrastructure powerful enough to scale its autonomous driving system. Arnoud calls it the "industrialization" or "production" of AI.
As part of Alphabet, Waymo uses Google data centers to train their neural networks. Specifically, it uses a high-power cloud computing hardware system called "tensor processing units," which underpins some of the company's most ambitious and far-reaching technologies. Normally, this work is done using commercially available GPUs, often from Nvidia. But Google has opted in recent years to build part of this hardware and optimize its own software. TPUs are "orders of magnitude" faster than CPUs, says Arnoud.
The future of AI in Waymo is not intelligent vehicles (sorry, Knight Rider fans). This is cutting-edge research, such as machine learning, in which the process of creating machine learning models is automated. "Essentially, the idea that you have AI automatic learning that is creating other AI models that really solve the problem you're trying to solve," says Arnoud.
This becomes extremely useful for driving in areas with unclear lane markings. These days, the most challenging driving environments require that cars that drive automatically make orientation decisions without white lines, Botts points or clear roadside demarcations. If Waymo can build machine learning models to train their neural networks to drive on streets with unclear markings, then the self-directed Waymo cars can locate the suburbs of Phoenix in their rear view and eventually hit the road.