Jeff Dean’s Lecture for YC AI
Short Description
Jeff Dean is a Google Senior Fellow in the Research Group, where he leads the Google Brain project. He spoke to the Y...
Description
Building Intelligent Systems with Large Scale Deep Learning Jeff Dean Google Brain team g.co/brain Presenting the work of many people at Google
Google Brain Team Mission: Make Machines Intelligent. Improve People’s Lives.
How do we do this? ● Conduct long-term research (>200 papers, see g.co/brain & g.co/brain/papers) ○
Unsupervised learning of cats, Inception, word2vec, seq2seq, DeepDream, image captioning, neural translation, Magenta, ML for robotics control, healthcare, …
● Build and open-source systems like TensorFlow (see tensorflow.org and https://github.com/tensorflow/tensorflow)
● Collaborate with others at Google and Alphabet to get our work into the hands of billions of people (e.g., RankBrain for Google Search, GMail Smart Reply, Google Photos, Google speech recognition, Google Translate, Waymo, …)
● Train new researchers through internships and the Google Brain Residency program
Main Research Areas ● General Machine Learning Algorithms and Techniques ● Computer Systems for Machine Learning ● Natural Language Understanding ● Perception ● Healthcare ● Robotics ● Music and Art Generation
Main Research Areas ● General Machine Learning Algorithms and Techniques ● Computer Systems for Machine Learning ● Natural Language Understanding ● Perception ● Healthcare ● Robotics ● Music and Art Generation
research.googleblog.com/2017/01 /the-google-brain-team-looking-ba ck-on.html
1980s and 1990s Accuracy
neural networks other approaches
Scale (data size, model size)
1980s and 1990s Accuracy
more compute
neural networks other approaches
Scale (data size, model size)
Now Accuracy
more compute
neural networks other approaches
Scale (data size, model size)
Growth of Deep Learning at Google
and many more . . . .
Directories containing model description files
Experiment Turnaround Time and Research Productivity ● Minutes, Hours: ○ Interactive research! Instant gratification!
● 1-4 days ○ Tolerable ○ Interactivity replaced by running many experiments in parallel
● 1-4 weeks ○ High value experiments only ○ Progress stalls
● >1 month ○ Don’t even try
Build the right tools
Google Confidential + Proprietary (permission granted to share within NIST)
Open, standard software for general machine learning Great for Deep Learning in particular http://tensorflow.org/ and
https://github.com/tensorflow/tensorflow
First released Nov 2015 Apache 2.0 license
TensorFlow Goals Establish common platform for expressing machine learning ideas and systems Make this platform the best in the world for both research and production use Open source it so that it becomes a platform for everyone, not just Google
TensorFlow Scaling
Near-linear performance gains with each additional 8x NVIDIA® Tesla® K80 server added to the cluster
TensorFlow supports many platforms
CPU
GPU iOS
Android
Raspberry Pi 1st-gen TPU
Cloud TPU
TensorFlow supports many languages Java
2013
2011
2013 2013 2010
late 2015
ML is done in many places
TensorFlow GitHub stars by GitHub user profiles w/ public locations Source: http://jrvis.com/red-dwarf/?user=tensorflow&repo=tensorflow
TensorFlow: A Vibrant Open-Source Community ●
Rapid development, many outside contributors ○ 475+ non-Google contributors to TensorFlow 1.0 ○ 15,000+ commits in 15 months ○ Many community created tutorials, models, translations, and projects ■ ~7,000 GitHub repositories with ‘TensorFlow’ in the title
●
Direct engagement between community and TensorFlow team ○ 5000+ Stack Overflow questions answered ○ 80+ community-submitted GitHub issues responded to weekly
●
Growing use in ML classes: Toronto, Berkeley, Stanford, ...
Google Photos
[glacier] Google Cloud Platform
Confidential & Proprietary
24
24
Reuse same model for completely different problems Same basic model structure trained on different data, useful in completely different contexts Example: given image → predict interesting pixels
www.google.com/sunroof
We have tons of vision problems Image search, StreetView, Satellite Imagery, Translation, Robotics, Self-driving Cars,
Computers can now see Large implications for healthcare
Google Confidential + Proprietary (permission granted to share within NIST)
MEDICAL IMAGING Using similar model for detecting diabetic retinopathy in retinal images
Performance on par or slightly better than the median of 8 U.S. board-certified ophthalmologists (F-score of 0.95 vs. 0.91). http://research.googleblog.com/2016/11/deep-learning-for-detection-of-diabetic.html
Computers can now see Large implications for robotics
Google Confidential + Proprietary (permission granted to share within NIST)
Combining Vision with Robotics “Deep Learning for Robots: Learning from Large-Scale Interaction”, Google Research Blog, March, 2016 “Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection”, Sergey Levine, Peter Pastor, Alex Krizhevsky, & Deirdre Quillen, Arxiv, arxiv.org/abs/1603.02199
Self-Supervised and End-to-end Pose Estimation
Confidential + Proprietary
TCN + Self-Supervision (No Labels!)
Confidential + Proprietary
Scientific Applications of ML
Google Confidential + Proprietary (permission granted to share within NIST)
Predicting Properties of Molecules Toxic?
Aspirin
Message Passing Neural Net
Bind with a given protein? Quantum properties.
● ● ● ● ●
Chemical space is too big, so chemists often rely on virtual screening. Machine Learning can help search this large space. Molecules are graphs, nodes=atoms and edges=bonds (and other stuff) Message Passing Neural Nets unify and extend many neural net models that are invariant to graph symmetries State of the art results predicting output of expensive quantum chemistry calculations, but ~300,000 times faster
https://research.googleblog.com/2017/04/predicting-properties-of-molecules-with.html and https://arxiv.org/abs/1702.05532 and https://arxiv.org/abs/1704.01212 (latter to appear in ICML 2017)
Measuring live cells with image to image regression “Seeing More”
Enabling technology: Image to image regression Input
True Depth
Predicted Depth
Depth prediction on portrait data
Applications for camera effects Input
Saturation
Defocus
Predict cellular markers from transmission microscopy?
Human cancer cells / DIC / nuclei (blue) and cell mask (green)
Human iPSC neurons / phase contrast / nuclei (blue), dendrites (green), and axons (red)
Scaling language understanding models
Google Confidential + Proprietary (permission granted to share within NIST)
Sequence-to-Sequence Model Target sequence
[Sutskever & Vinyals & Le NIPS 2014]
X
Y
Z
Q
__
X
Y
Z
v
Deep LSTM A
B
C
Input sequence
D
Sequence-to-Sequence Model: Machine Translation Target sentence
[Sutskever & Vinyals & Le NIPS 2014]
How
v
Quelle
est
votre
Input sentence
taille?
Sequence-to-Sequence Model: Machine Translation Target sentence
[Sutskever & Vinyals & Le NIPS 2014]
How
tall
How
v
Quelle
est
votre
Input sentence
taille?
Sequence-to-Sequence Model: Machine Translation Target sentence
[Sutskever & Vinyals & Le NIPS 2014]
How
tall
How
are
v
Quelle
est
votre
Input sentence
taille?
tall
Sequence-to-Sequence Model: Machine Translation Target sentence
[Sutskever & Vinyals & Le NIPS 2014]
How
tall
How
are
you?
v
Quelle
est
votre
Input sentence
taille?
tall
are
Sequence-to-Sequence Model: Machine Translation At inference time: Beam search to choose most probable [Sutskever & Vinyals & Le NIPS 2014] over possible output sequences
v
Quelle
est
votre
Input sentence
taille?
Incoming Email
Smart Reply Small Feed-Forward Neural Network
Google Research Blog - Nov 2015 Activate Smart Reply?
yes/no
Incoming Email
Smart Reply Small Feed-Forward Neural Network
Google Research Blog - Nov 2015 Activate Smart Reply?
yes/no
Generated Replies
Deep Recurrent Neural Network
Smart Reply April 1, 2009: April Fool’s Day joke Nov 5, 2015: Launched Real Product Feb 1, 2016: >10% of mobile Inbox replies
Sequence to Sequence model applied to Google Translate
Google Confidential + Proprietary (permission granted to share within NIST)
https://arxiv.org/abs/1609.08144
Google Neural Machine Translation Model Y1
One model replica: one machine w/ 8 GPUs
Encoder LSTMs
Y2
SoftMax Decoder LSTMs
Gpu8
Gpu8
8 Layers
+
+
+ +
Gpu3
+
+ Gpu3
Gpu2
Attention
Gpu2
Gpu2 Gpu1 Gpu1
X3
X2
Y1
Y3
Model + Data Parallelism Parameters distributed across many parameter server machines
Many replicas
Params
Params
...
Params
...
Neural Machine Translation perfect translation
6 Translation quality
5
human neural (GNMT)
4
phrase-based (PBMT)
3 2
Closes gap between old system and human-quality translation by 58% to 87%
1 0
English English English Spanish French > > > > > Spanish French Chinese English English
Translation model
Chinese > English
Enables better communication across the world
research.googleblog.com/2016/09/a-neural-network-for-machine.html
BACKTRANSLATION FROM JAPANESE (en->ja->en) Phrase-Based Machine Translation (old system): Kilimanjaro is 19,710 feet of the mountain covered with snow, and it is said that the highest mountain in Africa. Top of the west, “Ngaje Ngai” in the Maasai language, has been referred to as the house of God. The top close to the west, there is a dry, frozen carcass of a leopard. Whether the leopard had what the demand at that altitude, there is no that nobody explained. Google Neural Machine Translation (new system): Kilimanjaro is a mountain of 19,710 feet covered with snow, which is said to be the highest mountain in Africa. The summit of the west is called “Ngaje Ngai” God ‘s house in Masai language. There is a dried and frozen carcass of a leopard near the summit of the west. No one can explain what the leopard was seeking at that altitude.
Automated machine learning (“learning to learn”)
Google Confidential + Proprietary (permission granted to share within NIST)
Current: Solution = ML expertise + data + computation
Current: Solution = ML expertise + data + computation Can we turn this into: Solution = data + 100X computation ???
Early encouraging signs Trying multiple different approaches: (1) RL-based architecture search (2) Model architecture evolution (3) Learn how to optimize
Appeared in ICLR 2017
Idea: model-generating model trained via RL (1) Generate ten models (2) Train them for a few hours (3) Use loss of the generated models as reinforcement learning signal arxiv.org/abs/1611.01578
CIFAR-10 Image Recognition Task
Penn Tree Bank Language Modeling Task “Normal” LSTM cell
Cell discovered by architecture search
Learn2Learn: Learn the Optimization Update Rule
Neural Optimizer Search using Reinforcement Learning, Irwan Bello, Barret Zoph, Vijay Vasudevan, and Quoc Le. To appear in ICML 2017
More computational power needed Deep learning is transforming how we design computers
Google Confidential + Proprietary (permission granted to share within NIST)
Special computation properties reduced precision ok
about 1.2 × about 0.6 about 0.7
1.21042
NOT
× 0.61127 0.73989343
Special computation properties reduced precision ok
handful of specific operations
about 1.2 × about 0.6
1.21042
NOT
about 0.7
× 0.61127 0.73989343
×
=
Tensor Processing Unit v2
Revealed in May at Google I/O
Google-designed device for neural net training and inference ●
180 teraflops of computation, 64 GB of memory
TPU Pod 64 2nd-gen TPUs 11.5 petaflops 4 terabytes of memory
Programmed via TensorFlow Same program will run with only minor modifications on CPUs, GPUs, & TPUs
Will be Available through Google Cloud Cloud TPU - virtual machine w/180 TFLOPS TPUv2 device attached
Making 1000 Cloud TPUs available for free to top researchers who are committed to open machine learning research We’re excited to see what researchers will do with much more computation! g.co/tpusignup
Machine Learning in Google Cloud Custom ML models
TensorFlow
Pre-trained ML models
Vision API
Speech API
Jobs API
Natural Language API
Translation API
Video Intelligence API
Machine Learning Engine
Machine Learning for Higher Performance Machine Learning Models
Google Confidential + Proprietary (permission granted to share within NIST)
Device Placement with Reinforcement Learning Placement model (trained via RL) gets graph as input + set of devices, outputs device placement for each graph node
+19.3% faster vs. expert human for NMT model
Measured time per step gives RL reward signal
+19.7% faster vs. expert human for InceptionV3
Device Placement Optimization with Reinforcement Learning, Azalia Mirhoseini, Hieu Pham, Quoc Le, Mohammad Norouzi, Samy Bengio, Benoit Steiner, Yuefeng Zhou, Naveen Kumar, Rasmus Larsen, and Jeff Dean, to appear in ICML 2017, arxiv.org/abs/1706.04972
Now Accuracy
more compute
neural networks other approaches
Scale (data size, model size)
Future Accuracy
more compute
neural networks other approaches
Scale (data size, model size)
Example queries of the future Which of these eye images shows symptoms of diabetic retinopathy?
Please fetch me a cup of tea from the kitchen
Describe this video in Spanish
Find me documents related to reinforcement learning for robotics and summarize them in German
Conclusions Deep neural networks are making significant strides in speech, vision, language, search, robotics, healthcare, … If you’re not considering how to use deep neural nets to solve your problems, you almost certainly should be
g.co/brain More info about our work
g.co/brain
Thanks!
View more...
Comments