daniel bigham.ca

Turning Unsupervised Machine Learning Problems into Supervised Learning Problems
October 23, 2015

The learning that I've done in the area of machine learning is on the side of supervised machine learning. Unsupervised learning struck me as harder, etc, so until now I haven't really looked into it.

The other day it struck me how one can take "unsupervised" data and turn it into a kind of pseudo-supervised learning problem.

The basic idea is to take some video. The pixels in a frame of the video are inputs into your neural network, and the outputs of the neural network are a prediction about what the pixel values will be in the next frame of video. Thus, the number of inputs and the number of outputs are actually the same.

The beauty of this setup is that you do actually know what the pixel values are in the next frame, so you essentially have a supervised learning problem. You can use the true pixel values of the next frame to adjust the weights of your neural network.

Now, imagine you don't just feed the algorithm frames of flat video, but of stereo video, and you also feed it sensory data about how quickly the camera is rotating in terms of pitch/roll/yaw, and you give it information about how quickly the camera is accelerating in x/y/z space. You probably see where I'm going with this: The neural network will learn how camera rotation, for example, affects the relationship between the color of a pixel in one image and the color of the pixel in the next image.

Of course, you could also feed sound in, and if you were teaching a robot, you could feed in touch sensory information, etc, etc.

So what would result? Presumably, if your learning framework were set up well, and if it were powerful enough, it would start to implicitly build up models of the world so that it could better and better predict what comes next. For example, image occlusion... to be able to predict what it will look like when one person walks behind another, you need to be able to do "image segmentation" to know what pixels belong to one person and what pixels belong to another person, and you need to be able to sense their rough depth, etc, etc.

Perhaps I'm just re-hashing things that I've read in the past that are hinting at this same idea, but I found this realization somewhat breath-taking... to realize that the universe comes with a relatively infinite amount of supervised training data "for free", so long as your learning architecture is powerful enough to be able to connect the dots, so to speak.

One final sub-thought in this area is to connect this to my previous blog post: It would be possible to "help" the learning algorithm a bit by generating this video using a ray tracer, and to make some of the neural net's outputs things such as the Z depth of each pixel, which the ray tracer could provide precise supervised values for. This would encourage the learning algorithm in the right direction by forcing it to learn the importance of creating a distance-map of what it's seeing. Likewise, you could force it to learn how to do image segmentation by making some of the neural nets outputs pertain to image segmentation, and again, you could use the ray tracer to get a precisely labeled supervised data set for that.

Who knows, perhaps a recurrent neural net to predict the next frame, paired with perfect supervised data for z-depth and image segmentation, and given the computation power we'll have within 10 years in the cloud, could train a neural net like thing to do very impressive visual perception.

Imagine if Supervised Machine Learning Data Sets Were Almost Free
October 23, 2015

In the world of machine learning, there are two strategies: One is called "supervised" machine learning, which means that you feed the computer not only example problem inputs, but also tell the computer what answer you're hoping it would give. To contrast that, "unsupervised" machine learning is when you just give the computer a bunch of data and task the computer with finding meaningful patterns in the data.

So far, supervised approaches are the ones that are producing some head-turning results. But there's a catch: Having human beings label giant data sets with the right answers is expensive.

An interesting realization I had the other day is that our ability to use computer graphics to generate images (ray tracers used for 3D animated movies, etc.) gives us an impressive ability. Imagine you're teaching a computer vision system to label all of the pixels in an image as to what type of "thing" it is. A pixel might be a chair, a wall, the carpet, etc. Rather than taking photos of a room and labeling every pixel by hand, ray tracing could be used to generate a photo-realistic image and automatically label each pixel according to the 3D scene model. Bingo: You have a perfectly labeled image for your supervised machine learning algorithm. Want to generate another image from another angle? Easy: Just change the camera position or angle, run the ray tracer, and you're done.

Now, you'd want to produce images not from just one "scene", but from many different rooms in many different buildings. But that's easy to automate as well: Just create a probabilistic model of room layouts, building layouts, etc, and generate new 3D scenes from that model, each with different lightings, object arrangements, etc, etc. Now sit back and watch a billion perfectly tagged images be produced for your computer vision data set. Don't have the computer resources on your desk? No problem, just use the cloud. Computation a bit expensive? As noted in a blog post below, a 10x performance increase is expected next year, and presumably the next 10 years will continue to see good gains.

At the end of the day, I would presume that creating a machine learning data set of this type of size 1 billion images would be millions of times cheaper than actually hiring people to take photos and label the image pixels by hand. Something that I find exciting is when you have a powerful technology like ray tracing, and then given a few years it intersects another technology, like machine learning, and the two synergize in an incredibly powerful way. In a nut shell, that's the old technology story -- the thing that is behind what we sometimes feel are exponential gains: The coming together of different threads in surprising new ways.

10x Jump in GPU Performance
October 22, 2015

I heard today (probably not new news) that Nvidia has said that next year's GPUs will have a 10x increase in neural network performance over today's hardware. That stopped me in my tracks.

The combination of being able to use massive cloud based computation (not needing fancy hardware on your desk that sits inactive most of the time), and having a 10x decrease in the cost per FLOP for doing neural net training, will be a pretty incredible thing. We've already seen some incredible demos of this technology, so imagine what folks will be able to do as the cost/FLOP falls dramatically.

older >>