Neural networks are fantastic tools for classification and regression, but they are slow to train because they depend on gradient descent across thousands or even millions of parameters. In fact, they are a relatively old idea that has recently come back into vogue in part because speed increases in modern CPUs and particularly the large scale parallelization available in GPUs. With open source software and commodity hardware, the cost of learning and building useful neural networks is now extremely low. This post describes how I built a dedicated rig for testing neural networks for only a few hundred dollars and had it running in less than a day. It also serves as a how-to guide for avoiding some pitfalls in configuration.
Step 1 Hardware (GTX 770)
The only game in town is CUDA (sorry ATI) which means NVIDIA hardware. The main choice is between a dedicated GPU, (e.g.Tesla K20/K40/K80) or dual purpose graphics card (e.g. GTX 700s/800s/900s). The main considerations are speed, memory, and price. As of this writing, the winner of price/performance trade offs is the GTX 770 which provides the performance of the K20 dedicated GPU at a price in the $300 range.
Step 2 Operating System (Ubuntu 14.4)
Ubuntu is a highly popular, user friendly, and desktop focused Debian based Linux distribution. Add-ons, libraries, and drivers are available through Ubuntu repositories and third-party personal package archives (PPAs).
Step 3 Video Driver
After a fresh install, Ubuntu does not have a specific driver for your GTX card and instead uses an open source video driver called Nouveau. It will provide basic functionality but won’t provide access to any of the GPU specific hardware.
Start by finding NVIDIA’s recommended driver for your card. The NVIDIA Linux 64bit drivers (AMD64/EM64T) follow an incrementing version number convention (up to 340.65 official version as of this writing).
I ran into problems where my card was too new for the latest stable release, and iterated through each of the ostensible steps below before finding a working solution.
Ubuntu may automatically suggest and install the right driver for you either in a “restricted driver available” popup or by going to System -> Administration -> Additional Drivers
If no driver is automatically detected, or the suggested one is older than the one recommended on NVIDIA’s website you can download and install the driver directly. The recommended method is to pull the latest stable driver from the xorg-edgers ppa repository.
$ sudo add-apt-repository ppa:xorg-edgers/ppa -y $ sudo apt-get update # install the latest version $ sudo apt-get install nvidia-current
If your video card is very new, the most recent stable release may not work for it (e.g. booting to a blank screen). In my case I had to download a specific beta driver
#343 driver which was beta at the time of writing $ sudo apt-get install nvidia-343
Step 4 Compiler (CUDA 6.5)
NVIDIA provides the CUDA Toolkit which includes a compiler, builder, libraries, and drivers for running instructions on the GPU. They are mostly backwards compatible so I recommend getting the newest one (6.5 of this writing).
The toolkit is available in a number of forms: as a package, .deb file, and as a .run file. In my case, I lost a lot of time because of incompatible drivers so I recommend nailing down a working driver first and then going with the .run file which allows you to opt out of installing the bundled driver.
Some important paths and files (which may vary for your installation):
/usr/local/cuda-6.5 <- This is your root folder /usr/local/cuda-6.5/bin <- This is your bin folder /usr/local/cuda-6.5/bin/nvcc <- nvcc is your compiler /usr/local/cuda-6.5/lib64 <- This is your 64 bit libraries for linking
This is a good time to add environmental variables to point to CUDA paths. A great guide for how to use the “$LD_LIBRARY_PATH” is available here. In this context, we’re adding CUDA’s lib64 to the path so the compiler can find it.
Step 5 Python Stack (Python 2.7)
The Theano installation instructions do a good job detailing all the dependencies and compatibilities with different Python bundles. The specific installation instructions for Ubuntu show how to download and install the dependencies from a bare bones Python base installation that comes with a fresh install.
The main issue to point out is that it’s easier to start from scratch and build toward the right combination of packages than to start with a preexisting scientific distribution. I started initially with the free Anaconda installation, and it insisted on using its slow single threaded BLAS back-end, instead of the multithreaded package suggested below. More about that in the next section.
sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ git
Step 6 BLAS
Many matrix and algebraic operations are handled by highly specialized libraries. A family of both opensource and proprietary libraries go under BLAS (e.g. the cuda implementation cuBLAS, or other CPU implementations you might know like ATLAS or Intel MLK).
The Theano documentation recommends openblas which is available as a package “libopenblas-dev” It should be multicore out of the box.
sudo apt-get install libopenblas-dev
Step 7 Theano
There are several popular neural network libraries including Caffe, Theano, and PyLearn2 (which is built on Theano). I’ll be using Theano as it plays well with python, exposes a lot of the underlying machinery, and is well documented. There’s a nice brief tutorial here covering download and setting up the environmental variables.
When done with installation start the package tests
Then try the BLAS test on cpu.
THEANO_FLAGS=floatX=float32,device=cpu python `python -c "import os, theano; print os.path.dirname(theano.__file__)"`/misc/check_blas.py
Next enable the GPU and rerun the BLAS test
THEANO_FLAGS=floatX=float32,device=gpu python `python -c "import os, theano; print os.path.dirname(theano.__file__)"`/misc/check_blas.py
Then test that you can both import in python and that it uses the GPU.
Step 8 Build something!
There are indepth tutorials for getting started with Theano. Following the instructions for the Multilayer Perceptron on MNIST I trained a [768-500-10] model in under an hour with 1.6% error rate. Below are the final weights for each hidden node after 1000 epochs.
- “nvcc fatal : Value ‘sm_52′ is not defined for option ‘gpu-architecture'”
- This is caused by the Toolkit not being able to handle the architecture code automatically picked for your card. In most cases updating the toolkit to the latest version should fix this problem. In my case the card was newer than the latest stable release so I manually specified an earlier architecture as a theano flag in the .theanorc file.
[nvcc] fastmath = True flags = -arch=sm_30
- “WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu0 is not available (error: Unable to get the number of gpus available: unknown error)”
- In that case try specifying SUDO first before executing the python script