Motivation

In the last year I have installed and re-installed tensorflow at least 20 times. For reasons that I can’t understand there isn’t a simple and straightforward guide that describes the correct procedure to install tensorflow with NVIDIA gpu support on windows (on linux it’s a piece of cake). So, here’s what you need: patience and an NVIDIA gpu. Buckle up!

Introduction

First, it doesn’t matter which version of tensorflow you want to install, the important thing is that you keep this table in mind:

Source: tensorflow

Each version of tensorflow depends on a specific CUDA version, a specific version of the shared cuDNN libraries, and a c++ compiler. One by one we will install all these lovely things. For this tutorial we will install tensorflow 2.3, and consequently CUDA=10.1, cuDNN=7.6 and MSVC 2019. In case you wish to install another version, you can still follow the tutorial, but it is important to refer to the table to determine the versions.

We will follow these steps:

  1. MSVC 2019, Microsoft Visual Studio 2019
  2. CUDA 10.1
  3. cuDNN 7.6
  4. Anaconda
  5. Tensorflow 2.3

Microsoft Visual Studio 2019

Go here and download visual studio 2019. The community edition (the free one) is fine! Why do I need microsoft visual studio 2019? To compile the cuda libraries (written in c++) that we will install later. There are clearly other ways to get this functionality, Visual Studio 2019 is the fastest and easiest way to get it. During the installation, it will ask you to choose additional components, this is totally optional.
In case you haven’t chosen additional components the installer will ask you:

Just click continue.
This will take a while, I warned you, patience is a requirement. It may require a reboot, just do whatever it asks and everything will be fine.

CUDA ToolKit

Now, cuda toolkit! Why do I need the cuda toolkit? CUDA(Compute Unified Device Architecture) is a set of Programming APIs to use and program the Nvidia GPU cores (CUDA cores). Graphic Processign Units (GPU), are developed purely for graphics rendering (as the name suggests). The interesting thing is that GPUs have a lot of cores (normal CPU cores: 4-16 vs normal GPU cores: 128 +) but they are slow. So someone thought: It doesn’t matter if they are slow, we can still achieve great performance by making them all work together in parallel! So they wrote code so that they could use GPUs as GPGPU (General Purpose GPU) Computing. You are downloading these codes.

Referring to the table, you need to choose the cuda version appropriate for the version of tensorflow you want to install. In our case tensforflow 2.3 wants version 10.1 of the cuda toolkit. Get the corrected version here.

If you select the version you will see something like this. Here you will be asked to choose between network and local, nothing changes for the installation. In short it’s asking you:
Network: Download a program that downloads Cuda toolkit.
Local: Download the entire cuda toolkit package and then install it.

It depends on how stable your connection is, if it is not stable go with network, otherwise local is fine.

At this point the installation should not be complex. I’ll leave a couple of screenshots so you don’t feel lost.

If it takes a couple of minutes, it’s totally normal.
Still everything under control.
Quick is what we want, isn’t it?

Once the installation is complete we need to check if the paths to the cuda libraries are correctly placed inside our environment variables.

What the heck are environment variables? Environment variables are a list of folders, separated by a semicolon that identify the folders where the system should look when certain files are called from the command line or other processes.
There are two ways to control environment variables: via command line or via the control panel

Command Line
Open a Command Prompt or cmd. You can do this by pressing WIN+R, type cmd and press enter. Alternatively, look for it in the start menu. Type: echo %PATH%.

C:\Users\fra>echo %PATH%
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin;
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp;

If your pathways include these two, everything has gone right. Otherwise…

Control Panel

  1. Search for environment variables in the windows menu.

2. Select environment variables.

3. Check that the paths are correctly entered in the system variables and in the path. Clearly what you see in the images are the default paths. They should be the same, unless, you changed them during the installation. If so, you should also be able to tell if they are right or wrong.

I don’t see these paths!
Simply, click “new” and add the two magic folders. Just in case, check if these exist. Maybe you just selected another path and not the defaul one.

cuDNN Libraries

The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. Yes, you need it!
As usual, we refer to the table, we want to install tensorflow 2.3, we already have CUDA 10.1, so we need cuDNN 7.6.
To download the correct version, go to the version archive, select the version you are interested in, choose the operating system, register (if you don’t already have an account) and download the .zip file. Once downloaded, extract it to any folder. You should see this folder structure.

└── cuda/     
    ├── bin/     
    │   └── cudnn64_7.dll     
    ├── include/     
    │   └── cudnn.h     
    ├── lib/     
    │   └── x64/     
    │       └── cudnn.lib     
    └── NVIDIA_SLA_cuDNN_Support.txt

Now, full attention. Find in your local drive the folder where you installed cuda, by default it should be:

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1

What you need to do now is very simple, all files in the bin folder of cudnn must go into the bin folder of cuda, all files in the include folder of cudnn must go into the include folder of cuda and all files in the lib folder go into the lib folder.

cudnn64_7.dll -> C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin
cudnn.h -> C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\include
cudnn.lib -> C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\lib\x64

You can do this with a simple drag and drop. Clearly, Windows will ask you if you are sure of what you are doing. You will be sure.

Anaconda

At this point it is good practice to install tensorflow within a separate environment. There are many ways to manage virtual environments, my recommendation: anaconda. If you already have anaconda installed skip the following part.
Anaconda allows us to comfortably manage (through a command line interface) several python development environments, each with its own packages, dependencies and version of python. Cool isn’t it? Download Anaconda here.
Not sure if your machine is 32-bit or 64-bit? Open a cmd and type:

wmic os get osarchitecture

The otput should be pretty obvious!
Once you have downloaded the executable, open it and follow the instructions to install Anaconda.
Just be careful not to install anaconda in a folder with spaces or unicode characters (e.g. #, $), the default folder should be fine!

Click next!

At some point you should find this choice. “Add Anaconda3 to my PATH environmental variable“. If you followed the whole tutorial you should know what this means. I’ll summarize anyway: By adding anaconda3 to your PATH, it means that you can invoke conda from any system shell, e.g. cmd. If you plan to use conda environments in a massive way, adding conda to your environment variables is a good idea. This makes it easier for other software, e.g. an integrated development environment, to access your python environments.
In short: check this box only if you know what you are doing.

If everything went smoothly, looking through your programs you should find Anaconda Prompt open it and type:

conda --version

This will tell you the version of conda you have installed and confirm that everything went correctly.
here” and “here” you will find two short introductions to anaconda.

Tensorflow, finally!

First step: let’s create a separate python environment! Open an anaconda prompt, you can find it in the widnows menu. In case you have added anaconda3 to your environment variables, you can open a simple cmd. Type:

conda create -n tfgpu python=3.8

This command invokes conda with the create command, we pass an option -n which stands for “name” and specify a name, in this case “tfgpu”. You can choose a name that is easy to remember. And then we pass an argument, python=3.8 which indicates the version of python we want to install. Refer to the table to choose the correct version for your version of tensorflow.
Once the installation is complete, activate your new environment by typing:

C:\> conda activate tfgpu
(tfgpu) C:\>

Install our version of tensorflow with pip by typing:

(tfgpu) C:\> pip install tensorflow==2.3

This will take some time depending on the speed of your internet connection.
Well, it looks like we’re done here. Now let’s test if everything is working properly.

Test it!

Lets check that tensorflow recognizes our graphics card. Open an anconda promp and activate the environment you created earlier and type:

C:\> conda activate tfgpu
(tfgpu) C:\> python
Python 3.8.6 | packaged by conda-forge | (default, Jan 25 2021, 22:54:47) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

This will open the python interpreter of the environment.

>>> import tensorflow as tf

When importing tensorflow, the interpreter notifies us that the cudnn libraries have been opened correctly.

2021-02-20 09:37:45.561581: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll

Now let’s check if the graphics card of our computer is correctly detected.

>>> tf.config.experimental.list_physical_devices("GPU")

If everything went correctly you should receive output similar to this. In this case tensorflow is telling me that it correctly detected my rtx 2060 and giving me some specs on the hardware features.

2021-02-20 09:38:37.776650: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll 2021-02-20 09:38:37.808940: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce RTX 2060 computeCapability: 7.5 coreClock: 1.83GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 312.97GiB/s

Let’s create our first neural network and train it on our gpu.

#First let's import TensorFlow and Keras.
import tensorflow as tf
from tensorflow import keras
#loading the fashion MNIST dataset and split train/test
fashion_mnist = keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()
#split the training set into a validation set and a training set.
#scale the pixel intensities to the 0-1 range and convert them to floats, by dividing by 255.
X_valid, X_train = X_train_full[:5000] / 255., X_train_full[5000:] / 255.
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]
X_test = X_test / 255.
#get class names
class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
               "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]
#build the model
model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[28, 28]))
model.add(keras.layers.Dense(300, activation="relu"))
model.add(keras.layers.Dense(100, activation="relu"))
model.add(keras.layers.Dense(10, activation="softmax"))
model.summary()
#compile the model
model.compile(loss="sparse_categorical_crossentropy",
              optimizer="sgd",
              metrics=["accuracy"])
#fit the model with data
history = model.fit(X_train, y_train, epochs=30,
                    validation_data=(X_valid, y_valid))
#evaluate the model
model.evaluate(X_test, y_test)

Output:

Model: "sequential"
 
 Layer (type)                 Output Shape              Param #   
 flatten (Flatten)            (None, 784)               0         
 
 dense (Dense)                (None, 300)               235500    
 
 dense_1 (Dense)              (None, 100)               30100     
 
 dense_2 (Dense)              (None, 10)                1010      
 Total params: 266,610
 Trainable params: 266,610
 Non-trainable params: 0
 
 Epoch 1/30
 2021-02-20 10:09:11.579983: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.7357 - accuracy: 0.7597 - val_loss: 0.5125 - val_accuracy: 0.8258
 Epoch 2/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.4947 - accuracy: 0.8263 - val_loss: 0.4379 - val_accuracy: 0.8524
 Epoch 3/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.4474 - accuracy: 0.8437 - val_loss: 0.4279 - val_accuracy: 0.8528
 Epoch 4/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.4200 - accuracy: 0.8529 - val_loss: 0.4188 - val_accuracy: 0.8494
 Epoch 5/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.4008 - accuracy: 0.8576 - val_loss: 0.3922 - val_accuracy: 0.8660
 Epoch 6/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.3826 - accuracy: 0.8652 - val_loss: 0.3709 - val_accuracy: 0.8726
 Epoch 7/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.3702 - accuracy: 0.8684 - val_loss: 0.3711 - val_accuracy: 0.8730
 Epoch 8/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.3580 - accuracy: 0.8728 - val_loss: 0.3653 - val_accuracy: 0.8710
 Epoch 9/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.3480 - accuracy: 0.8756 - val_loss: 0.3861 - val_accuracy: 0.8648
 Epoch 10/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.3392 - accuracy: 0.8786 - val_loss: 0.3437 - val_accuracy: 0.8780
 Epoch 11/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.3299 - accuracy: 0.8809 - val_loss: 0.3416 - val_accuracy: 0.8758
 Epoch 12/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.3225 - accuracy: 0.8841 - val_loss: 0.3323 - val_accuracy: 0.8778
 Epoch 13/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.3156 - accuracy: 0.8870 - val_loss: 0.3462 - val_accuracy: 0.8766
 Epoch 14/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.3078 - accuracy: 0.8900 - val_loss: 0.3371 - val_accuracy: 0.8792
 Epoch 15/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.3021 - accuracy: 0.8910 - val_loss: 0.3322 - val_accuracy: 0.8806
 Epoch 16/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.2956 - accuracy: 0.8927 - val_loss: 0.3238 - val_accuracy: 0.8794
 Epoch 17/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.2887 - accuracy: 0.8950 - val_loss: 0.3286 - val_accuracy: 0.8810
 Epoch 18/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.2836 - accuracy: 0.8967 - val_loss: 0.3177 - val_accuracy: 0.8852
 Epoch 19/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.2777 - accuracy: 0.8994 - val_loss: 0.3619 - val_accuracy: 0.8706
 Epoch 20/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.2735 - accuracy: 0.9000 - val_loss: 0.3182 - val_accuracy: 0.8846
 Epoch 21/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.2681 - accuracy: 0.9037 - val_loss: 0.2998 - val_accuracy: 0.8896
 Epoch 22/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.2631 - accuracy: 0.9045 - val_loss: 0.3101 - val_accuracy: 0.8904
 Epoch 23/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.2586 - accuracy: 0.9067 - val_loss: 0.3218 - val_accuracy: 0.8802
 Epoch 24/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.2544 - accuracy: 0.9081 - val_loss: 0.2951 - val_accuracy: 0.8918
 Epoch 25/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.2495 - accuracy: 0.9101 - val_loss: 0.3156 - val_accuracy: 0.8862
 Epoch 26/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.2458 - accuracy: 0.9110 - val_loss: 0.3102 - val_accuracy: 0.8874
 Epoch 27/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.2426 - accuracy: 0.9126 - val_loss: 0.3018 - val_accuracy: 0.8906
 Epoch 28/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.2373 - accuracy: 0.9138 - val_loss: 0.2941 - val_accuracy: 0.8940
 Epoch 29/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.2337 - accuracy: 0.9158 - val_loss: 0.3036 - val_accuracy: 0.8884
 Epoch 30/30
 1719/1719 [==============================] - 3s 2ms/step - loss: 0.2311 - accuracy: 0.9161 - val_loss: 0.2928 - val_accuracy: 0.8924
 313/313 [==============================] - 0s 1ms/step - loss: 0.3239 - accuracy: 0.8843

Thank you for reading this far. All suggestions and feedback are accepted, please write me.

Source(s):
Post image: Ariel Davis
Table: Tensorflow