Massoud Mazar

Sharing The Knowledge

NAVIGATION - SEARCH

GPU assisted Machine Learning: Benchmark

A recent project at work, involving binary classification using a Keras LSTM layer with 1000 nodes which took almost an hour to run initiated my effort to speedup this type of problems. In my previous post, I explained the hardware and software configuration I'm about to use for this benchmark. Now I'm going to run the same training exercise with and without GPU and compare the runtimes.

The Data

The dataset I'm about to use is for about 1300 cases. For each case I have 100 data points in form of a time series and based on these data points, we need to do a binary classification.

The Model

For the sake of simplicity, I'm using only one LSTM layer with 1000 nodes and use the result to do binary classification:

def largeLayer_model():
    # create model
    model = Sequential()
    model.add(LSTM(1000, return_sequences=False, input_shape=(trainX.shape[1], trainX.shape[2])))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.001), metrics=['accuracy'])
    return model

Training is done with 100 epochs and batch size of 100:

start = time.time()
model = largeLayer_model()
summary = model.fit(trainX, trainY, epochs=100, batch_size=100, verbose=1)
end = time.time()
print("Elapsed {} sec".format(end - start))

With GPU

Since my machine was already configured to use GPU, results of the first run shows execution time when GPU is used:

Epoch 1/100
938/938 [==============================] - 3s 3ms/step - loss: 0.6386 - acc: 0.6365
Epoch 2/100
938/938 [==============================] - 2s 2ms/step - loss: 0.6292 - acc: 0.6748
Epoch 3/100
938/938 [==============================] - 1s 2ms/step - loss: 0.6215 - acc: 0.6780
.
.
.
Epoch 98/100
938/938 [==============================] - 1s 2ms/step - loss: 0.5791 - acc: 0.7015
Epoch 99/100
938/938 [==============================] - 1s 2ms/step - loss: 0.5719 - acc: 0.6919
Epoch 100/100
938/938 [==============================] - 1s 2ms/step - loss: 0.5733 - acc: 0.6791
Elapsed 150.32436347007751 sec

CPU only

To run the same test without GPU assistance, I uninstalled TensorFlow-GPU and installed the CPU only version:

pip3 uninstall tensorflow-gpu
pip3 install tensorflow

As expected, execution times are much higher:

Epoch 1/100
938/938 [==============================] - 25s 27ms/step - loss: 0.6370 - acc: 0.6546
Epoch 2/100
938/938 [==============================] - 25s 26ms/step - loss: 0.6141 - acc: 0.6684
Epoch 3/100
938/938 [==============================] - 24s 26ms/step - loss: 0.6063 - acc: 0.6834
.
.
.
Epoch 98/100
938/938 [==============================] - 24s 26ms/step - loss: 0.5743 - acc: 0.6940
Epoch 99/100
938/938 [==============================] - 24s 26ms/step - loss: 0.5627 - acc: 0.7058
Epoch 100/100
938/938 [==============================] - 24s 26ms/step - loss: 0.5703 - acc: 0.7004
Elapsed 2449.41867685318 sec

Conclusion

This specific software and hardware configuration showed GPU assisted training was 16x faster than the CPU only option. This GPU upgrade (Geforce GTX 1070 Ti and a 650W power supply) cost me around $550, which was well worth it.

Add comment