For more projects visit: https://setscholars.net
# Suppress warnings in Jupyter Notebooks
import warnings
warnings.filterwarnings("ignore")
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
In this notebook, we will learn how to build Movie Reviews Model using Keras in Python.
import numpy
from tensorflow.keras.datasets import imdb
# load the dataset
(X_train, y_train), (X_test, y_test) = imdb.load_data()
X = numpy.concatenate((X_train, X_test), axis=0)
y = numpy.concatenate((y_train, y_test), axis=0)
# summarize size
print(); print("Shape of data: ")
print(X.shape)
print(y.shape)
# Summarize number of classes
print(); print("Classes: ")
print(numpy.unique(y))
# Summarize number of words
print(); print("Number of words: ")
print(len(numpy.unique(numpy.hstack(X))))
# Summarize review length
print(); print("Review length: ")
result = [len(x) for x in X]
print("Mean %.2f words (%f)" % (numpy.mean(result), numpy.std(result)))
# plot review length
print()
plt.figure(figsize=(12,8))
plt.boxplot(result)
plt.show()
Shape of data: (50000,) (50000,) Classes: [0 1] Number of words: 88585 Review length: Mean 234.76 words (172.911495)
import warnings
warnings.filterwarnings("ignore")
# MLP for the IMDB problem
from tensorflow.keras.datasets import imdb
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Embedding
from tensorflow.keras.preprocessing import sequence
# load the dataset but only keep the top n words, zero the rest
top_words = 5000
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=top_words)
max_words = 500
X_train = sequence.pad_sequences(X_train, maxlen=max_words)
X_test = sequence.pad_sequences(X_test, maxlen=max_words)
# create the model
model = Sequential()
model.add(Embedding(top_words, 32, input_length=max_words))
model.add(Flatten())
model.add(Dense(250, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=20, batch_size=128, verbose=1)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print(); print("Accuracy: %.2f%%" % (scores[1]*100))
Model: "sequential_2" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= embedding_2 (Embedding) (None, 500, 32) 160000 _________________________________________________________________ flatten_2 (Flatten) (None, 16000) 0 _________________________________________________________________ dense_4 (Dense) (None, 250) 4000250 _________________________________________________________________ dense_5 (Dense) (None, 1) 251 ================================================================= Total params: 4,160,501 Trainable params: 4,160,501 Non-trainable params: 0 _________________________________________________________________ Epoch 1/20 196/196 [==============================] - 3s 15ms/step - loss: 0.6634 - accuracy: 0.5695 - val_loss: 0.3117 - val_accuracy: 0.8634 Epoch 2/20 196/196 [==============================] - 3s 15ms/step - loss: 0.1998 - accuracy: 0.9257 - val_loss: 0.3067 - val_accuracy: 0.8725 Epoch 3/20 196/196 [==============================] - 3s 15ms/step - loss: 0.0595 - accuracy: 0.9853 - val_loss: 0.3975 - val_accuracy: 0.8654 Epoch 4/20 196/196 [==============================] - 3s 15ms/step - loss: 0.0123 - accuracy: 0.9986 - val_loss: 0.4975 - val_accuracy: 0.8636 Epoch 5/20 196/196 [==============================] - 3s 15ms/step - loss: 0.0021 - accuracy: 0.9999 - val_loss: 0.5334 - val_accuracy: 0.8664 Epoch 6/20 196/196 [==============================] - 3s 15ms/step - loss: 8.3947e-04 - accuracy: 1.0000 - val_loss: 0.5613 - val_accuracy: 0.8673 Epoch 7/20 196/196 [==============================] - 3s 15ms/step - loss: 4.8891e-04 - accuracy: 1.0000 - val_loss: 0.5835 - val_accuracy: 0.8678 Epoch 8/20 196/196 [==============================] - 3s 15ms/step - loss: 3.4013e-04 - accuracy: 1.0000 - val_loss: 0.6030 - val_accuracy: 0.8684 Epoch 9/20 196/196 [==============================] - 3s 15ms/step - loss: 2.4743e-04 - accuracy: 1.0000 - val_loss: 0.6194 - val_accuracy: 0.8684 Epoch 10/20 196/196 [==============================] - 3s 15ms/step - loss: 1.8089e-04 - accuracy: 1.0000 - val_loss: 0.6346 - val_accuracy: 0.8686 Epoch 11/20 196/196 [==============================] - 3s 15ms/step - loss: 1.4279e-04 - accuracy: 1.0000 - val_loss: 0.6487 - val_accuracy: 0.8684 Epoch 12/20 196/196 [==============================] - 3s 15ms/step - loss: 1.1416e-04 - accuracy: 1.0000 - val_loss: 0.6615 - val_accuracy: 0.8692 Epoch 13/20 196/196 [==============================] - 3s 15ms/step - loss: 9.4577e-05 - accuracy: 1.0000 - val_loss: 0.6736 - val_accuracy: 0.8690 Epoch 14/20 196/196 [==============================] - 3s 15ms/step - loss: 7.5429e-05 - accuracy: 1.0000 - val_loss: 0.6852 - val_accuracy: 0.8688 Epoch 15/20 196/196 [==============================] - 3s 15ms/step - loss: 6.1079e-05 - accuracy: 1.0000 - val_loss: 0.6962 - val_accuracy: 0.8693 Epoch 16/20 196/196 [==============================] - 3s 15ms/step - loss: 5.0705e-05 - accuracy: 1.0000 - val_loss: 0.7067 - val_accuracy: 0.8690 Epoch 17/20 196/196 [==============================] - 3s 15ms/step - loss: 4.1892e-05 - accuracy: 1.0000 - val_loss: 0.7169 - val_accuracy: 0.8692 Epoch 18/20 196/196 [==============================] - 3s 15ms/step - loss: 3.7391e-05 - accuracy: 1.0000 - val_loss: 0.7266 - val_accuracy: 0.8694 Epoch 19/20 196/196 [==============================] - 3s 15ms/step - loss: 3.1609e-05 - accuracy: 1.0000 - val_loss: 0.7361 - val_accuracy: 0.8694 Epoch 20/20 196/196 [==============================] - 3s 15ms/step - loss: 2.7291e-05 - accuracy: 1.0000 - val_loss: 0.7454 - val_accuracy: 0.8688 Accuracy: 86.88%
import warnings
warnings.filterwarnings("ignore")
# CNN for the IMDB problem
from tensorflow.keras.datasets import imdb
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Conv1D
from tensorflow.keras.layers import MaxPooling1D
from tensorflow.keras.layers import Embedding
from tensorflow.keras.preprocessing import sequence
# load the dataset but only keep the top n words, zero the rest
top_words = 5000
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=top_words)
# pad dataset to a maximum review length in words
max_words = 500
X_train = sequence.pad_sequences(X_train, maxlen=max_words)
X_test = sequence.pad_sequences(X_test, maxlen=max_words)
# create the model
model = Sequential()
model.add(Embedding(top_words, 32, input_length=max_words))
model.add(Conv1D(32, 3, padding='same', activation='relu'))
model.add(MaxPooling1D())
model.add(Flatten())
model.add(Dense(250, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=20, batch_size=128, verbose=1)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print(); print("Accuracy: %.2f%%" % (scores[1]*100))
Model: "sequential_3" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= embedding_3 (Embedding) (None, 500, 32) 160000 _________________________________________________________________ conv1d_1 (Conv1D) (None, 500, 32) 3104 _________________________________________________________________ max_pooling1d_1 (MaxPooling1 (None, 250, 32) 0 _________________________________________________________________ flatten_3 (Flatten) (None, 8000) 0 _________________________________________________________________ dense_6 (Dense) (None, 250) 2000250 _________________________________________________________________ dense_7 (Dense) (None, 1) 251 ================================================================= Total params: 2,163,605 Trainable params: 2,163,605 Non-trainable params: 0 _________________________________________________________________ Epoch 1/20 196/196 [==============================] - 4s 20ms/step - loss: 0.5804 - accuracy: 0.6487 - val_loss: 0.2748 - val_accuracy: 0.8864 Epoch 2/20 196/196 [==============================] - 4s 20ms/step - loss: 0.2074 - accuracy: 0.9182 - val_loss: 0.2750 - val_accuracy: 0.8858 Epoch 3/20 196/196 [==============================] - 4s 20ms/step - loss: 0.1379 - accuracy: 0.9504 - val_loss: 0.3129 - val_accuracy: 0.8750 Epoch 4/20 196/196 [==============================] - 4s 20ms/step - loss: 0.0871 - accuracy: 0.9745 - val_loss: 0.3592 - val_accuracy: 0.8744 Epoch 5/20 196/196 [==============================] - 4s 19ms/step - loss: 0.0347 - accuracy: 0.9931 - val_loss: 0.4494 - val_accuracy: 0.8712 Epoch 6/20 196/196 [==============================] - 4s 20ms/step - loss: 0.0110 - accuracy: 0.9988 - val_loss: 0.5231 - val_accuracy: 0.8698 Epoch 7/20 196/196 [==============================] - 4s 20ms/step - loss: 0.0035 - accuracy: 0.9996 - val_loss: 0.6090 - val_accuracy: 0.8704 Epoch 8/20 196/196 [==============================] - 4s 19ms/step - loss: 0.0013 - accuracy: 0.9999 - val_loss: 0.6553 - val_accuracy: 0.8714 Epoch 9/20 196/196 [==============================] - 4s 20ms/step - loss: 9.1476e-04 - accuracy: 0.9999 - val_loss: 0.7078 - val_accuracy: 0.8704 Epoch 10/20 196/196 [==============================] - 4s 20ms/step - loss: 3.3205e-04 - accuracy: 1.0000 - val_loss: 0.7711 - val_accuracy: 0.8712 Epoch 11/20 196/196 [==============================] - 4s 20ms/step - loss: 1.7422e-04 - accuracy: 1.0000 - val_loss: 0.8401 - val_accuracy: 0.8711 Epoch 12/20 196/196 [==============================] - 4s 20ms/step - loss: 8.6259e-05 - accuracy: 1.0000 - val_loss: 0.8983 - val_accuracy: 0.8714 Epoch 13/20 196/196 [==============================] - 4s 20ms/step - loss: 5.0128e-05 - accuracy: 1.0000 - val_loss: 0.9536 - val_accuracy: 0.8712 Epoch 14/20 196/196 [==============================] - 4s 20ms/step - loss: 3.2545e-05 - accuracy: 1.0000 - val_loss: 1.0000 - val_accuracy: 0.8714 Epoch 15/20 196/196 [==============================] - 4s 19ms/step - loss: 2.0141e-05 - accuracy: 1.0000 - val_loss: 1.0384 - val_accuracy: 0.8711 Epoch 16/20 196/196 [==============================] - 4s 19ms/step - loss: 1.4380e-05 - accuracy: 1.0000 - val_loss: 1.0722 - val_accuracy: 0.8710 Epoch 17/20 196/196 [==============================] - 4s 20ms/step - loss: 9.9285e-06 - accuracy: 1.0000 - val_loss: 1.1026 - val_accuracy: 0.8710 Epoch 18/20 196/196 [==============================] - 4s 20ms/step - loss: 8.0898e-06 - accuracy: 1.0000 - val_loss: 1.1288 - val_accuracy: 0.8710 Epoch 19/20 196/196 [==============================] - 4s 20ms/step - loss: 6.2716e-06 - accuracy: 1.0000 - val_loss: 1.1539 - val_accuracy: 0.8711 Epoch 20/20 196/196 [==============================] - 4s 20ms/step - loss: 4.8016e-06 - accuracy: 1.0000 - val_loss: 1.1774 - val_accuracy: 0.8711 Accuracy: 87.11%
In this coding recipe, we discussed how to build Movie Reviews Model using Keras in Python.
Specifically, we have learned the followings: