Yet another creative way to use Google Quick Draw dataset in chatbot development with Tensorflow [CNN image classifier tutorial].

Google's quickdraw dataset is a massive crowdsourced dataset.More than 15 million people already have contributed thousands of tiny sketches in each of, around 345 items. That's a lot of data. Applications of this dataset reach further than we think. I can think of these: accessibility / UI, education, games/toys/VR, communication, art, pattern recognition, people/region based clustering for social research. But I believe it wouldn't stop here. What we're going to do is, create a quick demo of a chatbot based game for kids. This chatbot will ask a user to draw something and will make use of Machine Learning to identify that item. Because its a game, we need to reward the user. Of cause, the bot will score user's drawing.

Model selection: #

Google uses an RNN based convolutional neural network to identify the category of image. The RNN part there is useful to learn from the additional data being collected, which is the continuous drawing path - a sequence of strokes of points separated by time. This will highly influence the accuracy values since they are trying to classify objects into any of 345 classes. In this document, they mentioned a good accuracy of 70% with this model.

In our experiment, we're only interested in a small number of classes. If we could use an alternative model that's just enough for our requirement of 10 classes, we could train the model on relatively less number of data (10000 examples under each category, which are joined, shuffled and then split into train - 70%, test - 20% and validation - 10% sets) on a basic machine resulting an accuracy of 92%. This will also help us when we host it for prediction purpose as well. The demo that we will build below is hosted on Heroku's free dyno (512 MB RAM with 1x CPU share), which includes the model and bot logic.

Preparing data for our experiment: #

Google made this data available to the public in four formats. That's really great! It saves a lot of effort from our side. We will be using Numpy bitmap files (.npy) from there.

mkdir quickdraw_data; 
cd quickdraw_data; 

Thus we can skip data cleaning step and jump right into data formatting and splitting. Below is the Python code to setup data for training and then testing.

from os import listdir
import numpy as np

data_dir = './quickdraw_data/'
data_limit_per_label = 10000
npdata = None
ind2labels = []
labels2ind = {}
nplabels = None

# fill data
for file_name in listdir(data_dir):
  npdata_t = (np.load(data_dir+file_name, mmap_mode='r')).copy()
  npdata_t = npdata_t[:data_limit_per_label,:]
  if npdata is None:
    npdata = npdata_t
    npdata = np.concatenate((npdata, npdata_t))
  label = file_name.split('.')[0]
  labels2ind[label] = len(ind2labels)-1
  nplabel_t = np.full((npdata_t.shape[0],1), labels2ind[label])
  if nplabels is None:
    nplabels = nplabel_t
    nplabels = np.concatenate((nplabels, nplabel_t))

# shuffle data
data_ = np.c_[npdata.reshape(len(npdata), -1), nplabels.reshape(len(nplabels), -1)]
doodle_images = data_[:, :npdata.size//len(npdata)].reshape(npdata.shape)
doodle_labels = data_[:, npdata.size//len(npdata):].reshape(nplabels.shape)
data_ = np.random.shuffle(data_)
import math

# split data
data_len = npdata.shape[0]
train_data_len = int(math.floor(data_len*0.7))
eval_data_len = int(math.floor(data_len*0.2))

Building the model in Tensorflow: #

We are going to build a vanilla convolutional neural network in Tensorflow. Here is the overall architecture of that model.

||=> Input layer: dim: [-1,28,28,1] ||=> Conv layer: dim: [5, 5] ||=> Activation: Relu ||=> Max pooling: dim: [2, 2], strides: 2 ||=> Conv layer: dim: [5, 5] ||=> Activation: Relu ||=> Max pooling: dim: [2, 2], strides: 2 ||=> Dense layer: dim: [-1, 7 7 64] X 1024 ||=> Activation: Relu ||=> Output layer: dim: [-1, 10]

Learning rate: 0.001  |  Dropout rate: 0.4

Where -1 indicates the batch size of the input to that layer.

Below is the Tensorflow implementation of our model (this is a common CNN model. You can find a great explanation here):

def model_fn(features, labels, mode):
  # Input Layer
  input_layer = tf.reshape(features[INPUT_TENSOR_NAME], [-1,28,28,1], name='input_layer')
  # Convolutional Layer #1
  conv1 = tf.layers.conv2d(
      kernel_size=[5, 5],

  # Pooling Layer #1
  pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)

  # Convolutional Layer #2 and Pooling Layer #2
  conv2 = tf.layers.conv2d(
      kernel_size=[5, 5],
  pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)

  # Dense Layer
  pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])
  dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)
  dropout = tf.layers.dropout(
      inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN)

  # Logits Layer
  logits = tf.layers.dense(inputs=dropout, units=10)

  predictions = {
      # Generate predictions (for PREDICT and EVAL mode)
      "classes": tf.argmax(input=logits, axis=1, name="out_classes"),
      # Add `softmax_tensor` to the graph. It is used for PREDICT and by the
      # `logging_hook`.
      "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
  if mode == tf.estimator.ModeKeys.PREDICT:
    return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)
  # Calculate Loss (for both TRAIN and EVAL modes)
  loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)
  # Configure the Training Op (for TRAIN mode)
  if mode == tf.estimator.ModeKeys.TRAIN:
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
    train_op = optimizer.minimize(
    return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

  # Add evaluation metrics (for EVAL mode)
  eval_metric_ops = {
      "accuracy": tf.metrics.accuracy(
          labels=labels, predictions=predictions["classes"])}
  return tf.estimator.EstimatorSpec(
      mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

Chatbot design: #

Next important step is the design of presentation layer. In our case its a chatbot. Microsft Bot framework is a quick bot development tool that we can use. With a very basic dialog flow, we could jump start by asking the user to draw some item. This is kept in bot's memory. The user is presented with an example image to clarify the object that the bot is asking.

bot question

We could make use of Messenger webview to provide a drawing interface to the user.


Whenever a user submits a drawing, the machine learning model will be loaded and prediction will be done. The model's prediction will be a list of the probability distribution over those 10 classes that we're interested in. Based on that, we could find the position of our item (that the bot asked the user to draw) and present a score.

final score

Hosting the webapp: #

The entire webapp is written in python with flask library. It's hosted on Heroku's free dyno. yeah! it's that lightweight.

That's all. The demo bot is located here. The source code is located here. Thanks.

About Author

Jubin Jose

Chatbot Developer at Cedex Technologies LLP | Technology: C, Node JS Hobby: beginner ML & AI + creative coding + Youtuber.

Want to work with us?

Be free to contact us for any of your chatbot development queries. As a company specialized in Chatbot development, we can provide you quality services. We will be happy to provide you a free quote.

Contact Us