Have you ever wanted to keep track of model training metrics, for you or your organization? By using Keras’ Custom Callback functionality, you can collect data on every single model training run and glean valuable metrics from the data. Here’s how:
The AI model training process can be laborious, and through iterating across different model configurations, layer combinations, and hyperparameters, that one perfect combination of factors that you seek can get lost. If we log all our model training runs to a database, we can easily put our finger on the run that we want to use for production, or at least go back to for further iteration. In addition, we can compare metrics across different models, architectures, datasets, teams, and more.
One thing to keep in mind with this article is that although I am showing you how to track some obvious values like number of epochs, training duration, accuracy, and loss, you can track whatever you think is valuable to yourself or your organization. This is simply a basic end-to-end example of tracking model training metrics, and can be used as inspiration or a jumping off point.
I personally love visualizing data, so I’m going to take the opportunity to show you how to use Bokeh to visualize these metrics, but of course you can use whatever you choose to further analyze this data!
In this article I will show you how to:
- Write a custom callback to be used in any Keras model
- Write key model training metrics to a database from a callback
- Run a few different model training examples (with varying parameters) to populate the database
- Visualize and analyze the data
The Keras framework is one of the most widely used deep learning frameworks available in the open source community. Keras has a cool feature called a Custom Callback that allows you to write functionality at certain points in the model training process.
Example of the Custom Callback class from keras.io:
class CustomCallback(keras.callbacks.Callback):
def on_train_begin(self, logs=None):
= list(logs.keys())
keys print("Starting training; got log keys: {}".format(keys))
def on_train_end(self, logs=None):
= list(logs.keys())
keys print("Stop training; got log keys: {}".format(keys))
def on_epoch_begin(self, epoch, logs=None):
= list(logs.keys())
keys print("Start epoch {} of training; got log keys: {}".format(epoch, keys))
def on_epoch_end(self, epoch, logs=None):
= list(logs.keys())
keys print("End epoch {} of training; got log keys: {}".format(epoch, keys))
def on_test_begin(self, logs=None):
= list(logs.keys())
keys print("Start testing; got log keys: {}".format(keys))
def on_test_end(self, logs=None):
= list(logs.keys())
keys print("Stop testing; got log keys: {}".format(keys))
def on_predict_begin(self, logs=None):
= list(logs.keys())
keys print("Start predicting; got log keys: {}".format(keys))
def on_predict_end(self, logs=None):
= list(logs.keys())
keys print("Stop predicting; got log keys: {}".format(keys))
def on_train_batch_begin(self, batch, logs=None):
= list(logs.keys())
keys print("...Training: start of batch {}; got log keys: {}".format(batch, keys))
def on_train_batch_end(self, batch, logs=None):
= list(logs.keys())
keys print("...Training: end of batch {}; got log keys: {}".format(batch, keys))
def on_test_batch_begin(self, batch, logs=None):
= list(logs.keys())
keys print("...Evaluating: start of batch {}; got log keys: {}".format(batch, keys))
def on_test_batch_end(self, batch, logs=None):
= list(logs.keys())
keys print("...Evaluating: end of batch {}; got log keys: {}".format(batch, keys))
def on_predict_batch_begin(self, batch, logs=None):
= list(logs.keys())
keys print("...Predicting: start of batch {}; got log keys: {}".format(batch, keys))
def on_predict_batch_end(self, batch, logs=None):
= list(logs.keys())
keys print("...Predicting: end of batch {}; got log keys: {}".format(batch, keys))
Then actually calling it from a model:
= get_model()
model
model.fit(
x_train,
y_train,=128,
batch_size=1,
epochs=0,
verbose=0.5,
validation_split=[CustomCallback()],
callbacks
)
= model.evaluate(
res =128, verbose=0, callbacks=[CustomCallback()]
x_test, y_test, batch_size
)
= model.predict(x_test, batch_size=128, callbacks=[CustomCallback()]) res
That’s it! Now you can simply write whatever you code you want into the methods in the custom callback, and your code will be executed when you run the applicable action on your model.
Depending on how complex your callback gets, you may need to pass values from your model to your callback, which means more steps for others using your callback, if you’re deploying this in a larger organization.
As the documentation says, you can write functionality into the starts and ends of actions for training, epochs, batches, fitting, evaluating, and predicting. In this example we will use start and ends of training, and ends of epochs during training.
Let’s do this.
Before we start with the callback, we need to set up the database. Even if you already have a database, you will still need to create a table for this new data. For this example I set up a simple Postgres database on my Mac. To create the local database itself, I did the following:
Install Postgres with homebrew:
14 brew install postgresql@
Create a user for myself and a database called MT_METRICS
(Please be careful with your database credentials when doing this in the real world)
psql postgres'getting started';
CREATE ROLE jonah WITH LOGIN PASSWORD ;
CREATE DATABASE MT_METRICS;
GRANT ALL PRIVILEGES ON DATABASE MT_METRICS TO jonah \c mt_metrics jonah
Now to create the tables. In this case I created two tables; one for models (models
) to hold some information about each model, and one for training (training
) to keep track of each training run. The training table uses model_id
as a foreign key.
models (
CREATE TABLE ,
model_id serial primary KEYVARCHAR ( 50 ) UNIQUE NOT NULL,
model_name ,
created_date timestamp NOT NULL DEFAULT CURRENT_TIMESTAMPVARCHAR ( 50 ) NOT NULL,
model_type VARCHAR ( 50 ) NOT NULL
model_pkg ;
)
training (
CREATE TABLE ,
id serial primary KEY,
model_id INT NOT NULL,
training_date timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
training_duration INT NOT NULL,
epochs INT NOT NULL,
learning_rate DECIMAL NOT NULL,
accuracy DECIMAL NOT NULL,
loss DECIMAL NOT NULL,
accuracy_list DECIMAL[],
loss_list DECIMAL[]
CONSTRAINT fk_modelKEY(model_id)
FOREIGN models(model_id)
REFERENCES
ON DELETE SET NULL; )
As you can see, the models
table has the model name, the type to give a bit of depth to what the model is, and the package used to create the model (ours will be Keras every time, but could be useful for later development in other packages). Our training
table has training duration, epochs, learning rate, accuracy, and loss. We also collect accuracy and loss per epoch which is what accuracy_list
and loss_list
are collecting. If you are unfamiliar with these terms, read up on their topics here.
Now that we have our database, let’s fill out our callback for our metrics purposes.
We will use environment variables to collect information about the model name and type. This means that for every model training instance, as long as you or anyone in your organization sets their environment variables beforehand, they can train their model as much as they want and fill out training data for the same model in the database.
I recommend using Jupyter notebooks for this work as it makes it super easy to declare objects and see the output immediately.
Example of setting python environment variables:
import os
'MODEL_NAME'] = "Pima2"
os.environ['MODEL_TYPE'] = "Binary Classification" os.environ[
As I said earlier, for this callback we will log metrics at the start and end of training, and ends of epochs during training. This means we need to fill out methods on_train_begin
, on_train_end
, and on_epoch_end
. We also need to initialize and update some other values to capture the duration of model training and the accuracy and loss per epoch. Here is the Metrics Callback: (stored in MetricsCallbackKeras.py
)
from tensorflow import keras
import tensorflow.keras.backend as K
from datetime import datetime
import os
from db_util import write_training_metrics
class MetricsCallback(keras.callbacks.Callback):
def __init__(self, cursor):
self.cursor = cursor
self.epochs = 0
self.start = None
self.end = None
self.accuracy_list = []
self.loss_list = []
self.duration = 0
def on_train_begin(self, logs=None):
self.start = datetime.now()
def on_train_end(self, logs=None):
self.end = datetime.now()
self.duration = (self.end - self.start).total_seconds()
self.write_metrics(logs)
def on_epoch_end(self, epoch, logs=None):
self.epochs += 1
self.accuracy_list.append(logs['accuracy'])
self.loss_list.append(logs['loss'])
def write_metrics(self, logs):
= os.environ.get("MODEL_NAME")
model_name = os.environ.get("MODEL_TYPE")
model_type = K.eval(self.model.optimizer.lr)
learning_rate
write_training_metrics(self.cursor,
model_name,
model_type, 'keras',
self.duration,
self.params.get("epochs"),
learning_rate, 'accuracy'],
logs['loss'],
logs[self.accuracy_list,
self.loss_list
)
To keep the callback clean, I put the actual function to write to the database in a separate file. For the write function, I write an entry for the model into the models
table if it doesn’t exist, and then write the training entry to the training
table. (stored in db_util.py
)
def write_training_metrics(
cursor,
model_name,
model_type,
model_pkg,
duration,
epochs,
lr,
accuracy,
loss,
accuracy_list,
loss_list):"""
cursor.execute( INSERT INTO MODELS (
model_name,
model_type,
model_pkg
) VALUES (%s, %s, %s)
ON CONFLICT (model_name) DO UPDATE SET model_name=EXCLUDED.model_name
RETURNING model_id
""", (model_name, model_type, model_pkg))
= cursor.fetchone()[0]
model_id
f"""
cursor.execute( INSERT INTO TRAINING (
model_id,
training_duration,
epochs,
learning_rate,
accuracy,
loss,
accuracy_list,
loss_list
) VALUES (
{model_id},
{duration},
{epochs},
{lr},
{accuracy},
{loss},
%s,
%s
)
""",
(accuracy_list,loss_list) )
If you looked through the code, you saw that we’re passing a cursor
object to do database operations. For a Postgres connection I used psycopg2
. Example of creating a psycopg2
cursor:
import psycopg2
= psycopg2.connect(database="mt_metrics",
conn ="localhost",
host="jonah",
user="getting started",
password=5432)
port= True
conn.autocommit
= conn.cursor() cursor
Alright. Now that we have our database running, our environment variables set, our callback written, and our cursor created, we can run a model!
This is a common deep learning example and it is using binary cross entropy to predict diabetes in Pima Indians. Download the dataset from this address and save the file as pima-indians-diabetes.csv
in a directory called data
.
from numpy import loadtxt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from pathlib import Path
from MetricsCallbackKeras import MetricsCallback
= Path('data', 'pima-indians-diabetes.csv')
pima_indians_csv # load the dataset
= loadtxt(pima_indians_csv, delimiter=',')
dataset # split into input (X) and output (y) variables
= dataset[:,0:8]
X = dataset[:,8]
y
# define the keras model
= Sequential()
model 12, input_shape=(8,), activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.add(Dense(
# compile the keras model
compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.
# fit the keras model on the dataset with the added callback
=15, batch_size=10, callbacks=[MetricsCallback(cursor)]) model.fit(X, y, epochs
If everything went well, you should see some output with the epochs and the associated loss and accuracy. We should have some data in our database now! Let’s check it out.
If you’re using a notebook, you can reuse your cursor object from above.
Read the data into a dataframe from a query:
import pandas as pd
= pd.read_sql_query('select t.model_id, m.model_name, m.model_type, t.training_date, t.training_duration, t.epochs, t.learning_rate, t.accuracy, t.loss, t.accuracy_list, t.loss_list from training t INNER JOIN models m ON t.model_id=m.model_id', con=conn)
df df
Output:
model_id | model_name | model_type | training_date | training_duration | epochs | learning_rate | accuracy | loss | accuracy_list | loss_list | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | pima2 | binary classification | 2024-01-16 16:05:02.561282 | 1 | 15 | 0.001 | 0.708333 | 0.631138 | [0.3919271 0.5963542 0.5703125 0.5651042 0.61848956 0.62630206 | [21.039516 2.0474312 1.2937006 0.95252365 0.80447465 0.7372218 |
0.6458333 0.640625 0.6666667 0.6796875 0.68098956 0.67057294 | 0.68905145 0.7194483 0.6707931 0.62520367 0.66718054 0.6340042 | ||||||||||
0.67057294 0.6510417 0.7083333 ] | 0.6297135 0.6884937 0.63113797] |
You should see your single training run as well as the model name and type you provided with your environment variables.
Awesome, so now we have a way to write model training metrics into a database! From here you can do whatever you want with it! Perhaps you want to dig down into some very specific metrics and collect a buttload of data while you train out a particular model. Or maybe you want to deploy something like this in your organization that does model training so that you can collect metrics across different teams and projects to get a wide overall view of model training. For the rest of this article I am going to populate the database with some diverse model training data, and then visualize it with Bokeh.
To populate the database, I grabbed a bunch of model training examples from the examples section on the Keras website. For each model, I wrote out a brief blurb to tell you who created the model, a description, and the link to the page with the model (I DID NOT CREATE ANY OF THESE MODELS), set environment variables, and then built and trained the models using my Metrics Callback. I also set loops for each training instance where I set the number of epochs to be different for each training run. This helps to populate my database with a few different training runs for each model, so we can compare the results. Check out my pop_db.ipynb
notebook on Github, as it’s quite long.
If we run the above query again and then look at the table, we should see the data. Or we can simply look at the shape to see how many rows we have.
df.shape
Output:
(16, 9)
Now we have a database that’s populated with a range of data. We’re going to create an interactive Bokeh plot that allows us to compare training runs across multiple axes, as well as zooming in on a training run to see how the accuracy and loss change across epochs. Bokeh is a versatile open source tool that allows you to use Python to create amazing interactive visualizations.
Before we do that, let’s take a look at those lists we stored for accuracy and loss. We can graph them to see how the loss and accuracy changed throughout the training run.
First we need to convert the lists into lists of floats that Pandas will like:
'accuracy_list'] = df['accuracy_list'].apply(pd.to_numeric, downcast='float')
df['loss_list'] = df['loss_list'].apply(pd.to_numeric, downcast='float') df[
Let’s visualize just one of the training runs. I’m choosing index 2 just for this example.
= 2
idx
= list(range(1, df.iloc[[idx][0]]['epochs'] + 1))
x = df.iloc[[idx][0]]['accuracy_list']
y1 = df.iloc[[idx][0]]['loss_list']
y2
= 'accuracy')
plt.plot(x, y1, label = 'loss')
plt.plot(x, y2, label
'Epochs')
plt.xlabel('Accuracy / Loss')
plt.ylabel(
plt.legend()'Accuracy & Loss Per Epoch')
plt.title( plt.show()
Our Bokeh visualization will allow us to see these per epoch metrics if we select it on the larger canvas.
For the whole Bokeh plot, you can take a look at my code on Github. I highly recommend Bokeh for interactive data visualizations.
Check out an example deployed live here.
In conclusion, this is a pretty specific example of something to build on the ML Ops side of things, but the exercise can help you learn a lot about Machine Learning and the work that goes into it. Feel free to reach out to me if you have questions, comments, or gripes with how I did any of this! Thanks for reading!