Machine Learning Model Analysis Using TensorBoard
February 15, 2021
Machine Learning is growing by leaps and bounds with new neural network models coming up regularly.
These models are trained for specific datasets and are proven for accuracy and processing speed. Developers need to evaluate ML models and ensure that it meets specific threshold values and functions as expected, before it is deployed. There is a lot of experimenting going to improve the model performance, and visualizing differences become crucial while designing and training a model. TensorBoard helps visualize the model, making the analysis less complicated as debugging becomes easier when one can see what the problem is.
General practice to train ML models
The general practice is to use pre-trained models and perform Transfer Learning to re-train the model for the similar set of data. During Transfer Learning, a neural network model is first trained on a problem similar to the one that is being solved. One or more layers from the trained model are then used in a new model trained on the problem of interest.
Most of the time, the pre-train models come in a binary format, which makes it difficult to get internal information and immediately start working on it. From the organization’s business point of view, it would make sense to have some tool to get insights into the model to reduce the project delivery timelines.
There are a couple available options to get the model information such as the number of layers and associated parameters. Model Summary and Model Plot are the basic options. These options are quite simple, considering few lines of implementation, and provide very basic details like number of layers, types of layers, and input/output of each layer.
However, the Model Summary and Model Plot are not that effective to understand each and every detail about any large, complex model in the form of Protocol Buffer. In such scenarios, using TensorBoard, a visualization tool provided by TensorFlow is more meaningful. It is quite powerful, considering the various visualization options that it provides like Model, Scalars and Metrics (training and validation data), Images (from the dataset), Hyperparameter tuning, etc.
Model graphs to visualize custom models
This option helps especially when a custom model is received in the form of a protocol buffer, and it is required to understand it before making any modification or training it. As shown in the image below, an overview of the sequential CNN is visualized on the board. Each block represents a separate layer, and selecting one of them would open a window on the top-right corner with input and output information.
In case further information is required, about what is there inside the individual blocks, one can simply double-click on the block, which will expand the block and provide more details. Notice that a block can contain one or more blocks which can be expanded in a layer-by-layer fashion. Upon selecting any specific operation, it would also provide more information about associated processing parameters.
Scalar and Metrics to analyze model training and validation
The second important aspect of Machine Learning is to analyze the training and validation of the given model. The performance, from an accuracy and speed standpoint, is quite important to make it suitable for real-life practical applications. In the image below, it can be seen that the accuracy of the model improves with the number of epochs/iterations. If the training and testing validation are not up to the mark then it indicates that something is not right. It could be the case of either underfitting or overfitting and can be corrected by either modifying the layers/parameters or improving the dataset, or both.
Image Data to visualize images from dataset
As the name suggests, it helps to visualize the images. It is not limited only to visualize the images from the dataset, but it also shows the Confusion Matrix in the form of an image. This matrix indicates the accuracy of detecting objects of individual classes. As shown in the image below, the model confuses the coat with the pullover. To overcome this situation, it is recommended to improve the dataset of specific classes to feed distinguishable features to the model, for better learning and hence accuracy.
Hyperparameter tuning to achieve desired model accuracy
The accuracy of the model depends on the input dataset, the number of layers, and associated parameters. In most of the cases, the accuracy would never touch the expected accuracy during the initial training, and it would require to play around with the number of layers, types of layers, associated parameters, apart from the dataset. This process is known as Hyperparameter Tuning.
In this process, a range of hyperparameters is provided for the model to select, and the model is run with a combination of these parameters. The accuracy of each combination is logged and visualized on the board. It rectifies the efforts and time that would otherwise get consumed with manual training of the model for each and every possible combination of the hyperparameters.
Profiling tool to analyze model processing speed
Apart from accuracy, processing speed is an equally important aspect of any model. It is necessary to analyze the processing time consumed by individual blocks and if it can be reduced by making some modifications. The Profiling Tool provides a graphical representation of time consumption by each operation with different epochs. With this visualization, one can easily pin-point the operations which are consuming more time. Some of the known overheads could be resizing the input, translation of model code from Python, or running code in CPU instead of GPU. Taking care of such things would help to achieve optimum performance.
Overall, the TensorBoard is a great tool helping the development and training process. The data from Scalar and Metrics, Image Data, and Hyperparameter tuning help to improve the accuracy, while the profiling tool helps to improve the processing speed. TensorBoard also aids in reducing the debugging time involved, which otherwise would have definitely been a large time-frame.