Introduction to conformal analysis
In conformal analysis we are adding a confidence level to prediction methods even when the original method does not allow it. This is useful whenever we want to add a number to each prediction to quantify the uncertainty in the result. A necessary change for this is not predicting a single class, but a set of possible classes. So a neural network might predict a set of possible classes instead of just a single class. It might even predict a set of classes, when we are not certain enough for a single answer. For each answer we can be confident the answer is correctly in this set for a given confidence level. For example a confidence level of 0.95 mean in 95% of cases the correct answer will be in the answer set while a confidence level of 0.99 means in 99% of cases the correct class will be in the answer set.
There is a trade-off between the desired confidence and the desired size of the answer set.
Higher desired confidence levels lead to larger answer sets.
We are showing how to add conformal prediction to an existing neural network, so the focus in this tutorial is not on how neural networks work.
Just a quick recap on classification using neural networks:
When predicting a class using a neural network, we use some input data and run it through multiple layers, transforming the input at each step. The layers differ in the functions used and the size of the input/output. The last layer is usually contains a value for each label. In normal prediction the label with the highest value is used as the final answer.
These scores are just intermediate values, not probabilities, so they cannot be used directly to get a quantitative confidence value. In conformal analysis, we are adding another step here. When we want to get an answer set, we instead of using the highest value at the last layer, we instead compare every class score with some threshold and select every class higher than this threshold. This threshold value depends on the desired confidence level and needs to be calculated in advance using the conformal analysis.
This tutorial shows a simple version turning a neural network based prediction into a confidence score.
The complete tutorial can be found at https://github.com/nocicadaleftbehind/ConformalAnalysisTutorial
Build neural network based prediction
This tutorial focuses on the conformal prediction part, so for training we just used the standard toy problem: MNIST. The MNIST dataset consists of grayscale images of digits and focuses in classifying those images into the single digit shown. For this tutorial, we based the neural network implementation on a PyTorch example script (https://github.com/pytorch/examples).
Here we train a neural network to classify an image into one of 10 classes. The last layer of this network has a log probability score we are using as basis for our conformal prediction.
To change the existing implementation for the training part, we need just one change. Instead of splitting the dataset into just training and test, we also have an additional calibration split. In our example, we used an 80-10-10 split, so 80% of the dataset are used for training, 10% for testing and 10% for the new calibration step. The calibration split is used in the next step, so we need to save the indices for this later use.
The complete script can be found as 01_training.py
Converting to confidence
The next step using the trained neural network to get the thresholds needed for the prediction. This step is new and specific to conformal prediction, so we need to write it ourselves based on the existing example implementation.
We use the calibration split set aside in the first step. Since these images were not used in the training, they are independent and can be used for the calibration. We take the already trained network and run the prediction on every element of the dataset split. For every input datapoint, we are only interested in the predicted scores for every class. The output of the network is the logarithmic propability for each class. This means the value is between minus infinity as the lowest to 0 for the highest. While this score can be used directly, for convention we are converting it to a (non-logarithmic) alpha score. Taking the exponential of the network output gets us a probability between 0 for lowest and 1 for highest. The alpha value is just 1 - value, so a value of 0 is now the best possible score. After this short calculation we are saving both the predicted value of the correct class and the correct label.
Finally, we take all the alpha values for each label, sort them and save them in a separate file for the last step.
This complete script can be found as 02_calibration.py
Predicting using conformal analysis
The last step is now using the trained network and the calculated alpha values to predict classes from a new input. The prediction itself is similar to a normal neural network. The main changes are an additional parameter (the desired confidence). For conformal prediction the output is now a set of labels instead of a single class. This can also mean we are not predicting any labels, if no prediction has enough confidence for our given confidence threshold.
First we are using the sorted scores from the previous step. Based on the desired confidence alpha, we are calculating the offset 1 - alpha into the sorted scores. So for a desired confidence of 95%, we are using a threshold of the highest 5% of observed scores.
For the prediction, we run the network as usual to obtain the scores for each class. Instead of just choosing the single class with the highest score, we now select all labels based on this threshold.
In a variant, we can also use an individual threshold for each class. Here the calculation means we group the scores from the previous step according to the class. The threshold calculation is the same, but taking just the scores for each class separately. The per-class thresholds are used when the classifier is unbalanced in their ability to predict the class. We also need enough data for each class, so the treshhold can be established.
This complete script can be found as 03_prediction.py
The implementation contains both a global and a class-based implementation of the thresholding.