Custom Vision

Multi-Label Classification AI Vision

Performance Evaluation

Once training is complete, under the Performance tab, you will see the following results. As with multi-class classification, the model is evaluated using standard metrics including precision, recall and average precision (AP).

Performance evaluation dashboard showing precision, recall, and AP metrics for multi-label classification

However, in the case of multi-label classification, these values are calculated independently for each tag, since the model assesses the presence or absence of each label as a separate binary decision. This means that an image can contribute to the true positives, false positives or false negatives of multiple classes at once, depending on which labels were applied during training and which were predicted. As such, the recall score often varies more widely than in multi-class models, particularly when certain labels appear less frequently or are harder to detect. The performance summary also includes a breakdown per tag, allowing you to identify which classes the model handles well and which may require more training data or improved labelling consistency.