In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation (P …

The out-of-distribution (OOD) detection in deep learning models, particularly in image classification, addresses the challenge of identifying inputs unrelated to the model’s training task. It aims to prevent the model from making confident but incorrect predictions on (OOD) inputs while accurately classifying in-distribution (ID) inputs. By distinguishing between ID and OOD inputs, OOD detection methods enhance the model’s robustness and reliability in real-world applications.

A weakness in current OOD detection evaluations in image classification, specifically regarding datasets related to ImageNet-1K (IN-1K), is the presence of ID objects within the OOD datasets. This issue leads to the wrong classification of ID objects as OOD by state-of-the-art OOD detectors. Consequently, the evaluation of OOD detection methods is affected, resulting in underestimating the actual OOD detection performance and unjustly penalizing more effective OOD detectors. 

A new paper was recently published in which the authors aim to address the limitations in evaluating OOD detection methods. They introduce a novel test dataset, NINCO, which contains OOD samples without any objects from the ImageNet-1K (ID) classes. They also provide synthetic “OOD unit tests” to assess weaknesses in OOD detectors. The paper evaluates various architectures and methods on NINCO, providing insights into model weaknesses and the impact of pretraining on OOD detection performance. The goal is to improve the evaluation and understanding of OOD detection methods.

The authors propose the creation of a new dataset called NINCO (No ImageNet Class Objects) to address the limitations in evaluating OOD detection methods. They carefully select base classes from existing or newly scraped datasets, considering their non-permissive interpretation to ensure they are not categorically part of the ImageNet-1K (ID) classes. The authors visually inspect each image in the base classes to remove samples containing ID objects or where no object from the OOD class is visible. This manual cleaning process ensures a higher-quality dataset.

NINCO consists of 64 OOD classes with a total of 5,879 samples sourced from various datasets, including SPECIES, PLACES, FOOD-101, CALTECH-101, MYNURSINGHOME, ImageNet-21k, and newly scraped from and other websites. Additionally, the authors provide cleaned versions of 2,715 OOD images from eleven tests OOD datasets to evaluate potential ID contaminations.

The authors also propose using OOD unit tests, simple, synthetically generated image inputs designed to assess OOD detection weaknesses. They suggest evaluating the performance of an OOD detector on these unit tests separately and counting the number of failed tests (FPR above a user-defined threshold) alongside the overall evaluation on a test OOD dataset like NINCO. These unit tests provide valuable insights into specific weaknesses that detectors may encounter in practice. Overall, the authors propose NINCO as a high-quality dataset for evaluating OOD detection methods and suggest using OOD unit tests to gain additional insights into a detector’s weaknesses.

The paper presents detailed evaluations of OOD detection methods on the NINCO dataset and the unit tests. The authors analyze the performance of various architectures and OOD detection methods, revealing insights about model weaknesses and the impact of pretraining on OOD detection performance. In evaluating the NINCO dataset, the study assesses different IN-1K models obtained from the timm-library and advanced OOD detection methods. Feature-based techniques such as Maha, RMaha, and ViM perform better than the MSP baseline. Max-Logit and Energy also demonstrate notable enhancements compared to MSP. The performance results differ based on the chosen model and OOD detection method. Pretraining proves to be influential as it contributes to improved ID performance and the generation of superior feature embeddings for OOD detection.

In conclusion, the study addresses the limitations in evaluating OOD detection methods in image classification. It introduces the NINCO dataset, which contains OOD samples without any objects from the ImageNet-1K (ID) classes, and proposes the use of OOD unit tests to assess detector weaknesses. The evaluations on NINCO demonstrate the performance of different models and OOD detection methods, highlighting the effectiveness of feature-based techniques and the impact of pretraining. NINCO improves the evaluation and understanding of OOD detection methods by offering a clean dataset and insights into detector weaknesses. The findings emphasize the importance of improving OOD detection evaluations and understanding the strengths and limitations of current methods.

Check Out The Paper and Github. Don’t forget to join our 23k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at

Check Out 100’s AI Tools in AI Tools Club
The post In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation (Paper Summary) appeared first on MarkTechPost.