项目作者: lukasruff

项目描述 :
Repository for the paper "Rethinking Assumptions in Anomaly Detection"
高级语言: Python
项目地址: git://github.com/lukasruff/Classification-AD.git
创建时间: 2020-05-30T15:46:15Z
项目社区:https://github.com/lukasruff/Classification-AD

开源协议:MIT License

下载


Rethinking Assumptions in Deep Anomaly Detection

This repository provides the code for the methods and experiments presented in our ICML UDL 2021 workshop paper ‘Rethinking Assumptions in Deep Anomaly Detection.’

Citation and Contact

You find a PDF of the paper on arXiv: https://arxiv.org/abs/2006.00339.
If you find our work useful, please cite:

  1. @inproceedings{ruff2020rethinking,
  2. title = {Rethinking Assumptions in Deep Anomaly Detection},
  3. author = {Ruff, Lukas and Vandermeulen, Robert A and Franks, Billy Joe and M{\"u}ller, Klaus-Robert and Kloft, Marius},
  4. booktitle = {ICML 2021 Workshop on Uncertainty \& Robustness in Deep Learning},
  5. year = {2021}
  6. }

Abstract

Though anomaly detection (AD) can be viewed as a classification problem (nominal vs. anomalous) it is usually treated in an unsupervised manner since one typically does not have access to, or it is infeasible to utilize, a dataset that sufficiently characterizes what it means to be “anomalous.” In this paper we present results demonstrating that this intuition surprisingly seems not to extend to deep AD on images. For a recent AD benchmark on ImageNet, classifiers trained to discern between normal samples and just a few (64) random natural images are able to outperform the current state of the art in deep AD. Experimentally we discover that the multiscale structure of image data makes example anomalies exceptionally informative.

Installation

This code is written in Python 3.7 and requires the packages listed in requirements.txt. For running the code, we recommend to set up a virtual environment, e.g. via virtualenv or conda, and install the packages therein in the specified versions:

virtualenv

  1. # pip install virtualenv
  2. cd <path-to-repo>
  3. virtualenv myenv
  4. source myenv/bin/activate
  5. pip install -r requirements.txt

conda

  1. cd <path-to-repo>
  2. conda create --name myenv
  3. conda activate myenv
  4. while read requirement; do conda install -n myenv --yes $requirement; done < requirements.txt

Data

We present experiments using the MNIST, EMNIST, CIFAR-10, CIFAR-100, 80 Million Tiny Images, ImageNet-1K, and ImageNet-22K datasets in our paper. These datasets get automatically downloaded to the ./data directory when experiments are run for the first time on the respective datasets, except ImageNet-1K and ImageNet-22K. The ImageNet-1K one-vs-rest anomaly detection benchmark data can be downloaded from https://github.com/hendrycks/ss-ood, which is the repository of the paper that introduced the benchmark, and should be placed in the ./data/imagenet1k directory.
The ImageNet-22K dataset can be downloaded from http://www.image-net.org, which requires a registration. Note that our implementation assumes the ImageNet-22K *.tar archives to be extracted into the ./data/fall11_whole_extracted directory.

Running experiments

All the experiments presented in our paper can be run by using the main.py script. The specific method (hsc, deepSAD, bce, or focal) can be set via the --objective option, e.g. --objective hsc.

The main.py script features various options and experimental parameters. Have a look into main.py for all the possible options and arguments.

Below, we present two examples for the CIFAR-10 as well as the ImageNet one-vs-rest anomaly detection benchmarks. The complete bash scripts to reproduce all experimental results reported in our paper are given in ./src/experiments.

CIFAR-10 One-vs-Rest Benchmark using 80 Million Tiny Images as OE

The following runs a Hypersphere Classifier (--objective hsc) experiment on CIFAR-10 with class 0 (airplane) considered to be the normal class with using 80 Million Tiny Images as OE (--oe_dataset_name tinyimages):

  1. cd <path-to-repo>
  2. # activate virtual environment
  3. source myenv/bin/activate # or 'conda activate myenv' for conda
  4. # create folder for experimental outputs
  5. mkdir log/cifar10_test
  6. # change to source directory
  7. cd src
  8. # run experiment
  9. python main.py cifar10 cifar10_LeNet ../log/cifar10_test ../data --rep_dim 256 --objective hsc --outlier_exposure True --oe_dataset_name tinyimages --device cuda --seed 42 --lr 0.001 --n_epochs 200 --lr_milestone 100 --lr_milestone 150 --batch_size 128 --data_augmentation True --data_normalization True --normal_class 0;

ImageNet-1K One-vs-Rest Benchmark using ImageNet-22K as OE

The following runs a Binary Cross-Entropy Classifier (--objective bce) experiment on ImageNet-1K with class 4 (banjo) considered to be the normal class with using ImageNet-22K as OE (--oe_dataset_name imagenet22k):

  1. cd <path-to-repo>
  2. # activate virtual environment
  3. source myenv/bin/activate # or 'conda activate myenv' for conda
  4. # create folders for experimental outputs
  5. mkdir log/imagenet_test
  6. # change to source directory
  7. cd src
  8. # run classifier experiment
  9. python main.py imagenet1k imagenet_WideResNet ../log/imagenet_test ../data --rep_dim 256 --objective bce --outlier_exposure True --oe_dataset_name imagenet22k --device cuda --seed 42 --lr 0.001 --n_epochs 150 --lr_milestone 100 --lr_milestone 125 --batch_size 128 --data_augmentation True --data_normalization True --normal_class 4;

License

MIT