项目作者: moabitcoin

项目描述 :
Scene similarity for weak object discovery & classification
高级语言: Python
项目地址: git://github.com/moabitcoin/sisyphus.git
创建时间: 2020-02-24T12:52:57Z
项目社区:https://github.com/moabitcoin/sisyphus

开源协议:MIT License

下载


:mount_fuji: Sisyphus

Scene similarity for weak object discovery & classification. Labelling images for building an object classification models is a labor intensive task (a.k.a rolling a ball uphill). This repository leverages image similarity to generate a weakly labeled dataset which can be cleaned up order of magnitude faster than going through the whole image/video corpus. The expected speedup is inversely proportional to rarity of the object of interest. We also provide a blockages/construction detection model trained on drive data from Berlin. and tools contained in this repository.

Table of Contents

:computer: Installation

Create a self-contained reproducible development environment & Get into the development environment

Example for running on CPUs:

  1. make install dockerfile=Dockerfile.cpu dockerimage=moabitcoin/sfi-cpu
  2. make run dockerimage=dockerimage=moabitcoin/sfi-cpu

Example for running on GPUs via nvidia-docker:

  1. make install dockerfile=Dockerfile.gpu dockerimage=moabitcoin/sfi-gpu
  2. make run dockerimage=moabitcoin/sfi-gpu runtime=nvidia

The Python source code directory is mounted into the container: if you modify it on the host it will get modified in the container, so you don’t need to rebuild the image. To make data visible in the container set the datadir env var, e.g. to make your /tmp directory show up in /data inside the container run

  1. make run datadir=/tmp

See the Makefile for options and more advanced targets.

:tada: Usage

All tools can be invoked via

  1. ./bin/sfi --help
  2. usage: sficmd [-h] ...
  3. optional arguments:
  4. -h, --help show this help message and exit
  5. commands:
  6. frames-extract Extract video key frames w/intra frame similarity
  7. feature-extract Extract image features w/ pre-trained resnet50
  8. feature-extract-vid
  9. Extract features from videos wth 2(D+1) video model
  10. build-index Builds a faiss index
  11. serve-index Starts up the index http server
  12. query-index Queries the index server for nearest neighbour
  13. model-train Trains a classification model with a resnet50 backbone
  14. model-infer Runs inference with a classification model
  15. model-export Export a classification model to onnx

:camera: Frames vs. :video_camera: videos

Sisyphus works with images or videos. For image only corpus you can skip this step. For videos one option (recommended) is to extract key frames. For working directly with videos(s) pleas use Video feature extractor and skip step. Keyframe extraction can be either of the following two options. This option removed frame(s) with little to no motion and vastly reduced corpus size and duplicates in retrieval results.

FFMPEG keyframes

Keyframe extraction using ffmpeg. You can reconstruct video back from keyframes for sanity check.

  1. # Video to keyframes
  2. ./scripts/video-to-key-frames /path/to/video /tmp/frames/
  3. # Keyframes to video
  4. ./scripts/key-frames-to-video /tmp/result/ nearest.mp4

Image features

We also included an Experimental keyframe extractor built using intra-frame feature similarity (Slow).

  1. ./bin/sfi frames-extract --help

:rocket: Feature extraction

  1. ./bin/sfi feature-extract --help

Extracts high level MAC feature maps for all image frames from a pre-trained convolutional neural net(ResNet-50 + ILSVRC2012). Save the features in individual .npy files with the extracted feature maps in parallel to all image frames. We recommend running this step on GPUs.

:telescope: Feature extraction (Video)

If you prefer to work directly with video(s), please use 3D video classification model for feature extraction. Follow the instruction here. 3D video classification feature extraction tools within Sisyphus as Experimental.

  1. ./bin/sfi feature-extract-vid --help

:european_post_office: Building index

  1. ./bin/sfi index-build --help

Builds an index from the .npy feature maps for fast and efficient approximate nearest neighbour queries based on L2 distance. The quantizer for the index needs to get trained on a small subset of the feature maps to approximate the dataset’s centroids. Depending on the feature map’s spatial resolution (pooled vs. unpooled) we build and save multiple indices (one per depthwise feature map axis).

:vhs: Load index

  1. ./bin/sfi index-serve --help

Loads up the index (slow) and keeps it in memory to handle nearest neighbour queries (fast).
Responds to queries by searching the index, aggregating results, and re-ranking them.

:crystal_ball: Query index

  1. ./bin/sfi index-query --help

Sends nearest neighbour requests against the query server and reports results to the user.
The query and results are based on the .npy feature maps on which the index was build. The mapping from .npy files and images is saved in .json.

:bullettrain_side: Build classifier

The last step would provide a quasi clean dataset which would need manual cleaning to be ready for training. Once you have cleaned the dataset you can run the following and training a classier. The weakly supervised model can run prediction on your image dataset. The --dataset option expects training/validation samples for each class partitioned under path/to/dataset/. F.ex

  1. tree -d 2 /data/experiments/blockages/construction
  2. ├── train
  3. ├── background
  4. └── construction
  5. └── val
  6. ├── background
  7. └── construction

Trainig/Exporting/Inference tools

  1. # training
  2. ./bin/sfi model-train --help
  3. # inference
  4. ./bin/sfi model-infer --help
  5. # export
  6. ./bin/sfi model-export --help