项目作者: AnirudhMaiya

项目描述 :
Why Group Normalization works?
高级语言: Python
项目地址: git://github.com/AnirudhMaiya/pytorch-Group-Normalization.git
创建时间: 2020-06-10T12:44:55Z
项目社区:https://github.com/AnirudhMaiya/pytorch-Group-Normalization

开源协议:MIT License

下载


Group-Normalization

Why Group Normalization works?

Prerequisites

  • PyTorch 1.4+

Overview

Group Normalization is a normalization technique where adjacent channel statistics are used to normalize the channels present in that particular group. Group normalization becomes Layer normalization when all channels are used to compute mean and standard deviation and becomes Instance normalization when only a single channel is used to compute mean and standard deviation. The main claim of Group Norm is that adjacent channels are not independent.

I wanted to investigate the fact whether Group Norm really takes advantage of adjacent channel statistics. Turns out it does.

Experiment

To verify the claim I use ResNet-18 for Cifar-10 from this repository.

alt-text-1 alt-text-2

Compared to Vanilla Group Norm on the left, Group Shuffle Norm(right) picks channels that are not adjacent to each other. So a particular group consists of channels which are not adjacent to each other.

Setup

  • Batch-Size = 128
  • Step Size = 0.001
  • Number of Groups = 32 (for both Group Norm and Group Shuffle Norm)

Results

Model Train Acc. Test Acc.
Group Normalization 96.17% 85.38%
Group Shuffle Normalization 90.38% 82.36%

alt-text-1 alt-text-2alt-text-3

Hence Group Norm takes advantage of nearby channels which in-turn gives better results.

References