Code for "Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner" in ICCV 2017
This is the official code for the paper
Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner
Tseng-Hung Chen,
Yuan-Hong Liao,
Ching-Yao Chuang,
Wan-Ting Hsu,
Jianlong Fu,
Min Sun
To appear in ICCV 2017
In this repository we provide:
If you find this code useful for your research, please cite
@article{chen2017show,
title={Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner},
author={Chen, Tseng-Hung and Liao, Yuan-Hong and Chuang, Ching-Yao and Hsu, Wan-Ting and Fu, Jianlong and Sun, Min},
journal={arXiv preprint arXiv:1705.00930},
year={2017}
}
P.S. Please clone the repository with the --recursive
flag:
# Make sure to clone with --recursive
git clone --recursive https://github.com/tsenghungchen/show-adapt-and-tell.git
data-prepro/MSCOCO_preprocess/resnet_model/
.data-prepro/MSCOCO_preprocess/extract_resnet_coco.py
.data-prepro/MSCOCO_preprocess
and run the following script:./download_mscoco.sh
for downloading images and extracting features.coco_raw.json
../prepro_mscoco_caption.sh
for downloading and tokenizing captions.python prepro_coco_annotation.py
to generate annotation json file for testing. ./download_cub.sh
to download the images in CUB-200-2011.data-prepro/MSCOCO_preprocess/extract_resnet_coco.py
to extract and pack features in CUB-200-2011.python get_split.py
to generate dataset split following the ECCV16 paper “Generating Visual Explanations”.python prepro_cub_annotation.py
to generate annotation json file for testing. python CUB_preprocess_token.py
for tokenization.Download all pretrained and adaption models:
Here are some example results where the captions are generated from these models:
![]() MSCOCO: A large air plane on a run way. CUB-200-2011: A large white and black airplane with a large beak. TGIF: A plane is flying over a field. Flickr30k: A large airplane is sitting on a runway. | ![]() MSCOCO: A traffic light is seen in front of a large building. CUB-200-2011: A yellow traffic light with a yellow light. TGIF: A traffic light is hanging on a pole. Flickr30k: A street sign is lit up in the dark |
![]() MSCOCO: A black dog sitting on the ground next to a window. CUB-200-2011: A black and white dog with a black head. TGIF: A dog is looking at something in the mirror. Flickr30k: A black dog is looking out of the window. | ![]() MSCOCO: A man riding a skateboard up the side of a ramp. CUB-200-2011: A man riding a skateboard on a white ramp. TGIF: A man is doing a trick on a skateboard. Flickr30k: A man in a blue shirt is doing a trick on a skateboard. |
The training codes are under the show-adapt-tell/
folder.
Simply run python main.py
for two steps of training:
Please set the Boolean value of "G_is_pretrain"
to True in main.py
to start pretraining the generator.
After pretraining, set "G_is_pretrain"
to False to start training the cross-domain model.
Free for personal or research use; for commercial use please contact me.