项目作者: marumalo

项目描述 :
An implementation of cross-lingual language model pre-training (XLM).
高级语言: Python
项目地址: git://github.com/marumalo/pytorch-xlm.git
创建时间: 2019-07-29T06:56:58Z
项目社区:https://github.com/marumalo/pytorch-xlm

开源协议:

下载


XLM: Cross-lingual Language Model Pretraining

An implementation of Cross-lingual Language Model Pretraining (XLM) using pytorch.
You can choose following three training models.

  • Causal language model ( -—task causal)
  • Masked language model ( -—task masked)
  • Translation language model ( -—task translation)



Settings

This code are depend on the following.

  • python==3.6.5
  • pytorch==1.1.0
  • torchtext==0.3.1
  1. git clone https://github.com/t080/pytorch-xlm.git
  2. cd ./pytorch-xlm
  3. pip install -r requirements.txt


Usages

When a causal language model or a masked language model are trained, you must give a monolingual corpus (.txt) to the --train option.

  1. python train.py \
  2. --task causal (or masked) \
  3. --train /path/to/train.txt \
  4. --savedir ./checkpoints \
  5. --gpu


When a translation language model is trained, you must give a parallel corpus (.tsv) to the --train option.

  1. python train.py \
  2. --task translation \
  3. --train /path/to/train.tsv \
  4. --savedir ./checkpoints \
  5. --gpu


References