|
| 1 | +# Unsupervised Cross-lingual Representation Learning at Scale (XLM-RoBERTa) |
| 2 | + |
| 3 | +## Introduction |
| 4 | + |
| 5 | +XLM-R (XLM-RoBERTa) is scaled cross lingual sentence encoder. It is trained on `2.5T` of data across `100` languages data filtered from Common Crawl. XLM-R achieves state-of-the-arts results on multiple cross lingual benchmarks. |
| 6 | + |
| 7 | +## Pre-trained models |
| 8 | + |
| 9 | +Model | Description | # params | Download |
| 10 | +---|---|---|--- |
| 11 | +`xlmr.base.v0` | XLM-R using the BERT-base architecture | 250M | [xlm.base.v0.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/xlmr.base.v0.tar.gz) |
| 12 | +`xlmr.large.v0` | XLM-R using the BERT-large architecture | 560M | [xlm.large.v0.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/xlmr.large.v0.tar.gz) |
| 13 | + |
| 14 | +(Note: The above models are still under training, we will update the weights, once fully trained, the results are based on the above checkpoints.) |
| 15 | + |
| 16 | +## Results |
| 17 | + |
| 18 | +**[XNLI (Conneau et al., 2018)](https://arxiv.org/abs/1809.05053)** |
| 19 | + |
| 20 | +Model | en | fr | es | de | el | bg | ru | tr | ar | vi | th | zh | hi | sw | ur |
| 21 | +---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|--- |
| 22 | +`roberta.large.mnli` _(TRANSLATE-TEST)_ | 91.3 | 82.9 | 84.3 | 81.24 | 81.74 | 83.13 | 78.28 | 76.79 | 76.64 | 74.17 | 74.05 | 77.5 | 70.9 | 66.65 | 66.81 |
| 23 | +`xlmr.large.v0` _(TRANSLATE-TRAIN-ALL)_ | 88.7 | 85.2 | 85.6 | 84.6 | 83.6 | 85.5 | 82.4 | 81.6 | 80.9 | 83.4 | 80.9 | 83.3 | 79.8 | 75.9 | 74.3 |
| 24 | + |
| 25 | +## Example usage |
| 26 | + |
| 27 | +##### Load XLM-R from torch.hub (PyTorch >= 1.1): |
| 28 | +```python |
| 29 | +import torch |
| 30 | +xlmr = torch.hub.load('pytorch/fairseq', 'xlmr.large.v0') |
| 31 | +xlmr.eval() # disable dropout (or leave in train mode to finetune) |
| 32 | +``` |
| 33 | + |
| 34 | +##### Load XLM-R (for PyTorch 1.0 or custom models): |
| 35 | +```python |
| 36 | +# Download xlmr.large model |
| 37 | +wget https://dl.fbaipublicfiles.com/fairseq/models/xlmr.large.v0.tar.gz |
| 38 | +tar -xzvf xlmr.large.v0.tar.gz |
| 39 | + |
| 40 | +# Load the model in fairseq |
| 41 | +from fairseq.models.roberta import XLMRModel |
| 42 | +xlmr = XLMRModel.from_pretrained('/path/to/xlmr.large.v0', checkpoint_file='model.pt') |
| 43 | +xlmr.eval() # disable dropout (or leave in train mode to finetune) |
| 44 | +``` |
| 45 | + |
| 46 | +##### Apply Byte-Pair Encoding (BPE) to input text: |
| 47 | +```python |
| 48 | +tokens = xlmr.encode('Hello world!') |
| 49 | +assert tokens.tolist() == [ 0, 35378, 8999, 38, 2] |
| 50 | +xlmr.decode(tokens) # 'Hello world!' |
| 51 | +``` |
| 52 | + |
| 53 | +##### Extract features from XLM-R: |
| 54 | +```python |
| 55 | +# Extract the last layer's features |
| 56 | +last_layer_features = xlmr.extract_features(tokens) |
| 57 | +assert last_layer_features.size() == torch.Size([1, 5, 1024]) |
| 58 | + |
| 59 | +# Extract all layer's features (layer 0 is the embedding layer) |
| 60 | +all_layers = xlmr.extract_features(tokens, return_all_hiddens=True) |
| 61 | +assert len(all_layers) == 25 |
| 62 | +assert torch.all(all_layers[-1] == last_layer_features) |
| 63 | +``` |
| 64 | + |
| 65 | +## Citation |
| 66 | + |
| 67 | +```bibtex |
| 68 | +@article{, |
| 69 | + title = {Unsupervised Cross-lingual Representation Learning at Scale}, |
| 70 | + author = {Alexis Conneau and Kartikay Khandelwal and Naman Goyal |
| 71 | + and Vishrav Chaudhary and Guillaume Wenzek and Francisco Guzm\'an |
| 72 | + and Edouard Grave and Myle Ott and Luke Zettlemoyer and Veselin Stoyanov |
| 73 | + }, |
| 74 | + journal={}, |
| 75 | + year = {2019}, |
| 76 | +} |
| 77 | +``` |
0 commit comments