You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/roberta/README.md
+38-12Lines changed: 38 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,15 +2,15 @@
2
2
3
3
https://arxiv.org/abs/1907.11692
4
4
5
-
###Introduction
5
+
## Introduction
6
6
7
7
RoBERTa iterates on BERT's pretraining procedure, including training the model longer, with bigger batches over more data; removing the next sentence prediction objective; training on longer sequences; and dynamically changing the masking pattern applied to the training data. See the associated paper for more details.
8
8
9
9
### What's New:
10
10
11
11
- August 2019: Added [tutorial for pretraining RoBERTa using your own data](README.pretraining.md).
Please first download the [preprocessed WMT'16 En-De data provided by Google](https://drive.google.com/uc?export=download&id=0B_bZck-ksdkpM25jRUN2X2UxMm8).
14
+
First download the [preprocessed WMT'16 En-De data provided by Google](https://drive.google.com/uc?export=download&id=0B_bZck-ksdkpM25jRUN2X2UxMm8).
15
+
15
16
Then:
16
17
17
-
1. Extract the WMT'16 En-De data:
18
+
##### 1. Extract the WMT'16 En-De data
18
19
```bash
19
20
TEXT=wmt16_en_de_bpe32k
20
21
mkdir -p $TEXT
21
22
tar -xzvf wmt16_en_de.tar.gz -C $TEXT
22
23
```
23
24
24
-
2. Preprocess the dataset with a joined dictionary:
25
+
##### 2. Preprocess the dataset with a joined dictionary
25
26
```bash
26
-
fairseq-preprocess --source-lang en --target-lang de \
0 commit comments