Skip to content

Commit ec54228

Browse files
Update quicktour.mdx re: Issue #1625 (#1846)
Update broken wikitext-103 and tokenizers-pipeline links
1 parent b0464b2 commit ec54228

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

docs/source-doc-builder/quicktour.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ is both easy to use and blazing fast.
77
## Build a tokenizer from scratch
88

99
To illustrate how fast the 🤗 Tokenizers library is, let's train a new
10-
tokenizer on [wikitext-103](https://blog.einstein.ai/the-wikitext-long-term-dependency-language-modeling-dataset/)
10+
tokenizer on [wikitext-103](https://www.salesforce.com/blog/the-wikitext-long-term-dependency-language-modeling-dataset/)
1111
(516M of text) in just a few seconds. First things first, you will need
1212
to download this dataset and unzip it with:
1313

@@ -287,7 +287,7 @@ with the `Tokenizer.encode` method:
287287

288288
This applied the full pipeline of the tokenizer on the text, returning
289289
an `Encoding` object. To learn more
290-
about this pipeline, and how to apply (or customize) parts of it, check out [this page](pipeline).
290+
about this pipeline, and how to apply (or customize) parts of it, check out [this page](https://github.com/huggingface/tokenizers/blob/main/docs/source-doc-builder/pipeline.mdx).
291291

292292
This `Encoding` object then has all the
293293
attributes you need for your deep learning model (or other). The
@@ -835,4 +835,4 @@ as long as you have downloaded the file `bert-base-uncased-vocab.txt` with
835835
wget https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt
836836
```
837837
</python>
838-
</tokenizerslangcontent>
838+
</tokenizerslangcontent>

0 commit comments

Comments
 (0)