You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source-doc-builder/quicktour.mdx
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ is both easy to use and blazing fast.
7
7
## Build a tokenizer from scratch
8
8
9
9
To illustrate how fast the 🤗 Tokenizers library is, let's train a new
10
-
tokenizer on [wikitext-103](https://blog.einstein.ai/the-wikitext-long-term-dependency-language-modeling-dataset/)
10
+
tokenizer on [wikitext-103](https://www.salesforce.com/blog/the-wikitext-long-term-dependency-language-modeling-dataset/)
11
11
(516M of text) in just a few seconds. First things first, you will need
12
12
to download this dataset and unzip it with:
13
13
@@ -287,7 +287,7 @@ with the `Tokenizer.encode` method:
287
287
288
288
This applied the full pipeline of the tokenizer on the text, returning
289
289
an `Encoding` object. To learn more
290
-
about this pipeline, and how to apply (or customize) parts of it, check out [this page](pipeline).
290
+
about this pipeline, and how to apply (or customize) parts of it, check out [this page](https://github.com/huggingface/tokenizers/blob/main/docs/source-doc-builder/pipeline.mdx).
291
291
292
292
This `Encoding` object then has all the
293
293
attributes you need for your deep learning model (or other). The
@@ -835,4 +835,4 @@ as long as you have downloaded the file `bert-base-uncased-vocab.txt` with
0 commit comments