Skip to content

Commit 5e50f1f

Browse files
everettVTykdojodesmondcheongzx
authored
docs: Add Minhash Dedupe Example (#5165)
## Changes Made Added an end-to-end minhash deduplication example over the common crawl dataset. ## Checklist - [ x] Documented in API Docs (if applicable) - [x ] Documented in User Guide (if applicable) - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ x] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review) --------- Co-authored-by: YK <[email protected]> Co-authored-by: Desmond Cheong <[email protected]>
1 parent 778eba6 commit 5e50f1f

File tree

5 files changed

+2365
-0
lines changed

5 files changed

+2365
-0
lines changed

docs/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@
4848
* [Usage Telemetry](telemetry.md)
4949
* Examples
5050
* [Examples](examples/index.md)
51+
* [Web Text Deduplication](examples/minhash-dedupe.md)
5152
* [Document Processing](examples/document-processing.md)
5253
* [Audio Transcription](examples/audio-transcription.md)
5354
* [Generate Text Embeddings for Turbopuffer](examples/text-embeddings.md)

docs/examples/index.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,21 @@
2525
</a>
2626
</div>
2727

28+
<div class="example-card">
29+
<a href="./minhash-dedupe" class="example-image-link">
30+
<div class="example-image">
31+
<img src="../img/minhash-dedupe-cover.png" alt="MinHash Deduplication" onerror="this.style.display='none'; this.nextElementSibling.style.display='flex';">
32+
<div class="example-placeholder" style="display: none; background: linear-gradient(135deg, #36d1dc 0%, #5b86e5 100%);">
33+
<span>📃</span>
34+
</div>
35+
<div class="example-overlay">
36+
<h3>MinHash Deduplication on Common Crawl</h3>
37+
<p>Deduplicate web text at scale with MinHash, LSH, and Connected Components.</p>
38+
</div>
39+
</div>
40+
</a>
41+
</div>
42+
2843
<div class="example-card">
2944
<a href="./text-embeddings" class="example-image-link">
3045
<div class="example-image">

0 commit comments

Comments
 (0)