Commit History - (may be incomplete: for full details, see links to repositories near top of page) |
Commit | Credits | Log message |
0.19.1 21 Apr 2024 08:18:00 |
Hiroki Tagato (tagattie) |
textproc/py-tokenizers: update to 0.19.1
Changelog:
- https://github.com/huggingface/tokenizers/releases/tag/v0.19.0
- https://github.com/huggingface/tokenizers/releases/tag/v0.19.1
Reported by: Repology |
0.15.2_2 23 Mar 2024 09:41:46 |
Mikael Urankar (mikael) |
lang/rust: Bump revisions after 1.77.0
PR: 277786 |
0.15.2_1 19 Feb 2024 11:59:23 |
Mikael Urankar (mikael) |
lang/rust: Bump revisions after 1.76.0
PR: 276920 |
0.15.2 14 Feb 2024 09:17:15 |
Hiroki Tagato (tagattie) |
textproc/py-tokenizers: update to 0.15.2
While here, enable tests.
Changelog: https://github.com/huggingface/tokenizers/releases/tag/v0.15.2
Reported by: portscout |
0.15.1 12 Feb 2024 08:34:14 |
Hiroki Tagato (tagattie) |
textproc/py-tokenizers: add port: Fast state-of-the-art tokenizers optimized for
research and production
Provides an implementation of today's most used tokenizers, with a
focus on performance and versatility.
Main features:
- Train new vocabularies and tokenize, using today's most used
tokenizers.
- Extremely fast (both training and tokenization), thanks to the Rust
implementation. Takes less than 20 seconds to tokenize a GB of text
on a server's CPU.
- Easy to use, but also extremely versatile.
- Designed for research and production.
- Normalization comes with alignments tracking. It's always possible
to get the part of the original sentence that corresponds to a given
token.
- Does all the pre-processing: Truncate, Pad, add the special tokens
your model needs.
WWW: https://github.com/huggingface/tokenizers |