jptranstokenizer.subword.sentencepiece
- class jptranstokenizer.subword.sentencepiece.SentencepieceTokenizer(vocab_file: str | None = None, sp_model_kwargs: Dict[str, Any] | None = None, sp_model: Any | None = None)[source]
Bases:
objectRuns sentencepiece tokenization. You can import this module shortly:
>> from jptranstokenizer.subword import SentencepieceTokenizer
- Parameters:
vocab_file (
str) – The sentencepiece model file path.sp_model_kwargs (
Dict[str, Any], optional) – Arguments of dict to passsentencepiece.SentencePieceProcessor.sp_model (
sentencepiece.SentencePieceProcessor, optional) – Already trainedSentencePieceProcessormodel.