jptranstokenizer.mainword.spacy_luw
- class jptranstokenizer.mainword.spacy_luw.SpacyluwTokenizer(do_lower_case: bool = False, normalize_text: bool = True)[source]
Bases:
MainTokenizerABCTokenizer to split into words using ja_gsdluw in spaCy. spaCy and ja_gsdluw is required to use. For installation, spaCy and ja_gsdluw You can import this module shortly:
>> from jptranstokenizer.mainword import SpacyluwTokenizer
- Parameters:
do_lower_case (
bool, optional, defaults toFalse) – Whether or not to lowercase the input when tokenizing.Defaults to None.normalize_text (
bool, optional, defaults toTrue) – Whether to apply unicode normalization to text before tokenization.
See also
megagonlabs/UD_Japanese-GSD https://github.com/megagonlabs/UD_Japanese-GSD
ja_gsdluw https://github.com/megagonlabs/UD_Japanese-GSD/releases/tag/r2.9-NE