Skip to content

panic bug #77

@intfish123

Description

@intfish123

encode pair: (塞车模拟器, REALGALLARDORAİNY赛车模拟器2018)

panic: runtime error: slice bounds out of range [:16] with capacity 14

goroutine 1757858 [running]:
github.com/sugarme/tokenizer/normalizer.(*NormalizedString).TransformRange(0xc0ee6a7e60, 0x7?, {0xc0ee6e8f00, 0xe, 0x164df60?}, 0x0)
	/go/pkg/mod/github.com/sugarme/tokenizer@v0.3.0/normalizer/normalized.go:473 +0x2db8
github.com/sugarme/tokenizer/normalizer.(*NormalizedString).Transform(0xc0ee6a7e60, {0xc0ee6e8f00, 0xe, 0x10}, 0x0)
	/go/pkg/mod/github.com/sugarme/tokenizer@v0.3.0/normalizer/normalized.go:859 +0x6f
github.com/sugarme/tokenizer/normalizer.(*NormalizedString).Filter(0xc0ee6a7e60, 0x1c6d8c8)
	/go/pkg/mod/github.com/sugarme/tokenizer@v0.3.0/normalizer/normalized.go:1063 +0x395
github.com/sugarme/tokenizer/normalizer.(*NormalizedString).RemoveAccents(...)
	/go/pkg/mod/github.com/sugarme/tokenizer@v0.3.0/normalizer/normalized.go:1144
github.com/sugarme/tokenizer/normalizer.stripAccents(...)
	/go/pkg/mod/github.com/sugarme/tokenizer@v0.3.0/normalizer/bert.go:188
github.com/sugarme/tokenizer/normalizer.(*BertNormalizer).Normalize(0xc08cb3f370, 0x17ccb80?)
	/go/pkg/mod/github.com/sugarme/tokenizer@v0.3.0/normalizer/bert.go:206 +0xa5
github.com/sugarme/tokenizer.(*AddedVocabulary).ExtractAndNormalize.func2(0x0?, 0xc168c53be0?)
	/go/pkg/mod/github.com/sugarme/tokenizer@v0.3.0/added-vocabulary.go:517 +0x36
github.com/sugarme/tokenizer.(*PreTokenizedString).Split(0xc0ee6ab170, 0xc168c53c68)
	/go/pkg/mod/github.com/sugarme/tokenizer@v0.3.0/pretokenizer.go:81 +0x16f
github.com/sugarme/tokenizer.(*AddedVocabulary).ExtractAndNormalize(0xc03e9eb250, {0xc0af6f1140?, 0x48414f?}, {0x1e43f00, 0xc08cb3f370})
	/go/pkg/mod/github.com/sugarme/tokenizer@v0.3.0/added-vocabulary.go:513 +0x85
github.com/sugarme/tokenizer.(*Tokenizer).EncodeSingleSequence.func1(0x0, 0x0, {0xc0af6f1140?, 0x0?})
	/go/pkg/mod/github.com/sugarme/tokenizer@v0.3.0/tokenizer.go:383 +0x57
github.com/sugarme/tokenizer.(*Tokenizer).EncodeSingleSequence(0xb?, {{0xc0f186a4d0?, 0xc0f9f046af?, 0xc0bc393708?}, 0xc0bc3936f8?}, 0xc0bc393718?, 0x467f6a?)
	/go/pkg/mod/github.com/sugarme/tokenizer@v0.3.0/tokenizer.go:425 +0xb5
github.com/sugarme/tokenizer.(*Tokenizer).Encode(0xc03e9eb200, {0x1e44bc0, 0xc0a4bbd180}, 0x1)
	/go/pkg/mod/github.com/sugarme/tokenizer@v0.3.0/tokenizer.go:462 +0x1a7
github.com/sugarme/tokenizer.(*Tokenizer).EncodeBatch.func1(0x3c)
	/go/pkg/mod/github.com/sugarme/tokenizer@v0.3.0/tokenizer.go:651 +0x90
created by github.com/sugarme/tokenizer.(*Tokenizer).EncodeBatch in goroutine 1757797
	/go/pkg/mod/github.com/sugarme/tokenizer@v0.3.0/tokenizer.go:648 +0xf5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions