Skip to content

Solved a problem about garbled russian words caused by wordlist encoding #3

@G-walk

Description

@G-walk

Sorry for my bad English first.

I'm a user from China, my Windows system is Chinese, which is very likely to be the cause for the issue. I found that when I was analying sentiment for russian words, I got every calculated value 0. I further found that when the project is loading russian.csv, every russian word just changed into gibberish. Therefore, none of the russians words could match the wordlist. I simply added encoding = 'UTF-8' to the line 59 of sentimental.py to solve the problem:

def load_word_list(self, filename):
        with open(filename, 'r', encoding='UTF-8') as f:
            reader = csv.DictReader(f)
            word_list = {row['word']: float(row['score']) for row in reader}
        self.word_list.update(word_list)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions