Skip to content

Multibyte non-utf-8 encoded pages are decoded incorrectly #7

@arshaw

Description

@arshaw

Reported by blejdf...@gmail.com, Jan 30, 2010

What steps will reproduce the problem?

Scrape the <title> of http://www.sony.jp/

res = scrapemark.scrape("<title>{{title}}</title>",
    url="http://www.sony.jp/")

Print the result

  print res['title']

What is the expected output? What do you see instead?
Expected result is 'ソニー製品情報 | ソニー'
Instead i get '\j[i | \j['

What version of the product are you using? On what operating system?
Version 0.9 tested on MacOSX and Ubuntu Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions