Python has a lot of issues handling Unicode (there seem to be backwards compatibility issues – https://en.wikipedia.org/wiki/History_of_Python).
One common error you get will look like this:
Traceback (most recent call last): File "D:\Software\Anaconda3\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode characters in position 777-778: character maps to
If this occurs while you are doing an HTTP request, do this:
httpresponse = urlopen(url).read().decode('utf8')
response = json.loads(httpresponse)
If this occurs while you are reading a file in Beautiful soup, do this (the ‘rb’ triggers binary mode):
soup = BeautifulSoup(open(file, 'rb'), 'html.parser')
If this occurs in a print statement, like so, you may need to start logging to a file1.
print u'\u0420\u043e\u0441\u0441\u0438\u044f'
- http://stackoverflow.com/questions/10569438/how-to-print-unicode-character-in-python [↩]