Open
Description
Bug report
The text contain non ascii chars. The content of text/plain should show as above. While use email.message_from_string to parse the mime, message.get_payload(decode=True) decode "text/plain" part return wrong encode message.
Debug the code, found here https://github.com/python/cpython/blob/3.10/Lib/email/message.py#L278, get_payload return payload.encode('raw-unicode-escape'), but when I use message.get_charsets() it return utf-8, it doesn't match the encode charset. So the result is wrong. The final result is below, the charset is wrong, then I can't get the correct message.
Your environment
Python 3.9.10, macOS Catalina, version 10.15.5
Test code is below.
Runing command: python3 t.py TextBased.eml
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import logging
import sys
import email
def get_all_block(message, block_type = "text/plain"):
content_type = message.get_content_type()
main_type = message.get_content_maintype()
if main_type == "multipart":
if message.is_multipart():
block = None
for part in message.get_payload():
result = get_all_block(part, block_type)
if result:
if block is None:
block = result
else:
block += result
return block
else:
return None
elif content_type == block_type:
result = message.get_payload(decode=True)
if result is not None:
charsets = message.get_charsets()
print('charsets', charsets, result)
return result
else:
return None
if __name__ == '__main__':
fname = sys.argv[1]
fp = open(fname, 'rb')
mime = fp.read().decode('utf-8', errors='ignore')
message = email.message_from_string(mime)
text = get_all_block(message, "text/plain")