Skip to content

message_from_string return msg with wrong encode #94600

Open
@jinlianch

Description

@jinlianch

Bug report

image
The text contain non ascii chars. The content of text/plain should show as above. While use email.message_from_string to parse the mime, message.get_payload(decode=True) decode "text/plain" part return wrong encode message.

Debug the code, found here https://github.com/python/cpython/blob/3.10/Lib/email/message.py#L278, get_payload return payload.encode('raw-unicode-escape'), but when I use message.get_charsets() it return utf-8, it doesn't match the encode charset. So the result is wrong. The final result is below, the charset is wrong, then I can't get the correct message.

image

Your environment
Python 3.9.10, macOS Catalina, version 10.15.5

Test code is below.
Runing command: python3 t.py TextBased.eml

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import logging
import sys
import email

def get_all_block(message, block_type = "text/plain"):
    content_type = message.get_content_type()
    main_type = message.get_content_maintype()
    if main_type == "multipart":
        if message.is_multipart():
            block = None
            for part in message.get_payload():
                result = get_all_block(part, block_type)
                if result:
                    if block is None:
                        block = result
                    else:
                        block += result
            return block
        else:
            return None
    elif content_type == block_type:
        result = message.get_payload(decode=True)
        if result is not None:
            charsets = message.get_charsets()
            print('charsets', charsets, result)
        return result
    else:
        return None

if __name__ == '__main__':
    fname = sys.argv[1]
    fp = open(fname, 'rb')
    mime = fp.read().decode('utf-8', errors='ignore')
    message = email.message_from_string(mime)
    text = get_all_block(message, "text/plain")

TextBased.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions