Skip to content

mailbox.mbox malformed 'From ' lines not being detected/handled #93376

Open
@Soddentrough

Description

@Soddentrough

Bug report

mailbox.mbox (class mbox) builds the table of contents (_generate_toc() ) by matching on lines starting with b'From '.
This is RFC compliant, however, malicious emails/senders will sometimes intentionally break this causing unexpected behavior.

I suggest this be considered a bug as :

  • In these cases it is not possible to ask the sender to fix their MTA but we still need to parse the message
  • There are cases where good senders do this by mistake - e.g. poor line wrapping in quoted-printable content
  • Many common end-user email programs gracefully handle this scenario already

I propose exposing a custom 'From ' line delimiter with existing behavior maintained as a default :

diff of mailbox.py :

847c847
<     def __init__(self, path, factory=None, create=True, fromline=b'From '):
---
>     def __init__(self, path, factory=None, create=True):
850d849
<         self._fromline = fromline
865c864
<             if line.startswith(self._fromline):
---
>             if line.startswith(b'From '):

There are more sophisticated methods which could be explored; for example is_from() function in mutt, or a regex over a byte array.

Your environment

python 3.9.13

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtopic-emailtype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions