Open
Description
Bug report
mailbox.mbox (class mbox) builds the table of contents (_generate_toc() ) by matching on lines starting with b'From '.
This is RFC compliant, however, malicious emails/senders will sometimes intentionally break this causing unexpected behavior.
I suggest this be considered a bug as :
- In these cases it is not possible to ask the sender to fix their MTA but we still need to parse the message
- There are cases where good senders do this by mistake - e.g. poor line wrapping in quoted-printable content
- Many common end-user email programs gracefully handle this scenario already
I propose exposing a custom 'From ' line delimiter with existing behavior maintained as a default :
diff of mailbox.py :
847c847
< def __init__(self, path, factory=None, create=True, fromline=b'From '):
---
> def __init__(self, path, factory=None, create=True):
850d849
< self._fromline = fromline
865c864
< if line.startswith(self._fromline):
---
> if line.startswith(b'From '):
There are more sophisticated methods which could be explored; for example is_from() function in mutt, or a regex over a byte array.
Your environment
python 3.9.13