library/re.html: Definition of the behavior of character groups [] is incorrect.

**Documentation**

A general point here is that regular expressions are extremely information-dense ways to define Chomsky type 3 grammars.
As such, it matters that the definition is precise.

Currently, https://docs.python.org/3/library/re.html says:

"If the first character of the set is '^', all the characters that are not in the set will be matched."

Now, "If the first character of the set is '^'", this implies that the character '^' is part in the set (welcome to the tautology club).
So, according to this specification, such a character group could never match a '^'.

This is of course in conflict with how everybody proficient with regexps understands how this works.

Commonly, the behavior below would be understood as "the universally expected correct behavior that is good to rely on":

 >>> re.match('[^a]', '^')
 <re.Match object; span=(0, 1), match='^'>

A correct statement would be:

If a character set definition starts with a '^', then it matches any character that is not matched by the character set definition obtained by stripping the leading '^' and subsequently escaping the then leading character, should it happen to be a '^' (so, [^^] matches any character except the 'caret' symbol).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

library/re.html: Definition of the behavior of character groups [] is incorrect. #94898

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

library/re.html: Definition of the behavior of character groups [] is incorrect. #94898

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions