Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible slowdown of regex searching in 3.11 #91404

Open
markshannon opened this issue Apr 7, 2022 · 7 comments
Open

Possible slowdown of regex searching in 3.11 #91404

markshannon opened this issue Apr 7, 2022 · 7 comments
Labels
3.11 expert-regex performance

Comments

@markshannon
Copy link

@markshannon markshannon commented Apr 7, 2022

BPO 47248
Nosy @markshannon, @serhiy-storchaka, @animalize, @sweeneyde

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2022-04-07.11:30:33.473>
labels = ['3.11', 'performance']
title = 'Possible slowdown of regex searching in 3.11'
updated_at = <Date 2022-04-08.07:22:53.130>
user = 'https://github.com/markshannon'

bugs.python.org fields:

activity = <Date 2022-04-08.07:22:53.130>
actor = 'malin'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = []
creation = <Date 2022-04-07.11:30:33.473>
creator = 'Mark.Shannon'
dependencies = []
files = []
hgrepos = []
issue_num = 47248
keywords = ['3.11regression']
message_count = 4.0
messages = ['416923', '416928', '416959', '416961']
nosy_count = 4.0
nosy_names = ['Mark.Shannon', 'serhiy.storchaka', 'malin', 'Dennis Sweeney']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue47248'
versions = ['Python 3.11']

@markshannon
Copy link
Author

@markshannon markshannon commented Apr 7, 2022

The 3 regular expression benchmarks in the pyperformance suite, regex_v8, regex_effbot and regex_dna show slowdowns between 3% and 10%.

Looking at the stats, nothing seems wrong with specialization or the memory optimizations.

Which strongly suggests a regression in the sre module itself, but I can't say so for certain.

@markshannon markshannon added 3.11 performance labels Apr 7, 2022
@animalize
Copy link

@animalize animalize mannequin commented Apr 7, 2022

Could you give the two versions? I will do a git bisect.

I tested 356997c~1 and 356997c [1], msvc2022 non-pgo release build:

# regex_dna ###
an +- std dev: 151 ms +- 1 ms -> 152 ms +- 1 ms: 1.01x slower
t significant

# regex_effbot ###
an +- std dev: 2.47 ms +- 0.01 ms -> 2.46 ms +- 0.02 ms: 1.00x faster
t significant

# regex_v8 ###
an +- std dev: 21.7 ms +- 0.1 ms -> 22.4 ms +- 0.1 ms: 1.03x slower
gnificant (t=-30.82)

356997c

@sweeneyde
Copy link

@sweeneyde sweeneyde commented Apr 8, 2022

Possibly related to the new atomic grouping support from #76163?

@animalize
Copy link

@animalize animalize mannequin commented Apr 8, 2022

Possibly related to the new atomic grouping support from #76163?

It seems not likely.
I will do some benchmarks for this issue, more information (version/platform) is welcome.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@animalize
Copy link

@animalize animalize commented Apr 11, 2022

I wrote a script to automatically benchmark some commits about re module in recent months.

Platform: Windows 10, WSL2, Ubuntu 20.04, gcc-9.4.0.
Configure: --with-lto. PGO is not enabled.

The results are in attached file results.txt. There is no 10% performance regression, my_re_benchmark is 1.10x faster.

Compare from 08eb754~1 (2022-03-21) to b09184b (2022-04-07):

  regex_compile:   110 ms +- 1 ms      -> 110 ms +- 1 ms
  regex_dna:       149 ms +- 1 ms      -> 143 ms +- 1 ms
  regex_effbot:    2.23 ms +- 0.03 ms  -> 1.98 ms +- 0.01 ms
  regex_v8:        16.5 ms +- 0.1 ms   -> 16.9 ms +- 0.1 ms
  my_re_benchmark: 13.0 sec +- 0.0 sec -> 11.8 sec +- 0.0 sec

my_re_benchmark is a benchmark made by myself, it uses 16 patterns to process 100 MiB real text data.
I used my_re_benchmark to test each commit 30 times, then save/compare with pyperf.

results.txt

@brandtbucher
Copy link

@brandtbucher brandtbucher commented Apr 13, 2022

Not sure how we're supposed to be linking issues to PRs yet, but check out #91495.

@brandtbucher
Copy link

@brandtbucher brandtbucher commented Apr 15, 2022

Leaving this open, since it may still be worth exploring where the prior 3.10-to-3.11 slowdown came from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 expert-regex performance
Projects
None yet
Development

No branches or pull requests

5 participants