[2.7] bpo-26544: Make platform.libc_ver() less slow #10868

vstinner · 2018-12-03T12:57:51Z

Coarse benchmark on Fedora 29: 1.6 sec => 0.1 sec.

Co-Authored-By: Antoine Pitrou solipsis@pitrou.net

(cherry-picked from commit ba7c226)

https://bugs.python.org/issue26544

Coarse benchmark on Fedora 29: 1.6 sec => 0.1 sec. Co-Authored-By: Antoine Pitrou <solipsis@pitrou.net> (cherry-picked from commit ba7c226)

vstinner · 2018-12-03T12:58:48Z

I decided to attach my PR to https://bugs.python.org/issue26544. The cherry-picked commit had no associated issue.

pitrou

+1

pitrou · 2018-12-03T13:17:28Z

Lib/platform.py

@@ -194,7 +194,10 @@ def libc_ver(executable=sys.executable,lib='',version='', chunksize=2048):
        binary = f.read(chunksize)
        pos = 0
        while pos < len(binary):
-            m = _libc_search.search(binary,pos)


Wow. That must have been slow.

Honestly, I'm disappointed by the bad performance of re.search(). For example, re should faster since it is supposed to search for "GLIB" and "libc" patterns "at the same time". For example, it could use two bloom filters at the "same time". But no, it's 16x faster. I don't get it, but I never looked into _sre.c.

serhiy-storchaka

This adds an overhead when 'libc' is occurred only at the beginning of a large chunk. I have two suggestions for solving this.

serhiy-storchaka · 2018-12-03T14:23:26Z

Lib/platform.py

@@ -194,7 +194,10 @@ def libc_ver(executable=sys.executable,lib='',version='', chunksize=2048):
        binary = f.read(chunksize)
        pos = 0
        while pos < len(binary):
-            m = _libc_search.search(binary,pos)
+            if 'libc' in binary or 'GLIBC' in binary:


Maybe use find()?

Suggested change

if 'libc' in binary or 'GLIBC' in binary:

if binary.find('libc', pos) >= 0 or binary.find('GLIBC', pos) >= 0:

What is the difference between ('libc' in binary) and (binary.find('libc', pos) >= 0), they are supposed to be equavalent, no? Last time I looked at micro-optimization, an operator was faster than a method call.

They are equivalent only when pos == 0. If pos != 0, it may be that 'libc' in binary is True while binary.find('libc', pos) >= 0 is False. For example if binary is 'libc' + 'x'*1000000 and pos >= 4.

If you know that the regex will never match before offset N, maybe we use file.seek(N)? I don't know where the string is supposed to match, so I prefer to avoid to make any assumption.

... By the way, parsing a binary file to find a string, to extract a version number is really ugly. I would prefer that the libc provides its own version at runtime.

IMHO running "ldd --version" or directly "/lib64/libc.so.6" would be less ugly:

$ ldd --version ldd (GNU libc) 2.28 ... $ /lib64/libc.so.6 GNU C Library (GNU libc) stable release version 2.28. ...

serhiy-storchaka · 2018-12-03T14:23:26Z

Lib/platform.py

@@ -194,7 +194,10 @@ def libc_ver(executable=sys.executable,lib='',version='', chunksize=2048):
        binary = f.read(chunksize)
        pos = 0
        while pos < len(binary):
-            m = _libc_search.search(binary,pos)
+            if 'libc' in binary or 'GLIBC' in binary:


Alternate suggestion:

Suggested change

if 'libc' in binary or 'GLIBC' in binary:

if pos or 'libc' in binary or 'GLIBC' in binary:

I don't see the point of avoiding the two "in" if pos==0? Does it provide any speedup?

This code comes from the master branch. I have have a clever optimization, maybe write it in the master branch first, no?

This change already makes the function 16x faster, it should be enough no?

It avoids two "in" if pos != 0.

If pos == 0, we test just read block. In common case it doesn't contain 'libc', so this optimization makes sense. If pos != 0, then 'libc' was already found in this block, so 'libc' in binary will be always true, and performing this test just wastes a time.

vstinner · 2018-12-03T15:29:24Z

@serhiy-storchaka: This change is for Python 2.7, it's "just" a backport of an old optimization made in the master branch in 2011 (commit ba7c226). Are you ok if I merge this change in 2.7?

As I wrote, if you want to optimize the code further, I would prefer to do it in master, rather than in the stable 2.7 branch.

serhiy-storchaka

If this is just a backport from 3.x, it LGTM as is.

serhiy-storchaka · 2018-12-03T15:43:46Z

Lib/platform.py

@@ -194,7 +194,10 @@ def libc_ver(executable=sys.executable,lib='',version='', chunksize=2048):
        binary = f.read(chunksize)
        pos = 0
        while pos < len(binary):
-            m = _libc_search.search(binary,pos)
+            if 'libc' in binary or 'GLIBC' in binary:


They are equivalent only when pos == 0. If pos != 0, it may be that 'libc' in binary is True while binary.find('libc', pos) >= 0 is False. For example if binary is 'libc' + 'x'*1000000 and pos >= 4.

serhiy-storchaka · 2018-12-03T15:43:46Z

Lib/platform.py

@@ -194,7 +194,10 @@ def libc_ver(executable=sys.executable,lib='',version='', chunksize=2048):
        binary = f.read(chunksize)
        pos = 0
        while pos < len(binary):
-            m = _libc_search.search(binary,pos)
+            if 'libc' in binary or 'GLIBC' in binary:


It avoids two "in" if pos != 0.

If pos == 0, we test just read block. In common case it doesn't contain 'libc', so this optimization makes sense. If pos != 0, then 'libc' was already found in this block, so 'libc' in binary will be always true, and performing this test just wastes a time.

vstinner · 2018-12-03T16:02:28Z

I created https://bugs.python.org/issue35389 to continue the discussion :-)

bpo-26544: Make platform.libc_ver() less slow

afa3387

Coarse benchmark on Fedora 29: 1.6 sec => 0.1 sec. Co-Authored-By: Antoine Pitrou <solipsis@pitrou.net> (cherry-picked from commit ba7c226)

vstinner requested review from pitrou and serhiy-storchaka Dec 3, 2018

the-knights-who-say-ni added the CLA signed label Dec 3, 2018

bedevere-bot added the awaiting merge label Dec 3, 2018

vstinner changed the title ~~bpo-26544: Make platform.libc_ver() less slow~~ [2.7] bpo-26544: Make platform.libc_ver() less slow Dec 3, 2018

vstinner added the skip news label Dec 3, 2018

pitrou approved these changes Dec 3, 2018

View changes

serhiy-storchaka reviewed Dec 3, 2018

View changes

serhiy-storchaka approved these changes Dec 3, 2018

View changes

vstinner merged commit 8687bd8 into python:2.7 Dec 3, 2018

bedevere-bot removed the awaiting merge label Dec 3, 2018

vstinner deleted the libc_ver27 branch Dec 3, 2018

[2.7] bpo-26544: Make platform.libc_ver() less slow #10868

[2.7] bpo-26544: Make platform.libc_ver() less slow #10868

vstinner commented Dec 3, 2018 •

edited by bedevere-bot

vstinner commented Dec 3, 2018

pitrou left a comment

pitrou Dec 3, 2018

vstinner Dec 3, 2018

serhiy-storchaka left a comment

serhiy-storchaka Dec 3, 2018

vstinner Dec 3, 2018

serhiy-storchaka Dec 3, 2018

vstinner Dec 3, 2018

serhiy-storchaka Dec 3, 2018

vstinner Dec 3, 2018

serhiy-storchaka Dec 3, 2018

vstinner commented Dec 3, 2018

serhiy-storchaka left a comment

serhiy-storchaka Dec 3, 2018

serhiy-storchaka Dec 3, 2018

vstinner commented Dec 3, 2018

	if 'libc' in binary or 'GLIBC' in binary:
	if binary.find('libc', pos) >= 0 or binary.find('GLIBC', pos) >= 0:

	if 'libc' in binary or 'GLIBC' in binary:
	if pos or 'libc' in binary or 'GLIBC' in binary:

[2.7] bpo-26544: Make platform.libc_ver() less slow #10868

[2.7] bpo-26544: Make platform.libc_ver() less slow #10868

Conversation

vstinner commented Dec 3, 2018 • edited by bedevere-bot

vstinner commented Dec 3, 2018

pitrou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vstinner commented Dec 3, 2018

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vstinner commented Dec 3, 2018

vstinner commented Dec 3, 2018 •

edited by bedevere-bot