Doc: Update references and examples of old, unsupported OSes and uarches #92791

CAM-Gerlach · 2022-05-13T23:32:54Z

As a followup to PR #92529 resolving issue #76773 , I also noticed a few other references to legacy OSes that are no longer supported by Python (or their own developers) per PEP 11 (PEP-11) that I elided or updated. More impactfully, I rewrote several descriptions and examples (e.g. in socket and struct) that were very out of date with modern 64-bit architectures. To keep this narrowly scoped and backportable, I avoided changes that were too broad or could potentially affect references that were still of some modern value.

serhiy-storchaka · 2022-05-14T05:50:54Z

Doc/howto/sockets.rst

-would be two bytes, while binary is four. Of course, this doesn't fit well with
-fixed-length messages. Decisions, decisions.
+amount of the time, all those integers have the value 0, or maybe 1.
+The string ``"0"`` would be two bytes, while a full 64-bit word would be 8.


It does not fit with functions ntohl, htonl, ntohs, htons, described in previous paragraph, which only work with 16 and 32 bit integers.

Yes, but I'm not sure it has to, since those two paragraphs are describing separate topics, and the respective integer widths are each appropriate to the point being made—the standard library functions are nominally for smaller 16 and 32-bit integers, while modern machines and Python's own native integer type are 64-bit, where encoding small integers as ASCII/UTF-8 has the greatest potential size advantage over binary, while also avoiding the need to potentially convert endianness, particularly so when the aforementioned functions are not available for 64-bit ints (though perhaps one could be added).

Is there something specific you'd like me to add/clarify here?

It is obvious to me that this paragraph refers to the previous one. It's mention of "all those longs" is a reference to "long" in ntohl and htonl which is always 32-bit.

Also, on most modern 64-bit platforms the standard integer type int in C is 32-bit. On Windows even long is 32-bit.

It is obvious to me that this paragraph refers to the previous one. It's mention of "all those longs" is a reference to "long" in ntohl and htonl which is always 32-bit.

Okay, thanks for providing something specific. I replaced that wording with "most integers", as well as clarified the terminology in the following sentence as well. If there's something else specific you would like me to change, let me know.

Also, on most modern 64-bit platforms the standard integer type int in C is 32-bit. On Windows even long is 32-bit.

Yes, if by "standard" you mean the C type that has the name int. Of course, this is less relevant for Python users, given Python only has one native integer type, nominally 64 bits, and this how-to does not focus on the C API. In any case, the use of a native 64-bit integer is appropriate to making the point of the section, that with wider binary types, representing small numbers as text may actually be more efficient.

serhiy-storchaka · 2022-05-14T05:50:54Z

Doc/howto/sockets.rst

-Motorola chip will represent a 16 bit integer with the value 1 as the two hex
-bytes 00 01. Intel and DEC, however, are byte-reversed - that same 1 is 01 00.
+that not all machines use the same formats for binary data. For example,
+network byte order is big-endian, with the most significant byte first,


It is not clear why the network byte order is big-endian at first place.

What big-endian platforms are still in use?

It is not clear why the network byte order is big-endian at first place.

I'm not sure its within the scope of this how-to doc to describe the origins and history of network byte order, but to summarize, most CPU arches at the time except for Intel used it (which was a primary driver of the later rise of little-ended, out of compatibility rather than technical superiority), since it was more "natural", and in particular for network operations transmitting all the bits "in order" was necessary/helpful for some early forms of transmission, encoding and error correction schemes.

What big-endian platforms are still in use?

As mentioned elsewhere in these changes, IBM z/Architecture, IBM Power, (Tier 2/3 Python platforms) SPARC, (Solaris) many microcontrollers and various other smaller ones, and RISC-V, ARM, MIPS (Longsoon, Sunway, etc) and others are bi-endian (though mostly little-endian in actual CPU designs). But as mentioned in the section, little-endian predominates in the consumer CPU space.

I know, but it is not clear for the reader. The old text mentions a concrete big-endian processor. Maybe just replace it with a more modern example? Or at least add "for historical reasons" when mention the network byte order. Or better leave both, the user should know that the native order can be different from little-endian.

For better or for worse, there is no real modern example, at least that more than a infinitesimal fraction of modern Python installations run on. The only architecture Python supports even at Tier 3 level that is not little-endian is IBM s390x mainframes running Linux, with deployments on the order of perhaps up to a few thousand total installs, compared with many millions to perhaps nearly on the order of a billion little-endian-CPU machines with some form of Python. And I do note that only most (not all) common processors are little-endian.

To note, it could equally be argued that the current prevalence of little-endian on the CPU side is also for historical reasons, not technical merit and despite big-endian being much more natural and logical, due to the historical dominance of Intel x86 CPUs, the resulting pressure to ease compatibility with software developed for them and the subsequent network effect that essentially required all new CPUs to be little-endian to have a chance at success in the market.

In any case, this is (supposed to be, at least; we probably need to refactor it further) a "how-to" guide, which per the Diataxis framework we've adopted for the Python docs, is concerned (naturally) with the "how" to practically accomplish a set of tasks, not the "why". The historical details behind why network byte order is big-endian are not necessary, or directly helpful to users looking for how to practically accomplish socket-related tasks in Python, and are thus well out of scope here. Users curious about such a topic are free to consult a more appropriate resource, such as Wikipedia. However, I've linked network byte order to the latter to make that easier to do.

serhiy-storchaka · 2022-05-14T05:50:54Z

Doc/library/platform.rst

@@ -53,7 +53,7 @@ Cross Platform

 .. function:: machine()

-   Returns the machine type, e.g. ``'i386'``. An empty string is returned if the
+   Returns the machine type, e.g. ``'AMD64'``. An empty string is returned if the


Is it what returned on the AMD processors? On my computer it is 'x86_64'.

Update also the docstring of platform.machine().

The original 'proper' name of what is also called the "x86-64" architecture is AMD64, as it was created by AMD and later adopted by Intel when Intel's IA-64 (Itanium) architecture failed in the market. What is returned depends (AFAIK) on the OS, not the CPU; Windows and many (most?) Linux distros call it AMD64 internally, while Apple and some others call it x86-64. Running a freshly built from main Python, as well as 3.9 and 3.10 release builds, platform.machine() returns AMD64 on my Windows system with a stock Intel i7-3730.

Update also the docstring of platform.machine().

I can, but as this PR currently only modifies the docs, I'd rather do that separately; there are other places in the codebase that should be updated too, for consistency.

Docstrings are a part of the docs. If we do not update them together with the rst files, they are left desynchronized.

This only affects the choice of one specific example, both of which are equally accurate and valid, and which will be synchronized if and when I do a similar pass through the codebase itself. Several of the other platform functions, e.g. system(), have differing examples on each. And given the rest of this PR scrupulously avoids touching the code, adding this one trivial change has non-trivial cost, of triggering and requiring a whole suite of extra builds/CIs, and increasing risk for backporting.

Given the change was minor and not strictly required in the first place, and I almost didn't make it, if this is going to be a big issue I'll just drop this change instead, since its not at all worth the cost.

serhiy-storchaka · 2022-05-14T05:50:54Z

Doc/faq/library.rst

@@ -483,8 +483,7 @@ including :func:`~shutil.copyfile`, :func:`~shutil.copytree`, and
 How do I copy a file?
 ---------------------

-The :mod:`shutil` module contains a :func:`~shutil.copyfile` function.  Note
-that on MacOS 9 it doesn't copy the resource fork and Finder info.
+The :mod:`shutil` module contains a :func:`~shutil.copyfile` function.


And on Windows it does not copy the NTFS file streams.

Indeed, though it also they are also rarely used for much of significance. In any case, I ended up expanding that description to mention both resource forks and ADSes, as well as file metadata, pointing out that shutil.copy2 will copy most (though not all) of that.

CAM-Gerlach added 2 commits May 13, 2022

Doc: Update/elide mentions of EoL OSes that Python no longer supports

6032ecb

Doc: Update several old explainations/examples for modern 64-bit arches

534de62

bedevere-bot added docs awaiting review labels May 13, 2022

CAM-Gerlach added skip issue skip news needs backport to 3.9 needs backport to 3.10 needs backport to 3.11 labels May 13, 2022

CAM-Gerlach changed the title ~~Doc: Update references and examples with old, unsupported OSes and uarches~~ Doc: Update references and examples of old, unsupported OSes and uarches May 13, 2022

serhiy-storchaka reviewed May 14, 2022

View changes

Doc: Add note to FAQ about copying file ADS/forks & metadata

2a67a2a

CAM-Gerlach requested a review from serhiy-storchaka May 16, 2022

Doc: Further clarity integer/byte order details in socket howto

7fc91e7

serhiy-storchaka removed the needs backport to 3.9 label May 20, 2022

python / cpython Public

Doc: Update references and examples of old, unsupported OSes and uarches #92791

Doc: Update references and examples of old, unsupported OSes and uarches #92791

CAM-Gerlach commented May 13, 2022

serhiy-storchaka May 14, 2022

CAM-Gerlach May 16, 2022

serhiy-storchaka May 16, 2022

CAM-Gerlach May 16, 2022

serhiy-storchaka May 14, 2022

CAM-Gerlach May 16, 2022

serhiy-storchaka May 16, 2022

CAM-Gerlach May 16, 2022 •

edited

serhiy-storchaka May 14, 2022

CAM-Gerlach May 16, 2022

serhiy-storchaka May 16, 2022

CAM-Gerlach May 16, 2022 •

edited

serhiy-storchaka May 14, 2022

CAM-Gerlach May 16, 2022

python / cpython Public

Doc: Update references and examples of old, unsupported OSes and uarches #92791

Are you sure you want to change the base?

Doc: Update references and examples of old, unsupported OSes and uarches #92791

Conversation

CAM-Gerlach commented May 13, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CAM-Gerlach May 16, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CAM-Gerlach May 16, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CAM-Gerlach May 16, 2022 •

edited

CAM-Gerlach May 16, 2022 •

edited