Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc: Update references and examples of old, unsupported OSes and uarches #92791

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

CAM-Gerlach
Copy link
Member

@CAM-Gerlach CAM-Gerlach commented May 13, 2022

As a followup to PR #92529 resolving issue #76773 , I also noticed a few other references to legacy OSes that are no longer supported by Python (or their own developers) per PEP 11 (PEP-11) that I elided or updated. More impactfully, I rewrote several descriptions and examples (e.g. in socket and struct) that were very out of date with modern 64-bit architectures. To keep this narrowly scoped and backportable, I avoided changes that were too broad or could potentially affect references that were still of some modern value.

@bedevere-bot bedevere-bot added docs awaiting review labels May 13, 2022
@CAM-Gerlach CAM-Gerlach changed the title Doc: Update references and examples with old, unsupported OSes and uarches Doc: Update references and examples of old, unsupported OSes and uarches May 13, 2022
would be two bytes, while binary is four. Of course, this doesn't fit well with
fixed-length messages. Decisions, decisions.
amount of the time, all those integers have the value 0, or maybe 1.
The string ``"0"`` would be two bytes, while a full 64-bit word would be 8.
Copy link
Member

@serhiy-storchaka serhiy-storchaka May 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not fit with functions ntohl, htonl, ntohs, htons, described in previous paragraph, which only work with 16 and 32 bit integers.

Copy link
Member Author

@CAM-Gerlach CAM-Gerlach May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I'm not sure it has to, since those two paragraphs are describing separate topics, and the respective integer widths are each appropriate to the point being made—the standard library functions are nominally for smaller 16 and 32-bit integers, while modern machines and Python's own native integer type are 64-bit, where encoding small integers as ASCII/UTF-8 has the greatest potential size advantage over binary, while also avoiding the need to potentially convert endianness, particularly so when the aforementioned functions are not available for 64-bit ints (though perhaps one could be added).

Is there something specific you'd like me to add/clarify here?

Copy link
Member

@serhiy-storchaka serhiy-storchaka May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is obvious to me that this paragraph refers to the previous one. It's mention of "all those longs" is a reference to "long" in ntohl and htonl which is always 32-bit.

Also, on most modern 64-bit platforms the standard integer type int in C is 32-bit. On Windows even long is 32-bit.

Copy link
Member Author

@CAM-Gerlach CAM-Gerlach May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is obvious to me that this paragraph refers to the previous one. It's mention of "all those longs" is a reference to "long" in ntohl and htonl which is always 32-bit.

Okay, thanks for providing something specific. I replaced that wording with "most integers", as well as clarified the terminology in the following sentence as well. If there's something else specific you would like me to change, let me know.

Also, on most modern 64-bit platforms the standard integer type int in C is 32-bit. On Windows even long is 32-bit.

Yes, if by "standard" you mean the C type that has the name int. Of course, this is less relevant for Python users, given Python only has one native integer type, nominally 64 bits, and this how-to does not focus on the C API. In any case, the use of a native 64-bit integer is appropriate to making the point of the section, that with wider binary types, representing small numbers as text may actually be more efficient.

Motorola chip will represent a 16 bit integer with the value 1 as the two hex
bytes 00 01. Intel and DEC, however, are byte-reversed - that same 1 is 01 00.
that not all machines use the same formats for binary data. For example,
network byte order is big-endian, with the most significant byte first,
Copy link
Member

@serhiy-storchaka serhiy-storchaka May 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear why the network byte order is big-endian at first place.

What big-endian platforms are still in use?

Copy link
Member Author

@CAM-Gerlach CAM-Gerlach May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear why the network byte order is big-endian at first place.

I'm not sure its within the scope of this how-to doc to describe the origins and history of network byte order, but to summarize, most CPU arches at the time except for Intel used it (which was a primary driver of the later rise of little-ended, out of compatibility rather than technical superiority), since it was more "natural", and in particular for network operations transmitting all the bits "in order" was necessary/helpful for some early forms of transmission, encoding and error correction schemes.

What big-endian platforms are still in use?

As mentioned elsewhere in these changes, IBM z/Architecture, IBM Power, (Tier 2/3 Python platforms) SPARC, (Solaris) many microcontrollers and various other smaller ones, and RISC-V, ARM, MIPS (Longsoon, Sunway, etc) and others are bi-endian (though mostly little-endian in actual CPU designs). But as mentioned in the section, little-endian predominates in the consumer CPU space.

Copy link
Member

@serhiy-storchaka serhiy-storchaka May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, but it is not clear for the reader. The old text mentions a concrete big-endian processor. Maybe just replace it with a more modern example? Or at least add "for historical reasons" when mention the network byte order. Or better leave both, the user should know that the native order can be different from little-endian.

Copy link
Member Author

@CAM-Gerlach CAM-Gerlach May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better or for worse, there is no real modern example, at least that more than a infinitesimal fraction of modern Python installations run on. The only architecture Python supports even at Tier 3 level that is not little-endian is IBM s390x mainframes running Linux, with deployments on the order of perhaps up to a few thousand total installs, compared with many millions to perhaps nearly on the order of a billion little-endian-CPU machines with some form of Python. And I do note that only most (not all) common processors are little-endian.

To note, it could equally be argued that the current prevalence of little-endian on the CPU side is also for historical reasons, not technical merit and despite big-endian being much more natural and logical, due to the historical dominance of Intel x86 CPUs, the resulting pressure to ease compatibility with software developed for them and the subsequent network effect that essentially required all new CPUs to be little-endian to have a chance at success in the market.

In any case, this is (supposed to be, at least; we probably need to refactor it further) a "how-to" guide, which per the Diataxis framework we've adopted for the Python docs, is concerned (naturally) with the "how" to practically accomplish a set of tasks, not the "why". The historical details behind why network byte order is big-endian are not necessary, or directly helpful to users looking for how to practically accomplish socket-related tasks in Python, and are thus well out of scope here. Users curious about such a topic are free to consult a more appropriate resource, such as Wikipedia. However, I've linked network byte order to the latter to make that easier to do.

@@ -53,7 +53,7 @@ Cross Platform

.. function:: machine()

Returns the machine type, e.g. ``'i386'``. An empty string is returned if the
Returns the machine type, e.g. ``'AMD64'``. An empty string is returned if the
Copy link
Member

@serhiy-storchaka serhiy-storchaka May 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it what returned on the AMD processors? On my computer it is 'x86_64'.

Update also the docstring of platform.machine().

Copy link
Member Author

@CAM-Gerlach CAM-Gerlach May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original 'proper' name of what is also called the "x86-64" architecture is AMD64, as it was created by AMD and later adopted by Intel when Intel's IA-64 (Itanium) architecture failed in the market. What is returned depends (AFAIK) on the OS, not the CPU; Windows and many (most?) Linux distros call it AMD64 internally, while Apple and some others call it x86-64. Running a freshly built from main Python, as well as 3.9 and 3.10 release builds, platform.machine() returns AMD64 on my Windows system with a stock Intel i7-3730.

Update also the docstring of platform.machine().

I can, but as this PR currently only modifies the docs, I'd rather do that separately; there are other places in the codebase that should be updated too, for consistency.

Copy link
Member

@serhiy-storchaka serhiy-storchaka May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstrings are a part of the docs. If we do not update them together with the rst files, they are left desynchronized.

Copy link
Member Author

@CAM-Gerlach CAM-Gerlach May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only affects the choice of one specific example, both of which are equally accurate and valid, and which will be synchronized if and when I do a similar pass through the codebase itself. Several of the other platform functions, e.g. system(), have differing examples on each. And given the rest of this PR scrupulously avoids touching the code, adding this one trivial change has non-trivial cost, of triggering and requiring a whole suite of extra builds/CIs, and increasing risk for backporting.

Given the change was minor and not strictly required in the first place, and I almost didn't make it, if this is going to be a big issue I'll just drop this change instead, since its not at all worth the cost.

@@ -483,8 +483,7 @@ including :func:`~shutil.copyfile`, :func:`~shutil.copytree`, and
How do I copy a file?
---------------------
The :mod:`shutil` module contains a :func:`~shutil.copyfile` function. Note
that on MacOS 9 it doesn't copy the resource fork and Finder info.
The :mod:`shutil` module contains a :func:`~shutil.copyfile` function.
Copy link
Member

@serhiy-storchaka serhiy-storchaka May 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And on Windows it does not copy the NTFS file streams.

Copy link
Member Author

@CAM-Gerlach CAM-Gerlach May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, though it also they are also rarely used for much of significance. In any case, I ended up expanding that description to mention both resource forks and ADSes, as well as file metadata, pointing out that shutil.copy2 will copy most (though not all) of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants