1

There are multiple languages that have a reference implementation of their compiler/libraries, but why doesn't C have a reference implementation?
I know that GCC and glibc are used extensively and Microsoft has their version that they also use, but why not one main implementation like for example python?(Yes I know there are other implementations, but there is one MAIN/Reference python)
Does it have to do with the fact that OS's like linux and Windows implement at least part of their API in C? Thank you.

9
  • 1
    Ultimately, the answer comes down to "The ISO never made one", I expect. There are other ISO-standardized languages like C# and Eiffel with reference implementations, so it isn't that the ISO bylaws forbid that or something.
    – Daniel H
    Commented May 22, 2018 at 0:43
  • What would the purpose of this be?
    – M.M
    Commented May 22, 2018 at 0:56
  • @DanielH: I don't think C♯ has a reference implementation anymore. There used to be one as part of the Microsoft Rotor Project, but that project was abandoned well over a decade ago and never went past a subset of C♯ 2.0. Commented May 22, 2018 at 0:58
  • @hescobar: Can you clarify your question, please? You sometimes talk about a reference implementation, and sometimes talk about a main implementation. Those are not the same thing. In particular, a Reference Implementation is meant to be used as, well, a reference, and thus it should be written in such a way that it explains unclear areas of the spec, i.e. it should be written in the simplest, clearest way possible. This is in direct conflict with a main implementation, which should be high-performance and robust. Commented May 22, 2018 at 1:01
  • @M.M: In cases of ambiguity in interpreting the specification, if a reference interpretation is inconsistent with one of the potential interpretations, it rules that interpretation out. Commented May 22, 2018 at 1:59

1 Answer 1

9

This is really more of a historical than a technical question, I suppose. The short answer is: there was one, of a sort. It didn't take. The long answer is rather longer, depending on how much detail you want to go into.

C is old. I mean, python is also old, but C is really old. Dennis Ritchie hacked the first versions of it together at Bell Labs in the late 1960s, and he did it so that he wouldn't have to write UNIX in assembly or B (which is a now-forgotten systems programming language of the time that had some shortcomings C was made to address).

This is arguably a completely different approach from language design today: C was written for the purpose of writing UNIX. Nobody sat down and designed C for the sake of designing a pretty and clean systems programming language; these guys wanted to write an operating system and built a tool that allowed them to do this more easily.

However, this Bell Labs C compiler was a sort of reference implementation in that C was essentially what the C team wrote into their compiler. Then came the Portable C compiler that was supposed to make porting C to new platforms easier, and the book titled "The C Programming Language" (aka the informal K&R C specification), and C became popular, and all was well. And then things became complicated.

C had become popular at a time when it still had...let's be charitable and say "some room for improvement." Of course it had; every language always has room for improvement. There was no real standard library, for starters. Function arguments were not checked at compile time, functions could not return void or structs, that sort of thing. It was still rough around the edges.

But now there was not just Bell Labs, oh no. Now there were dozens of vendors, all with their adaptations of pcc or homegrown compilers and lots of great and sometimes not-so-great ideas about the ways in which C could be improved. They, too, were less interested in designing a language than in designing a tool that would make their actual jobs simpler. So they wrote their own extensions to the language, and sometimes these didn't mesh well with the extensions that other people came up with.

Now, it's easy to facepalm at this point and ask why they didn't just coordinate the language development better, but it really wasn't that simple. The Internet didn't exist yet, so they couldn't just set up an IRC channel to discuss stuff, and also...well, programming languages weren't the only thing that was messy compared to today.

Today, most computers are fairly similar. We all represent negative integers as two's complement, bytes are very nearly always 8 bits wide, pointers are simply memory addresses. This was not the case at the time, and when you consider that when C was standardized, there were still machines around that used one's complement or signed-magnitude, you get an idea why signed overflow is undefined in C. Have you ever seen the C code for an old DOS program? They had this concept of near and far pointers because the old 16-bit x86 computers needed special segmentation registers to address more than 64KB of RAM. Several compilers built C extensions for this, but believe me, you're very, very glad that C today does not include this concept. The Soviets built a balanced ternary computer, although I'm unsure whether it had C support. In short, the hardware landscape was also messy, and this is kind of a big deal for a language that's close to the metal.

So, everybody did what they had to and generally (though not always) the best they could, but the language necessarily diverged. A language core was eventually standardized in 1989 (when the Undertaker...hold on, wrong year) to bring back some semblance of order, and some years after this compilers began to converge on it. Nevertheless, some of the old extensions will never go away because backwards compatibility is always an issue -- consider how quickly python 3 was adopted -- and there are some close-to-the-metal issues that need to be addressed for the language to be useful but cannot sensibly be written into the spec because they are not portable, such as calling conventions.

And there you have it. The reason that C has a language specification rather than a reference implementation is mostly historical and partly due to subtleties of the different machines on which it has to run.

I suppose it would be possible to develop an official reference implementation (at least for a few common platforms), but I also believe that its value would be somewhat limited. After all, the C standard has to leave a number of things undefined because it cannot know the exact nature of the underlying machine, so all other implementations would only behave like the reference implementation as long as the code you feed into it is well-formed. For well-formed code, the usual C implementations (i.e. gcc, clang, MSVC) generally behave the same way, so you can use any one of them.

1
  • Thank you for your in depth explanation!
    – user8396910
    Commented May 22, 2018 at 2:50

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.