Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Friday November 10 2017, @06:23PM   Printer-friendly
from the C,-C-Rust,-C-Rust-Go,-Go-Rust-Go! dept.

In which ESR pontificates on the future while reflecting on the past.

I was thinking a couple of days ago about the new wave of systems languages now challenging C for its place at the top of the systems-programming heap – Go and Rust, in particular. I reached a startling realization – I have 35 years of experience in C. I write C code pretty much every week, but I can no longer remember when I last started a new project in C!
...
I started to program just a few years before the explosive spread of C swamped assembler and pretty much every other compiled language out of mainstream existence. I'd put that transition between about 1982 and 1985. Before that, there were multiple compiled languages vying for a working programmer's attention, with no clear leader among them; after, most of the minor ones were simply wiped out. The majors (FORTRAN, Pascal, COBOL) were either confined to legacy code, retreated to single-platform fortresses, or simply ran on inertia under increasing pressure from C around the edges of their domains.

Then it stayed that way for nearly thirty years. Yes, there was motion in applications programming; Java, Perl, Python, and various less successful contenders. Early on these affected what I did very little, in large part because their runtime overhead was too high for practicality on the hardware of the time. Then, of course, there was the lock-in effect of C's success; to link to any of the vast mass of pre-existing C you had to write new code in C (several scripting languages tried to break that barrier, but only Python would have significant success at it).

One to RTFA rather than summarize. Don't worry, this isn't just ESR writing about how great ESR is.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Friday November 10 2017, @06:58PM (19 children)

    by Anonymous Coward on Friday November 10 2017, @06:58PM (#595274)

    Then, of course, there was the lock-in effect of C's success; to link to any of the vast mass of pre-existing C you had to write new code in C

    and now you can use C++, so why use C? C is only nice if you want to make libraries where ABI stability is paramount. Otherwise you can use other things, even Javascript (aka, NodeJS).

    Rust? Go? Go doesn't even make dynamic linking support. So how is it going to help me leverage my OS security support when everything is linked in? And just the other year Scala and Erlang were suppose to be the "It" languages then. oh well...

  • (Score: 1, Informative) by Anonymous Coward on Friday November 10 2017, @07:18PM

    by Anonymous Coward on Friday November 10 2017, @07:18PM (#595284)

    Go doesn't even make dynamic linking support. So how is it going to help me leverage my OS security support when everything is linked in?

    It is not going to help there. You'll have to return to the bad old days before shared libraries where a security fix involved recompiling everything that depended on the library, and reinstalling it all.

  • (Score: 2) by pvanhoof on Friday November 10 2017, @07:27PM (14 children)

    by pvanhoof (4638) on Friday November 10 2017, @07:27PM (#595289) Homepage

    "C is only nice if you want to make libraries where ABI stability is paramount. "

    ABI stability is possible with C++. But it's indeed very hard. It's also hard with C, though. Having to add padding members and space to your structs so that you can add members later on, having to do the same thing in C++ structs and C++ classes, having to understand data alignment, etc.

    Yet ABI stability is super important for popular libraries. Then I think about GLib, Qt, among others. We have the semver.org rules. But the semver.org rules for ABI are hard to support.

    A language, or a compiler, or a standard, for C, for C++, for that language, that makes this more easy; would or could displace C.

    We have D as a interesting contender. There is Vala which generates GLib C code. One could claim that Qt with moc and C++ and QMetaData provides some sort of an ABI.

    ...

    • (Score: 4, Interesting) by bzipitidoo on Friday November 10 2017, @09:21PM (12 children)

      by bzipitidoo (4388) on Friday November 10 2017, @09:21PM (#595353) Journal

      Libraries, and the way they are accessed, is the problem. Without the header file, written in C/C++ of course, which makes developing in C by far the easiest way to use these library functions, there's no way to be sure of the details of the parameters to pass. That information is not part of the library file that contains the compiled functions. It should have been.

      Over the decades, we've built up a huge code base in these C library file formats, and it'll take a lot of work to phase in a better, more informative and language independent library file format. I can't see Linux, X, and Firefox all being rewritten in some other programming language any time soon.

      • (Score: 0) by Anonymous Coward on Friday November 10 2017, @10:01PM

        by Anonymous Coward on Friday November 10 2017, @10:01PM (#595369)

        They'll just be replaced entirely.

      • (Score: 0) by Anonymous Coward on Friday November 10 2017, @10:27PM (10 children)

        by Anonymous Coward on Friday November 10 2017, @10:27PM (#595380)

        Including "details of the parameters to pass" in the compiled library is of little use.

        Suppose a struct layout changes. Now what? Are you proposing to parse something, then dynamically convert between the layout used by the library and the layout used by the program? This is a performance killer; at that point you may as well be writing in an interpreted language.

        Suppose a field goes missing in the new library, but the program depended on that field. The typical behavior for an interpreted language would be to run for a while, modifying your data files, and then suddenly crash with an exception. People expect better of compiled programs, and anyway the whole point of a compiled program is to quickly and directly access the data. Type checking is supposed to happen at compile time, not run time, both for performance and for crash avoidance.

        At best, we could ask that the program loader (ld.so runtime linker) refuse to run when there is any mismatch at all. This only requires a hash. Simply hash the data structures and function prototypes, include this hash in both the library and program, and compare the hashes for equality at startup. Mismatches prevent running the program.

        Of course, that only covers things in a crude sense. Library behavior may change without any change to the structs or function prototypes. Consider a library function that implicitly opens a database connection. In a newer version of the library, there is a separate call that must be made to do this. Consider a library with locking. In a new version, the lock ordering changes, causing usage of the old locking to create a deadlock. Consider a library that returns a pointer to a struct. In a new version, the allocation may change (static, malloc, new, mmap...) in a way that causes old library users to crash or run out of memory.

        • (Score: 4, Interesting) by Grishnakh on Friday November 10 2017, @11:04PM (1 child)

          by Grishnakh (2831) on Friday November 10 2017, @11:04PM (#595393)

          The typical behavior for an interpreted language would be to run for a while, modifying your data files, and then suddenly crash with an exception. People expect better of compiled programs, and anyway the whole point of a compiled program is to quickly and directly access the data. Type checking is supposed to happen at compile time, not run time, both for performance and for crash avoidance.

          I disagree about people expecting better. Everyone these days loves Python, which exemplifies the unexpected crashing behavior you mention here, therefore I think that people now are quite happy to have software which crashes regularly with undecipherable exception errors, and they're demonstrably happy with software that's very slow. For proof, I cite the widespread usage of Electron apps. If people actually cared about performance, they wouldn't be using those or developing them, or all the Python apps that are all the rage now.

          This is a performance killer; at that point you may as well be writing in an interpreted language.

          We might as well anyway, since performance isn't a priority any more.

          • (Score: -1, Troll) by Anonymous Coward on Friday November 10 2017, @11:17PM

            by Anonymous Coward on Friday November 10 2017, @11:17PM (#595399)

            That's how you indicate that you are quoting someone else.

        • (Score: 2) by bzipitidoo on Saturday November 11 2017, @01:39AM (6 children)

          by bzipitidoo (4388) on Saturday November 11 2017, @01:39AM (#595435) Journal

          Dynamically convert? Type checking at run time? Of course not. The info about the parameters should have been in an easier to parse format than C source code. That info is not that complicated. Ambiguous items such as the order of members in a structure, should have been specified exactly. As it is, to parse those C header files, pretty much an entire C compiler is needed, can't get by with even a subset of C. They love using macros in header files. It's SOP to bracket the entire header in a macro to prevent duplicate defines in case the header is compiled more than once. Even when the redefinitions are exactly the same, because they're from the same header file, still have to wrap it in a macro. It's a rotten way to handle that problem, but that's how the C libraries do it and we've been stuck with that method for decades.

          Then there's the problem of duplicate function names in different libraries, which for years was handed by an unwritten rule that library writers should prefix all their function names with the name of the library. The namespace extensions to C address this issue. Of course, the oldest libraries such as stdlib, stdio, and math do not follow that custom. They were created before name collision became a serious issue.

          Another mistake was the handling of variable numbers of parameters, the so called variadic functions. C doesn't do that, but it was wanted anyway and the designers cobbled on an ugly extension so it could be done. They used it immediately in the venerable stdio functions printf and friends.

          Most languages dump the problem of calling C library functions on the language users. Trying to call between languages, like, calling a C routine from a FORTRAN or Perl program, has always been way too hard to do. Some compensate by trying to provide their own native libraries for everything. Java is an extreme example of that. There are also utilities to convert Pascal and FORTRAN source code to C. Last time I tried to link a Perl 5 program to a C library, I gave up on that approach. I tried the direct approach, then SWIG. SWIG is one of those code generators that spews out an astonishing amount of source code. Like, 100K of source code, just to make less than 6 library calls?? I instead made a wrapper in C with the library functions compiled into it, to receive and send parameter data and results through a socket. The Perl program called on the OS to launch the C program in a separate process.

          I understand Python and Perl 6 have conceded on this and can easily call C library functions, have the means to do so built into the language, relieving programmers of that burden.

          • (Score: 0) by Anonymous Coward on Saturday November 11 2017, @06:15AM (5 children)

            by Anonymous Coward on Saturday November 11 2017, @06:15AM (#595511)

            Usually when people complain that a binary library does not describe the data structure layout or parameters of functions, the issue is compatibility between library versions.

            I now see that you seem to be complaining that you'd rather not parse C code at all. This is a tough issue for me to be sympathetic to, since I love C. More C please! I'll try though...

            The info you want is in fact provided within every normal compiled library.

            For platforms like Linux, using the ELF object format and a SysV ABI, the libraries can be in *.so form (shared) or in *.a form (not shared). Either way, the files installed by all normal systems will contain DWARF3 debug information. There are libraries that can parse this; your favorite language will almost certainly have bindings for one. There are tools to dump out DWARF3 data in a human-readable form.

            On the Windows platform, normal libraries are in some sort of PE-COFF format. You get *.dll files for runtime use, *.lib files for linking both shared and static binaries, and *.pdb files for debug info. The info you want is in the *.pdb files at least; it might be elsewhere too. Again, there are libraries and tools to deal with *.pdb files.

            Your language interpreter probably ought to provide built-in functionality to handle this stuff, making it possible for you to simply call into a normal library.

            • (Score: 2) by bzipitidoo on Saturday November 11 2017, @12:50PM (4 children)

              by bzipitidoo (4388) on Saturday November 11 2017, @12:50PM (#595567) Journal

              I had forgotten about DWARF. Checking, I read DWARF5 was released this year.

              Yes, for library info, I would prefer a small subset of C, or, since C perhaps isn't the best tool for that job, another simpler language entirely, like what this DWARF sounds like. Why should programmers using other languages also have to use C? Spoils the point of using a "better" language than C if you still have to use C. Why make all other languages somewhat less than complete by not being able to call library functions? Alternatively, why are library functions not more universal, easily called from any program? Language designers dropped the ball on this matter. One of the heaviest workarounds of this issue is to launch separate processes, rather like a shell script does when it hooks together a bunch of utilities with pipes.

              It's sweet that you love C, but don't let that blind you to its many shortcomings. To wit, if C is so great, why do we have Makefiles? Why are those a language of their own, rather than more C?

              A shortcoming of most programming languages is their abysmal handling of declarations of complicated data structures, giving rise to abominations such as XML, and the slightly better YAML (a superset of JSON). C++ has stepped up a little on this matter, thanks to "aggregate initialization" added in C++11 and improved in C++14. But it's still crappy. It's not just C, it's most programming languages. Even Python stinks at this. I'll give an example. Suppose you want in your data the 2 and 1 letter chemical element abbreviations. You could do an array of strings: {"H", "He", "Li", "Be", "B", ... } Think that looks pretty good? It doesn't! Why couldn't it be initialized like this: "H He Li Be B" ? In JavaScript, it can be done that way by appending .split(" ") to that string. Not ideal, but much better than having to flood the source code with dozens of quote marks and commas, all because programming languages don't do decent data serialization natively, with their own syntax, nooo, they choke on their own dogfood and push programmers to do it programmatically. Got to call a function, maybe even a YAML library function, passing it a string or even a file name. Then they screw up the programmatic way by casting everything to variable status, can't let it stay constant. Why is it that in C, something like #define SIZE 100 was preferred to const int SIZE=100, for array sizes?

              C is also notorious for its awful pointer syntax. One of the big selling points of Java was dumping those asterisks, bragging that there aren't any pointers in Java. Might as well program in Perl if you have to have asterisks in front of most of your variable names. C++ added the ampersand syntax to function parameters, and that helps some. How about the thrilling syntax for function pointers? And then, function name mangling! They had to support polymorphism somehow, but ouch is that ugly. That sort of thing makes connecting to a library a nightmare.

              • (Score: 0) by Anonymous Coward on Saturday November 11 2017, @08:31PM (3 children)

                by Anonymous Coward on Saturday November 11 2017, @08:31PM (#595722)

                People who prefer non-C being forced to do a bit of C is nothing compared to the trouble they cause for people using C and every other language. Making use of non-C code across languages is a complete disaster. Your suffering is insufficient punishment for the suffering you inflict upon others. Example: my C program might like to call code written in Java, and some other code written in Python. This is nearly impossible. Even something much less crazy, like C++, is really difficult. It is you who is causing trouble.

                The need for multiple languages is reasonable. We use "make" to avoid fussing with details, but that greatly limits control over what exactly is happening. I write processor emulators, sometimes with JIT and/or a hypervisor. I also write stuff resembling boot loaders and OS kernels. The amount of control I need to do this stuff is extreme; sometimes C is too high-level. It's hard to imagine a language that I could use for this work that wouldn't be awful for a build system.

                That said, there was a time I did write part of a build system in C. The "make" language isn't very good at handling symlinks. There are 3 timestamps on the symlink, 3 timestamps on the target, and of course the content of each. The "make" program wouldn't look at the right stuff. Builds could be inaccurate and take 8 hours, or they could be accurate and take 30 hours. Switching to C cut the build times down to a few hours.

                C99 added decent initializers for structs and arrays. That's 18 years ago, or about 5 for Visual Studio. It sounds like you want more though; you want to make up a language on the fly. That is sort of possible in LISP and Scheme (still with quotation limitations) but it isn't all that reasonable of a request. You're making code hard to parse by humans when you do this; your non-standard little language is a source of confusion. You're also creating more of that problem you complained about in C header files: you can't make a tool to parse things without implementing almost the whole language. If you really want a mini-language, you can of course have it: gperf, bison or yacc, flex or lex, midl or pidl for idl, snmp MIB stuff, corba stuff, sunrpc stuff, custom makefile hacks with sed and awk... but of course the cost is that your code is no longer trivial to parse and no longer trivial for all to understand. Your example with JavaScript's .split(" ") suffers from this problem; if I were to try to parse that in a generic way then I'd need to implement all of JavaScript and actually run the program, which hopefully would halt!

                Normally it is best to put a literal 100 in the array definition, then use the sizeof operator (total divided by first member) to get the array size where needed.

                Putting "const" in any language is hugely problematic. Consider the strstr function. It may return something that is const, or not, depending on what it is passed. There is no limit to how complicated this can get. Imagine a function that normally returns a pointer to a string that is passed into it, If the string content is "Whew!" though, a pointer to a constant string "Yow!" is instead returned. There is just no reasonable way to express the conditions upon which the function would return a const value. Oh, let's make it even worse. The string "Whew!" is actually configurable.

                C pointer syntax is almost good. People have trouble because the language will sometimes implicitly grant you an "&" operator. For example, the "&" is optional when taking the address of a function or an array. This screw's up people's understanding of the language. It would've also been nice to have the "->" operator defined to dereference as many pointers as needed, for example 0 or 4, rather than always exactly 1. That actually looks compatible; the "->" operator could be made more capable in a future version of the language.

                The syntax C++ added for function parameters is awful. Maybe you save a few keystrokes, but the result is non-obvious code. It isn't obvious at all points that the variable is really implemented as a pointer. It isn't obvious at the call site that something passed as a parameter could be modified.

                They did not have to support polymorphism. That too is a disaster. It is no longer obvious what code is even being called. An important part of maintaining software is keeping things understandable, particularly if you don't want hidden slowness to creep in all over the place.

                The code I write at work is in plain C. Linux itself is in C. This works fine.

                • (Score: 2) by bzipitidoo on Sunday November 12 2017, @07:16AM (2 children)

                  by bzipitidoo (4388) on Sunday November 12 2017, @07:16AM (#595854) Journal

                  Just a quick note. I'll reply again when I have more time.

                  The JavaScript example with .split is a hackish workaround to achieve data serialization. It ought to be possible to do that natively. It takes such a tiny amount of additional syntax to support the parsing of a literal into a hierarchical data structure, it's just sad that there's such poor support for that. Many languages have a "dump" function to print out complicated data structures, why not at least an "undump" or "dedump" function to do the reverse?

                  As another example, why can't we have a class-- or, let's stick to C and talk struct instead of class-- for points in 2D (or 3D or more) and be able to assign values to those points with the following: struct Point p1=6,-13; No, it has to be p1.x=6; p1.y=-13; Or we might make a function set(&p1,6,-13), but that's ugly.

                  • (Score: 0) by Anonymous Coward on Sunday November 12 2017, @03:53PM (1 child)

                    by Anonymous Coward on Sunday November 12 2017, @03:53PM (#595915)

                    // Modern C syntax to initialize, not naming the members:
                    struct Point p1 = (struct Point){6,-13};

                    // Modern C syntax to initialize, naming members:
                    struct Point p1 = (struct Point){.x=6,.y=-13};

                    Naming the members lets you skip ones that don't matter; they will be zeroed. If done everywhere, it lets you reorder the members in a header file without having to fix all the initializers. This is valuable because it lets you lay out the struct to keep members that used often/together in the same cache lines. Performance is better when the "hot" members are in just a few of the CPU's cache lines.

                    There is similar syntax for arrays too, and it all works together recursively in two different styles. You can do stuff like [17].foo.bar[6].baz = 42 to initialize or you can repeat the cast-like part of the syntax for each level. You can also freely mix it with the no-name style, in which case you can just add curly brackets where desired.

                    • (Score: 2) by bzipitidoo on Tuesday November 14 2017, @03:23AM

                      by bzipitidoo (4388) on Tuesday November 14 2017, @03:23AM (#596641) Journal

                      > sometimes C is too high-level

                      Yeah, I don't like having to hope that code optimization makes up for C's lacks there. For instance, a combination multiply and add operation has become popular, because the add can be done for free, as part of the work done to perform a multiplication. There's no way I know of to call for that in C source code. Or, how about a comparison in which one of 3 branches are taken depending on whether the result was less than, equal , or greater than? C can't explicitly code that either. Then there's the whole world of parallel and distributed computing. Semaphores? Atomic test and set? Nope. But that's maybe unfair to C as it was never meant to handle that, and it wasn't until the 486 that Intel's lame x86 architecture finally became less lame by adding that.

                      > Normally it is best to put a literal 100 in the array definition

                      No, I disagree. A literal 100 is fine for a small program, with only a few hundred lines of code in one file, no separate header file. But even that is a judgment call. For bigger projects, you should at least use a #define. The problem is that you may end up using the same value for 2 unrelated things, and the bigger the program is, the more likely that happens. If you need to change one of those literal quantities without changing the others, you have to search through the source code and check each one to determine if it is the correct one.

                      > but it isn't all that reasonable of a request. You're making code hard to parse by humans when you do this; your non-standard little language is a source of confusion. You're also creating more of that problem you complained about in C header files: you can't make a tool to parse things without implementing almost the whole language.

                      Not at all. Trees are quite easy to handle, for both people and computers. All that's needed is to reserve 2 symbols to serve as open bracket and close bracket. Probably also want a comma-- LISP shows how dense it can get with just brackets. For human readability, it would also be good to have some sort of whitespace. No need for a complete language to parse that.

                      > // Modern C syntax to initialize

                      Those are nice improvements. But if it was possible to easily express a tree in a literal...

        • (Score: 0) by Anonymous Coward on Saturday November 11 2017, @10:43AM

          by Anonymous Coward on Saturday November 11 2017, @10:43AM (#595549)

          Suppose a struct layout changes. Now what? Are you proposing to parse something, then dynamically convert between the layout used by the library and the layout used by the program?

          A number of Microsoft APIs uses versioning of structs. Given their marketshare vs yours, you must reflect long and hard before presuming to know better. ;)

    • (Score: 0) by Anonymous Coward on Friday November 10 2017, @09:40PM

      by Anonymous Coward on Friday November 10 2017, @09:40PM (#595356)

      ABI stability is possible with C++. But it's indeed very hard.

      It's so hard that even C++ language implementers routinely fail to achieve it in their standard libraries. I agree that it is possible, however. The best way to have binary stability with a C++ library is to give it a pure C-compatible interface and not expose any C++ features across the library boundary. Also avoid relying on any C++ features that cause binary compatibility problems in practice, such as C++ exceptions.

      It's also hard with C, though. Having to add padding members and space to your structs so that you can add members later on.

      The only "hard" parts are identifying when the ABI actually changed (automated tools can help with this), and then possibly the ongoing maintenance that may be associated with early bad design decisions. This is rarely even particularly difficult, though obviously doing it is more work than not doing it. The C language itself does not define any particular binary interface but in practice every major platform has a de facto standard interface that is documented and implemented by every major toolchain targeting that platform.

      Most library authors and users consider "ABI stability" to mean just "binary compatible with previous versions". Specifically, if I update a library to a newer version, then any program using that library and depending only on its documented interface, then I should not have to recompile any those programs for them to continue working.

      Spending some time up front to reduce future maintenance (like planning your interfaces to accomodate future expansion) may add to the programmer's work in the short term but usually reduces work in the long run, and even if you don't do that and your interfaces are all horrible you can still almost always keep them stable without too much effort -- in the worst case, this might mean adding a new function instead of changing an old one.

  • (Score: 3, Interesting) by The Mighty Buzzard on Friday November 10 2017, @09:07PM

    by The Mighty Buzzard (18) Subscriber Badge <themightybuzzard@proton.me> on Friday November 10 2017, @09:07PM (#595350) Homepage Journal

    Rust's still in its infancy but it can already produce and make use of shared libs. It's in no way ready to produce a usable kernel (or probably libraries that need to have a stable ABI, I haven't checked.) but you can produce anything else you like and end up with a much less bug-prone binary while still having fairly compact binaries/libs. The real downside to it is most of its OSS libraries are even less developed than the language itself, if they exist at all.

    For a C coder who never, ever, ever writes or uses vulnerable code, Rust's pretty useless. I haven't met one of those yet though.

    --
    My rights don't end where your fear begins.
  • (Score: 2) by isj on Friday November 10 2017, @09:09PM (1 child)

    by isj (5249) on Friday November 10 2017, @09:09PM (#595352) Homepage

    and now you can use C++, so why use C?

    I do that whenever I have a choice. But one of the semi-embedded projects I work on is based on a vendor framework that generates C code and wrappers. No extern "C" exists in any headerfile, and the framework has ownership of main(). So I have to stick with C.

    Even if I restarted that project I would use C because the framework and code generation is worth it.

    • (Score: -1, Redundant) by Anonymous Coward on Friday November 10 2017, @11:19PM

      by Anonymous Coward on Friday November 10 2017, @11:19PM (#595401)

      and now you can use C++, so why use C?

      See here [soylentnews.org]