Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Friday November 10 2017, @06:23PM   Printer-friendly
from the C,-C-Rust,-C-Rust-Go,-Go-Rust-Go! dept.

In which ESR pontificates on the future while reflecting on the past.

I was thinking a couple of days ago about the new wave of systems languages now challenging C for its place at the top of the systems-programming heap – Go and Rust, in particular. I reached a startling realization – I have 35 years of experience in C. I write C code pretty much every week, but I can no longer remember when I last started a new project in C!
...
I started to program just a few years before the explosive spread of C swamped assembler and pretty much every other compiled language out of mainstream existence. I'd put that transition between about 1982 and 1985. Before that, there were multiple compiled languages vying for a working programmer's attention, with no clear leader among them; after, most of the minor ones were simply wiped out. The majors (FORTRAN, Pascal, COBOL) were either confined to legacy code, retreated to single-platform fortresses, or simply ran on inertia under increasing pressure from C around the edges of their domains.

Then it stayed that way for nearly thirty years. Yes, there was motion in applications programming; Java, Perl, Python, and various less successful contenders. Early on these affected what I did very little, in large part because their runtime overhead was too high for practicality on the hardware of the time. Then, of course, there was the lock-in effect of C's success; to link to any of the vast mass of pre-existing C you had to write new code in C (several scripting languages tried to break that barrier, but only Python would have significant success at it).

One to RTFA rather than summarize. Don't worry, this isn't just ESR writing about how great ESR is.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by bzipitidoo on Saturday November 11 2017, @01:39AM (6 children)

    by bzipitidoo (4388) on Saturday November 11 2017, @01:39AM (#595435) Journal

    Dynamically convert? Type checking at run time? Of course not. The info about the parameters should have been in an easier to parse format than C source code. That info is not that complicated. Ambiguous items such as the order of members in a structure, should have been specified exactly. As it is, to parse those C header files, pretty much an entire C compiler is needed, can't get by with even a subset of C. They love using macros in header files. It's SOP to bracket the entire header in a macro to prevent duplicate defines in case the header is compiled more than once. Even when the redefinitions are exactly the same, because they're from the same header file, still have to wrap it in a macro. It's a rotten way to handle that problem, but that's how the C libraries do it and we've been stuck with that method for decades.

    Then there's the problem of duplicate function names in different libraries, which for years was handed by an unwritten rule that library writers should prefix all their function names with the name of the library. The namespace extensions to C address this issue. Of course, the oldest libraries such as stdlib, stdio, and math do not follow that custom. They were created before name collision became a serious issue.

    Another mistake was the handling of variable numbers of parameters, the so called variadic functions. C doesn't do that, but it was wanted anyway and the designers cobbled on an ugly extension so it could be done. They used it immediately in the venerable stdio functions printf and friends.

    Most languages dump the problem of calling C library functions on the language users. Trying to call between languages, like, calling a C routine from a FORTRAN or Perl program, has always been way too hard to do. Some compensate by trying to provide their own native libraries for everything. Java is an extreme example of that. There are also utilities to convert Pascal and FORTRAN source code to C. Last time I tried to link a Perl 5 program to a C library, I gave up on that approach. I tried the direct approach, then SWIG. SWIG is one of those code generators that spews out an astonishing amount of source code. Like, 100K of source code, just to make less than 6 library calls?? I instead made a wrapper in C with the library functions compiled into it, to receive and send parameter data and results through a socket. The Perl program called on the OS to launch the C program in a separate process.

    I understand Python and Perl 6 have conceded on this and can easily call C library functions, have the means to do so built into the language, relieving programmers of that burden.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 0) by Anonymous Coward on Saturday November 11 2017, @06:15AM (5 children)

    by Anonymous Coward on Saturday November 11 2017, @06:15AM (#595511)

    Usually when people complain that a binary library does not describe the data structure layout or parameters of functions, the issue is compatibility between library versions.

    I now see that you seem to be complaining that you'd rather not parse C code at all. This is a tough issue for me to be sympathetic to, since I love C. More C please! I'll try though...

    The info you want is in fact provided within every normal compiled library.

    For platforms like Linux, using the ELF object format and a SysV ABI, the libraries can be in *.so form (shared) or in *.a form (not shared). Either way, the files installed by all normal systems will contain DWARF3 debug information. There are libraries that can parse this; your favorite language will almost certainly have bindings for one. There are tools to dump out DWARF3 data in a human-readable form.

    On the Windows platform, normal libraries are in some sort of PE-COFF format. You get *.dll files for runtime use, *.lib files for linking both shared and static binaries, and *.pdb files for debug info. The info you want is in the *.pdb files at least; it might be elsewhere too. Again, there are libraries and tools to deal with *.pdb files.

    Your language interpreter probably ought to provide built-in functionality to handle this stuff, making it possible for you to simply call into a normal library.

    • (Score: 2) by bzipitidoo on Saturday November 11 2017, @12:50PM (4 children)

      by bzipitidoo (4388) on Saturday November 11 2017, @12:50PM (#595567) Journal

      I had forgotten about DWARF. Checking, I read DWARF5 was released this year.

      Yes, for library info, I would prefer a small subset of C, or, since C perhaps isn't the best tool for that job, another simpler language entirely, like what this DWARF sounds like. Why should programmers using other languages also have to use C? Spoils the point of using a "better" language than C if you still have to use C. Why make all other languages somewhat less than complete by not being able to call library functions? Alternatively, why are library functions not more universal, easily called from any program? Language designers dropped the ball on this matter. One of the heaviest workarounds of this issue is to launch separate processes, rather like a shell script does when it hooks together a bunch of utilities with pipes.

      It's sweet that you love C, but don't let that blind you to its many shortcomings. To wit, if C is so great, why do we have Makefiles? Why are those a language of their own, rather than more C?

      A shortcoming of most programming languages is their abysmal handling of declarations of complicated data structures, giving rise to abominations such as XML, and the slightly better YAML (a superset of JSON). C++ has stepped up a little on this matter, thanks to "aggregate initialization" added in C++11 and improved in C++14. But it's still crappy. It's not just C, it's most programming languages. Even Python stinks at this. I'll give an example. Suppose you want in your data the 2 and 1 letter chemical element abbreviations. You could do an array of strings: {"H", "He", "Li", "Be", "B", ... } Think that looks pretty good? It doesn't! Why couldn't it be initialized like this: "H He Li Be B" ? In JavaScript, it can be done that way by appending .split(" ") to that string. Not ideal, but much better than having to flood the source code with dozens of quote marks and commas, all because programming languages don't do decent data serialization natively, with their own syntax, nooo, they choke on their own dogfood and push programmers to do it programmatically. Got to call a function, maybe even a YAML library function, passing it a string or even a file name. Then they screw up the programmatic way by casting everything to variable status, can't let it stay constant. Why is it that in C, something like #define SIZE 100 was preferred to const int SIZE=100, for array sizes?

      C is also notorious for its awful pointer syntax. One of the big selling points of Java was dumping those asterisks, bragging that there aren't any pointers in Java. Might as well program in Perl if you have to have asterisks in front of most of your variable names. C++ added the ampersand syntax to function parameters, and that helps some. How about the thrilling syntax for function pointers? And then, function name mangling! They had to support polymorphism somehow, but ouch is that ugly. That sort of thing makes connecting to a library a nightmare.

      • (Score: 0) by Anonymous Coward on Saturday November 11 2017, @08:31PM (3 children)

        by Anonymous Coward on Saturday November 11 2017, @08:31PM (#595722)

        People who prefer non-C being forced to do a bit of C is nothing compared to the trouble they cause for people using C and every other language. Making use of non-C code across languages is a complete disaster. Your suffering is insufficient punishment for the suffering you inflict upon others. Example: my C program might like to call code written in Java, and some other code written in Python. This is nearly impossible. Even something much less crazy, like C++, is really difficult. It is you who is causing trouble.

        The need for multiple languages is reasonable. We use "make" to avoid fussing with details, but that greatly limits control over what exactly is happening. I write processor emulators, sometimes with JIT and/or a hypervisor. I also write stuff resembling boot loaders and OS kernels. The amount of control I need to do this stuff is extreme; sometimes C is too high-level. It's hard to imagine a language that I could use for this work that wouldn't be awful for a build system.

        That said, there was a time I did write part of a build system in C. The "make" language isn't very good at handling symlinks. There are 3 timestamps on the symlink, 3 timestamps on the target, and of course the content of each. The "make" program wouldn't look at the right stuff. Builds could be inaccurate and take 8 hours, or they could be accurate and take 30 hours. Switching to C cut the build times down to a few hours.

        C99 added decent initializers for structs and arrays. That's 18 years ago, or about 5 for Visual Studio. It sounds like you want more though; you want to make up a language on the fly. That is sort of possible in LISP and Scheme (still with quotation limitations) but it isn't all that reasonable of a request. You're making code hard to parse by humans when you do this; your non-standard little language is a source of confusion. You're also creating more of that problem you complained about in C header files: you can't make a tool to parse things without implementing almost the whole language. If you really want a mini-language, you can of course have it: gperf, bison or yacc, flex or lex, midl or pidl for idl, snmp MIB stuff, corba stuff, sunrpc stuff, custom makefile hacks with sed and awk... but of course the cost is that your code is no longer trivial to parse and no longer trivial for all to understand. Your example with JavaScript's .split(" ") suffers from this problem; if I were to try to parse that in a generic way then I'd need to implement all of JavaScript and actually run the program, which hopefully would halt!

        Normally it is best to put a literal 100 in the array definition, then use the sizeof operator (total divided by first member) to get the array size where needed.

        Putting "const" in any language is hugely problematic. Consider the strstr function. It may return something that is const, or not, depending on what it is passed. There is no limit to how complicated this can get. Imagine a function that normally returns a pointer to a string that is passed into it, If the string content is "Whew!" though, a pointer to a constant string "Yow!" is instead returned. There is just no reasonable way to express the conditions upon which the function would return a const value. Oh, let's make it even worse. The string "Whew!" is actually configurable.

        C pointer syntax is almost good. People have trouble because the language will sometimes implicitly grant you an "&" operator. For example, the "&" is optional when taking the address of a function or an array. This screw's up people's understanding of the language. It would've also been nice to have the "->" operator defined to dereference as many pointers as needed, for example 0 or 4, rather than always exactly 1. That actually looks compatible; the "->" operator could be made more capable in a future version of the language.

        The syntax C++ added for function parameters is awful. Maybe you save a few keystrokes, but the result is non-obvious code. It isn't obvious at all points that the variable is really implemented as a pointer. It isn't obvious at the call site that something passed as a parameter could be modified.

        They did not have to support polymorphism. That too is a disaster. It is no longer obvious what code is even being called. An important part of maintaining software is keeping things understandable, particularly if you don't want hidden slowness to creep in all over the place.

        The code I write at work is in plain C. Linux itself is in C. This works fine.

        • (Score: 2) by bzipitidoo on Sunday November 12 2017, @07:16AM (2 children)

          by bzipitidoo (4388) on Sunday November 12 2017, @07:16AM (#595854) Journal

          Just a quick note. I'll reply again when I have more time.

          The JavaScript example with .split is a hackish workaround to achieve data serialization. It ought to be possible to do that natively. It takes such a tiny amount of additional syntax to support the parsing of a literal into a hierarchical data structure, it's just sad that there's such poor support for that. Many languages have a "dump" function to print out complicated data structures, why not at least an "undump" or "dedump" function to do the reverse?

          As another example, why can't we have a class-- or, let's stick to C and talk struct instead of class-- for points in 2D (or 3D or more) and be able to assign values to those points with the following: struct Point p1=6,-13; No, it has to be p1.x=6; p1.y=-13; Or we might make a function set(&p1,6,-13), but that's ugly.

          • (Score: 0) by Anonymous Coward on Sunday November 12 2017, @03:53PM (1 child)

            by Anonymous Coward on Sunday November 12 2017, @03:53PM (#595915)

            // Modern C syntax to initialize, not naming the members:
            struct Point p1 = (struct Point){6,-13};

            // Modern C syntax to initialize, naming members:
            struct Point p1 = (struct Point){.x=6,.y=-13};

            Naming the members lets you skip ones that don't matter; they will be zeroed. If done everywhere, it lets you reorder the members in a header file without having to fix all the initializers. This is valuable because it lets you lay out the struct to keep members that used often/together in the same cache lines. Performance is better when the "hot" members are in just a few of the CPU's cache lines.

            There is similar syntax for arrays too, and it all works together recursively in two different styles. You can do stuff like [17].foo.bar[6].baz = 42 to initialize or you can repeat the cast-like part of the syntax for each level. You can also freely mix it with the no-name style, in which case you can just add curly brackets where desired.

            • (Score: 2) by bzipitidoo on Tuesday November 14 2017, @03:23AM

              by bzipitidoo (4388) on Tuesday November 14 2017, @03:23AM (#596641) Journal

              > sometimes C is too high-level

              Yeah, I don't like having to hope that code optimization makes up for C's lacks there. For instance, a combination multiply and add operation has become popular, because the add can be done for free, as part of the work done to perform a multiplication. There's no way I know of to call for that in C source code. Or, how about a comparison in which one of 3 branches are taken depending on whether the result was less than, equal , or greater than? C can't explicitly code that either. Then there's the whole world of parallel and distributed computing. Semaphores? Atomic test and set? Nope. But that's maybe unfair to C as it was never meant to handle that, and it wasn't until the 486 that Intel's lame x86 architecture finally became less lame by adding that.

              > Normally it is best to put a literal 100 in the array definition

              No, I disagree. A literal 100 is fine for a small program, with only a few hundred lines of code in one file, no separate header file. But even that is a judgment call. For bigger projects, you should at least use a #define. The problem is that you may end up using the same value for 2 unrelated things, and the bigger the program is, the more likely that happens. If you need to change one of those literal quantities without changing the others, you have to search through the source code and check each one to determine if it is the correct one.

              > but it isn't all that reasonable of a request. You're making code hard to parse by humans when you do this; your non-standard little language is a source of confusion. You're also creating more of that problem you complained about in C header files: you can't make a tool to parse things without implementing almost the whole language.

              Not at all. Trees are quite easy to handle, for both people and computers. All that's needed is to reserve 2 symbols to serve as open bracket and close bracket. Probably also want a comma-- LISP shows how dense it can get with just brackets. For human readability, it would also be good to have some sort of whitespace. No need for a complete language to parse that.

              > // Modern C syntax to initialize

              Those are nice improvements. But if it was possible to easily express a tree in a literal...