Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Thursday April 23 2020, @02:41PM   Printer-friendly
from the code-is-code dept.

"Good Code Documents Itself" and Other Hilarious Jokes You Shouldn't Tell Yourself:

I didn't notice this story when it appeared on Hackaday just over a year ago. I'm well aware this has all the hallmarks of devolving into an emacs vs vi battle. Yet, the story does raise some interesting points about the different kinds of comments. That some comments are worse than useless and others can have great value. Without further ado, here's the introduction:

Code documentation — is there anything more exciting than spending your time writing extensive comments? If I had to guess, your answer is probably somewhere along the lines of "uhm, yes, everything is more exciting than that". Plus, requesting to document your code is almost like an insult to your well thought out design, this beautiful creation you implemented so carefully that it just has to be obvious what is happening. Writing about it is just redundant, the code is all you need.

As a result, no matter if it's some open source side project or professional software development, code documentation usually comes in two flavors: absent and useless. The dislike for documenting ones code seems universal among programmers of any field or language, no matter where in the world they are. And it's understandable, after all, you're in it for the coding, implementing all the fun stuff. If you wanted to tell stories, you would have chosen a different path in life.

This reluctance has even formed whole new paradigms and philosophies claiming how comments are actually harmful, and anyone trying to weasel their way out of it can now happily rehash all those claims. But, to exaggerate a bit, we're essentially villainizing information this way. While it is true that comments can be counterproductive, it's more the fundamental attitude towards them that causes the harm here.

In the end, code documentation is a lot like error handling, we are told early on how it's important and necessary, but we fail to understand why and instead grow to resent doing it again for that same old teacher, supervisor, or annoying teammate. But just like error handling, we are the ones who can actually benefit the most from it — if done right. But in order to do it right, we need to face some harsh truths and start admitting that there is no such thing as self-documenting code, and maybe we simply don't understand what we're actually doing if we can't manage to write a few words about it.

So let's burst some bubbles!

I found the rest of the story well worth the read. Highly recommended! I'll be the first to admit that the coding example has shortcomings. But it did serve as a concrete basis on which to launch the discussion.

In my experience, all too often I find myself updating code I'd written a year ago. Or 10 or 20 years ago. I've come to see the value of some of my comments. Especially those that remind me of what I was intending to accomplish in a certain code sequence and how I was accomplishing that goal. Some of my code is self-documenting. In other cases, I was so far into the weeds just trying to get it to work, that I just knew that a year or so later I'd not recall the details and would be furiously scratching my head trying to remember what I was doing and thinking. And in still other cases, I found comments that, although accurate, failed to be of any help!

By writing comments to my future, defuddled self, I try to explain things to make the next update easier. If it's all I can do at that moment to write the code and get it working, what hope do I have of ever coming back and trying to debug (or extend) it when it's no longer fresh in my mind?

When reading through others' code, I am grateful to find comments which provide assertions of what the goal was and other comments explaining how that goal is being achieved. I can then look at the code, see how it supports that effort.

Anecdata: I had a professor in college who could not go down to the computer center (back in the days of mainframes and punch cards). Any appearance there and he'd be besieged by students with questions! So, he would give a handwritten copy of the program to a grad student who would go to the computer center to enter, run, and debug it for him. Almost without exception, the grad student would report that if the computer could run his *comments*, the code would basically run the first time! That such a learned and experienced programmer that I held in very high esteem would make such a plain-spoken admission of his poor coding skills and of the value of writing comments made a long-lasting impression on me!

So, fellow Soylentils, what has been your experience with code comments? What kinds of comments have been most helpful to you? As an example, think of reading a function which returned a T/F flag as to whether or not the year passed in as an argument was a leap year. Imagine debugging it, with and without comments. If it had no comments, what comments would you wish it had? How does the programming language affect your approach to comments?


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Informative) by DannyB on Thursday April 23 2020, @04:28PM (7 children)

    by DannyB (5839) Subscriber Badge on Thursday April 23 2020, @04:28PM (#986082) Journal

    <no-sarcasm>
    Programmers in their young days think the audience for their code is the compiler. If it compiles, great! Job done.

    The primary audience is actually someone else ten years later who will be reading your code. The compiler is just part of the machinery that makes the code run.

    Write your code as if you were telling someone how it worked. Bake that idea into the very way that you write every line.

    Forty five years ago we passed the infantile stage where we have a comment like: increment I by one. That code already IS self documenting to anyone who knows the language. Someone who doesn't know the language is NOT your audience.

    Hopefully, we also long ago learned to use meaningful variable and function names. This is one of the hardest things. (And class and interface names if you have them.) Come up with a good name. Take a few minutes. Don't just take the first notion that comes into your mind. Does that name really describe this? If the function does too many things to be easily described, then is it too big? The name of something is one of the biggest aids to comprehensibility.

    You don't need to mention that this function is called by foo(), bar() and baz(). Your IDE's tooling can instantly tell you this precisely and accurately. So if you're thinking of changing the function signature, you can know in advance exactly what will be affected without being misled by an obsolete comment.

    Use comments to explain the bigger picture. What is this part of? When and how is it used? Not how the mechanics of what the statements do -- unless it is something truly non obvious. Like an FFT routine that perhaps needs separate documentation.

    Your comments will not be as well maintained as the code. So write comments that endure the test of time. Bigger picture comments. What this does, not the precise steps of how -- the code spells that out.
    </no-sarcasm>

    --
    To transfer files: right-click on file, pick Copy. Unplug mouse, plug mouse into other computer. Right-click, paste.
    Starting Score:    1  point
    Moderation   +3  
       Insightful=1, Informative=2, Total=3
    Extra 'Informative' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   5  
  • (Score: 2) by stretch611 on Thursday April 23 2020, @05:27PM (5 children)

    by stretch611 (6199) on Thursday April 23 2020, @05:27PM (#986124)

    Hopefully, we also long ago learned to use meaningful variable and function names.

    I only wish this was the case... but sadly I see it all the time.

    I remember recently debugging some website code and had to go deep into the jQuery codebase to fix it. It was full of two letter variable names. Needless to say, I was cursing under my breath every minute I was trying to fix it. (and as the saying goes... "Profanity is the one language all programmers know.")

    Even something as simple as a loop index needs to have a meaningful name. People justify using "x" and "y" by saying that the whole loop only performs 1 or 2 statements... why even bother to put the effort into something so simple? The real answer is that today it is only one or two statements, will it double or triple next year? and how about the year after that? What happens when the code changes to the point that it needs to be nested? Not to mention even with only one or two statements something like X or Y can have the readability improved significantly by using a name like DayIx instead of X. Then look how much better that is with MonthIx or YearIx once it becomes nested.

    Then they whine about how much longer it is to type. (Yes, I have actually heard this multiple times by people to lazy to use good variable names) Seriously wtf? A majority of good code is cut/pasted; it doesn't take extra keystrokes to do that with long variable names.

    Needless to say, this has always been one of my pet peeves. If you want to deal with two letter variable names, maybe you should go back to the days of BASIC interpreters of the 80's which is one of the last times 2 letters was the maximum. (which technically, at least on commodore interpreters, you could have a longer variable name if you wanted, but only the first two letters were used.)

    --
    Now with 5 covid vaccine shots/boosters altering my DNA :P
    • (Score: 3, Interesting) by martyb on Thursday April 23 2020, @06:46PM

      by martyb (76) Subscriber Badge on Thursday April 23 2020, @06:46PM (#986155) Journal

      Even something as simple as a loop index needs to have a meaningful name. People justify using "x" and "y" by saying that the whole loop only performs 1 or 2 statements... why even bother to put the effort into something so simple? The real answer is that today it is only one or two statements, will it double or triple next year? and how about the year after that? What happens when the code changes to the point that it needs to be nested? Not to mention even with only one or two statements something like X or Y can have the readability improved significantly by using a name like DayIx instead of X. Then look how much better that is with MonthIx or YearIx once it becomes nested.

      Hear Hear!

      When choosing variable names, I strive to: "syntactify the semantics". Think trying to grep for a string.

      Over years of painful experience, I have come up with certain ways of coding things that quickly manifest exactly what something is and how it is intended to be used. Since it is so easy to read, I'll give an example in AWK:

      # Extract and output non-HTML text from input file;
      # Leading and trailing spaces are removed;
      # Sequences of spaces are reduced to a single space.
      # Fails on non-simple HTML where, e.g., attribute values could confuse parsing.
      # Also tends to fail on CSS, Javascript, json.
      BEGIN {
         # We have not, as yet, processed any records:
         rec[0] = 0;
      }  # BEGIN

      # Place each input record into an array:
      {
         rec_count = rec[0] + 1;
         rec[rec_count] = $0;
         rec[0] = rec_count;
      }

      # Crude hack to extract and output non-empty text strings from plain HTML (Fails with Javascript, CSS, etc.)
      END {
         rec_count = rec[0];
         for (rec_index=1; rec_index<=rec_count; rec_index++) {
            rec_this = rec[rec_index];

            # Remove HTML tags and reduce sequences of spaces to a single space:
            text_str = rec_this;
            gsub("<[^>]*>", " ", text_str);
            gsub("[[:space:]]+", " ", text_str);

            # Strip leading/trailing spaces:
            sub("^[[:space:]]*", "", text-str);
            sub("[[:space:]]*$", "", text-str);

            # Only output non-blank lines:
            if (text_str != "") {
               print text_str;
            }  # if
         }  # for (rec-index)
      }  # END

      Let me be the first to say this code has issues and fails to take full advantage of the AWK language! That said, using the name "foo_count" to refer to how many items are in the "foo" array, and using "foo_index" when referring to an element of that array makes abundantly clear the purpose and use of those variables.

      If I were to extend the code to introduce the concept of words on each line, then it would be a simple matter of using "word[]", "word_count", and "word_index" to again explicate the use and applicability of those variables. In short, those constructs become instances of an overriding "language" that is used to describe the task at hand. There is no guessing what "rec_index" refers to. Further, it greatly reduces the risk of accidentally coding something like "rec[word_index]". Don't laugh, I've done it countless times back when I used "i", "j", and "k" as nested indices in a loop! When (not "if", "when!") the code gets extended, and nested loops arise, very minimal additional intellectual load is required to ensure that the correct variables are being used to access the correct things.

      Oh, and why introduce the variable "rec_this"? Because that way I know I am not changing the value of anything in "rec[]". Also, this is a very simple case. When more work is involved in selecting the item to be worked upon, it proves its worth. There is always a consistently-named variable whose value can be examined to ensure that the value I got was what I was intending it to be!

      In short, I expect that I will need to debug my code. I endeavor, therefore, to make it as easy as possible to follow what the code is doing without having to remember semantics of cutely-named variables like Peter, Paul, and Mary. The compiler doesn't care what I call them, so I try to reduce my mental load so that as much gray matter can be applied to understanding the minutia of the details of where the code is "interesting" and minimize the mental load required to keep track of the "administrivia" of how I go about organizing and accessing the data I am working with.

      That all said, I must express how delighted I have been at the constructive discussion this story has generated! I have already picked up a few pointers (heh!) from the discussion. THIS is why I have spent so much time supporting this site; it is discussions such as this that make all that effort worthwhile!

      --
      Wit is intellect, dancing.
    • (Score: 0) by Anonymous Coward on Thursday April 23 2020, @06:50PM (1 child)

      by Anonymous Coward on Thursday April 23 2020, @06:50PM (#986156)

      Even something as simple as a loop index needs to have a meaningful name.

      i is a meaningful loop index.

      • (Score: 0) by Anonymous Coward on Friday April 24 2020, @02:30AM

        by Anonymous Coward on Friday April 24 2020, @02:30AM (#986362)

        unless its a pointer to a char array in which case pci is better

    • (Score: 2) by tangomargarine on Thursday April 23 2020, @07:33PM

      by tangomargarine (667) on Thursday April 23 2020, @07:33PM (#986173)

      Even something as simple as a loop index needs to have a meaningful name. People justify using "x" and "y" by saying that the whole loop only performs 1 or 2 statements... why even bother to put the effort into something so simple? The real answer is that today it is only one or two statements, will it double or triple next year? and how about the year after that? What happens when the code changes to the point that it needs to be nested?

      You change the index name from i to something more useful? It's even built into the IDE to do it smartly for you, but you can worry about that when it actually becomes complicated enough to matter.

      --
      "Is that really true?" "I just spent the last hour telling you to think for yourself! Didn't you hear anything I said?"
    • (Score: 2) by DannyB on Friday April 24 2020, @03:52PM

      by DannyB (5839) Subscriber Badge on Friday April 24 2020, @03:52PM (#986517) Journal

      Even something as simple as a loop index needs to have a meaningful name.

      Yes, if the loop code is more than a couple lines.

      I often use sort of a reverse-hungarian notation by suffixing index variables with "ix". charIx, paymentIx, etc. These work well if the type of thing being indexed has similar names without the suffix, such as char, payment, etc.

      --
      To transfer files: right-click on file, pick Copy. Unplug mouse, plug mouse into other computer. Right-click, paste.
  • (Score: 2) by tangomargarine on Thursday April 23 2020, @07:30PM

    by tangomargarine (667) on Thursday April 23 2020, @07:30PM (#986171)

    Most of my comments these days are actually "why aren't we doing it this way?" Because there's some obscure issue that it causes elsewhere in the code, so that I don't come back a year later and say "hey, we could do this a lot more simply by just doing X", and then spend the next week figuring out why it doesn't actually work.

    Yay for asynchronous JavaScript stuff :P Our product has to communicate with an external webservice that wasn't built to be synchronous with outside servers, so there's a somewhat horrifying number of places in our code where I had to add "wait 3 seconds for them to finish processing it" delayed calls so we wouldn't get back empty objects.

    --
    "Is that really true?" "I just spent the last hour telling you to think for yourself! Didn't you hear anything I said?"