Stories
Slash Boxes
Comments

SoylentNews

SoylentNews is people

Sections

SoylentNews

Compiler Jump Threading

posted by martyb on Monday November 02 2015, @04:31AM

from the next-up:-jump-jiving dept.

Phoenix666 writes:

The "jump threading" compiler optimization (aka -fthread-jump) turns conditional into unconditional branches on certain paths at the expense of code size. For hardware with branch prediction, speculative execution, and prefetching, this can greatly improve performance. However, there is no scientific publication or documentation at all. The Wikipedia article is very short and incomplete.

The linked article has an illustrated treatment of common code structures and how these optimizations work.

Original Submission

This discussion has been archived. No new comments can be posted.

Compiler Jump Threading | Log In/Create an Account | Top | 34 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Re:hmm... Re:hmm... (Score: 2) by LoRdTAW on Monday November 02 2015, @06:48PM

by LoRdTAW (3755) on Monday November 02 2015, @06:48PM (#257630) Journal

The only thing I can think of is to maintain a single return statement in the function to keep things simple.

Parent

Starting Score: 1 point

Karma-Bonus Modifier +1

Total Score: 2
Re:hmm... Re:hmm... (Score: 3, Informative) by jdccdevel on Monday November 02 2015, @08:03PM

by jdccdevel (1329) on Monday November 02 2015, @08:03PM (#257661) Journal

A lot of C code I've seen uses this coding style for housekeeping. It might not be relevant in a function this simple, but it's easier to maintain the style for every bit of the code.
The idea is to make sure that everything that needs to be cleaned and/or set (mallocs are freed, error values set/cleared, structure members made consistent, etc) is done consistently just before the function returns.
If you have more than one return statement in a function (not too unusual), then you have to make sure that each of them is kept up to date with all required housekeeping every time you make a change the code. Miss just once, and your function will leak memory sometimes, and you have a bug that won't be easy (in fact it could be really, really hard) to track down.
It leads to a lot of code duplication (bad!) which is tedious to maintain (bad!), very fragile (bad!) and error prone (also bad!).
In this coding style, all that bad is avoided by just jumping to the end of the function, where some housekeeping code can clean up everything the same every time (good!). That is more robust (good!) and easier to maintain (bonus!) than the alternative.
Modern, more Object Oriented languages tend to hide a lot of the relevant details behind concepts like exceptions, and use the "object" model to ensure data consistency. A language like C doesn't have those protections so many coders embrace a more defensive coding style like this.

Parent
- Re:hmm... Re:hmm... (Score: 2) by tibman on Monday November 02 2015, @10:26PM
  
  by tibman (134) on Monday November 02 2015, @10:26PM (#257717)
  
  If you had duplicate cleanup bits in the same function (from multiple returns in this case) then wouldn't you just extract those duplicate pieces into a new function?
  
  --
  SN won't survive on lurkers alone. Write comments.
  
  Parent
  - Re:hmm... Re:hmm... (Score: 1) by Delwin on Monday November 02 2015, @10:44PM
    
    by Delwin (4554) on Monday November 02 2015, @10:44PM (#257721)
    
    Every function has overhead, so no. Even inline is only a suggestion to the compiler, not a hard requirement.
    
    Parent
    - Re:hmm... Re:hmm... (Score: 2) by tibman on Tuesday November 03 2015, @02:51AM
      
      by tibman (134) on Tuesday November 03 2015, @02:51AM (#257795)
      
      Being readable and easier to maintain is better than saving on function overhead. If the compiler finds it is best to in-line some functions, let it do that. Unless you are dealing with an extremely resource constrained system.
      
      --
      SN won't survive on lurkers alone. Write comments.
      
      Parent
      - Re:hmm... Re:hmm... (Score: 0) by Anonymous Coward on Tuesday November 03 2015, @03:59AM
        
        by Anonymous Coward on Tuesday November 03 2015, @03:59AM (#257805)
        
        Being readable and easier to maintain is better than saving on function overhead.
        You can make readable and maintainable code with gotos. And even a system that isn't necessarily resource constrained doesn't deserve to be abused by sloppy coding. I know that more processing power means that programming think they can waste as many resources as possible, but I prefer not to.
        
        Parent
        
        Re:hmm... (Score: 2) by tibman on Tuesday November 03 2015, @02:14PM
        
        by tibman (134) on Tuesday November 03 2015, @02:14PM (#257922)
        
        Removing duplicate code by putting it into a function is not a waste of resources. That is a major reason why functions exist. Doing the same with gotos is fine as well, i never said otherwise.
        
        --
        SN won't survive on lurkers alone. Write comments.
        
        Parent
    - Re:hmm... (Score: 2) by Wootery on Tuesday November 03 2015, @11:24AM
      
      by Wootery (2341) on Tuesday November 03 2015, @11:24AM (#257878)
      
      Even inline is only a suggestion to the compiler, not a hard requirement.
      Yes, because inlining everything isn't always a good thing for performance. Code size matters for cache behaviour, and function-calls/returns are cheap on modern CPUs. The compiler probably knows better than you whether inlining makes sense or not.
      But none of that matters. Readability is generally far more important than tiny potential performance differences.
      
      Parent
  - Re:hmm... Re:hmm... (Score: 2) by jdccdevel on Tuesday November 03 2015, @04:19PM
    
    by jdccdevel (1329) on Tuesday November 03 2015, @04:19PM (#257990) Journal
    
    No, that would just create a bigger mess to maintain.
    It is horrendously bad coding practice to have a function leave a mess for another function to clean up. Each function needs to have a well known and consistent api, with well understood side effects, (i.e. what it's supposed to do) and nothing more or less.
    Moving that code to a second function is just asking for more problems. Now the cleanup code is in a different place from the code that made the "mess", would still have to be maintained in conjunction with the first function, has an unclear, and very dynamically changing API (How many variables do we need to pass it today?), has to have an unambiguous name, has to always be called instead of or before returning a value, and serves no purpose without the original function.
    That's a huge amount of bad coding and horrible contortions to live with just because someone convinced you that "goto" is "always bad".
    You could use a macro. Some projects do, and that works for them. However, that macro code still has to "live" somewhere, has to have a unique name, and has to be maintained and not forgotten before every return. Many people find it more logical, simpler, and less cumbersome to have that code live at the end of their function, and jump to it when they're ready to return a value. Using a goto is also really easy to code audit, because each function should have exactly one return statement. (Sometimes they have two, one for the regular code path, and one for an error code path.)
    This coding style also produces smaller and faster code, since all the code is only compiled once, and doesn't have to be optimized away by the compiler.
    A simple example for where this coding style shines is a function with several dynamically allocated buffers.
    Imagine that the buffers are used internally in the function, and only within the function, to do some processing. How do you make sure that you ALWAYS free the buffer, if your code can return from all over the place?
    So, before each return statement in the function, you have to:
    - check if the buffer was allocated
    - free the buffer if necessary
    - verify the free happened properly, handle the error if it didn't
    - and then return a value
    Now, suppose you have to add a second, then a third, then a fourth buffer. Also remember these changes happen over time, by multiple people, inside a somewhat complicated function. You can start to see how maintaining all that code at in one well defined place, within the function itself, is much more maintainable.
    This coding style is about promoting good habits, especially in large projects with lots of people working on them.
    You just have to get over the whole "goto bad" mantra, which was probably drilled into you by some well meaning coding instructor who didn't have the time to explain when and how to use it properly.
    
    Parent
    - Re:hmm... Re:hmm... (Score: 2) by tibman on Tuesday November 03 2015, @07:06PM
      
      by tibman (134) on Tuesday November 03 2015, @07:06PM (#258061)
      
      Each function needs to have a well known and consistent api...
      An API is a collection of public methods. Internal workings should remain private. Memory allocation and freeing is a (mostly) private matter.
      Moving that code to a second function is just asking for more problems.
      Though you sound like you know a lot about programming, it sounds like you don't know a lot about writing maintainable code. 400 line functions and duplicate code are not the best way to create maintainable code. To me it sounds like you are sticking an entire class's worth of functionality and variables inside of one single public method. If you have a function that is doing many different things where only some of those things use some variables then you indeed have hidden a class inside that function. You can extract that entire function into a class, promote the variables to privates and break it apart into several methods. Then from the original function you can new up the class and execute it. Some new engineer/dev who looks at that method will be able to better understand what it is doing than having to parse 400 lines of branching logic and only sometimes used variables.
      That's a huge amount of bad coding and horrible contortions to live with just because someone convinced you that "goto" is "always bad".
      Constructing classes that are organized by functionality and share variables is hardly horrible contortions. I have yet to say anything about goto's being bad, so i won't touch that one.
      Imagine that the buffers are used internally in the function, and only within the function, to do some processing. How do you make sure that you ALWAYS free the buffer, if your code can return from all over the place?
      So in this case the buffer is always allocated at the beginning of the function. There are multiple returns (from all over). Now the puzzle is how to make sure the memory is always freed and is maintainable. It sounds to me like you either need to reduce the number of returns to one location or extract that logic into another function. In the extract scenario it would look like function A doing the allocation, calling B, then free the memory. Function B only performs logic and has multiple returns. Putting custom cleanup just before each return is not a good answer. Though i talk about it in early posts it was never my idea. My idea was that if you had duplicate code because of cleanups, you could extract that cleanup to a function to prevent someone changing one code bit and not another.
      I apologize if i hurt your feelings for accusing you of not knowing about writing maintainable code. I'm not trying to insult. Just trying to point out a place where you can improve.
      
      --
      SN won't survive on lurkers alone. Write comments.
      
      Parent
      - Re:hmm... (Score: 2) by jdccdevel on Wednesday November 04 2015, @01:06AM
        
        by jdccdevel (1329) on Wednesday November 04 2015, @01:06AM (#258221) Journal
        
        To me it sounds like you are sticking an entire class's worth of functionality and variables inside of one single public method.
        This coding style is mostly useful to keep things straight when you don't have an Object-oriented language. O-O programming has a coding methodology, language features, and a lot of "syntactic sugar" which changes code maintenance dramatically.
        It sounds to me like you either need to reduce the number of returns to one location or extract that logic into another function.
        The coding style I was originally commenting on is designed to "reduce the number of returns to one location". Coding logic doesn't always make that easy. Error checking, for example, can halt the function's logic in any number of locations where you want the function to return right away. Backing out of some logic trees and nested loops can be really complicated. Goto makes it easy.
        Remember, this is C, so there is no "exception" construct in the language. You could think of this coding style as using "goto" as a primitive form of exception handling, to make sure everything is cleaned up before the function returns. The alternative is to duplicate the cleanup code everywhere an error could occur (which is, I'm sure you'll agree, fragile. Not to mention bad coding practice due to code duplication.)
        My idea was that if you had duplicate code because of cleanups, you could extract that cleanup to a function to prevent someone changing one code bit and not another
        Extracting the logic to another function also doesn't make any sense, as the sole purpose of that function would be to "clean up" for the other function. It has the same fragility as the code duplication we are trying to avoid (you have to remember to always call the "clean up" function before you return instead of just "return".) and it would be a real pain to maintain. How do you ensure that both functions are kept in sync? This is a maintenance nightmare. Not the least because every time you change something, the api for the "cleanup" function could change, necessitating you to change every line of code where the cleanup is called. (which, thanks to error handling, could be in dozens of different places.)
        No, the "goto done" coding style is much simpler to follow, easier to audit, and significantly better overall for languages like C.
        Unfortunately, many people see the "goto", and assume it has to be bad because someone told them "goto is bad, and makes code unmaintainable." Which is unfortunately something that every coding instructor will say to beginners, because a beginner will almost never use goto properly.
        
        Parent
Re:hmm... Re:hmm... (Score: 0) by Anonymous Coward on Monday November 02 2015, @08:15PM

by Anonymous Coward on Monday November 02 2015, @08:15PM (#257664)

to keep things simple.
That code? Simple?
This is your brain on BSD.

Parent
- Re:hmm... (Score: 2) by LoRdTAW on Tuesday November 03 2015, @01:14AM
  
  by LoRdTAW (3755) on Tuesday November 03 2015, @01:14AM (#257770) Journal
  
  Yea, the gnu memcpy function is half as many LOC.
  
  Parent
Re:hmm... (Score: 2) by Wootery on Tuesday November 03 2015, @11:39AM

by Wootery (2341) on Tuesday November 03 2015, @11:39AM (#257883)

This is the correct answer: having a 'single point-of-return' can be useful, especially in C where there's no RAII to handle cleanup automatically, the way you can in C++. It's also possible that single-point-of-return was required by coding guidelines that the authors were required to follow.
(It's perhaps ironic that garbage-collected languages, which generally lack RAII, also require special care with clean-up when it comes to things like closing files. Memory-management != resource-management, and no-one likes finalisers.)
I've also been told that old compilers would generate less efficient code when functions had multiple return points. (I don't know whether this is actually true, though.)

Parent

Moderator Help

In Tulsa, Oklahoma, it is against the law to open a soda bottle without the supervision of a licensed engineer.

Starting Score:	1		point
Karma-Bonus Modifier		+1

Total Score:		2

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Compiler Jump Threading

Re:hmm... Re:hmm... (Score: 2) by LoRdTAW on Monday November 02 2015, @06:48PM

Re:hmm... Re:hmm... (Score: 3, Informative) by jdccdevel on Monday November 02 2015, @08:03PM

Re:hmm... Re:hmm... (Score: 2) by tibman on Monday November 02 2015, @10:26PM

Re:hmm... Re:hmm... (Score: 1) by Delwin on Monday November 02 2015, @10:44PM

Re:hmm... Re:hmm... (Score: 2) by tibman on Tuesday November 03 2015, @02:51AM

Re:hmm... Re:hmm... (Score: 0) by Anonymous Coward on Tuesday November 03 2015, @03:59AM

Re:hmm... (Score: 2) by tibman on Tuesday November 03 2015, @02:14PM

Re:hmm... (Score: 2) by Wootery on Tuesday November 03 2015, @11:24AM

Re:hmm... Re:hmm... (Score: 2) by jdccdevel on Tuesday November 03 2015, @04:19PM

Re:hmm... Re:hmm... (Score: 2) by tibman on Tuesday November 03 2015, @07:06PM

Re:hmm... (Score: 2) by jdccdevel on Wednesday November 04 2015, @01:06AM

Re:hmm... Re:hmm... (Score: 0) by Anonymous Coward on Monday November 02 2015, @08:15PM

Re:hmm... (Score: 2) by LoRdTAW on Tuesday November 03 2015, @01:14AM

Re:hmm... (Score: 2) by Wootery on Tuesday November 03 2015, @11:39AM