The "jump threading" compiler optimization (aka -fthread-jump) turns conditional into unconditional branches on certain paths at the expense of code size. For hardware with branch prediction, speculative execution, and prefetching, this can greatly improve performance. However, there is no scientific publication or documentation at all. The Wikipedia article is very short and incomplete.
The linked article has an illustrated treatment of common code structures and how these optimizations work.
(Score: 2) by Gravis on Monday November 02 2015, @07:58AM
frankly, i find all this speculation and prediction to be a waste of silicon and power. yes, you can push the limits on execution speed but it's already more than enough. what we should be moving toward is something like XMOS architecture where instead of waiting or predicting the outcome of the instruction, it merely uses one core for multiple threads. the result is fully deterministic execution times which is awesome, especially for real-time problems like bit banging a protocol.
(Score: 3, Informative) by pe1rxq on Monday November 02 2015, @09:13AM
If you are bit banging on a CPU which does branch prediction you are probably already doing it wrong.
Bit banging is a nice trick if you have a deterministic CPU like an avr or pic with no OS to interfer.
But on just about anything else you should invest in a little bit of circuitry to implement the protocol the proper way.
I have done my share of real-time programming, but bit-banging on a general purpose CPU is not a typical problem.
(Score: 2) by Alfred on Monday November 02 2015, @07:51PM
The XMOS line of chips gravis mentions are deterministic. Of course he misses the boat. A deterministic chip is not something you run a word processor or web browser on. it is great for processing audio streams or a constant rate of data with the same function over and over. Gravis also mentions bit banging which XMOS has pretty much built into the hardware.
(Score: 2) by ledow on Monday November 02 2015, @01:29PM
We're doing both.
Predicting paths things is not really much to do with the CPU - it's about getting things into memory or the cache that you might need soon. It's about the interfaces, not the instructions themselves.
And if you memory cache is the bottleneck, adding more "CPU's" to the same silicon just makes the problem worse. Rather than add in even more places trying to access random memory at the same time, use a single cache effectively.
And the problem we have with CPU's is that they can only ever run so fast. We stopped going up in GHz for a reason, hitting physical limits. As such, using a tiny bit of logic that surrounds the silicon keeps the processor at full speed even if it mis-predicting, which is more effective than building another processor next to it and adding to the heat / signal propagation issues.
If the world were perfect and everything ran at CPU speed, we wouldn't need memory controllers, memory caches, bus timings, etc. Those things exist because the CPU is only one problem. It's like a steam engine - you have to keep that at full speed because it you're biggest investment, but that means dragging along a cart full of coal, and a guy who knows when to shovel it in.
(Score: 4, Interesting) by pixeldyne on Monday November 02 2015, @08:34AM
The article that is, and the submission. This is the kind of content I'd like to see more on SN.
(Score: 2) by Phoenix666 on Monday November 02 2015, @12:20PM
Do you have more sources for this kind of article? Not many like this turn up in sources I read or the RSS log.
Washington DC delenda est.
(Score: 2) by TheRaven on Monday November 02 2015, @12:46PM
sudo mod me up
(Score: 2) by Wootery on Monday November 02 2015, @03:30PM
Also, their compiler project seems totally pointless.
Want a C compiler that generates an SSA intermediate-representation and does lots of optimisations? We have that, it's called Clang, it's awesome, and it's used everywhere such a compiler is needed.
(Score: 0) by Anonymous Coward on Thursday November 05 2015, @05:19PM
gcc generates an SSA intermediate-representation and does lots of optimisations. It's used everywhere.
(Score: 2) by ledow on Monday November 02 2015, @01:23PM
Topic yes.
Treatment of it, no.
I actually didn't find the explanation all that clear and I'm not a stranger to the way modern processors operate.
The point of it also was clear - why would you need or want to do this? It didn't seems covered particularly well.
I'd like more of this, but... literally.. more of this. The article was quite bare, I felt.
(Score: 3, Touché) by Phoenix666 on Monday November 02 2015, @04:04PM
If you have better sources, please submit them. Many of us would like to see them. I don't see much like this.
Washington DC delenda est.
(Score: 0) by Anonymous Coward on Monday November 02 2015, @09:59AM
GOTO not considered harmful anymore?
(Score: 2) by LoRdTAW on Monday November 02 2015, @01:56PM
It's an old meme that applied back when it was written in 1968. Dijkstra was opposed to it because it was used heavily and some languages relied on it to solely control program flow and implement loops.
Nowadays it can still be used but the idea is to use it as a last resort or very, very sparingly. Apple's memcpy() has a goto: http://www.opensource.apple.com/source/xnu/xnu-2050.18.24/libsyscall/wrappers/memcpy.c [apple.com]
(Score: 0) by Anonymous Coward on Monday November 02 2015, @02:51PM
JMP is goto. Most compilers make liberal usage of them. You just do not see it in the language.
if (x == 1) { dothis(); } else { dothat(); }
With most compilers there are at least 1 (usually 2 if you are in debug) goto statements there and a conditional one. They are hidden. You do not see them much anymore because the languages removed the need for you to type them in. In the few cases where you still need them it is still there.
Some of the early languages had poor branching and functional description so GOTO was what you did to segment things off. Things like stack management also 'hide' goto statements. Such as the x86 CALL/RET which are gotos with side effects.
The article is very similar to an idea I have kicking around. Run both ends of an if condition then toss out the results of the bad condition. You end up wasting half of your silicon power though. It also only works if there are no memory side effects for downline items. It gets worse if you nest conditional or recursively do something. Wonder how they handled that. Or if they just are breaking out the for loops. It looks like they have an interesting issue I had not considered with halting the external thread.
(Score: 0) by Anonymous Coward on Monday November 02 2015, @06:18PM
void * memcpy(void *dst0, const void *src0, size_t length)
{
char *dst = dst0;
const char *src = src0;
size_t t;
if (length == 0 || dst == src) /* nothing to do */
goto done;
[ several lines omitted ]
done:
return (dst0);
}
I might be severely retarded, but couldn't they just, you know...
if (length == 0 || dst == src) /* nothing to do */
return (dst0);
Crazy, I know.
(also Rehash seems to hate code snippets with empty lines in 'em. I had to <code> every individual part.)
(Score: 2) by LoRdTAW on Monday November 02 2015, @06:48PM
The only thing I can think of is to maintain a single return statement in the function to keep things simple.
(Score: 3, Informative) by jdccdevel on Monday November 02 2015, @08:03PM
A lot of C code I've seen uses this coding style for housekeeping. It might not be relevant in a function this simple, but it's easier to maintain the style for every bit of the code.
The idea is to make sure that everything that needs to be cleaned and/or set (mallocs are freed, error values set/cleared, structure members made consistent, etc) is done consistently just before the function returns.
If you have more than one return statement in a function (not too unusual), then you have to make sure that each of them is kept up to date with all required housekeeping every time you make a change the code. Miss just once, and your function will leak memory sometimes, and you have a bug that won't be easy (in fact it could be really, really hard) to track down.
It leads to a lot of code duplication (bad!) which is tedious to maintain (bad!), very fragile (bad!) and error prone (also bad!).
In this coding style, all that bad is avoided by just jumping to the end of the function, where some housekeeping code can clean up everything the same every time (good!). That is more robust (good!) and easier to maintain (bonus!) than the alternative.
Modern, more Object Oriented languages tend to hide a lot of the relevant details behind concepts like exceptions, and use the "object" model to ensure data consistency. A language like C doesn't have those protections so many coders embrace a more defensive coding style like this.
(Score: 2) by tibman on Monday November 02 2015, @10:26PM
If you had duplicate cleanup bits in the same function (from multiple returns in this case) then wouldn't you just extract those duplicate pieces into a new function?
SN won't survive on lurkers alone. Write comments.
(Score: 1) by Delwin on Monday November 02 2015, @10:44PM
Every function has overhead, so no. Even inline is only a suggestion to the compiler, not a hard requirement.
(Score: 2) by tibman on Tuesday November 03 2015, @02:51AM
Being readable and easier to maintain is better than saving on function overhead. If the compiler finds it is best to in-line some functions, let it do that. Unless you are dealing with an extremely resource constrained system.
SN won't survive on lurkers alone. Write comments.
(Score: 0) by Anonymous Coward on Tuesday November 03 2015, @03:59AM
Being readable and easier to maintain is better than saving on function overhead.
You can make readable and maintainable code with gotos. And even a system that isn't necessarily resource constrained doesn't deserve to be abused by sloppy coding. I know that more processing power means that programming think they can waste as many resources as possible, but I prefer not to.
(Score: 2) by tibman on Tuesday November 03 2015, @02:14PM
Removing duplicate code by putting it into a function is not a waste of resources. That is a major reason why functions exist. Doing the same with gotos is fine as well, i never said otherwise.
SN won't survive on lurkers alone. Write comments.
(Score: 2) by Wootery on Tuesday November 03 2015, @11:24AM
Even inline is only a suggestion to the compiler, not a hard requirement.
Yes, because inlining everything isn't always a good thing for performance. Code size matters for cache behaviour, and function-calls/returns are cheap on modern CPUs. The compiler probably knows better than you whether inlining makes sense or not.
But none of that matters. Readability is generally far more important than tiny potential performance differences.
(Score: 2) by jdccdevel on Tuesday November 03 2015, @04:19PM
No, that would just create a bigger mess to maintain.
It is horrendously bad coding practice to have a function leave a mess for another function to clean up. Each function needs to have a well known and consistent api, with well understood side effects, (i.e. what it's supposed to do) and nothing more or less.
Moving that code to a second function is just asking for more problems. Now the cleanup code is in a different place from the code that made the "mess", would still have to be maintained in conjunction with the first function, has an unclear, and very dynamically changing API (How many variables do we need to pass it today?), has to have an unambiguous name, has to always be called instead of or before returning a value, and serves no purpose without the original function.
That's a huge amount of bad coding and horrible contortions to live with just because someone convinced you that "goto" is "always bad".
You could use a macro. Some projects do, and that works for them. However, that macro code still has to "live" somewhere, has to have a unique name, and has to be maintained and not forgotten before every return. Many people find it more logical, simpler, and less cumbersome to have that code live at the end of their function, and jump to it when they're ready to return a value. Using a goto is also really easy to code audit, because each function should have exactly one return statement. (Sometimes they have two, one for the regular code path, and one for an error code path.)
This coding style also produces smaller and faster code, since all the code is only compiled once, and doesn't have to be optimized away by the compiler.
A simple example for where this coding style shines is a function with several dynamically allocated buffers.
Imagine that the buffers are used internally in the function, and only within the function, to do some processing. How do you make sure that you ALWAYS free the buffer, if your code can return from all over the place?
So, before each return statement in the function, you have to:
- check if the buffer was allocated
- free the buffer if necessary
- verify the free happened properly, handle the error if it didn't
- and then return a value
Now, suppose you have to add a second, then a third, then a fourth buffer. Also remember these changes happen over time, by multiple people, inside a somewhat complicated function. You can start to see how maintaining all that code at in one well defined place, within the function itself, is much more maintainable.
This coding style is about promoting good habits, especially in large projects with lots of people working on them.
You just have to get over the whole "goto bad" mantra, which was probably drilled into you by some well meaning coding instructor who didn't have the time to explain when and how to use it properly.
(Score: 2) by tibman on Tuesday November 03 2015, @07:06PM
An API is a collection of public methods. Internal workings should remain private. Memory allocation and freeing is a (mostly) private matter.
Though you sound like you know a lot about programming, it sounds like you don't know a lot about writing maintainable code. 400 line functions and duplicate code are not the best way to create maintainable code. To me it sounds like you are sticking an entire class's worth of functionality and variables inside of one single public method. If you have a function that is doing many different things where only some of those things use some variables then you indeed have hidden a class inside that function. You can extract that entire function into a class, promote the variables to privates and break it apart into several methods. Then from the original function you can new up the class and execute it. Some new engineer/dev who looks at that method will be able to better understand what it is doing than having to parse 400 lines of branching logic and only sometimes used variables.
Constructing classes that are organized by functionality and share variables is hardly horrible contortions. I have yet to say anything about goto's being bad, so i won't touch that one.
So in this case the buffer is always allocated at the beginning of the function. There are multiple returns (from all over). Now the puzzle is how to make sure the memory is always freed and is maintainable. It sounds to me like you either need to reduce the number of returns to one location or extract that logic into another function. In the extract scenario it would look like function A doing the allocation, calling B, then free the memory. Function B only performs logic and has multiple returns. Putting custom cleanup just before each return is not a good answer. Though i talk about it in early posts it was never my idea. My idea was that if you had duplicate code because of cleanups, you could extract that cleanup to a function to prevent someone changing one code bit and not another.
I apologize if i hurt your feelings for accusing you of not knowing about writing maintainable code. I'm not trying to insult. Just trying to point out a place where you can improve.
SN won't survive on lurkers alone. Write comments.
(Score: 2) by jdccdevel on Wednesday November 04 2015, @01:06AM
This coding style is mostly useful to keep things straight when you don't have an Object-oriented language. O-O programming has a coding methodology, language features, and a lot of "syntactic sugar" which changes code maintenance dramatically.
The coding style I was originally commenting on is designed to "reduce the number of returns to one location". Coding logic doesn't always make that easy. Error checking, for example, can halt the function's logic in any number of locations where you want the function to return right away. Backing out of some logic trees and nested loops can be really complicated. Goto makes it easy.
Remember, this is C, so there is no "exception" construct in the language. You could think of this coding style as using "goto" as a primitive form of exception handling, to make sure everything is cleaned up before the function returns. The alternative is to duplicate the cleanup code everywhere an error could occur (which is, I'm sure you'll agree, fragile. Not to mention bad coding practice due to code duplication.)
Extracting the logic to another function also doesn't make any sense, as the sole purpose of that function would be to "clean up" for the other function. It has the same fragility as the code duplication we are trying to avoid (you have to remember to always call the "clean up" function before you return instead of just "return".) and it would be a real pain to maintain. How do you ensure that both functions are kept in sync? This is a maintenance nightmare. Not the least because every time you change something, the api for the "cleanup" function could change, necessitating you to change every line of code where the cleanup is called. (which, thanks to error handling, could be in dozens of different places.)
No, the "goto done" coding style is much simpler to follow, easier to audit, and significantly better overall for languages like C.
Unfortunately, many people see the "goto", and assume it has to be bad because someone told them "goto is bad, and makes code unmaintainable." Which is unfortunately something that every coding instructor will say to beginners, because a beginner will almost never use goto properly.
(Score: 0) by Anonymous Coward on Monday November 02 2015, @08:15PM
to keep things simple.
That code? Simple?
This is your brain on BSD.
(Score: 2) by LoRdTAW on Tuesday November 03 2015, @01:14AM
Yea, the gnu memcpy function is half as many LOC.
(Score: 2) by Wootery on Tuesday November 03 2015, @11:39AM
This is the correct answer: having a 'single point-of-return' can be useful, especially in C where there's no RAII to handle cleanup automatically, the way you can in C++. It's also possible that single-point-of-return was required by coding guidelines that the authors were required to follow.
(It's perhaps ironic that garbage-collected languages, which generally lack RAII, also require special care with clean-up when it comes to things like closing files. Memory-management != resource-management, and no-one likes finalisers.)
I've also been told that old compilers would generate less efficient code when functions had multiple return points. (I don't know whether this is actually true, though.)
(Score: 2) by gnuman on Tuesday November 03 2015, @05:18AM
Crazy, I know.
It's called code style or coding guidelines. Especially with C, you need to follow it or you get yourself in trouble.
And there is nothing wrong with this code style. Compilers easily optimize it removing the goto, but it's actually easier to understand for humans. Successful return statement in ONE place.
(Score: 2) by Wootery on Monday November 02 2015, @03:27PM
Did you miss the word compiler?
We're talking about the code generated by a C compiler, not C programming patterns. The harm of GOTO doesn't apply here.
(Score: 2) by Dr Ippy on Monday November 02 2015, @11:19AM
Very deterministic thinking. But presumably a quantum computer would be able to execute all branches in the code simultaneously, even if there's a huge number of possibilities.
This signature intentionally left blank.
(Score: 1, Interesting) by Anonymous Coward on Monday November 02 2015, @04:14PM
Seems similar to a technique used for in-order processors. Since conditional branch mispredictions are expensive (a misprediction has to discard instructions and refetch the correct branch), you execute both sides of a branch and simply "select" the correct result at the end. The "if" at the end simply forwards the correct result to the subsequent code, so no re-fetch required. At worst, you have a one cycle stall as you dump the last assignment (you should hint or structure the compare such that the processor always pre-fetches the last assignment). At best, you have a "SELECT" instruction so no stall or branching to deal with (ex. PowerPC fsel or Intel CMOV).
if (x)
result = foo(x);
else
result = bar(x);
turns into
altresult = foo(x);
result = bar(x);
if (x) result = altresult;
explaining why the following is sub-optimal is left to the student
result = bar(x);
if (x) result = foo(x);
(Score: 2) by Alfred on Monday November 02 2015, @08:32PM
Branch misprediction is expensive but modern branch prediction is good enough to pay for itself in cpu cycles saved.
Even the PlayStation ONE cpu had a set of branch likely instructions. It seems to me that would be the easiest way, just tell the chip which way to prefetch and only fail once per loop block or less than half per conditional branch. I thought modern chips did the same but I'm not sure, I know the mips3000 in the ps1 did.
Maybe that is what you were trying to say. I think the implementation in the article could be achieved by unrolling the first pass of a loop or other opcode bloating optimization. It is really just interesting graph theory.