Recently, I watched a fellow particle physicist talk about a calculation he had pushed to a new height of precision. His tool? A 1980s-era computer program called FORM.
Particle physicists use some of the longest equations in all of science. To look for signs of new elementary particles in collisions at the Large Hadron Collider, for example, they draw thousands of pictures called Feynman diagrams that depict possible collision outcomes, each one encoding a complicated formula that can be millions of terms long. Summing formulas like these with pen and paper is impossible; even adding them with computers is a challenge. The algebra rules we learn in school are fast enough for homework, but for particle physics they are woefully inefficient.
Programs called computer algebra systems strive to handle these tasks. And if you want to solve the biggest equations in the world, for 33 years one program has stood out: FORM.
Developed by the Dutch particle physicist Jos Vermaseren, FORM is a key part of the infrastructure of particle physics, necessary for the hardest calculations. However, as with surprisingly many essential pieces of digital infrastructure, FORM's maintenance rests largely on one person: Vermaseren himself. And at 73, he has begun to step back from FORM development. Due to the incentive structure of academia, which prizes published papers, not software tools, no successor has emerged. If the situation does not change, particle physics may be forced to slow down dramatically.
FORM got its start in the mid-1980s, when the role of computers was changing rapidly. Its predecessor, a program called Schoonschip, created by Martinus Veltman, was released as a specialized chip that you plugged into the side of an Atari computer. Vermaseren wanted to make a more accessible program that could be downloaded by universities around the world. He began to program it in the computer language FORTRAN, which stands for Formula Translation. The name FORM was a riff on that. (He later switched to a programming language called C.) Vermaseren released his software in 1989. By the early '90s, over 200 institutions around the world had downloaded it, and the number kept climbing.
Since 2000, a particle physics paper that cites FORM has been published every few days, on average. "Most of the [high-precision] results that our group obtained in the past 20 years were heavily based on FORM code," said Thomas Gehrmann, a professor at the University of Zurich.
[...] As crucial as software like FORM is for physics, the effort to develop it is often undervalued. Vermaseren was lucky in that he had a permanent position at the National Institute for Subatomic Physics in the Netherlands, and a boss who appreciated the project. But such luck is hard to come by. Stefano Laporta, an Italian physicist who developed a crucial simplification algorithm for the field, has spent most of his career without funding for students or equipment. Universities tend to track scientists' publication records, which means those who work on critical infrastructure are often passed over for hiring or tenure.
[...] While a few younger physicists like Ruijl work on FORM sporadically, for their careers' sake they need to spend most of their time on other research. This leaves much of the responsibility for developing FORM in the hands of Vermaseren, who is now mostly retired.
Without ongoing development, FORM will get less and less usable—only able to interact with older computer code, and not aligned with how today's students learn to program. Experienced users will stick with it, but younger researchers will adopt alternative computer algebra programs like Mathematica that are more user-friendly but orders of magnitude slower. In practice, many of these physicists will decide that certain problems are off-limits—too difficult to handle. So particle physics will stall, with only a few people able to work on the hardest calculations.
In April, Vermaseren will hold a summit of FORM users to plan for the future. They will discuss how to keep FORM alive: how to maintain and extend it, and how to show a new generation of students just how much it can do. With luck, hard work, and funding, they may preserve one of the most powerful tools in physics.
(Score: 4, Insightful) by Immerman on Wednesday January 04, @07:29PM (3 children)
Certainly sounds like the field needs to pull together and start funding at least a few developers - and given the sorts of budgets needed to do cutting edge particle physics a whole team probably doesn't even amount to a rounding error.
As for
Sorry, but we have a name for researchers who use vastly inferior tools rather than the well-known industry standard their teachers showed them.
Unemployed.
As I understand it particle physics is a VERY competitive field, with far more skilled enthusiasts than actual decent jobs. If you're doing particle physics research and are looking to hire a new team member, are you going to choose the candidate who knows how to use the tools of the field, or the one who does the equivalent of finger-painting in Mathematica?
I'm sure quite a few would rather use finger paints, but they will learn and use the right tool for the job, or they won't get the job. I mean, I'm a programmer and I'd *rather* use one of those clever graphical programming environments popular for beginners... but there's a reason they're only popular among beginners - they just aren't up to the task of doing real work.
(Score: 2) by PiMuNu on Thursday January 05, @09:05AM (2 children)
Note CERN has a big dev team, responsible for things like ROOT (essentially a crap version of the python/numpy/scipy/matplot stack, invented before "python" was a thing), and GEANT4 (an excellent package for modelling passage of particles through matter). Fermilab also has a dev team that contributes to some of this stuff. HEP community invented a big distributed cluster/decentralised data processing tech called GRID before "cloud" was a thing.
(Score: 2) by maxwell demon on Thursday January 05, @02:16PM
Another invention of CERN is called World Wide Web. You might have heard of it.
The Tao of math: The numbers you can count are not the real numbers.
(Score: 2) by Immerman on Thursday January 05, @04:20PM
No doubt - I didn't mean to suggest otherwise, though I can see how my comment might be interpreted that way.
I just meant it sounds like they need to fund some developers specifically for FORM (or a replacement)
(Score: 4, Insightful) by turgid on Wednesday January 04, @08:12PM (4 children)
I'd be willing to bet US$10 that it's a dog's breakfast, hence why only one guy understands it, and no one else is brave enough to take it on. Working With Legacy Code by Michael C. Feathers is a good place to start.
I refuse to engage in a battle of wits with an unarmed opponent [wikipedia.org].
(Score: 4, Funny) by Anonymous Coward on Wednesday January 04, @08:49PM (2 children)
You can look for yourself.... here's a random example [github.com]
(Score: 3, Funny) by turgid on Wednesday January 04, @08:53PM (1 child)
Oh. My. God.
I refuse to engage in a battle of wits with an unarmed opponent [wikipedia.org].
(Score: 2, Interesting) by khallow on Thursday January 05, @06:16AM
Notice the convenience of uncommenting the huge (this is only part of it) crazy expressions as I felt I needed to evaluate them! And yes, I was able to turn this into a PhD dissertation. No, I wouldn't expect to get paid for that abomination in the real world, but it actually did evaluation the sort of expressions I was dealing with in an intuitive and fast manner.
So to summarize, Form does a few things really well and a bit ugly. And well, my advisor was Australian.
(Score: 3, Informative) by Immerman on Thursday January 05, @04:56PM
How about this little gem from a different program:
Not so bad in comparison, but that's just the fast (approximate) invert square root function from Quake (with original comments) - a little nothing of a function that nonetheless delivered massive performance improvements when renormalizing vectors, without which Quake would likely not have been possible on the hardware of the time.
As a programmer with his roots in the days of severe memory restrictions and knowing the clock-cylce counts of individual CPU instructions, I want to say that extremely efficient, high-performance code is practically required to be a huge ugly mess.
These days it seems like optimization usually stops at eliminating really bone-headedly bad design decisions, with maintainability (a.k.a. reducing developer time) being the primary goal.
But for cutting edge high performance computing thatjust doesn't cut it - as demonstrated by the fact that FORM manages to run hundreds of times faster than the "competition". When run-time actually *matters*, rather than just freeing up more cycles to tack on some extra bells and whistles, you need to start resorting to the kinds of ugly tricks that in recent decades have been widely damned as being wastes of developer time that make code fragile and hard to understand.
Which is a fair enough position in most cases, since Moore's law gave us ridiculously powerful computers that just don't need that kind of optimization for most things. But there are still places where fast and ugly pays big dividends. I think that Quake function is now included in pretty much every 3D game in the world...
If anything, it sounds to me like maybe those in the particle physics world should try to convince Vermaseren to just go through his code and document it better so that others can pick up the torch when he's gone. Ugly code calls for extensive comments, and old-timers have a tendency to... not.
(Score: 4, Insightful) by krishnoid on Wednesday January 04, @08:21PM (1 child)
I mean he's a computer science guy, worked at CERN, and from what I hear he did something important like forty years ago and has been goofing off ever since. What has he done for us lately?
(Score: 0) by Anonymous Coward on Thursday January 05, @01:02AM
Along the same lines, I nominate Stephen Wolfram https://en.wikipedia.org/wiki/Stephen_Wolfram [wikipedia.org] Before he got sidetracked by making money with Mathematica and polishing his ego with "A New Kind of Science", he was one of *the* up and coming young physicists.
For example, he could take this old code and turn it into a function that ran inside Mathematica...and he has a pretty good dev team there to help him do it.
(Score: 3, Informative) by ShovelOperator1 on Wednesday January 04, @08:38PM (7 children)
I work with simulation software which costs a lot and its main component (called Solver, and let's leave it here) is written in FORTRAN. While the "3D" version is made in quite sane version of the language I can mess with quite safely, the 2D variant I previously used was written in an antique version in which, even considering that FORTRAN is highly procedural, used GOTOs every a few lines in IFs so complex that it is hard to imagine how they can be reached. One of it was certainly added because of some compiler bug 40+ years ago.
This is an academic code. It is brilliant, but its development was made during long and painful research, lots of trial and error and with many observations, from which notes have been lost. This is not a database-like CRUD which can be written using template. This is an important fact. So the software company, or one man, starts with the code in 1980s.
Then better computers come. Multiprocessing becomes stronger than these 2 units connected with serial (!) cable. So in the software I was working with, they were thinking about multiprocessing so early that they hardcoded all output data for 3-core computation. Fourth CPU does visualization and runs Unix, and who needs more? In 1990s it came out that someone needs more, so they installed MPI over that, splitting the computation in another domain. It is so cleverly made that you may use number of CPUs different than multiply of 3, but it can be seen that the problem partitioning routine has more crunching when it is not 3, 6, 9 or 12.
In 2004, someone trimmed all comments longer than 80 characters from source code database. Nobody cared about it, and it is still in this form.
The Pre-post processing tools are a big mess written in Fortran, C, C++, Python and Perl using GTK backend supplied in a pseudo-CAD package. Windows version will not run without going through a few VBScripts too. There is a binary data file processing software made in antique C because they lost the format specification in late 90s. The software operates using A_LONG_SERIES_OF_PARAMETERS_LIKE_THIS as arguments. I have found that I can misspell them and it works. Why? They check only length and some significant parts. And when you exceed their maximum length, you get an interesting new features because they overflow to the adjacent data.
Then the program becomes more and more commercial. Management is pressing that the software has to be easier for the user. So developers start to hide things. Want to use acoustic propagation models almost nobody uses? Put the minus before material density. Want to write the mesh in a format in which nodes can be fed to some CFD? Append another extension to the filename. So that's the thing.
Finally, the management hires some group to make a nice, user-friendly pre/post processing tool to make user not write input files by hand.
And 90% of features are gone.
The truth is that in such software someone should carefully revise it and convert academic code to the production-grade code. However, this scientific stuff is so complex that most companies who sell these programs do not like this idea. If I ask them about some custom routine, there is only a big "we don't do that here!" like answer.
(Score: 1, Funny) by Anonymous Coward on Wednesday January 04, @08:57PM (2 children)
Don't worry! Some new grad students will train an AI to learn the inputs -> outputs and you can replace the whole shebang with a neural network.
(Score: 5, Funny) by turgid on Wednesday January 04, @09:03PM
He's done us all a public service and put it on GitHub, apparently. That'll confuse Microsoft's code scraping AI good and proper.
I refuse to engage in a battle of wits with an unarmed opponent [wikipedia.org].
(Score: 2, Funny) by ShovelOperator1 on Wednesday January 04, @10:33PM
Been here, done that! And there are programs which do this too. They usually connect into I/O of these antique FORTRAN solvers.
With current extremely flexible AI, I found it able to predict values in periodic situation, based on a series of program runs. I run it in cycled 0-100, it tells me what may happen at 1001th cycle.
Meanwhile some old 1970s engineer with a slipstick extrapolates two curves, modulates a single calculation in it and gets exactly the same results without GPU :).
(Score: 4, Insightful) by sjames on Wednesday January 04, @09:09PM
You know it's going to be 'interesting' times when you see scientific software that still refers to the input as a 'deck'.
(Score: 1, Insightful) by Anonymous Coward on Wednesday January 04, @09:17PM (1 child)
Academic code is not meant to be "stable", it has to be more fluid. At the most you can plasticize it
(Score: 2) by turgid on Thursday January 05, @07:48AM
Code is code and it has to be "stable" or it's not usable. By "stable" it must not crash and it must be deterministic, producing exactly the same output each time it is run with a given set of inputs. Furthermore, it should be in some kind of test harness, so that changes can be made to the source to improve it (add new features, refactor, fix bugs) without too much risk of introducing new undetected bugs.
I refuse to engage in a battle of wits with an unarmed opponent [wikipedia.org].
(Score: 3, Insightful) by hendrikboom on Wednesday January 04, @10:08PM
I wonder if an editor that provides only semantics-preserving program transformations would help.
I'm not advocating an automatic process, just a intelligent programmer using these transformations.
Note: There's a flow-control analysis algorithm used by optimizing compilers that breaks Fortran programs (yes, with all the GOTOs) into nested groups of statements. I studied that in the early 1970's. I wonder what it was called... (It doesn't always succeed, but it can be helpful to the extent that it does succeed) (Yes optimizing compilers need to find structure in programs to be able to do what they do, too -- not just people.)
(Score: 2) by mcgrew on Wednesday January 04, @09:27PM
Even though my 16 bit games are trivial, completely unlike particle physics which is anything but. Anybody know of some good emulators?
And speaking of which, if the guy who wrote the original program is still alive and free of Alzheimer's, maybe he could jump in and save the day?
Carbon, The only element in the known universe to ever gain sentience
(Score: 2, Insightful) by hman on Thursday January 05, @09:01AM
So there is a clear way for young (or not) academics with different talents... you now, those who would struggle in conventionally published research but do have other talents.
Start working on the software infrastructure and PUBLISH it. If this stuff is so important I bet they could get relevant publishers to jump on the wagon.
Not Nature, no.
But with time some smaller publication would become the Nature of This Month in Scientific Software development, full of articles like FORM, cleanup of the xxx routines and bug fixes in merge cluster yyy by Some Particle Hacker Guy.
And they would get referalls: by later publications of the same kind, in prerequisites.
(Yes I know it isn't that easy)
(Score: 2) by PiMuNu on Thursday January 05, @09:07AM
> particle physics may be forced to slow down dramatically.
Yeah right.
(Score: 1) by clive_p on Thursday January 05, @11:19AM (1 child)
I think that converting it from Fortran to C was his big mistake as C is a much more primitive language.
Converting it back to a modern version of of Fortran like Fortran2018 would be a good move and make it much more maintainable in the future.
(Score: 2) by maxwell demon on Thursday January 05, @02:46PM
It wasn't written in Fortran, but in FORTRAN. FORTRAN clearly was a more primitive language than C.
Now the summary doesn't say when Vermaseren switched to C; it might have been before any good Fortran compilers were available (the first Fortran version being Fortran 90). Note also it doesn't say he converted the old code to C; the statement could also mean that new additions to the code were written in C.
The Tao of math: The numbers you can count are not the real numbers.