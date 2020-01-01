Tool predicts how fast code will run on a chip:
[...] In [a] series of conference papers, the researchers describe a novel machine-learning pipeline that automates this process, making it easier, faster, and more accurate. In a paper presented at the International Conference on Machine Learning in June, the researchers presented Ithemal, a neural-network model that trains on labeled data in the form of “basic blocks” — fundamental snippets of computing instructions — to automatically predict how long it takes a given chip to execute previously unseen basic blocks. Results suggest Ithemal performs far more accurately than traditional hand-tuned models.
Then, at the November IEEE International Symposium on Workload Characterization, the researchers presented a benchmark suite of basic blocks from a variety of domains, including machine learning, compilers, cryptography, and graphics that can be used to validate performance models. They pooled more than 300,000 of the profiled blocks into an open-source dataset called BHive. During their evaluations, Ithemal predicted how fast Intel chips would run code even better than a performance model built by Intel itself.
Ultimately, developers and compilers can use the tool to generate code that runs faster and more efficiently on an ever-growing number of diverse and “black box” chip designs. “Modern computer processors are opaque, horrendously complicated, and difficult to understand. It is also incredibly challenging to write computer code that executes as fast as possible for these processors,” says co-author on all three papers Michael Carbin, an assistant professor in the Department of Electrical Engineering and Computer Science (EECS) and a researcher in the Computer Science and Artificial Intelligence Laboratory (CSAIL). “This tool is a big step forward toward fully modeling the performance of these chips for improved efficiency.”
Most recently, in a paper presented at the NeurIPS conference in December, the team proposed a new technique to automatically generate compiler optimizations. Specifically, they automatically generate an algorithm, called Vemal, that converts certain code into vectors, which can be used for parallel computing. Vemal outperforms hand-crafted vectorization algorithms used in the LLVM compiler — a popular compiler used in the industry.
[...] “Intel’s documents are neither error-free nor complete, and Intel will omit certain things, because it’s proprietary,” says co-author on all three papers Charith Mendis, a graduate student in EECS and CSAIL. “However, when you use data, you don’t need to know the documentation. If there’s something hidden you can learn it directly from the data.”
[...] In training, the Ithemal model analyzes millions of automatically profiled basic blocks to learn exactly how different chip architectures will execute computation. Importantly, Ithemal takes raw text as input and does not require manually adding features to the input data. In testing, Ithemal can be fed previously unseen basic blocks and a given chip, and will generate a single number indicating how fast the chip will execute that code.
The researchers found Ithemal cut error rates in accuracy — meaning the difference between the predicted speed versus real-world speed — by 50 percent over traditional hand-crafted models. Further, in their next paper, they showed that Ithemal’s error rate was 10 percent, while the Intel performance-prediction model’s error rate was 20 percent on a variety of basic blocks across multiple different domains.
(Score: 2) by ikanreed on Monday January 27, @02:50PM (2 children)
I'm pretty sure that one Alan Turing proved this was literally impossible almost a century ago.
There's not one mention of the halting problem in the summary or any of the papers. That seems like a bit of an oversight.
If it's purely heuristic, that's fine, but at least make a nod to the impossibility of certainty.
(Score: 1) by shrewdsheep on Monday January 27, @02:57PM
You are correct about the general problem of predicting running time of arbitrary code. They are talking about "basic blocks" though. TLDR; probably this means code without branches (I know the terminology straight blocks for this) for which a prediction used to be very simple in the days of a M68k (look up the cycle count per instruction in your table) but is now difficult to predict (caches, branch prediction, micro-code, etc.).
(Score: 2) by The Mighty Buzzard on Monday January 27, @03:00PM
Sounds like they're after "better than human" not perfection. Also, there's this [arxiv.org].
Me, I'm more interested in if the optimizations result in the code not doing exactly what you told it to and causing bugs that are damned hard to troubleshoot.
What do picture frames, sheetrock, and Jeffery Epstein have in common? They don't hang themselves.