Scientific literature often mis-names genes and boffins say Microsoft Excel is partly to blame.
"Automatic conversion of gene symbols to dates and floating-point numbers is a problematic feature of Excel software," In a paper titled write Mark Ziemann, Yotam Eren and Assam El-OstaEmai of the Baker IDI Heart & Diabetes Institute in Australia in a paper titled Gene name errors are widespread in the scientific literature .
Among the things Excel does to gene names include changing "SEPT2", the name of a gene thought to have a role in proper formation of cell structure, to the date "2-Sep". The "MARCH1" gene becomes "1-Mar".
The paper notes that this is a problem that's been know for over a decade, but one which remains pervasive. The trio studied 35,175 Excel tables attached to 3,597 scientific papers published between 2005 and 2015 and found errors in "987 supplementary files from 704 published articles. Of the selected journals, the proportion of published articles with Excel files containing gene lists that are affected by gene name errors is 19.6 per cent."
It's not hard to change the default format of Excel cell to avoid changes of this sort: you can get it done in a click or three. Much of the problem in these papers is therefore between scientists' ears, rather than within Excel itself. The paper's silent on why genetic scientists, who The Register will assume are not short of intelligence, have been making Excel errors for years.
This article focuses on errors resulting from auto-correction of gene names; certainly other subject areas have suffered from similarly 'helpful' software. What hilarious and/or cringe-worthy 'corrections' have YOU seen?
(Score: 2) by looorg on Friday August 26 2016, @11:12AM
I'm always amazed for all the different things people use Excel for. It's a great program for making some simple spreadsheets and stuff but it becomes horrible after that as far as I am concerned. Pages filled with formulas and generally all around bad programming of referring to frames/cells and doing things to them and then just spitting out the data someplace else. That said the program has clearly gotten functions for it. So the problem isn't so much the program but the people.
That said having a proper statistics software package isn't exactly any guarantee for proper stats usage. The most common fault I can think of it typing/scaling your data - everything just gets put down as being ratio or interval data even tho it's really nominal data. The program doesn't mind. The program doesn't care. The program doesn't know what your data is. It will happily compute it and do what you ask. But just what is the mean of male and female? or the average month of the year ... It's not the software package that makes or breaks the data - it's the idiot behind the keyboard. Users are just horribly sloppy with all their data and the processing of said data.