Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Friday August 26 2016, @08:13AM   Printer-friendly
from the not-so-bright-scientists dept.

Scientific literature often mis-names genes and boffins say Microsoft Excel is partly to blame.

"Automatic conversion of gene symbols to dates and floating-point numbers is a problematic feature of Excel software," In a paper titled write Mark Ziemann, Yotam Eren and Assam El-OstaEmai of the Baker IDI Heart & Diabetes Institute in Australia in a paper titled Gene name errors are widespread in the scientific literature .

Among the things Excel does to gene names include changing "SEPT2", the name of a gene thought to have a role in proper formation of cell structure, to the date "2-Sep". The "MARCH1" gene becomes "1-Mar".

The paper notes that this is a problem that's been know for over a decade, but one which remains pervasive. The trio studied 35,175 Excel tables attached to 3,597 scientific papers published between 2005 and 2015 and found errors in "987 supplementary files from 704 published articles. Of the selected journals, the proportion of published articles with Excel files containing gene lists that are affected by gene name errors is 19.6 per cent."

It's not hard to change the default format of Excel cell to avoid changes of this sort: you can get it done in a click or three. Much of the problem in these papers is therefore between scientists' ears, rather than within Excel itself. The paper's silent on why genetic scientists, who The Register will assume are not short of intelligence, have been making Excel errors for years.

This article focuses on errors resulting from auto-correction of gene names; certainly other subject areas have suffered from similarly 'helpful' software. What hilarious and/or cringe-worthy 'corrections' have YOU seen?


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Insightful) by pTamok on Friday August 26 2016, @09:57AM

    by pTamok (3042) on Friday August 26 2016, @09:57AM (#393419)

    It seems each generation is condemned to making the same mistakes.

    When I was doing scientific programming three decades ago, it was very, very clear that data should be separate from programs. Data was input using text files, the format of which was documented extremely precisely, and your program read that file putting variables and constants into the correct structures in the program. Room for ambiguity was ruthlessly removed. This was when the language of choice for scientific programming was FORTRAN-77, not C. You validated your programs with test data files designed to check edge cases and oddities. You also scrubbed your data, looking for typos.

    In my view Excel is not a suitable tool for serious data analysis. Better, more rigorous, testable tools exist. Working with Excel is like working with someone with sloppy bench habits in a lab: not cleaning up after spills, by-passing safety procedures, not keeping an accurate log-book. You can get away with that for a while, but eventually something blows up in your face - sometimes literally. If you can't be bothered to use the right tool for the job, you shouldn't be doing the job. It is a bad workman who blames his tools. The problems are not Excel's fault, but the fault of whoever chose Excel as the inadequate tool.

    Starting Score:    1  point
    Moderation   +4  
       Insightful=4, Total=4
    Extra 'Insightful' Modifier   0  

    Total Score:   5  
  • (Score: 0) by Anonymous Coward on Friday August 26 2016, @10:41AM

    by Anonymous Coward on Friday August 26 2016, @10:41AM (#393426)

    Just because the tool might not be "proper" for the job in that one case, does not mean that the tool can't have the same failure in other, more proper, jobs.