Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Wednesday September 21 2016, @02:29PM   Printer-friendly
from the carefully-choose-your-buckets dept.

My job was to examine blood lead data from our local Hurley Children's Hospital in Flint for spatial patterns, or neighborhood-level clusters of elevated levels, so we could quash the doubts of state officials and confirm our concerns. Unbeknownst to me, this research project would ultimately help blow the lid off the water crisis, vindicating months of activism and outcry by dedicated Flint residents.

As I ran the addresses through a precise parcel-level geocoding process and visually inspected individual blood lead levels, I was immediately struck by the disparity in the spatial pattern. It was obvious Flint children had become far more likely than out-county children to experience elevated blood lead when compared to two years prior.

How had the state so blatantly and callously disregarded such information? To me – a geographer trained extensively in geographic information science, or computer mapping – the answer was obvious upon hearing their unit of analysis: the ZIP code.

Their ZIP code data included people who appeared to live in Flint and receive Flint water but actually didn't, making the data much less accurate than it appeared.

ZIP codes – the bane of my existence as a geographer. They confused my childhood friends into believing they lived in an entirely different city. They add cachet to parts of our communities (think 90210) while generating skepticism toward others relegated to less sexy ZIP codes.

A tale to remind the scientists and technologists among us why it's important to do our jobs well.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Informative) by AthanasiusKircher on Wednesday September 21 2016, @03:08PM

    by AthanasiusKircher (5291) on Wednesday September 21 2016, @03:08PM (#404798) Journal

    This problem shouldn't be surprising to anyone who has any training in basic stats. ZIP codes are mostly arbitrary divisions. Yes, they're often roughly organized around municipal divisions and such, but that may not always track with the variables you're actually looking at. In this case, you need a divisions that tracks "attached to city water" vs. "not attached to city water." ZIP codes didn't meet that criterion. Superimposing arbitrary divisions onto a pool of data can mask patterns in the data, or it can make patterns appear which aren't really there. Or it can even make trends in data apparently reverse (known as Simpson's paradox [wikipedia.org]).

    There's a much broader lesson here than ZIP codes. If you're analyzing data, you need to be certain the way you're grouping it is meaningful to your analysis. Moreover, you should generally check for statistical artifacts by looking at patterns with and without divisions (or with different divisions) to check for robustness in correlations, but also in case your groupings are masking a broader pattern.

    Starting Score:    1  point
    Moderation   +3  
       Informative=3, Total=3
    Extra 'Informative' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   5