Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Wednesday September 21 2016, @02:29PM   Printer-friendly
from the carefully-choose-your-buckets dept.

My job was to examine blood lead data from our local Hurley Children's Hospital in Flint for spatial patterns, or neighborhood-level clusters of elevated levels, so we could quash the doubts of state officials and confirm our concerns. Unbeknownst to me, this research project would ultimately help blow the lid off the water crisis, vindicating months of activism and outcry by dedicated Flint residents.

As I ran the addresses through a precise parcel-level geocoding process and visually inspected individual blood lead levels, I was immediately struck by the disparity in the spatial pattern. It was obvious Flint children had become far more likely than out-county children to experience elevated blood lead when compared to two years prior.

How had the state so blatantly and callously disregarded such information? To me – a geographer trained extensively in geographic information science, or computer mapping – the answer was obvious upon hearing their unit of analysis: the ZIP code.

Their ZIP code data included people who appeared to live in Flint and receive Flint water but actually didn't, making the data much less accurate than it appeared.

ZIP codes – the bane of my existence as a geographer. They confused my childhood friends into believing they lived in an entirely different city. They add cachet to parts of our communities (think 90210) while generating skepticism toward others relegated to less sexy ZIP codes.

A tale to remind the scientists and technologists among us why it's important to do our jobs well.


Original Submission

Related Stories

Michigan: Two Million Gallons of Untreated Sewage Spill Into Flint River 48 comments

From MLive, Months after dire warnings, Flint spills 2 million gallons of raw sewage into river:

The city dumped an estimated 2 million gallons of untreated sewage into the Flint River Sunday, Aug. 18, just months after officials warned wastewater infrastructure was fast approaching a "critical point."

A partial report filed by the city with the state Department of Environment, Great Lakes and Energy on Tuesday, Aug. 20, says a "flash flood event" overflowed primary settling tanks at the city's wastewater treatment plant on Beecher Road, sending raw waste onto the ground and into a storm sewer drain that discharges directly to the river....

Earlier this year, the city sought a waiver from the Genesee County Health Department, requesting that it be allowed to skip testing river water for bacteria after sewage spills in cases in which the discharge comes from its retention basin.

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 4, Insightful) by TheRaven on Wednesday September 21 2016, @02:43PM

    by TheRaven (270) on Wednesday September 21 2016, @02:43PM (#404790) Journal

    In the UK, we have post codes that are two letters, one or two digits, then two letters and one digit. That, in combination with a house number, is enough to identify every house in the country. The first two letters give you the nearest large city, and each subsequent set of letters or digits is a subdivision within that area. The system gives 502,673,600 different combinations (the digits can be zero and one of the digits can be 0-9 or omitted, giving 11 symbols), which is more than enough for everyone in the USA to have their own postal code. Adding an extra letter at the start would give 13,069,513,600 possible combinations, allowing for very sparse (more human-friendly) allocations.

    So why does the US ZIP code system suck so much?

    --
    sudo mod me up
    • (Score: 0) by Anonymous Coward on Wednesday September 21 2016, @03:00PM

      by Anonymous Coward on Wednesday September 21 2016, @03:00PM (#404795)
    • (Score: 5, Informative) by SunTzuWarmaster on Wednesday September 21 2016, @03:04PM

      by SunTzuWarmaster (3971) on Wednesday September 21 2016, @03:04PM (#404797)

      The US ZIP code system functions extremely well for its intend purpose: delivering mail to recipients. They are issued, reissued, and changed each year in order to streamline mail delivery. The problem comes when you use the Mail Delivery Code for segregating people into water zones, or "affluent neighborhoods", demographic characteristics, or other features which are _not_ mailing letters. Like every engineering tool, it has advantages, disadvantages, and limits.

      The original analyzers used postal mail codes for figuring out what kind of water people were drinking. Somewhat obviously (when you think about it), this is not an accurate representation.

      • (Score: 0) by Anonymous Coward on Wednesday September 21 2016, @05:08PM

        by Anonymous Coward on Wednesday September 21 2016, @05:08PM (#404846)

        > Code for segregating people into water zones, or "affluent neighborhoods", demographic characteristics, or other features which are _not_ mailing letters.

        As a data point, I live in on top of a ridge. The neighborhood on top of the ridge is rich (houses valued at $400K to $3M) and lily white and we have our own special non-threatening olde-timey neighborhood watch signs that just look like historical markers unless you stop and read them. But 400 feet vertically below the ridge is the poorest (houses average about $30K) and blackest part of the entire city and I regularly hear gunshots fired down there. Our neighborhoods have exactly one very steep road connecting them but we share the same zip code.

      • (Score: 3, Informative) by DeathMonkey on Wednesday September 21 2016, @05:25PM

        by DeathMonkey (1380) on Wednesday September 21 2016, @05:25PM (#404851) Journal

        Yes, they should be using the proper tool for the job which is the The Public Land Survey System (PLSS) [nationalmap.gov]
         
        This is very standard practice so I'm not sure why they were using zipcodes in the first place.

        • (Score: 4, Insightful) by HiThere on Wednesday September 21 2016, @06:24PM

          by HiThere (866) Subscriber Badge on Wednesday September 21 2016, @06:24PM (#404872) Journal

          Zip codes are easy to get. Few people know, e.g., their census tract, much less their block-face. Water bills are sent out by zip code. Electric bills are sent out by zip code. Etc. Everyone knows their zip code. All the records include the zip code.

          A couple of decades or so ago I was involved with processing data collected during the 1960 and the 1980 censuses. They'd redrawn a bunch of the census tract boundaries in the interim, splitting a few, consolidating a couple. Etc. The only data source that had sufficient detail to reconcile the two data sets were the individual addresses, which were not available. So we did the best we could with 1960 census tracts and 1980 block faces....but much of the data wasn't available at the block-face level of detail, so we had to estimate a proportional correction. UGH! Not good. But zip codes split both the 1980 and the 1960 census tracts in different ways. (I don't believe they split any block-faces, however...but city boundaries did.)

          There are lots of different geographical units, and each is designed for a particular purpose. (Census tracts, e.g., play a part in Gerymandering, but they're supposed to contain "about" the same number of people. They don't always, but commercial areas tend to have larger census tracts, and rezoning will affect what happens the next time census boundaries are redrawn.) When you use a unit for a purpose that it isn't designed for, you should expect problems. For some reason people rarely do.

          --
          Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
    • (Score: 4, Informative) by shipofgold on Wednesday September 21 2016, @03:10PM

      by shipofgold (4696) on Wednesday September 21 2016, @03:10PM (#404800)

      Zip codes do their intended purpose well. They were introduced to easily route mail through the US Postal Service to its intended destination.

      The first 3 digits of a zipcode generally identify a large city or region processing center. All mail with those same 3 digits go to that center for further sorting. The last 2 digits identify the postoffice serving the destination from which the postmen are dispatched.

      The issue is that US postoffices do not necessarily deliver based on geographic or political boundaries or any other non postal factors. My understanding is that the postal routes served by a postoffice are intended so that a postman can deliver efficiently without regard to whether he crossing an arbitrary boundary.

      One good thing about ZIPcodes is they are not arbitrarily shaped (like many political boundaries) so they are good in sectioning a region into blocks, but if you expect any consistency in those blocks you are using a flawed methodoloy. In this case, they simply didn't map to the water distribution network, and I fault the researcher for not realizing this.

      All that being said, the US Postal Service can be frustrating. I live in a town with more than 100,000 people, but the USPS insists on saying I live in a town 5 miles away. Any reverse lookup of my zipcode will give the other town. My mail gets delivered regardless of the town written on the envelope as long as the zip code is correct.

      • (Score: 0) by Anonymous Coward on Wednesday September 21 2016, @10:23PM

        by Anonymous Coward on Wednesday September 21 2016, @10:23PM (#404935)

        You are close.

        There are type of zip codes.
        Street service
        Mailbox service
        Mixed service
        Private service
        Hidden / special service. Sorting centers

        Examples. A high rise in Chicago inner loop, the build is based on street service. Service in the building is another zip code with plus-4 for the floors. Grainger has private zip code so any typo that makes that code all other information like street name is ignored There are two zip codes in New York if you reverse the two digits, one is private and the other is a street service and if made the mistake nothing will tell you the error.

        Also zip codes are NOT numbers but alphanumeric. There is zip +4 with -shoe as the last 4. Was used for shoe department in a New York store. Was paid to usps for the marketing rights.

    • (Score: 2) by FatPhil on Wednesday September 21 2016, @04:01PM

      by FatPhil (863) <pc-soylentNO@SPAMasdf.fi> on Wednesday September 21 2016, @04:01PM (#404822) Homepage
      Do UK postal codes do a better job of identifying geograpical regions, though?

      What about NW4's incursion right between HA0 and HA9 https://en.wikipedia.org/wiki/File:NW_postcode_area_map.svg
      Or N14's landgrab where EN2 or EN4 should more sensibly be? https://upload.wikimedia.org/wikipedia/commons/0/06/N_postcode_area_map.svg

      Comparing populations, assuming A-ZZ are used, and on average 25 numeric districts per letter prefix (25 is way less than L, M, S, etc. but more than a whole bunch, I admit to pulling that number out of my arse), the first half of a UK postcode covers almost an identical number of people, so these regions are comparable. Even assuming errors in the assumptions, the overlap in ranges of populations in US 5-digit or UK up-to-2-letter-up-to-2-digit regions is pretty large. So the usefulness of those codes for geographical purposes should also be comparable if what you say is correct.

      The fact that the UK has an extra digit and two extra letters is irrelevant - the US has "+4", namely four extra digits, theoretically too.
      --
      Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
      • (Score: 2) by TheRaven on Wednesday September 21 2016, @08:00PM

        by TheRaven (270) on Wednesday September 21 2016, @08:00PM (#404900) Journal

        What about NW4's incursion right between HA0 and HA9

        The first three to four digits don't give much fine-grained identification. The first two digits typically identify a city and nearby towns and villages. The full post code, however, typically narrows it down to a single street. The combination of house / flat number and postcode uniquely identifies a property (some large buildings have their own postcodes so that they won't need new ones allocated if they're later subdivided). Compare, for example, Cambridge and Cambridge Mass. The former is identified by the CB prefix and then has a large number of subdivisions that narrow it down to individual streets (or parts of streets, in some cases). The latter has 5 ZIP codes for the entire city. ZIP +4 apparently would give similar fidelity, but was never widely adopted.

        --
        sudo mod me up
        • (Score: 2) by eof on Thursday September 22 2016, @05:54AM

          by eof (5559) on Thursday September 22 2016, @05:54AM (#405040)

          I don't know what you mean by zip+4 "was never widely adopted" since it is in use now. Certainly businesses use it when writing, whether or not you provide them with the information. There is a lot of variation with individuals using it, but I've always known mine.

  • (Score: 2) by EQ on Wednesday September 21 2016, @02:44PM

    by EQ (1716) on Wednesday September 21 2016, @02:44PM (#404791)

    Why not switch to a better geocoding system? The ZIP code was designed nearly half a century ago, surely we have something better by now.

    • (Score: 0) by Anonymous Coward on Wednesday September 21 2016, @03:13PM

      by Anonymous Coward on Wednesday September 21 2016, @03:13PM (#404804)

      I wonder if the USPS would (correctly) deliver a letter with only GPS coordinates.

      • (Score: 2) by bob_super on Wednesday September 21 2016, @04:49PM

        by bob_super (1357) on Wednesday September 21 2016, @04:49PM (#404839)

        I remember an old story of a letter delivered to (translated) "Bum usually near the church's left door".
        Some postal workers actually like a challenge.

        • (Score: 3, Interesting) by HiThere on Wednesday September 21 2016, @06:27PM

          by HiThere (866) Subscriber Badge on Wednesday September 21 2016, @06:27PM (#404875) Journal

          That was once true. The post office has changed a lot since then, though, and I suspect that these days there'd be not the slightest attempt to deliver...it would probably never get near the local carrier.

          --
          Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
          • (Score: 2) by bob_super on Wednesday September 21 2016, @06:44PM

            by bob_super (1357) on Wednesday September 21 2016, @06:44PM (#404883)

            True. The public services are under obligation to be efficient. No wasting time with serving outliers.
            The competitors trying to kill USPS even made sure to pay for a law forcing it into being profitable and fully financing pensions, which would take care of any frivolous notions of being an equal service unifying the country.

          • (Score: 0) by Anonymous Coward on Wednesday September 21 2016, @09:46PM

            by Anonymous Coward on Wednesday September 21 2016, @09:46PM (#404925)

            That was once true. The post office has changed a lot since then, though, and I suspect that these days there'd be not the slightest attempt to deliver...it would probably never get near the local carrier.

            True, that. These days even properly addressed mail sent to a major metropolitan area will get returned as "unknown addressee" by the knuckle-scrapers at the local post office. It has happened to me at least a couple of times over the last couple of years. The only thing they seem capable of delivering without fail is unsolicited advertising. Gee, I wonder why that might be? My best guess is that it's because they know who their real customers are. Hint: it ain't yer Mom sending you a birthday card.

            • (Score: 2) by goody on Wednesday September 21 2016, @11:59PM

              by goody (2135) on Wednesday September 21 2016, @11:59PM (#404955)

              The only thing they seem capable of delivering without fail is unsolicited advertising. Gee, I wonder why that might be?

              Probably because bulk mailers validate all of their recipient addresses with the USPS address verification/correction systems prior to mailing?

              • (Score: 2) by urza9814 on Tuesday September 27 2016, @12:26AM

                by urza9814 (3954) on Tuesday September 27 2016, @12:26AM (#406788) Journal

                Probably because bulk mailers validate all of their recipient addresses with the USPS address verification/correction systems prior to mailing?

                Or because they contract with USPS to ensure they get delivered no matter what.

                Took me six months to get unsubscribed from RedPlum's ad mailers. First they were coming to me at my address, so I unsubscribed. Then they started coming to 'Resident' at my current address, so I unsubscribed again. Guess what they did next? They removed my address (so I can't even attempt to unsubscribe anymore), but still deliver them. Literally the address field is blank, there is no address anywhere on the thing at all, but it still gets delivered consistently. Bulk mailers contract with the USPS to just put one in every box in a specific area or served by specific post offices, so they don't have to get the address right, or give any address at all. If they DO give an address, it's probably just to hide how they're actually getting the things delivered. That's also why bulk mailers are the one thing that's extremely common to find delivered to the 'wrong' address (if you bother to check) -- the mail carrier knows it doesn't matter, they're just supposed to stuff one in every box regardless, so what does it matter if you get your neighbors'?

    • (Score: 5, Interesting) by AthanasiusKircher on Wednesday September 21 2016, @03:28PM

      by AthanasiusKircher (5291) on Wednesday September 21 2016, @03:28PM (#404814) Journal

      Why not switch to a better geocoding system? The ZIP code was designed nearly half a century ago, surely we have something better by now.

      ZIP codes were designed for mail delivery. They still function reasonably well for that. The problem is that they're also a convenient data point that anyone who has your address also knows, and thus they tend to be misappropriated for all sorts of uses they aren't designed for. (Another example beyond this article -- there have been instances where towns outside a major city have been assigned the same first three numbers as the ZIP code of the city itself, resulting in insurance companies charging residents higher "city rates" just because of the numerical similarity in ZIP codes.)

      But what's "better"? For statistical analysis, you need something that makes sense for your data. Different types of geographical divisions might be suitable for different applications.

      From the article:

      More useful are units such as census block groups, wards, planning districts or municipal designations for neighborhoods within a city. Each of these adhere to some temporally consistent, spatially bounded definition, and can more appropriately be used to understand how one neighborhood varies compared to another.

      Any or all of these might be appropriate depending on your particular data and application. And ZIP codes might be useful sometimes too. The problem isn't the existence of ZIP codes or their usefulness (since they are still useful), but rather the fact that everyone easily can find your ZIP code and thus use it as a proxy for some more meaningful geographical division, when another division might be more appropriate in that specific application.

      Postal addresses (with ZIP codes) are a system already in place for locating people, buildings, etc. to a specific address. Probably what we need are better and more accessible converters to make it easy for those doing data analysis to convert those postal addresses (generally easiest to obtain) into some of the other types of divisions mentioned in the article (and others as well). Presumably some stuff like this may already exist but isn't used widely enough for some reason.

    • (Score: 2, Informative) by Anonymous Coward on Wednesday September 21 2016, @03:33PM

      by Anonymous Coward on Wednesday September 21 2016, @03:33PM (#404816)

      ZIP codes are not a geocoding system, and were not meant to be. ZIP codes are a postal distribution coding system. The important part in designing ZIP codes is which post office the letter is delivered to. The ZIP code is sort of equivalent to the IP address of a computer.

      Both ZIP codes and IP addresses are of course related to the geographical position of the destination address, but not the same. The post office that delivers your mail need not be the one that is geographically closest to you, nor does your internet traffic necessarily go through the router closest to you.

      Asking to replace ZIP codes with a geocoding system is just as wrong as asking to replace IP addresses by a geocoding system. Both are designed so that your packets arrive at the correct destination, not to determine where you are.

    • (Score: 2) by richtopia on Wednesday September 21 2016, @05:43PM

      by richtopia (3160) on Wednesday September 21 2016, @05:43PM (#404858) Homepage Journal

      Everyone knows their zip code, and most people consider it anonymous enough for surveys of sensitive data. I haven't looked too closely at the blood lead level study but I imagine it asked the users their zip for location. Beyond using their addresses I doubt there is a better commonly used system available.

    • (Score: 1) by nitehawk214 on Wednesday September 21 2016, @08:37PM

      by nitehawk214 (1304) on Wednesday September 21 2016, @08:37PM (#404911)

      The same reason we do not switch to something better than Social Security numbers.

      There is such a colossal amount of momentum behind the existing system it is nearly impossible to change.

      And for most things, zip codes work fine.

      --
      "Don't you ever miss the days when you used to be nostalgic?" -Loiosh
  • (Score: 5, Informative) by AthanasiusKircher on Wednesday September 21 2016, @03:08PM

    by AthanasiusKircher (5291) on Wednesday September 21 2016, @03:08PM (#404798) Journal

    This problem shouldn't be surprising to anyone who has any training in basic stats. ZIP codes are mostly arbitrary divisions. Yes, they're often roughly organized around municipal divisions and such, but that may not always track with the variables you're actually looking at. In this case, you need a divisions that tracks "attached to city water" vs. "not attached to city water." ZIP codes didn't meet that criterion. Superimposing arbitrary divisions onto a pool of data can mask patterns in the data, or it can make patterns appear which aren't really there. Or it can even make trends in data apparently reverse (known as Simpson's paradox [wikipedia.org]).

    There's a much broader lesson here than ZIP codes. If you're analyzing data, you need to be certain the way you're grouping it is meaningful to your analysis. Moreover, you should generally check for statistical artifacts by looking at patterns with and without divisions (or with different divisions) to check for robustness in correlations, but also in case your groupings are masking a broader pattern.

  • (Score: 2, Insightful) by Anonymous Coward on Wednesday September 21 2016, @04:00PM

    by Anonymous Coward on Wednesday September 21 2016, @04:00PM (#404821)

    System designed to aid mail delivery fails at quantifying water delivery. Details at 11.

  • (Score: 3, Insightful) by opinionated_science on Wednesday September 21 2016, @04:56PM

    by opinionated_science (4031) on Wednesday September 21 2016, @04:56PM (#404841)

    Not trying to be obtuse but I would think using Voronoi polyhedra to describe location would be ideal. There might need to be several layers, but it would complete map to any territory and allow all sorts of neat calculations...

    Please return to your normally scheduled discussion, with appropriate mathematics!

  • (Score: 0) by Anonymous Coward on Thursday September 22 2016, @12:00AM

    by Anonymous Coward on Thursday September 22 2016, @12:00AM (#404957)

    Zone Improvement Plan
    02139... it's ok to be jealous.