Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 9 submissions in the queue.
posted by n1 on Saturday June 10 2017, @11:07AM   Printer-friendly
from the i-am-spartacus dept.

Software engineers go crazy for the most ridiculous things. We like to think that we're hyper-rational, but when we have to choose a technology, we end up in a kind of frenzy — bouncing from one person's Hacker News comment to another's blog post until, in a stupor, we float helplessly toward the brightest light and lay prone in front of it, oblivious to what we were looking for in the first place.

This is not how rational people make decisions, but it is how software engineers decide to use MapReduce.

As Joe Hellerstein sideranted to his undergrad databases class (54 min in):

The thing is there's like 5 companies in the world that run jobs that big. For everybody else... you're doing all this I/O for fault tolerance that you didn't really need. People got kinda Google mania in the 2000s: "we'll do everything the way Google does because we also run the world's largest internet data service" [tilts head sideways and waits for laughter]

Having more fault tolerance than you need might sound fine, but consider the cost: not only would you be doing much more I/O, you might be switching from a mature system—with stuff like transactions, indexes, and query optimizers—to something relatively threadbare. What a major step backwards. How many Hadoop users make these tradeoffs consciously? How many of those users make these tradeoffs wisely?

Source: https://blog.bradfieldcs.com/you-are-not-google-84912cf44afb


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 3, Insightful) by Nerdfest on Saturday June 10 2017, @11:13AM (8 children)

    by Nerdfest (80) on Saturday June 10 2017, @11:13AM (#523460)

    Well, over-engineering for expected growth is against the "XP" sort of idea that you should "do the simplest thing that could possibly work", but usually not designing for potential growth or future changes bites you in the ass in one way or another. It's always a call that needs to be done based on experience, but I think every time I've over-engineered something based on what I expect to happen in the future, it's paid off in a huge way. As with everything, it's cost/ benefit, or at least potential cost and potential benefit.

    • (Score: 0) by Anonymous Coward on Saturday June 10 2017, @11:25AM

      by Anonymous Coward on Saturday June 10 2017, @11:25AM (#523464)

      Maybe it's about predicting where the project will be in n years and deciding to make certain subsystems expandable while making other KISS-adherent.

    • (Score: 3, Insightful) by Bobs on Saturday June 10 2017, @12:08PM (3 children)

      by Bobs (1462) on Saturday June 10 2017, @12:08PM (#523469)

      Yeah. I have found basing things on a fundamentally scalable architecture then KISS-Inc the hell out of it usually works well.
      You can usually buy time to fix and upgrade individual subsystems as long as you don't have to rip out and rebuild the whole system.

      Balanced against

      "The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming. " - Knuth

      • (Score: 0) by Anonymous Coward on Saturday June 10 2017, @06:23PM (2 children)

        by Anonymous Coward on Saturday June 10 2017, @06:23PM (#523541)

        I have found basing things on a fundamentally scalable architecture then KISS-Inc the hell out of it usually works well.

        1) That's fine if you're basing on OSS or free stuff. Then your scaling considerations don't need to include stuff like software licensing ;). Whereas if you base on non-free stuff, you may find it starts costing a lot more to scale.

        2) It's also fine if you're not basing it on really crappy stuff. e.g. MySQL and/or PHP. Yes I know Facebook managed despite that but I'm sure it was quite painful.

        • (Score: 0) by Anonymous Coward on Saturday June 10 2017, @07:14PM

          by Anonymous Coward on Saturday June 10 2017, @07:14PM (#523559)

          2) It's also fine if you're not basing it on really crappy stuff. e.g. MySQL and/or PHP. Yes I know Facebook managed despite that but I'm sure it was quite painful.

          Just like you are not google; you are also not facebook.

        • (Score: 2) by Nerdfest on Saturday June 10 2017, @09:08PM

          by Nerdfest (80) on Saturday June 10 2017, @09:08PM (#523587)

          That's pretty funny, I was going to pick FaceBook as a bad example as well. They overcame it, but it took quite a lot of effort. As someone else mentioned, imagine if they used proprietary tech. I think StackOverflow did for software (and I'm not sure how much they get boned on licences), but they went cheap on hardware, which was good. I've worked for places where they "buy" lameframe processing from IBM and what would run on a few PCs costs millions.

    • (Score: 2) by Runaway1956 on Saturday June 10 2017, @02:56PM (1 child)

      by Runaway1956 (2926) Subscriber Badge on Saturday June 10 2017, @02:56PM (#523495) Journal

      "It's always a call that needs to be done based on experience,"

      There's a term for that - "judgement". Some people are qualified to make a judgement call, others are not. Personally, I've always "over engineered", but my work is in the more concrete real world. Yes, you can overdo it, but even when watching costs, I've always engineered a little above expectations. Even if I'm only building a doghouse, I want to make it sturdy enough that my dog doesn't end up in the Land of Oz. (Yeah, my dog is an Australian sheep dog, but I like her, and the Ozzies can't have her back!)

      That doesn't mean I build $10,000 dog houses. But, I take the time to drive some posts into the ground, build a frame on those posts, then close it and roof it. I've never bricked a doghouse, but that would be pretty cool . . .

      Point is, you can build to higher standards than the people around you, without breaking the bank. And, that's where judgement comes in.

      • (Score: 0) by Anonymous Coward on Saturday June 10 2017, @04:47PM

        by Anonymous Coward on Saturday June 10 2017, @04:47PM (#523512)

        You should have built your doghouse like Google does. Less stability, more big data. I almost modded this "Offtopic", but then I saw who it was, and asked myself, "What would Google do?"

    • (Score: 2) by sjames on Saturday June 10 2017, @07:04PM

      by sjames (2882) on Saturday June 10 2017, @07:04PM (#523553) Journal

      There is design for expansion, there's prudent over-engineering, and then there's shooting a gnat with an elephant gun.

  • (Score: 3, Interesting) by goodie on Saturday June 10 2017, @11:55AM

    by goodie (1877) on Saturday June 10 2017, @11:55AM (#523466) Journal

    This is what I have been teaching my students for years. You most likely don't need it if people talk a lot about it like the "next big thing". I make them research and try these technologies and find actual case studies to show how it does apply but only very specific settings.

  • (Score: 4, Interesting) by The Mighty Buzzard on Saturday June 10 2017, @12:56PM (4 children)

    Technically, we're greatly over-engineered from both a code and a network perspective. It's been working out pretty well for both us and the community though.

    From a network perspective, we could probably get by on like three servers. I, however, quite enjoy the lack of downtime, separation of critical services from non-critical services, a dedicated development box, redundancy, and an off-site backup box.

    From a code perspective we're slightly slower than we could be at the moment but it's saved us I don't know how many hours of having to optimize/rewrite/etc... already and I expect it to continue doing so as growth continues. I don't know if we'll ever outgrow the codebase to the extent that it requires us moving to something like fcgi or compiled executables instead of mod_perl and I enjoy not having "scrap everything and rewrite it" pressure.

    --
    My rights don't end where your fear begins.
    • (Score: 0) by Anonymous Coward on Saturday June 10 2017, @06:45PM (3 children)

      by Anonymous Coward on Saturday June 10 2017, @06:45PM (#523547)

      Servers are the network? Do you have a microsoft background prior to programming? this is a serious question -- only server people generally think the server is the network.

      it's like truck drivers claiming trucks are the highway system.

      • (Score: 1) by khallow on Saturday June 10 2017, @09:15PM

        by khallow (3766) Subscriber Badge on Saturday June 10 2017, @09:15PM (#523590) Journal
        It's good that you blew that comment out of proportion. I was starting to think that maybe TMB was smart enough to actually manage a network. That way lies madness. I'd be jumping off bridges in a chicken costume next. But you pointed out the grievous flaw in TMB's post, that servers aren't networks. Whew! You saved me from the bridges. My gratitude will know no bounds.
      • (Score: 2) by The Mighty Buzzard on Saturday June 10 2017, @10:24PM (1 child)

        Only cable jockeys think the network does not include the hardware attached to it.

        --
        My rights don't end where your fear begins.
        • (Score: 0) by Anonymous Coward on Sunday June 11 2017, @12:53AM

          by Anonymous Coward on Sunday June 11 2017, @12:53AM (#523631)

          As much as TMB is a boob at times, he makes a valid point that a network is worthless without servers on it. In a way, the servers are the purpose for the network and can -by extension- be equated with it for all intents and purposes. After all, that'd be a nice network you got going there, without any servers on it.

  • (Score: 2, Interesting) by isj on Saturday June 10 2017, @03:02PM (1 child)

    by isj (5249) on Saturday June 10 2017, @03:02PM (#523497) Homepage

    But I still feel the lure of new interesting systems and technologies. I'm just suffering from the illusion that my experience makes me able to step back and think about it.

    The blog post is spot on.

    In a project I was fooled into using a nosql databases (couchdb and riak in this case). It turned out to be a bad idea because most of te data was not easily sharded or would give very unbalanced shards. Combined with the lack of referential integrity so the application would have to deal with inconsistencies I ended up scrapping it for a more traditional rdbms for that part of the data. I was able to do so because I'm the main developer and it's a small company. Some of the data stayed in riak because it was easily sharded. It wasn't fun anyway because the minimum installation requires 3 instances and sometimes it didn't recover from rebalancing. An other developer were struggling with the data retrieval and after months ended up with something that mostly worked (he wasn't as experienced as me so I don't blame him). I ended up scrapping that part too and replacing it with flat text files (for restore and later analysis) and pushing aggregates into a fancy statistics system, grafana+opentsdb, so the users can see graphs which is really what they needed.

    Yes, I know that OpenTSDB uses HBase which is essentially Hadoop. I'm fine with that because we only use one instance and I don't have to deal with the actual read+write to that.

    I think a useful mindset is: All systems are shit in some area. Choose the one that is least shitty.

    • (Score: 0) by Anonymous Coward on Saturday June 10 2017, @05:52PM

      by Anonymous Coward on Saturday June 10 2017, @05:52PM (#523531)

      I ended up scrapping it for a more traditional rdbms

      I've been telling people this for years:
      There are two sets of people: those who think they need a NoSQL database and are wrong on one side, and on the other side those that use an RDBMS.

  • (Score: 2) by YeaWhatevs on Saturday June 10 2017, @04:55PM (1 child)

    by YeaWhatevs (5623) on Saturday June 10 2017, @04:55PM (#523514)

    I think he misstated the problem. TFA argues we should analyze to arrive at the correct fit instead of chasing trends. I think in fact the right argument is that we spend too much time chasing trends and analyzing instead of just picking something that meets our known needs and can get done quickly and inexpensively and doesn't lock us in. I don't really need Hadoop, but then I don't really need a RDBMS either most of the time. I should however pick something that meets my minimum needs and can be used quickly and with minimal investment. When I outgrow this later I won't feel so invested.

    • (Score: 0) by Anonymous Coward on Saturday June 10 2017, @07:17PM

      by Anonymous Coward on Saturday June 10 2017, @07:17PM (#523561)

      Hey, look everyone, the next JavaScript "Framework" of the day. Let's chase it!

  • (Score: 1, Informative) by Anonymous Coward on Saturday June 10 2017, @05:56PM (1 child)

    by Anonymous Coward on Saturday June 10 2017, @05:56PM (#523533)

    I had a PHB that had Hadoop on the brain. Insisted that we use Hadoop because he thought it would help the marketing.

    I tried to explain that having 1 TB of data split across thousands of files wasn't nearly enough to justify the Hadoop overhead and got shouted down. Fortunately we never started implementing before our team shifted to another org.

    Sometimes it isn't the engineers that make these silly decisions...it is the marketing/sales types who think it will drive sales.

    • (Score: 2) by YeaWhatevs on Saturday June 10 2017, @09:39PM

      by YeaWhatevs (5623) on Saturday June 10 2017, @09:39PM (#523603)

      PHB: We have to use this latest technology, our competitors are using it.
      Dilbert: Let's just say we do and then don't.
      PHB: Does that work?
      Dilbert: It almost did on us.

  • (Score: 3, Informative) by TheLink on Saturday June 10 2017, @06:13PM

    by TheLink (332) on Saturday June 10 2017, @06:13PM (#523535) Journal

    https://www.chrisstucchio.com/blog/2013/hadoop_hatred.html [chrisstucchio.com]

    They handed me a flash drive with all 600MB of their data on it (not a sample, everything). For reasons I can't understand, they were unhappy when my solution involved pandas.read_csv rather than Hadoop.

(1)