Stories
Slash Boxes
Comments

SoylentNews is people

posted by n1 on Thursday July 28 2016, @06:30AM   Printer-friendly
from the they-forgot-about-mssql dept.

[redacted] Coward writes:

https://eng.uber.com/mysql-migration/

The early architecture of Uber consisted of a monolithic backend application written in Python that used Postgres for data persistence. Since that time, the architecture of Uber has changed significantly, to a model of microservices and new data platforms. Specifically, in many of the cases where we previously used Postgres, we now use Schemaless, a novel database sharding layer built on top of MySQL. In this article, we’ll explore some of the drawbacks we found with Postgres and explain the decision to build Schemaless and other backend services on top of MySQL.

[...] We encountered many Postgres limitations:

Inefficient architecture for writes
Inefficient data replication
Issues with table corruption
Poor replica MVCC support
Difficulty upgrading to newer releases


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Interesting) by Common Joe on Thursday July 28 2016, @09:45AM

    by Common Joe (33) <common.joe.0101NO@SPAMgmail.com> on Thursday July 28 2016, @09:45AM (#381117) Journal

    Fully agree.

    I didn't read the whole thing, but I read portions. The only relevant complaint that jumped out at me was the 9.2 bug that corrupted... data? I don't know. He was complaining about his indexes being corrupted in the second half of that section. I didn't fully follow what he was trying to say. Corrupted indexes are very bad but easily fixable. Corrupted data is a lot worse, though.

    And I completely agree that this is an architectural problem. Some of things the author is complaining about are safety mechanisms in PostgreSQL to ensure ACID compliance. And a quick Google search says MySQL is still not ACID compliant. On this one point alone, I say PostgreSQL is superior. Yes, of course you'll take a speed hit for ACID compliance. I'm quite sure many MySQL people don't understand how important this is... the like author maybe? He's kind of bitching about the bug that corrupted data. Well, you have to expect that sometimes in a non-ACID environment. This doesn't let PostgreSQL off the hook for the bug, but it sort of nullifies his argument for using MySQL.

    In the section on write amplification, he says

    For instance, if we have a table with a dozen indexes defined on it, an update to a field that is only covered by a single index must be propagated into all 12 indexes to reflect the ctid for the new row.

    And? The reason PostgreSQL writes a complete new record (and corresponding 12 indexes) is to ensure that the user running a long running query isn't adversely affected by another user updating the data. I seriously question the use of 12 indexes on a table with a lot of writes, though. Of course it's going to be slow for large tables. That's an architectural issue, not a PostgreSQL problem.

    Yes everything is case dependent and PostgreSQL is not perfect for everything, but I have yet to see any argument where MySQL is superior to PostgreSQL with high volume usage that requires data to be absolutely pristine. (And if anyone has anything, I'd be happy to read it.)

    I suppose on the plus side, this article will keep the PostgreSQL guys on their toes. Competition is good to keep the project healthy.

    Starting Score:    1  point
    Moderation   +4  
       Insightful=1, Interesting=2, Informative=1, Total=4
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   5  
  • (Score: 3, Touché) by sendafiolorkar on Thursday July 28 2016, @12:37PM

    by sendafiolorkar (6300) on Thursday July 28 2016, @12:37PM (#381152)

    Folks, this is why i visit soylentnews!

    • (Score: -1, Troll) by Anonymous Coward on Thursday July 28 2016, @01:05PM

      by Anonymous Coward on Thursday July 28 2016, @01:05PM (#381159)
      To get misinformed opinions from people who don't read stuff properly nor understand it? You can get a lot of that from Slashdot too.
  • (Score: 4, Informative) by Anonymous Coward on Thursday July 28 2016, @01:02PM

    by Anonymous Coward on Thursday July 28 2016, @01:02PM (#381158)

    I didn't read the whole thing, but I read portions.

    The reason PostgreSQL writes a complete new record (and corresponding 12 indexes) is to ensure that the user running a long running query isn't adversely affected by another user updating the data

    You should read and understand more first before shooting your mouth off. MySQL InnoDB handles the scenario you mention with MVCC (just like Postgres), but they implement it differently. Read this bit in particular:

    This design means that InnoDB is at a slight disadvantage to Postgres when doing a secondary key lookup, since two indexes must be searched with InnoDB compared to just one for Postgres. However, because the data is normalized, row updates only need to update index records that are actually changed by the row update. Additionally, InnoDB typically does row updates in place. If old transactions need to reference a row for the purposes of MVCC MySQL copies the old row into a special area called the rollback segment.

    Let’s follow what happens when we update al-Khwārizmī’s birth year. If there is space, the birth year field in the row with id 4 is updated in place (in fact, this update always happens in place, as the birth year is an integer that occupies a fixed amount of space). The birth year index is also updated in place to reflect the new date. The old row data is copied to the rollback segment. The primary key index does not need to be updated, nor does the (first, last) name index. If we have a large number of indexes on this table, we still only have to update the indexes that actually index over the birth_year field. So say we have indexes over fields like signup_date, last_login_time, etc. We don’t need to update these indexes, whereas Postgres would have to.

    Perhaps there are other scenarios where Postgresql's design would work better than MySQL InnoDB but for Uber's case MySQL works better.

    • (Score: 4, Interesting) by Common Joe on Thursday July 28 2016, @01:56PM

      by Common Joe (33) <common.joe.0101NO@SPAMgmail.com> on Thursday July 28 2016, @01:56PM (#381184) Journal

      I gave you an "informative point" for clarification on your rebuttal.

      Maybe MySQL is the better choice for Uber. However, this design only buys you a certain amount of power in this particular circumstance. As mtrycz said at the top of this thread "it looks to me that Uber needed architectural advantages, not functional; namely scaling-out (because you can scale-up just so much)". They are going to bump into a scale-up issue even with MySQL. In the end, with both databases, architectural issues will have to be addressed. In my experience, every major program has weaknesses that have to be worked around; I think they'll find something equally as frustrating in MySQL. I suppose this point (heavy writing with many indexes) can be viewed as a weakness in PostgreSQL, but I still question changing the database over this one issue. But they listed several, so let's explore.

      From their list of grievances at the top: 1) Inefficient architecture for writes; I disagree. There are some pretty hefty databases that use PostgreSQL with heavy writing. 2) Inefficient data replication and poor MVCC support; I will grant you that MySQL was better than PostgreSQL for a long time. I don't know how they measure up now since PostgreSQL has made very significant strides in the past three to five years concerning replication. It's hard to make apples-to-apples comparisons with databases. MySQL may still have the edge with speed and ease here. 3) Issues with table corruption; definitely a ding against PostgreSQL, but MySQL is going to have table corruption also because of ACID compliance 4) Difficulty upgrading to newer release; This to me says they have architectural issues. On large tables, everyone would love updates to go quickly, but on a table with millions and millions of rows, that just isn't going to happen in any database. The application will have to adapt to it. According to Wikipedia [wikipedia.org], this kind of thing happens about once a year for PostgreSQL.

      My biggest problem is still the ACID stuff. They want perfect data, but they don't want to go ACID.

      Am I a DBA? No. Am I a database expert? I wouldn't consider myself such. (And so, if you find something wrong with what I say, please continue to correct me.) I just found it hard to digest many of their arguments, though. And I still find it hard to digest their argument about "write speeds".

      • (Score: 2) by Common Joe on Thursday July 28 2016, @02:09PM

        by Common Joe (33) <common.joe.0101NO@SPAMgmail.com> on Thursday July 28 2016, @02:09PM (#381190) Journal

        One important point of clarification for those who find this thread interesting (which many seem to). This whole discussion about the 12 indexes and MySQL vs PostgreSQL is only valid for update statements applied to existing rows. Every insert statement by either database will face the same ugly write penalty. This is why having many indexes on write-heavy tables are generally frowned upon.

        • (Score: 1, Interesting) by Anonymous Coward on Thursday July 28 2016, @03:13PM

          by Anonymous Coward on Thursday July 28 2016, @03:13PM (#381219)

          FWIW I prefer Postgresql to MySQL. I don't need to go Uber scale anytime soon, so hopefully by the time I might encounter such an issue the Postgres bunch would have worked out a way around it.

          They seem to acknowledge there is a problem: https://www.mail-archive.com/pgsql-hackers@postgresql.org/msg290116.html [mail-archive.com]

      • (Score: 2, Funny) by Anonymous Coward on Thursday July 28 2016, @02:39PM

        by Anonymous Coward on Thursday July 28 2016, @02:39PM (#381203)

        Will you two just stop disagreeing with each other in a civil manner? This is the internet and one of you is an AC! Fight! Fight! Fight!

        • (Score: 1, Touché) by Anonymous Coward on Thursday July 28 2016, @02:50PM

          by Anonymous Coward on Thursday July 28 2016, @02:50PM (#381208)
          I wasn't very civil. In my earlier post I told him not to shoot his mouth off. Does that make you feel better?
          • (Score: 3, Touché) by Common Joe on Thursday July 28 2016, @03:14PM

            by Common Joe (33) <common.joe.0101NO@SPAMgmail.com> on Thursday July 28 2016, @03:14PM (#381220) Journal

            Sounds like you're the AC who was "uncivil" to me. I'm glad you're reading my followup stuff.

            What I'm most happy about is you had a good and informative reply. (Thank you.) Yeah, you were a bit uncivil, but you're an AC, this is the Internet, and I tried to pay attention more to learning from your response.

          • (Score: 0) by Anonymous Coward on Thursday July 28 2016, @03:46PM

            by Anonymous Coward on Thursday July 28 2016, @03:46PM (#381225)

            Does that make you feel better?

            Not exactly. There was no name calling or personal attacks. Not even mention of his insensitiveness or his clodiness. Ugh, I'm going over to 4chan to balance the internet's scales.

            • (Score: 2) by bzipitidoo on Thursday July 28 2016, @04:31PM

              by bzipitidoo (4388) on Thursday July 28 2016, @04:31PM (#381244) Journal

              You want a fight? Ask these:

              Which P in LAMP is best, PHP, Python, or Perl?

              And for that matter, Apache, nginx, lighttpd, or something else?

              vi or emacs?

              Is systemd Lucifer in the flesh?

              SoylentNews or Slashdot?

              • (Score: 2) by mtrycz on Thursday July 28 2016, @07:51PM

                by mtrycz (60) on Thursday July 28 2016, @07:51PM (#381299)

                The obvious response to each end every one is:

                To each their own.

                Except Slashdot. Obviously :3

                --
                In capitalist America, ads view YOU!
      • (Score: 2, Informative) by Bronster on Friday July 29 2016, @02:52AM

        by Bronster (356) on Friday July 29 2016, @02:52AM (#381407) Homepage

        I'd be happy to correct you here, as others already have.

        InnoDB (the mysql backend engine that Uber are using) is fully ACID. Your continual harping on that incorrect data point already disqualifies your opinion.

        I've used both engines, but only mysql with replication. Cross-datacentre traffic is a real issue, even at much lower scale than Uber. We ship compressed SQL statements at FastMail, and every time I've considered switching to Postgres I would have built or required the same replication strategy - both for protection against replicating corruption, and for efficiency. So far we haven't found enough advantages to Postgres to justify the switching cost.

        We've been running mysql for over 10 years with no dataloss or crash that can be attributed to mysql. We've upgraded multiple versions with no downtime due to replication being forwards compatible. That's pretty amazing really. You can't just "shut down" something like Uber while you perform a database upgrade.

        Our time tracking system when we were part of Opera had to get upgraded. I'm pretty sure it was Oracle under the hood. They took a _1 week_ maintenance window to upgrade it. Can you imagine taking Uber offline for a week? Insanity. Upgradability without downtime or massive performance hits isn't "nice to have", it's business critical. And being able to bring up replicas with the new code first and then pivot - yep, that's what we do too - it's how you keep the lights on while making major changes.

        In summary - your points against MySQL are FUD, pure and simple. You don't actually know anything about it and you're talking shit based on hearsay. Shame on you. I read the Uber article and I learned valuable things to check in any new replication system we might consider to replace the current MySQL that we're using, or any redesign work we might be doing to the Cyrus IMAP server where I am one of the authors of the replication system.

        • (Score: 3, Interesting) by Common Joe on Friday July 29 2016, @05:54AM

          by Common Joe (33) <common.joe.0101NO@SPAMgmail.com> on Friday July 29 2016, @05:54AM (#381441) Journal

          Allow me to 1) say thank you for teaching me something and 2) to correct you. It took some digging, but I know where our misunderstanding comes from.

          In summary - your points against MySQL are FUD, pure and simple. You don't actually know anything about it and you're talking shit based on hearsay. Shame on you.

          First of all, I was not spreading "Fear, Uncertainty, and Doubt". And you, as a MySQL fanboy, should understand MySQL's history because I am not talking shit nor hearsay. As a matter of fact, it is still entirely possible that MySQL is not ACID complaint with certain settings even today.

          With that said, you have taught me that MySQL can be ACID compliant. Let's get into specifics of our misunderstanding so that you may grow and correct others properly.

          First of all, I did a quick Google search "Is MySQL ACID Compliant? [google.de]" The answer it came back was no. (This is how bad MySQL's reputation is.) Looking at the answer again, it seems to be pulling an answer from 2001 which is quite unfair to MySQL.

          This article [ronaldbradford.com] seems to have the best answers out of anything I've found. Apparently, MySQL can have different database engines. The MyISAM engine is not ACID compliant while the InnoDB engine is ACID compliant. Version 5.5 came out in December 2010 [wikipedia.org] and was the first version to default to the InnoDB engine. So, by default, MySQL was not ACID compliant before December 2010. To further quote the article:

          But the damage to the ecosystem that uses MySQL, that is many thousands of open source projects, and the resources that work with MySQL has been done. Recently working on a MySQL 5.5 production system in 2016, the default engine was specifically defined in the configuration defined as MyISAM, and some (but not all tables) were defined using MyISAM.

          In a LAMP setup, who knew what you were getting? And that is where MySQL got a well deserved, bad reputation that lingers even to today. And one of a thousand reasons why the MySQL developers left to create a fork called MariaDB, which I understand was (is?) much better than MySQL. Interestingly enough, this guy [rdx.com] insisted that MySQL 5.5 was still not ACID compliant, although that was November 2010 before general release and about 5.5.6. (I don't know what engine he was using and I'm not going to dig into the particulars of his blog. I just thought it interesting.)

          This article [rackspace.com] specifically gets into the differences between MyISAM and InnoDB. What scares me is that, if I'm understanding correctly, specific tables can use one engine or the other. Holy shit. In my opinion, that is messed up and you're asking for a hell of a lot of trouble if you do anything like that.

          So, in short, you may correct me, but don't tell me I'm talking shit. I've been around the block with databases long enough to know to rightly give MySQL a wide berth. Hell, because MySQL is owned by Oracle, I would still rather use MariaDB over MySQL over that one fact alone. I mean, what can you say when Oracle doesn't care enough about MySQL to fix the reputation and confusion they've generated?

          I've used both engines, but only mysql with replication.

          You've used MyISAM? That one is not ACID compliant. I'm not saying it's the wrong choice. I just hope you understand the pros and cons.

          So far we haven't found enough advantages to Postgres to justify the switching cost.

          Fair enough. Changing databases is often expensive. And I will admit that MySQL had a lot better tools than PostgreSQL for many, many years -- especially when it came to replication.

          We've been running mysql for over 10 years with no dataloss or crash that can be attributed to mysql.

          Lucky you. Others weren't so lucky because they used the defaults from 10 years ago.

          We've upgraded multiple versions with no downtime due to replication being forwards compatible. That's pretty amazing really. You can't just "shut down" something like Uber while you perform a database upgrade.

          I'm no DBA so I don't understand what you did that would be different than PostgreSQL. MySQL upgrade documentation [mysql.com] says

          mysql_upgrade processes all tables in all databases, which might take a long time to complete. Each table is locked and therefore unavailable to other sessions while it is being processed. Check and repair operations can be time-consuming, particularly for large tables.

          So I don't know what you did different from PostgreSQL. I mean, frankly, if there is a single database running in production that needs to be up 24/7, something is wrong. With APIs and interfaces, I would imagine an application should be able to handle different versions of databases without taking out production if it were critical to keep it up and going.

          Our time tracking system when we were part of Opera had to get upgraded. I'm pretty sure it was Oracle under the hood. They took a _1 week_ maintenance window to upgrade it. Can you imagine taking Uber offline for a week? Insanity

          A time tracking system took a week to update? It sounds like there was a problem with the programmers who made the time tracking system. I worked on an Oracle system where the DBAs did the updates in stages for this reason. (The entire update took the entire weekend.) The application has since been updated to better handle very large tables better during the upgrade process.

          I don't know what makes MySQL special or faster. If you have a reason why it is faster on updates, I'd be interested to see it. Otherwise, I would be very careful that a future doesn't cost you dearly in time. You may be sitting on a time bomb.

  • (Score: 0) by Anonymous Coward on Thursday July 28 2016, @02:57PM

    by Anonymous Coward on Thursday July 28 2016, @02:57PM (#381212)

    12 indexes on 1 table shows a need for a different schema usually. With that many I would venture a guess and say it was a very wide table or badly designed indexes. It could be the redesign took care of that.

    I have seen a few systems like this in the past. It seems to be a slow moving SQL anti pattern. Basically some key table starts picking up columns. They get wider and wider. Then some perf issue shows up and the table picks up a new index.

    I have no 'skin' in the game on this one. I am a MSSQL guy. But I have seen the same sorts of decisions made before. Pretty much every time it is 'while we are doing this lets fix...'.

    With a table like that it probably needed a star topology. It really just depends on do you want to tune for reads or writes. In this case they tuned for reads. But writes were a bigger issue so they needed to tune for that. Another way to fix this would be to have a 2 stage commit where you blast the data into the DB then have another service come by and fix it up. But at that point you should start looking into a NoSQL sort of solution.

    • (Score: 2, Funny) by Anonymous Coward on Thursday July 28 2016, @04:04PM

      by Anonymous Coward on Thursday July 28 2016, @04:04PM (#381230)

      12 indexes on 1 table shows a need for a different schema usually.

      Maybe you didn't read TFS carefully, but those 12 indexes are part of an Über schema ;-)

    • (Score: 0) by Anonymous Coward on Thursday July 28 2016, @08:05PM

      by Anonymous Coward on Thursday July 28 2016, @08:05PM (#381304)

      I am a MSSQL guy

      i love it when microsoft whores act like their "choice" is legitimate. stupid @#%$

      • (Score: 0) by Anonymous Coward on Thursday July 28 2016, @09:23PM

        by Anonymous Coward on Thursday July 28 2016, @09:23PM (#381332)

        Dont know its done alright by me. By my last estimate I would say ~1.5 million in earned income.

        I did not disparage mysql or Postgres. MSSQL is a very competent SQL implementation. When the boss (who is cutting the checks) says we are using MSSQL you figure it out....

        I have used just about every SQL out there. I am most proficient in MSSQL. You wooshed my point to make fun of me. Sad. I will restate it simply for your small troll mind. My point was the redesign probably did more good than switching databases.