Stories
Slash Boxes
Comments

SoylentNews is people

Meta
posted by NCommander on Thursday June 04 2015, @11:38AM   Printer-friendly
from the input-wanted dept.

Moving on from frontend stuff, I'm getting to the point that I want to dig in deep and rewrite most of the database layer of SoylentNews/rehash. For those who are unfamiliar with our codebase, its primarily written in perl, with fairly strict adherence to the MVC model, going as far as installing system modules for code shared between the frontend and backend daemons. Slash was written with database portability in mind, and at least historically, a port to postgreSQL existed in the late 90s/early 2000s, and there was some legacy Oracle code authored by VA Linux as well. This code has bitrotted to the point of unusability, leaving the MySQL backend the only functional mode; I deleted the legacy code about a year ago from our git repo.

However, migrating from MySQL has remaining on my personal TODO list for a long time, due to unreliability, data corruption, and configuration pain. The obvious choice from where I'm sitting is postgreSQL. For those who aren't super familiar with the pro/cons with MySQL, this article by Elnur Abdurrakhimov has a pretty good summary and a list of links explaining in-depth why MySQL is not a good solution for any large site. We've hit a lot of pain in the nearly 1.5 years SN has been up due to limitations in the database layer as well, forcing us to use a clustering solution to provide any sort of real redundancy for our backend. Although I'm familiar with database programming, I'm not a DBA by trade, so I'm hoping to tap into the collective knowledge of the SoylentNews community and work out a reasonable migration plan and design.

[More after the break...]

The "Why" Factor

Beside my personal dislike of MySQL, there's a more important reason to migrate from MySQL. MySQL's support for stored procedures is incredibly poor, which means raw SQL has to be written in the application layer. rehash reduces the danger of injection by providing a set of wrapper functions such as select/insert/update which take four arguments: table, from clause, where clause, and anything extra if necessary; these parameters are assembled into a full query which is in turn properly escaped to prevent most obvious attacks from working. Extensive whitelists are used for sanitizing parameters, but by design, rehash uses a single namespace, with a single user account which has full SELECT/INSERT/UPDATE/DELETE permissions across the board. If any single point is compromised, the entire database is toast. Furthermore, because of poor support for views in MySQL, massive JOINs litter the codebase, making some queries reaching across 5-6 tables (with the most horrific example I can think of being the modbomb SELECT which reaches across almost every user and comment table in the database). This makes debugging and optimizing anything a slow and *painful experience.

The End Goal

What I want to do is remove as much code out of the application layer, and move it down the stack into the database. Each function in Slash/DB/MySQL/MySQL.pm should be replicated with a stored procedure which at a minimum executes the query, and if possible, relocate as much of query processing logic into pg/Perl modules. This should be relatively straightforward to implement, and allow high code reusability due to the fact that almost all of rehash's methods exist in perl modules, and not in individual .pl scripts. The upshot of this is that the only permission the DB account requires is EXECUTE to run the stored procedures; if possible, I'd also like to localize which perl function can call which pgSQL procedure; i.e., the getStories() function can only call procedures relating to that, vs. having access to all stored procedures in the database.

This would greatly reduce the reach of any SQL injection attacks, as well as hardening the site against possible compromise; unrestricted access to the database would require breaching one of the DB servers directly instead of getting lucky via a bug in rehash. As I've stated before, no security can be perfect, but I would love to get this site to the point that only a dedicated, targeted attack would even stand a chance of succeeding. That being said, while it sounds good on paper, I'm not 100% sure this type of design is reasonable. Very few FOSS projects seem to take advantage of stored procedures, triggers, views and other such functionality and I'm wondering if others have tried and failed to implement this level of functionality.

My Plan to Implementation

So, knowing what you want to do is good, but knowing how to do it is just as important. What I think the first step needs to be is a basic straight-port of the site from MySQL to postgreSQL, and implement a better schema upgrade system. As of right now, our upgrade "system" is writing queries in a text file, and just executing them when a site upgrade has to be done. Very low-tech, and error prone for a large number of queries. I don't have a lot of experience in managing very large schemas, so one thing on which I'd like advice is if there's a good, pre-existing framework that could be used to simplify our database upgrades. By far, the ideal scenario would be if we could run a single script which can intelligently upgrade the database from release to release.

Once both these pieces are in play, a slow but steady migration of code from the application layer to the database layer would allow us to handle the transition in a sane manner. Database auditing can be used to keep track of the most frequently used queries and function calls, and keep an eye on our total progress towards reaching an EXECUTE-only world.

That's everything in a nutshell. I want to know what you guys think, and as always, I'll be reading and replying to comments down below!

~ NCommander

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by TheLink on Thursday June 04 2015, @11:53AM

    by TheLink (332) on Thursday June 04 2015, @11:53AM (#192009) Journal

    The obvious choice from where I'm sitting is postgreSQL.

    Yes I'd prefer Postgresql to MySQL.

    What I want to do is remove as much code out of the application layer, and move it down the stack into the database

    That's nice and a good idea if you don't need to scale up/out the DB.

    A bad idea if you need to scale. PostgreSQL can be clustered but it's usually harder to scale PostgreSQL to say 100 nodes than it is to scale web/app nodes to 100 nodes.

    If you ever need to scale shoving everything into stored procs and having the DB do the all the work may not be so good as doing less at the DB level and the rest at identical app/webapp servers, with some caching stuff for performance.

    • (Score: 3, Informative) by TheLink on Thursday June 04 2015, @01:17PM

      by TheLink (332) on Thursday June 04 2015, @01:17PM (#192043) Journal
      Elaborating on at least one scenario where stored procs can be a good idea (in case some get the wrong idea from my first post) - in an environment with many different apps, users and developers using the same DB and you want to ensure consistency in the DB - so you have stored procs to do each standard/exposed task in the DB. All the apps and most people are supposed to use that those stored procs to do those stuff. That way if stuff is changed, in more cases you can just change the stored procs without affecting all the different apps.

      In contrast if you have 100 different apps and the apps did all the SQL directly, if you want to change the way things are done a bit or say extend stuff (e.g. adding a new user/item now should insert extra rows in new other tables), you need to update ALL the different apps that are affected. Now imagine the pain especially if some of the apps are no longer practically maintainable (company has gone out of business, nobody wants to dig through the code and put their necks on the line etc). This sort of thing is common in "Enterprise" environments. The alternative to stored procs is to have another layer around the DB - have an app and "API" that everything and everyone is supposed to use to do stuff to the DB, however this often takes a higher level of effort, discipline and commitment than found in most organizations ;).

      In the case of SoylentNews - you have a lot fewer apps and you can more easily update those apps if the schema or other stuff changes. Also certain sorts of dynamic queries are harder to do with stored procs (e.g. a custom report that allows the user to select which columns are wanted and how those columns should be displayed ).

      As for the SQL injection stuff, use the standard method of bind vars and prepared statements. That way if there's an exploitable bug (some unicode corner case etc) it's really the DB developer or DB lib developer's problem to fix, support and document; not yours. Whereas if you reinvent and reimplement all that stuff yourself, it's all on you when stuff happens.
      • (Score: 4, Insightful) by bryan on Thursday June 04 2015, @04:58PM

        by bryan (29) <bryan@pipedot.org> on Thursday June 04 2015, @04:58PM (#192161) Homepage Journal

        As for the SQL injection stuff, use the standard method of bind vars and prepared statements.

        Defiantly agree on the prepared statements. If you were to change anything related to the database, do this first. Relying on string escaping functions is never a good idea for security reasons.

        As for the stored procedures, I'd prefer most of the logic to stay in the code where it is version controlled and easier to upgrade. Stored procedures are also not quite as universal as standard SQL syntax. By adopting Postgres' version, you may be hurting your ability to switch to a different DBMS down the line.

        • (Score: 2) by NCommander on Thursday June 04 2015, @07:33PM

          by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Thursday June 04 2015, @07:33PM (#192234) Homepage Journal

          Honestly, database portability is not a huge concern. We don't want to rip out the abstraction layer, so if ever becomes necessary again, it can be done, but I don't see us supporting multiple congruent database types for longer than is necessary to transition.

          --
          Still always moving
        • (Score: 4, Funny) by bryan on Thursday June 04 2015, @07:55PM

          by bryan (29) <bryan@pipedot.org> on Thursday June 04 2015, @07:55PM (#192247) Homepage Journal

          s/Defiantly/Definitely/g

          Drat! An "edit comment" button would be more useful than a "multi-tiered database cluster farm with replication and hot-failover"

  • (Score: 1) by logan on Thursday June 04 2015, @12:23PM

    by logan (3020) on Thursday June 04 2015, @12:23PM (#192017)

    rehash reduces the danger of injection by providing a set of wrapper functions such as select/insert/update which take four arguments, table, from clause, where clause, and anything extra if necessary; these parameters are assembled into a full query which is in turn properly escaped to prevent most obvious attacks from working. Extensive whitelists are used to sanitizing parameters, [...]

    Custom escaping will never be as secure as a prepared statement. I haven't read rehash's source code, though, so maybe I misinterpreted the above lines. If rehash does indeed use custom escaping, you could rewrite these wrapper functions to use prepared statements and reduce the risk of SQL injection significantly.

    I have used PostgreSQL for various projects since the the late nineties and I love it, so your migration plans get a thumbs-up from me. Moving application logic into the database layer is a different ball of wax, though - I prefer keeping the logic in the application code.

    As to schema upgrades: Django with its Migrations [djangoproject.com] (Python) and Phinx [phinx.org] (PHP) are two nice examples of frameworks that make database schema (and data) migrations (both up and down) easy and comfortable. I haven't used Perl much but I'm sure similar projects exist. Bonus points for being able not only to apply schema modifications but also implement data changes if necessary.

    • (Score: 2) by NCommander on Thursday June 04 2015, @07:35PM

      by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Thursday June 04 2015, @07:35PM (#192237) Homepage Journal

      In a nutshell, this is how it supposed to work

      Script calls DB::*SQL method -> which has the query and any necessary processing -> escaping handled by the DB level -> Call out to DBD to poke the database.

      There's a couple of place where SQL calls are not in Slash/DB/*, but that's not hard to fix.

      --
      Still always moving
  • (Score: 5, Insightful) by aclarke on Thursday June 04 2015, @12:50PM

    by aclarke (2049) on Thursday June 04 2015, @12:50PM (#192026) Homepage

    First, I have nothing against PostgreSQL and no particular love for MySQL. Like most of us though, I've used MySQL far more than PostgreSQL. If you just hate MySQL and love PostgreSQL and you want to switch, then that's an emotional argument and not a technical one. If that's the case, that's fine, but then be honest with the reasoning behind your decision.

    As far as technical reasons go, have you done a cost/benefit analysis on this? Factor in a reasonable estimate of the effort it's going to be to re-architect, develop, debug, and deploy a PostgreSQL migration. I'm not sure in the end you'll get your value back. You've given no really good reasons as to why MySQL isn't working for you. It's used successfully by sites that are much larger than sn is ever going to be, no offence meant. You said yourself that you're not a professional DBA. A site of this size shouldn't need one, but then a site of this size is unlikely to need database clustering for performance reasons. For reliability, yes, but that's a different question.

    MySQL does support stored procedures and views. I never use them, so I'm not really aware of where they're better or worse than PostgreSQL. I stopped using these features many years ago around the time I stopped using SQL Server, and around the time I discovered and started using source code control. If you want to use these database features, then yeah maybe there's some killer feature that PosgreSQL has that MySQL lacks that I don't know about. However, as TheLink has already pointed out [soylentnews.org], there are good reasons to stay away from views and stored procedures. Other reasons include more complex change and release management protocols. Right now if you have the occasional database change, e.g. varchar(100) -> varchar(200), you already need to run a SQL script for that as part of an update. If you need to modify a bunch of SPs and views in the database, upgrading becomes much more difficult. So does switching between code branches in your source code repository. Editing becomes harder, as generally the tools for editing stored procedures aren't quite as feature-rich. Unit testing can become more problematic as well. These are all arguments in favour of just leaving the SQL code in the application layer. The other argument for this of course is that it's already there and is complicated to move it. With a proper MVC design, which you say the code already has, at least your queries are limited to certain portions of the code which makes it easier to manage.

    If you were designing a new project, then yeah I'd say have a good hard look at PostgreSQL. I know I would. However, if you were a client of mine and were paying me for my opinion, based on what you've written I'd say stick with what you have and put your efforts elsewhere.

    • (Score: 3, Insightful) by quadrox on Thursday June 04 2015, @01:00PM

      by quadrox (315) on Thursday June 04 2015, @01:00PM (#192031)

      Spot on.

      Mysql does have it's limitations, but I don't see any case where a site like SoylentNews will ever run into those.

      If it ain't broken, don't fix it.

    • (Score: 0) by Anonymous Coward on Thursday June 04 2015, @04:30PM

      by Anonymous Coward on Thursday June 04 2015, @04:30PM (#192144)

      Software quality is difficult to do a cost/benefit analysis on, but it is possible with sophisticated methods, and some are very straight forward. Most sites that use MySQL successfully either heavily patch it, do not mind a small amount of hard to quantify data going missing, or use it as a dumb key value store (a lot of NoSQL proponents were disgruntled MySQL users with little experience with more reliable database technologies). PostgreSQL does more to protect the user from technical and functional surprises, such as designs that cannot loose data in edge cases and feature behaviors that are less likely to cause developers to go WTF‽ when they are almost done with implementing them. MySQL is a soup of WTF‽ if you are paying attention.

      Regarding VARCHAR, people misuse those all the time. In PostgreSQL, VARCHAR and TEXT perform the same, so just use TEXT and put a constraint on it if you need it. There are very few reasons why you want the maximum length to be a part of the data type & interface.

    • (Score: 0) by Anonymous Coward on Thursday June 04 2015, @10:45PM

      by Anonymous Coward on Thursday June 04 2015, @10:45PM (#192291)

      I've been playing around with MariaDB 10.0 for about a year and a half in several different configurations on a small dev setup on KVM+Fedora20-22, all working on virutal machines to ensure I see problems sooner than if I had the money to purchase bare-metal servers and do the same thing.

      What I noticed was when I had the cluster up and running in a 2 way active-active setup, it would always fall down after receiving load from 3 simoultaneous Atlassian's products startup up after a reboot. And what was worse was there was no real catch-up feature to bring the out of sync master back into sync. The routine which worked for me was to nuke the out-of-sync node and recreate it as an active server.

      Views in MySQL are, from what I've seen, dynamically generated for each user session. It isn't like Microsoft's SQL views, MySQL views are in of themselves a performance hit and if they need modification, you may want to catchup on soylent and find out what ncommander did next ;)

      • (Score: 2) by NCommander on Friday June 05 2015, @03:07AM

        by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Friday June 05 2015, @03:07AM (#192352) Homepage Journal

        We've had the same problem with MySQL async replication as well. Databases get out of sync with each other, and you get subtle issues because of it. MySQL completely seems to fail at the concept of ACID in my experience ...

        --
        Still always moving
    • (Score: 2) by fleg on Friday June 05 2015, @03:08AM

      by fleg (128) Subscriber Badge on Friday June 05 2015, @03:08AM (#192353)

      completely agree with aclarke. especially wrt the stored procedures.

  • (Score: 2) by wonkey_monkey on Thursday June 04 2015, @01:12PM

    by wonkey_monkey (279) on Thursday June 04 2015, @01:12PM (#192039) Homepage

    this article by Elnur Abdurrakhimov has a pretty good summary and a list of links explaining in-depth why MySQL is not a good solution for any large site

    I found it to be a pretty bad summary. It doesn't really say anything.

    One of his opening arguments seems to be that "MySQL is bad because someone's IRC bot says it is" which hardly seems like in-depth analysis to me. And one of his video links is of those two stupid talking bears.

    Personally, MySQL was my first proper exposure to SQL databases. Since then I've primarily used MSSQL, but also MySQL, and all I can give is my vague sense that MySQL just feels a little... flaky. Not in the sense that it crashes, but in the sense that some things just don't feel like they're properly defined. Many is the time I've written up a very quick query, expecting it not to work at all and need correcting, only to find that MySQL happily runs it and sometimes, but not always, returns the expected results. I had reason recently to attempt to use a TIMESTAMP column in a MySQL project, and gave up in the end and just implemented it myself because I just couldn't figure out what the hell it was trying to do.

    You know how it is; after some time spent with a system (or anything in life, really), you get to know how it's going to behave and can start anticipating it. I don't get that feeling with MySQL.

    --
    systemd is Roko's Basilisk
    • (Score: 4, Insightful) by quadrox on Thursday June 04 2015, @01:41PM

      by quadrox (315) on Thursday June 04 2015, @01:41PM (#192055)

      I fully agree.

      The entire article feels like it was written by a nerd with a grudge. it's not objective in any way whatsoever, and very sparse with actual facts/criticism.

      I am not a DB expert myself (I do have a few years of experience though), but our resident oracle database expert said after working with MySQL that there is a ton of features it doesn't have or do "properly", but it's blazingly fast and scalable as hell. For relatively simple databases - and any site like SN definitely meets this criteria - with many requests MySql is one of the best choices available.

    • (Score: 0) by Anonymous Coward on Thursday June 04 2015, @03:15PM

      by Anonymous Coward on Thursday June 04 2015, @03:15PM (#192097)

      The deficiency in MySQL which TFA specifically called out as "the last straw" - the inability to update foreign keys self-referentially embedded in the same table as the updated table - that is indeed a missing feature, which I can understand might be enough (in conjunction with other similar blocking problems) for someone to look for another database.

      But the guy spent the first few paragraphs trashing MySQL and its maintainers as basically incompetent boobs, and this is the only bug he was willing to specifically mention. As someone pointed out, it wasn't strictly a bug because the update didn't misfire - it just emitted an error message correctly noting the lack of db support. That seems out of proportion.

  • (Score: 4, Insightful) by martyb on Thursday June 04 2015, @01:37PM

    by martyb (76) Subscriber Badge on Thursday June 04 2015, @01:37PM (#192053) Journal

    I've seen some insightful comments already... yay community!

    I am not a DBA. I have years' experience testing a CASE [wikipedia.org] product, including its LDM [wikipedia.org] component. I also have experience in testing large applications at multiple companies whose products had sprawling SQL statements. That said, it's been several years since I've worked as that level, so take the following with the proverbial unit of salt.

    From the looks of things, I suspect (but am by no means certain) a renormalization and refactoring of code could get you at least some of what you are looking for in performance improvements and security considerations, independent of the underlying DB. It's been a while since I looked around, but there are tools that you can point at your DB and schema and which can automatically generate useful abstractions and visualizations of it, such as a LDM.

    For example, your database interface code all runs under one user? Can that not be added as a parameter, as well?

    Also, be mindful of what got us to this place originally... an eventual dependency on a particular DB's implementation. If rehash is ported to support stored procs in postgreSQL, what happens 5 or 10 years from now if you want to port to another backend?

    --
    Wit is intellect, dancing.
    • (Score: 2) by bootsy on Friday June 05 2015, @07:12AM

      by bootsy (3440) on Friday June 05 2015, @07:12AM (#192404)

      In answer to your last point, you rewrite the logic of the stored procedure in another DBMS's stored procedure language.

      At least the logic is isolated and kept tightly on the database. It's easier to change it if you have to and you can parameter bind to avoid SQL injection.

      Another cool thing about Postgres is you can do DDL changes in a transaction and roll them back if the upgrade doesn't work. You certainly cannot do that with SQL Server/Sybase or Oracle.

      • (Score: 2) by martyb on Saturday June 06 2015, @10:47AM

        by martyb (76) Subscriber Badge on Saturday June 06 2015, @10:47AM (#192838) Journal

        In answer to your last point, you rewrite the logic of the stored procedure in another DBMS's stored procedure language.

        At least the logic is isolated and kept tightly on the database. It's easier to change it if you have to and you can parameter bind to avoid SQL injection.

        That makes sense, much like having an abstraction to a device provided by a device driver. Call the driver with what you want done and let the driver deal with each hardware platform's idiosyncrasies. In this case, have a consistent API for talking to the DB and let the API deal with each DB's syntax.

        Another cool thing about Postgres is you can do DDL changes in a transaction and roll them back if the upgrade doesn't work. You certainly cannot do that with SQL Server/Sybase or Oracle.

        That is so obvious in its utility and amazing in that it is not supported on those other platforms... thanks so much for the info!

        --
        Wit is intellect, dancing.
  • (Score: -1, Troll) by Anonymous Coward on Thursday June 04 2015, @01:59PM

    by Anonymous Coward on Thursday June 04 2015, @01:59PM (#192062)

    I remember volunteering strenuously for this kind of work during the initial launch of SN.

    I saw a few other people volunteering for other things as well.

    In the end, just a select few got chosen.

    Now we're whining to the community about needing help. Nice.

    • (Score: 2, Insightful) by Anonymous Coward on Thursday June 04 2015, @02:43PM

      by Anonymous Coward on Thursday June 04 2015, @02:43PM (#192076)

      Chosen? What about the github repository is unclear to you? It's possible this wasn't on the table during launch. It's possible it's not a good idea now. Either way, unless you can show me the pull request that was denied, you're just blowing smoke. What you are doing, this is whining. Either get over yourself and help as you can, or (more likely suited to your abilities) do nothing and shut the hell up.

      • (Score: -1, Flamebait) by Anonymous Coward on Friday June 05 2015, @01:55AM

        by Anonymous Coward on Friday June 05 2015, @01:55AM (#192339)

        Why the fuck would I risk wasting my time making a contribution that may not be accepted?

    • (Score: 2) by janrinok on Thursday June 04 2015, @03:05PM

      by janrinok (52) Subscriber Badge on Thursday June 04 2015, @03:05PM (#192087) Journal

      Firstly, it sounds more like you are whining because you didn't get to help when it first started. But I may have read the wrong tone into what you wrote. But if you think that you have the necessary skills and expertise I suggest you offer your services on #dev.

      Slashcode/MySQL was a working combination when the site started. At that point, we had bigger fish to fry and redesigning the DB was not a high priority. Now we have - or more correctly the devs have - more experience with the codebase and it has been significantly updated to use supported versions of software - hence Rehash. But there are still some problems with slow responses from the database (which might or might not be caused by the interface with MySQL) and the remaining code structure doesn't seem to be optimally designed in every respect. NCommander has, wisely in my view, asked for any relevant advice from the community. And some have posted the requested advice. I don't see anything wrong with that. The current dev team is very small and there isn't much expertise in this particular area - but we are guessing that there is plenty in the community.

      • (Score: 0) by Anonymous Coward on Thursday June 04 2015, @03:40PM

        by Anonymous Coward on Thursday June 04 2015, @03:40PM (#192119)

        Yeah, sorry for whinging. I recognize that's what I'm doing.

        I just felt kind of left out the first time around. There was a lot going on, those were heady times, and felt like I somehow missed the chance.

        I'll stfu now.

        • (Score: 2) by mrcoolbp on Thursday June 04 2015, @05:21PM

          by mrcoolbp (68) <mrcoolbp@soylentnews.org> on Thursday June 04 2015, @05:21PM (#192176) Homepage

          Remember that the first month or so was very hectic, no one was really in charge, and what janrinok mentioned is also rather key.

          Whine if you want, but I'd ask that you NOT sftu. Speak your mind, offer your opinions whatever they are. This site is built and influenced by the community, we all contribute however we can.

          Furthermore, if you feel you could help out the dev team, by all mean reach out to them. IRC [soylentews.org] is best, but you could email dev at soylentnews dot org, or just jump on github and have a crack at the code.

          --
          (Score:1^½, Radical)
  • (Score: 2) by sudo rm -rf on Thursday June 04 2015, @02:10PM

    by sudo rm -rf (2357) on Thursday June 04 2015, @02:10PM (#192067) Journal

    I have heard a lot of good things about MariaDB [wikipedia.org], although admittedly I never read in-depth articles about it.
    Has anyone here some experience with it? The migration [mariadb.com] from MySQL seems pretty simple, too.

    • (Score: 2) by darkfeline on Thursday June 04 2015, @07:40PM

      by darkfeline (1030) on Thursday June 04 2015, @07:40PM (#192240) Homepage

      I have used MariaDB as a drop-in replacement for MySQL for a personal collection tracking app. No experience building large apps with it though.

      At any rate, it takes almost no effort to migrate from MySQL to MariaDB, and I don't think MariaDB is any worse than MySQL, so I think it's something SN could consider doing.

      --
      Join the SDF Public Access UNIX System today!
      • (Score: 2) by wonkey_monkey on Friday June 05 2015, @08:53AM

        by wonkey_monkey (279) on Friday June 05 2015, @08:53AM (#192426) Homepage

        I don't think MariaDB is any worse than MySQL

        That's their company motto, isn't it?

        --
        systemd is Roko's Basilisk
  • (Score: 4, Informative) by kbahey on Thursday June 04 2015, @02:35PM

    by kbahey (1147) on Thursday June 04 2015, @02:35PM (#192074) Homepage

    Background: I have been using MySQL for a decade and a half, mainly for web development (specifically Drupal).

    Not many users of MySQL use the NDB cluster. In fact, NDB was evaluated years ago for Drupal and found to be a poor fit.

    Instead most sites today use the InnoDB engine (ACID) with replication. Mostly, it is master/slave, but some sites do use master/master. You can make the code intelligent so that reads go to one of the slaves, and writes go to the master only.

    As others have said, stored procedures are a maintenance burden. Code in your git repository is visible, under version control, and editable. Stored procedures are hidden from normal view and can fire on data conditions, not just when called from code.

    The issue of SQL injection can be mitigated by using a database abstraction layer, and calling that layer for all database calls. That layer can support multiple database engines as well. We do this in Drupal using PHP's PDO, and support MySQL, SQLite, and PostgreSQL.

    So, I agree what others have chimed in that the reasons for moving to PostgreSQL are not that strong from a technical and practical point of view.

  • (Score: 1) by Darren on Thursday June 04 2015, @03:33PM

    by Darren (4786) on Thursday June 04 2015, @03:33PM (#192115) Homepage

    Well, part of the problem is the SQL syntax being vulnerable because it fuses command and value strings into one. The other problem is that SQL as a structure is 'tables' which means it's really all one dimensional and tied together with associations which is ridiculous and non-mnemonic.
    Solution? STORE IT AS A JSON OBJECT IN AN OBJECT DATABASE (I like nedb or mongo for this)
    Example:
    var thisCommentThread={
      comments:[{user:'bob','text':'I like linux',score:0,comments:[{name:'ted',text:'I like mac',score:1,comments:[]}]}],
      title:'what kind of os do you like'
    };
    var database=require('nedb');
    var db=new database({'filename':'/var/www/soylentnews/data/comments.db',autoload:true});
    db.insert(thisCommentThread);
    //so say this went in and the system generated _id:x12 for it, you pull it out
    db.find({_id:'x12'},function(err,comments){
      soyIO.emit('here is a comment thread',comments);
    });
    //on the client side
    soyIO.on('here is the comment thread',function(comments){
      memoryObject.thisArticle.comments=comments;
    });
    //display is fairly easy
    var nestComments=function(comments){
      var comments='

    ';
      $.each(comments,function(i,comment){
        var comment='

    '+comment.text;
        if(comment.comments.length>0){nestComment(comment.comments);}
        comment+='

    ';
        comments+=comment;
      });
      comments+='

    ';
      return comments;
    });
    //so when the user wants to view the article page it looks sorta like
    var loadArticlePage(article,comments){
      var htmArticle='

    '+article.text+'

    ';
      var htmComments=nestComments(comments);
      var page='

    '+htmArticle+htmComments+'

    ';
      $('body>.content').html(page);
    };

    --
    Web Designer - darrencaldwellwebdesign.ca
    • (Score: 2) by mtrycz on Thursday June 04 2015, @08:10PM

      by mtrycz (60) on Thursday June 04 2015, @08:10PM (#192251)

      A nosql document-based database would certainly make sense for parts of the site, like stories and comments, but unfortunately an established codebase like this will probably need a relational database, expecially since it's not doing big data. Going for two databases for the one app is a no-go.

      I stand by my opinion that all data is relational; if you think you have non relational data, you've probably not found your relations yet. Nonetheless nosql does have it's place when you need to scale beyond what's possibile on a single machine. But with the horizontal scaling you sacrifice acid/transactional guarantees of an rdbms. If it was me, I wouldn't risk it for a production app.

      --
      In capitalist America, ads view YOU!
      • (Score: 0) by Anonymous Coward on Thursday June 04 2015, @08:40PM

        by Anonymous Coward on Thursday June 04 2015, @08:40PM (#192262)

        Its the fact that you have to establish or think of relations at all. Properly the data should be nested so your calls aren't confusing long strings pulling separate 'tables' together by their relations it just comes out as one piece.

        There are caveats to that as data grows larger, but the amount of code you need to write is practically nothing compared to php with sql. This makes it more stable, understandable, maintainable, and therefore more usable.

        There is currently a wash of old databases, and they suck. There's no getting around it, they were awesome at the time, but times change and so should our data storage/retrieval. It behooves us to do the job the best we can do it, not the way that makes people playing office politics happy (I'm amazed how much power non-techs are given over tech processes, the logical fallacy and resultant unnecessary project crippling in search of marketing power words is staggering "It's a PDO connection, it's the best (that I think you want to hear), I swear! (please feed my children for another day)").

        • (Score: 2) by mtrycz on Friday June 05 2015, @07:57AM

          by mtrycz (60) on Friday June 05 2015, @07:57AM (#192412)

          The old databases don't suck, what are you talking about?

          Nosql is a tool that has it's usecases, and other usecases where it's not a good idea. There are specific applications (most of them, I'd argue), where you can't get around a relational integrity constraint.

          Also, are you arguing that sql/rdbms is hip and a buzzword for the management? Yeah, I just got trolled.

          --
          In capitalist America, ads view YOU!
          • (Score: 0) by Anonymous Coward on Friday June 05 2015, @01:02PM

            by Anonymous Coward on Friday June 05 2015, @01:02PM (#192489)

            I'd say it's definitely a buzzword, having your ideas torn apart into separate 1 dimensional tables instead of just nesting the data properly creates a spaghetti mess. At the moment sql etc are buzz words, people know them, they like them, mongodb or nedb are not buzz words, they are unknown. However in comparison these newer object oriented databases are faster, simpler, easier to use, and more mnemonic.

            for insance, if I want all the clients I'd do
            database.find({},function(err,allClientObjects){ //do something with client objects });

            that doesn't seem like much at all eh? Now try that with sql, you'll pull from the client table, the accounts table, the user table, the red white blue and purple hickery dragon table etc. Your query string eventually will be massive, and it will be prone to problems whenever you want to change it.

            Sql is a shit database, and it's relations via 1 dimensional tables and associations is utter garbage.

            • (Score: 2) by mtrycz on Friday June 05 2015, @02:19PM

              by mtrycz (60) on Friday June 05 2015, @02:19PM (#192540)

              will be prone to problems whenever you want to change it.

              Yup, that's the essence of it. Nosql is ust magically immune from this, actually it handles datamodel changes gracefully. lol

              Bye bye, troll.

              --
              In capitalist America, ads view YOU!
  • (Score: 3, Informative) by bloodnok on Thursday June 04 2015, @04:04PM

    by bloodnok (2578) on Thursday June 04 2015, @04:04PM (#192133)

    First my credentials: I currently work as a freelance Postgres Consulatant. I used to be a DBA. I've been developing with relational databases for more years than I care to count. My resume (Marc Munro) is online, though outdated, at bloodnok.com

    I would be hugely pleased to see SN move to Postgres. Partly to fuel my own technological prejudices, but mostly because I think it will provide a superior solution.

    Although Postgres clustering is seen by some as difficult, setting up an architecture with a single writable database and multiple read-only standbys is fairly straightforward using streaming replication. Such systems are highly reiliable and resilient, have very throughput, and are are easy to maintain.

    Like others who have commented I am not convinced about putting application logic into the database. I do believe though in putting all *data management* logic in there. Where the line should be drawn is difficult, but your user defined functions (technically they are not stored procedures) should be ensuring data integrity and cleanliness rather than business logic. If you were to replace all of your complex queries with views, and all of your complex updates with instead-of triggers on those views, you would have the right split. I do not suggest you do this, except as an intellectual exercise to determine what functionality truly belongs in the database.

    As for security, ideally you would base this not on the identity of the calling module, but on the identity of the user the module is working for. If you'd like to see what I mean take a look at the Veil demo: http://veil.projects.pgfoundry.org/curdocs/index.html/ [pgfoundry.org] I no longer recommend using Veil as there are now simpler ways to achieve the same thing, but the underlying approach is still valid.

    I'd be pleased to help.

    • (Score: 2) by NCommander on Friday June 05 2015, @03:12AM

      by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Friday June 05 2015, @03:12AM (#192354) Homepage Journal

      Veil seems close to what I want. Basically, since every SQL operation in rehash can be mapped to a perl function, which in turn could be mapped to a stored procedure, I just want a way to make it so each procedure can only call what it needs and nothing else. I'm flipping through the veil documentation, but what would you recommend in its place right now?

      Also, what IMHO is the best way to handle data from MySQL -> postgresql, given many types do not map 1:1.

      --
      Still always moving
      • (Score: 0) by Anonymous Coward on Saturday June 13 2015, @03:23AM

        by Anonymous Coward on Saturday June 13 2015, @03:23AM (#195623)

        Veil seems like a row level security like feature. Maybe not exact, but close. At this point, it seems best at bolting on an existing project and a new project would just design for security. I suggest looking at:
        http://michael.otacoo.com/postgresql-2/postgres-9-5-feature-highlight-row-level-security/ [otacoo.com]
        The row level security feature has been in progress for years. Back in 9.2, that work solved the leaky procedure / views problem that the Veil documentation talks about with LEAKPROOF/security_barrier.

        Regarding type mapping. Find the type that behaves the same way. If its range is smaller than the range in MySQL, pick the next size up. I wrote a tool that converts any table from any database product to another using Java's JDBC meta data interfaces. It was not difficult, and did not turn out to be a large set of code, with the exception of other features that I added to it.

  • (Score: 2) by arslan on Thursday June 04 2015, @11:26PM

    by arslan (3462) on Thursday June 04 2015, @11:26PM (#192301)

    Did MySQL just caused the blip in the SN home page a moment ago where every article went missing?

  • (Score: 2) by goodie on Friday June 05 2015, @05:19AM

    by goodie (1877) on Friday June 05 2015, @05:19AM (#192378) Journal

    A few disclaimers are in order before I go on with my 2 cents here:
    - I have very little experience with MySQL and PostgreSQL. My area of expertise is MSSQL (10+ years) including some migration projects between Oracle and MSSQL. All that within the context of application development (legacy, J2EE etc.). So my views here are influenced (tainted if you prefer ;) by this.
    - I did not read the linked module code nor am I familiar with the software stack for SN so some of my points may be moot.

    Couple of important points to me stand out:

    - As some have mentioned, MySQL vs PostgreSQL does not seem to be very clear here... Every DBMS has its flaws, the point is to pick one that works best and scales out best for you. My main point here is the following: beyond the clustering/failover system, some have alluded to performance issues (mainly scaling from what I gather). What exactly are those issues when taking into account the capacity that your DB server gives you? As in, is the DB server very/too busy? I mean what is the actual average load on the DB server and how is that an issue? This could just be a coding issue, whether on the DB or the app side. To me, it may be that more than anything when you refer to queries having to span 5/6 tables: views, SPs, etc. will not help with this in any way. On the plus side, 5 or 6 tables may not be an issue if it's not too frequently used or if the tables are small. As far as reliability and corruption are concerned, it could be the application code rather than the DBMS that is to blame. But MySQL offers ACID transactins like other DBMSs if you pick the right engine. After that, if you're coding in Perl, I am assuming that you have to handle COMMIT/ROLLBACK in your code rather than have a "container" type of thing do it for you.
    - Again, as some have mentioned, prepared statements should be preferred. This does not entail in any way moving the logic into SPs. I would actually not do that unless you need to package multiple statements into one self-contained SQL object, much like you would inside an app module. Prepared statements and SPs will give you cached, reusable execution plans (something that ad hoc SQL usually will not). This is on top of removing the need to sanitize the input for injection (although it does not in itself remove any possibility of injection if you take strings as parameters for example).
    - As far as the EXECUTE permission vs CRUD permission argument is concerned, I don't think it's a huge issue personally. And having restrictions for accounts based on the type of operation you are calling and the operation you are trying to perform will be rather bothersome to implement and maintain as the DB and the application code are loosely coupled. You could design a policy where you say that READ is one account and INSERT/UPDATE another while DELETE is not permitted if you implement soft deletes. That's about the extent I'd go to not to hinder the development.
    - When you want to have a multi DB server setup, is it for failover or load distribution? Depending on what you are trying to achieve, you need to consider whether you will want to implement something like a 2PC (costs more on writes but provides point in time failover) or an incremental log shipping strategy (e.g., 15 minutes lost at most) for the standby server.
    - As far as DB upgrades as concerned, a simple setup wherein every DB change is contained in one numbered script should be enough (been doing that for many years at my old job, it's a discipline issue, not a technical one). DB upgrades can be automated by running the scripts in sequence from the last one run. Add to that an automated backup before DB upgrades and you should be good. But DB upgrades are always trickier than code upgrades because you're not pushing binaries or anything like that.
    - Also, you want to keep the business logic in one place rather than have it put in multiple places without any clear rules or guidelines. One thing we did at my work was use triggers to archive data automatically as soon as the application would consider it "outside of its scope", i.e. only when reporting on it rather than use for any live operations (could be for example archival of threads that have had no activity after X days etc.).

    So my very superficial conclusion is the following: if your main objective is to sanitize the code base, keep MySQL and implement prepared statements with some SPs. If you're looking to switch, then switch and after that implement prepared statements and some SPs. Sanitization and improvements to the DB schema and application code can be tackled during slower dev periods and be done progressively. I tend to favor this over a long blitz which will inevitably unveil lots of bugs especially if you are switching platform at the same time.

    But again, this is based solely on your original post. I'd be happy to lend a hand if needed.

    • (Score: 2) by goodie on Friday June 05 2015, @05:22AM

      by goodie (1877) on Friday June 05 2015, @05:22AM (#192379) Journal

      One thing I forgot regarding triggers...

      While they may work great, you have to make sure they save you work rather than force you to reload data that has been updated by a trigger unbeknownst to the application. I've seen cases where using triggers saved 1 SQL statement but required re-loading a bunch of objects... Obviously that was a poor choice, but for maintenance, pure DB stuff that the application does not touch (like in my example above), it could be worth it.