Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Sunday March 27 2016, @12:27PM   Printer-friendly
from the could-this-site-run-without-both-of-them? dept.

Discussion on the advantages of TCP vs UDP (and vice versa) has a history which is almost as long as the eternal Linux-vs-Windows debate. As I have long been a supporter of the point of view that both UDP and TCP have their own niches (see, for example, [NoBugs15]), here are my two cents on this subject.

Note for those who already know the basics of IP and TCP: please skip to the 'Closing the Gap: Improving TCP Interactivity' section, as you still may be able to find a thing or two of interest.

It's a primer, or a refresher, or a skip. We have all kinds here. Enjoy, or don't.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by Common Joe on Sunday March 27 2016, @05:50PM

    by Common Joe (33) <{common.joe.0101} {at} {gmail.com}> on Sunday March 27 2016, @05:50PM (#323585) Journal

    I just went to a talk this week where one of the lecturers was explaining how their database system relied on UDP instead of TCP. I looked at him as if he were from Mars. Realizing I wasn't fully following, he explained something I didn't know and thought I'd share here.

    His company works on a distributed system based on a heavily modified distributed PostgreSQL network; they call it a "massive parallel, shared-nothing database". There are one or two machines that sit out front and a break the query down so that other PostgreSQL systems can parse those newly formed statements. The queries execute in parallel (but share no resources in any way), the front end machines get the information back, and they reassemble the results and send it on to the originator of the queries. They claim the databases can be hundreds of TBs or even PBs worth of information.

    They use UDP to do it, though. I was shocked because I thought packet reliability was going to be king with this kind of database setup. He explained that when scaling to these sizes, the resources for hundreds or thousands of open sockets for TCP/IP communication was too expensive for those machines up front. The UDP fire and forget method was much better for resources. They do, of course, have checks on those UDP packets and if they don't get results back in a predetermined amount of time, they will resend, switch to failover systems, or do other appropriate things. Apparently, packet loss isn't a big problem since the UDP packets are sent on a closed network. He says very few packets are ever lost.

    I found that thought very interesting. I never imagined using UDP that way, but then I also understand the reasons why they did it and it's not a typical database setup.

    He put his slides online, but I purposely didn't post it here. I wasn't looking to post a Soyvertisement (I don't know anything about his company anyway), it looks like it's his personal website, and the talk he gave was not about TCP/IP vs UDP so there is nothing about it in his slides. The slides are mainly about how they took their program and made it open source... but he had to explain his product first which is how we got onto the TCP/IP vs UDP bit. Out of the 91 slides, only a few are dedicated to the setup of the parallel processing database setup. If you're just dying to see the slides, google "massive parallel, shared-nothing database" with quotes and you can't miss it. You can also email me and I'll send you the link. Due to craziness in my life, it may take me a couple of days to respond, but I'll try to do a decent turn around.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by devlux on Sunday March 27 2016, @06:14PM

    by devlux (6151) on Sunday March 27 2016, @06:14PM (#323589)

    I dunno about his particular implementation. However UDP is fine as long as you don't need guarantees. Once you start needing guarantees then you begin to start re-implementing TCP. If you own the entire loop, i.e. it's a private network this might be worthwhile but as soon as you begin to expand beyond the local side of things TCP does become king. Turning Nagel off on TCP can help alot if you're mostly smallish updates and need closer to realtime performance, but most of the features of TCP are there for a reason. That reason is to make sure your data arrives at it's destination. If your data only needs to "mostly" arrive then UDP becomes superior.

    What I don't understand is why no one has ever implemented a packet multicasting option for UDP as a standard.

    This would make UDP a lot more useful since a lot of people could subscribe to a a feed and it wouldn't be limited to the the upstream of the originator.

    (For those who don't know, the reason your video conference craps out at 4 or 5 people has little to do with your downstream bandwidth. if you have 5 people in a videochat your computer is pumping out the exact same packet 5 different times, simply changing the destination IP. What I'm saying is this seems like a function the router would be better at, failing that, having multiple ip destinations in a single packet might be good although I can see the IP header space quickly swamping the data portion in a large enough conferencing app.)

    • (Score: 2) by c0lo on Sunday March 27 2016, @08:52PM

      by c0lo (156) on Sunday March 27 2016, @08:52PM (#323639)

      (For those who don't know, the reason your video conference craps out at 4 or 5 people has little to do with your downstream bandwidth. if you have 5 people in a videochat your computer is pumping out the exact same packet 5 different times, simply changing the destination IP. What I'm saying is this seems like a function the router would be better at, failing that, having multiple ip destinations in a single packet might be good although I can see the IP header space quickly swamping the data portion in a large enough conferencing app.)

      Ah, such a simple solution, how come others haven't thought of it [wikipedia.org]?

      (grin)

      • (Score: 2) by devlux on Monday March 28 2016, @03:15PM

        by devlux (6151) on Monday March 28 2016, @03:15PM (#323918)

        Thanks I guess what I meant is as part of a major working standard like WebRTC.

        • (Score: 0) by Anonymous Coward on Monday March 28 2016, @10:57PM

          by Anonymous Coward on Monday March 28 2016, @10:57PM (#324123)
          Yes, because a Web browser needs to get OS capabilities.</sarcasm>
    • (Score: 2) by Common Joe on Monday March 28 2016, @07:14AM

      by Common Joe (33) <{common.joe.0101} {at} {gmail.com}> on Monday March 28 2016, @07:14AM (#323752) Journal

      Once you start needing guarantees then you begin to start re-implementing TCP. If you own the entire loop, i.e. it's a private network this might be worthwhile but as soon as you begin to expand beyond the local side of things TCP does become king.

      My understanding is that that is the key. The network is private so they don't lose tons of packets -- just a few. It sounds like their program can handle those few losses. And you're right, if they opened up the network beyond that, the setup sounds like it would fail. Obviously, this is not a standard database setup, but it's an interesting enough idea that it could be used for other extremely specialized programs. And for all I know, other programs may use it.

  • (Score: 2) by darkfeline on Sunday March 27 2016, @06:36PM

    by darkfeline (1030) on Sunday March 27 2016, @06:36PM (#323600) Homepage

    Or you can post it here because you thought it was interesting.

    Please do not emulate the horrible forum practice of:

    "I need help!"

    "I PMed you the fix =)"

    • (Score: 2) by Common Joe on Sunday March 27 2016, @08:01PM

      by Common Joe (33) <{common.joe.0101} {at} {gmail.com}> on Sunday March 27 2016, @08:01PM (#323620) Journal

      I thought his talk and the comments about TCP/IP and UDP in the talk were interesting. That's all. On the slides are a few diagrams of their setups. I don't need any help with anything as I'm not affiliated with them and have no project I'm currently working on like this. If I want something from those on Soylent News, I won't be bashful or beat around the bush about asking. I quite believe these people are smarter and more experienced than me concerning networking. As far as the details as to what they did, he didn't get into it much more than what I wrote. I can see what they did, though. I'll try to write some further details and thoughts to devlux's reply tomorrow after I've had a night's sleep.

  • (Score: 0) by Anonymous Coward on Sunday March 27 2016, @08:56PM

    by Anonymous Coward on Sunday March 27 2016, @08:56PM (#323640)

    You can build reliability on top of UDP, but it's a lot of work to get it right. Obviously, that's what they signed up to do, unless they're using a third party protocol library.

    There's the risk of starting on the path of re-inventing TCP to deal with all the pitfalls the original TCP developers ran into over 20-25 years. For example, flow control; what if the master node (the one dispatching the requests) runs short of message buffer space? TCP has a solution for that, UDP doesn't.

  • (Score: 4, Informative) by Geotti on Monday March 28 2016, @02:13AM

    by Geotti (1146) on Monday March 28 2016, @02:13AM (#323689) Journal

    I'll make it easier for you: it's greenplum [greenplum.org], here [scherbaum.la] are the slides, and here's their github [github.com]. ;)