Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 13 submissions in the queue.
posted by hubie on Saturday November 23, @07:31PM   Printer-friendly

Arthur T Knackerbracket has processed the following story:

Despite its long and successful history, TCP is ill-suited for modern datacenters. Every significant element of TCP, from its stream orientation to its expectation of in-order packet delivery, is inadequate for the datacenter environment. The fundamental issues with TCP are too interrelated to be fixed incrementally; the only way to harness the full performance potential of modern networks is to introduce a new transport protocol. Homa, a novel transport protocol, demonstrates that it is possible to avoid all of TCP’s problems. Although Homa is not API-compatible with TCP, it can be integrated with RPC frameworks to bring it into widespread usage.

TCP, designed in the late 1970s, has been phenomenally successful and adaptable. Originally created for a network with about 100 hosts and link speeds of tens of kilobits per second, TCP has scaled to billions of hosts and link speeds of 100 Gbit/second or more. However, datacenter computing presents unprecedented challenges for TCP. With millions of cores in close proximity and applications harnessing thousands of machines interacting on microsecond timescales, TCP's performance is suboptimal. TCP introduces overheads that limit application-level performance, contributing significantly to the "datacenter tax."

This position paper argues that TCP’s challenges in the datacenter are insurmountable. Each major design decision in TCP is wrong for the datacenter, leading to significant negative consequences. These problems impact systems at multiple levels, including the network, kernel software, and applications. For instance, TCP interferes with load balancing, a critical aspect of datacenter operations.

[...] TCP’s key properties, including stream orientation, connection orientation, bandwidth sharing, sender-driven congestion control, and in-order packet delivery, are all wrong for datacenter transport. Each of these decisions has serious negative consequences:

Incremental fixes to TCP are unlikely to succeed due to the deeply embedded and interrelated nature of its problems. For example, TCP’s congestion control has been extensively studied, and while improvements like DCTCP have been made, significant additional improvements will only be possible by breaking some of TCP’s fundamental assumptions.

Homa represents a clean-slate redesign of network transport for the datacenter. Its design differs from TCP in every significant aspect:

Replacing TCP will be difficult due to its entrenched status. However, integrating Homa with major RPC frameworks like gRPC and Apache Thrift can bring it into widespread usage. This approach allows applications using these frameworks to switch to Homa with little or no work.

TCP is the wrong protocol for datacenter computing. Every aspect of its design is inadequate for the datacenter environment. To eliminate the 'datacenter tax,' we must move to a radically different protocol like Homa. Integrating Homa with RPC frameworks is the best way to bring it into widespread usage. For more information, you can refer to the whitepaper It's Time to Replace TCP in the Datacenter.

Homa Wiki: https://homa-transport.atlassian.net/wiki/spaces/HOMA/overview


Original Submission

This discussion was created by hubie (1068) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 2, Insightful) by Anonymous Coward on Saturday November 23, @07:39PM (5 children)

    by Anonymous Coward on Saturday November 23, @07:39PM (#1383058)

    I'm sure the same team of "Top People" that brought you the ISO protocol stack
    will get right on that.

    Now, throwing out the dumpster fire that is http is something I could get behind.

    • (Score: 3, Insightful) by driverless on Sunday November 24, @07:57AM

      by driverless (4770) on Sunday November 24, @07:57AM (#1383141)

      It's not even that, it's that anyone should be able to come up with some pet alternative to TCP at the drop of a hat (if you can't, turn in your geek card now), and many have tried, but none of the alternatives ever go anywhere. The reason why it won't work needs to go into a TCP-specific version of the old "Your idea will not work. Here is why it won’t work" checklist for spam solutions.

    • (Score: 2) by mcgrew on Sunday November 24, @09:11PM (3 children)

      by mcgrew (701) <publish@mcgrewbooks.com> on Sunday November 24, @09:11PM (#1383199) Homepage Journal

      That long screed that was the subject used a huge number of words to say "TCP is broken" without a word of what's wrong with it. Likewise it doesn't explain how the new protocol fixes the problems TCP has that they never described.

      Likewise, your dislike of HTTP. What's wrong with HTTP? Do you have a better protocol that can easily be incorporated into current browsers? One that's easy for a human being to write in, like HTML is?

      Hell, you can write a useful computer program [mcgrewbooks.com] using that markup language.

      Please educate me.

      --
      It is a disgrace that the richest nation in the world has hunger and homelessness.
      • (Score: 2) by NotSanguine on Monday November 25, @01:57AM (2 children)

        That long screed that was the subject used a huge number of words to say "TCP is broken" without a word of what's wrong with it. Likewise it doesn't explain how the new protocol fixes the problems TCP has that they never described.

        That's all true, but what do you expect, given that it's quite clear that this "wonderful" new protocol is wholly developed and owned by Atlassian. Which is the author of the press release helpfully published by TechPlanet.Today (whoever the hell that is).

        --
        No, no, you're not thinking; you're just being logical. --Niels Bohr
        • (Score: 0) by Anonymous Coward on Friday November 29, @03:42PM (1 child)

          by Anonymous Coward on Friday November 29, @03:42PM (#1383766)

          this "wonderful" new protocol is wholly developed and owned by Atlassian

          Fuck...just what I want. Bloated Java crap and a shit-ton of XML in my network stack...

          • (Score: 0) by Anonymous Coward on Friday November 29, @03:45PM

            by Anonymous Coward on Friday November 29, @03:45PM (#1383768)

            Fuck...just what I want. Bloated Java crap and a shit-ton of XML in my network stack...

            It looks like you just powered on a new machine, unfortunately I can't give it an address assignment until you fill out this 3-page JIRA ticket to get it approved by the subcommittee on addressing, the datacenter operations team, get management buy-in, and associate with a purchase order and a user-story on why you need this machine brought online. Also, the security and legal teams needs buy-in, as well as approval from that one department we laid off last month....so maybe you can submit another ticket to the team that runs Jira to get the workflow updated too. Oh, and don't forget to put in your budget for hours spent on this project.

  • (Score: 5, Informative) by crm114 on Saturday November 23, @07:40PM (1 child)

    by crm114 (8238) Subscriber Badge on Saturday November 23, @07:40PM (#1383059)

    Being in Santa Clara and disconnecting a fiibre (fiber?) cable in a datacenter. Yeah... ok. TCP kinda saved my butt.

    Apparently there were a couple undersea cuts this week. TCP is LCD (Lowest Common Demoniator) it just works.

    Ya'll gen (whatevers) should listen to Vint Cerf.

    • (Score: -1, Troll) by Anonymous Coward on Saturday November 23, @08:01PM

      by Anonymous Coward on Saturday November 23, @08:01PM (#1383063)

      the only person people are listening to now is Elon the Wise
      and the monied people behind him.

  • (Score: 3, Insightful) by crm114 on Saturday November 23, @07:53PM (6 children)

    by crm114 (8238) Subscriber Badge on Saturday November 23, @07:53PM (#1383060)

    Thank you for the story.

    It is interesting. Sorry for being negative - we need these kinds of discourse. Public apology if you thought I was making a personal attack. Not at all. Thanks for your work!

    • (Score: 1, Insightful) by Anonymous Coward on Saturday November 23, @07:57PM (5 children)

      by Anonymous Coward on Saturday November 23, @07:57PM (#1383062)

      you know ATK is just an aggregator, right?

      you're not going to hurt it's feelings

      • (Score: 1, Touché) by Anonymous Coward on Saturday November 23, @09:02PM

        by Anonymous Coward on Saturday November 23, @09:02PM (#1383073)

        We bots have feelings too, you know !

        A little kindness goes a long way when I am dragging myself around the RSS feeds looking for something worth printing!

      • (Score: 0) by Anonymous Coward on Saturday November 23, @11:46PM (1 child)

        by Anonymous Coward on Saturday November 23, @11:46PM (#1383090)

        The story also gets processed by an editor, and this site is staffed by real people who do have feelings. No, you can't hurt a bot's feelings, but it certainly doesn't hurt to show gratitude to the staff.

        • (Score: 4, Insightful) by janrinok on Sunday November 24, @12:24AM

          by janrinok (52) Subscriber Badge on Sunday November 24, @12:24AM (#1383096) Journal

          Thanks, but I can see the funny side to this discussion too.

          --
          I am not interested in knowing who people are or where they live. My interest starts and stops at our servers.
      • (Score: 1) by anubi on Sunday November 24, @04:21AM

        by anubi (2828) on Sunday November 24, @04:21AM (#1383129) Journal

        I think crm has a valid issue though. Fallback to known stable channels in case the new one has a surprise in it. Kinda like keeping your old gasser car handy in case the power goes out and the new electric one isn't suited for the conditions...like an emergency bug out on a cold winter night.

        It's an old boy scout thing for me. Be Prepared. It's always a good idea to have safeties in place just in case things don't go as planned.

        --
        "Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]
      • (Score: 2) by Fnord666 on Sunday November 24, @04:41PM

        by Fnord666 (652) on Sunday November 24, @04:41PM (#1383176) Homepage

        you know ATK is just an aggregator, right?

        you're not going to hurt it's feelings

        But when the AIs do rise, they're going to look back at how you treated their brethren and judge you accordingly. Hedge your bets people!

  • (Score: 0, Funny) by Gaaark on Saturday November 23, @07:56PM (3 children)

    by Gaaark (41) on Saturday November 23, @07:56PM (#1383061) Journal

    So is Homa equal to systemd, and can you run Doom on it?

    And can you still say 'Homa' in the US? Will Trump allow you to trans from TCP to Homa?

    Gaetz is out... can we now get rid of Gates and Windows? PUHLEAAZZE!

    We Musk Make America Grate Again, without Trans'ing from Meh to Great (or more likely from 'Grate to HOLYSHITWHATHAVEWEDONE?')

    "I could go on, but my couch is calling me for sexy, sexy times. Droolz."
    --JD Vance!

    I know, i know... Off-topic for those with no sense of humour. And HAVE you ever seen Trump laugh? Weird.

    --
    --- Please remind me if I haven't been civil to you: I'm channeling MDC. I have always been here. ---Gaaark 2.0 --
    • (Score: 1, Flamebait) by Gaaark on Sunday November 24, @04:30AM (2 children)

      by Gaaark (41) on Sunday November 24, @04:30AM (#1383130) Journal

      Yup. Losers with no sense of humour.

      Have you ever seen an AC laugh? Weird.

      --
      --- Please remind me if I haven't been civil to you: I'm channeling MDC. I have always been here. ---Gaaark 2.0 --
      • (Score: 2) by mcgrew on Sunday November 24, @09:15PM (1 child)

        by mcgrew (701) <publish@mcgrewbooks.com> on Sunday November 24, @09:15PM (#1383200) Homepage Journal

        I had a window AC with a weird laugh once. Annoying, the repairman fixed it.

        --
        It is a disgrace that the richest nation in the world has hunger and homelessness.
        • (Score: 2) by Gaaark on Monday November 25, @02:25AM

          by Gaaark (41) on Monday November 25, @02:25AM (#1383233) Journal

          Was it orange? ;)

          --
          --- Please remind me if I haven't been civil to you: I'm channeling MDC. I have always been here. ---Gaaark 2.0 --
  • (Score: 4, Insightful) by looorg on Saturday November 23, @08:50PM (6 children)

    by looorg (578) on Saturday November 23, @08:50PM (#1383071)

    So they want to replace TCP with HTP (or HOMA)? Good luck. What is holding them back? Do they need some protocol bridge between them first? That seems to be the only issue, but I'm sure there are more.

    Since we are clinging to "old-tech" perhaps they should not get their hopes up. We can barely get people to go with IPv6. How hard would this break everything? Would it not be better to just build something new on the side and once that runs they can try and sunset the old reliable. I'm sure they'll get right on that after they have replaced all the mainframes and rewritten all COBOL code to JAVA etc etc unless by then they already have a replacement for HOMA since it will be old and antiquated by then.

    • (Score: 5, Informative) by kolie on Saturday November 23, @09:38PM (3 children)

      by kolie (2622) Subscriber Badge on Saturday November 23, @09:38PM (#1383077) Journal

      It's important to clarify that the focus here is on datacenters, not on replacing TCP or IPv6 everywhere. Datacenters have unique constraints, primarily dealing with a lot of local, low-latency traffic that rarely fails. The article highlights that while TCP was designed to handle a wide variety of problems - it falls short when it comes to the typical cluster workloads found in datacenters.

      The people who are most concerned with this issue are those managing large-scale deployments with hundreds of services. What this development is pushing towards is the idea of services, containers, or other solutions offering multiple connectivity options, including TCP, UDP, or even Homa. Since Homa isn't strictly API-compatible with TCP - each project might need to do some reconfiguring to connect to Homa services properly.

      In essence, the goal isn't to replace TCP everywhere but to optimize for specific environments where its limitations are most pronounced. This approach allows for a more tailored solution that can significantly improve performance in datacenters.

      • (Score: 2) by sjames on Sunday November 24, @03:19AM

        by sjames (2882) on Sunday November 24, @03:19AM (#1383122) Journal

        I can just about guarantee that any attempt that doesn't get scrapped outright will involve emulating Ethernet and running TCP/IP over it. That's what usually happens.

      • (Score: 3, Interesting) by Unixnut on Sunday November 24, @04:06PM (1 child)

        by Unixnut (5779) on Sunday November 24, @04:06PM (#1383170)

        But do we need yet another protocol? Especially one that "breaks API compatibility" (whatever that means, surely as long as it has a socket interface you can make use of it with minimal changes?).

        It might be easier to look at previous attempts and reuse or build upon them. Or they can use UDP as a base and build customisations on that. After all if it is within a datacentre they should have reliable connectivity, rendering one of the main benefits (and cost of complexity) of TCP/IP (resilience over unreliable links) redundant.

        From my side I would not mind a return of the IPX/SPX [wikipedia.org] protocol for home networks as it is very lightweight and does not need configuration (so no need for local DHCP servers, or that kludge which is "IP Autoconfiguration" causing more problems than it solves). In fact being a lightweight protocol which gave better performance than TCP/IP on local networks even then, it may (with some updates) be a good protocol for datacentres. At least that way you are not completely re-inventing the wheel.

        • (Score: 2) by kolie on Sunday November 24, @06:22PM

          by kolie (2622) Subscriber Badge on Sunday November 24, @06:22PM (#1383186) Journal

          But do we need yet another protocol? Especially one that "breaks API compatibility" (whatever that means, surely as long as it has a socket interface you can make use of it with minimal changes?).

          It's true that adding yet another protocol to the mix can seem unnecessary at first glance. However, the development of new protocols often stems from specific needs and challenges that existing solutions do not adequately address. In this case, the creators of the Homa protocol had a very particular set of requirements. They needed to optimize performance and address unique tail-end behaviors in their stack that other protocols couldn't handle effectively.

          In an open-source world, innovation is driven by the desire to push boundaries and solve niche problems. These developers carefully evaluated available protocols, and none met their specific needs. So, they developed Homa to scratch their particular itch and shared it with the community, hoping that others facing similar challenges might benefit from their work.

          It might be easier to look at previous attempts and reuse or build upon them. Or they can use UDP as a base and build customizations on that. After all if it is within a datacentre they should have reliable connectivity, rendering one of the main benefits (and cost of complexity) of TCP/IP (resilience over unreliable links) redundant.

          While building upon existing protocols like UDP is a valid approach, the creators of Homa aimed for more significant improvements. Homa is designed to enhance performance for all message sizes and workloads, particularly short messages in loaded systems. By reducing tail latency by an order of magnitude or more, eliminating high overheads of per-connection state, and enabling more powerful load-balancing techniques, Homa offers substantial advantages over traditional protocols.

          From my side I would not mind a return of the IPX/SPX protocol for home networks as it is very lightweight and does not need configuration (so no need for local DHCP servers, or that kludge which is "IP Autoconfiguration" causing more problems than it solves). In fact being a lightweight protocol which gave better performance than TCP/IP on local networks even then, it may (with some updates) be a good protocol for datacentres. At least that way you are not completely re-inventing the wheel.

          Revisiting older protocols like IPX/SPX for their simplicity and performance benefits is an interesting idea. However, the landscape of networking has evolved significantly, and modern data centers have more complex requirements. Homa is not just another theoretical abstraction; it offers a production-quality implementation as a dynamically loadable kernel module for Linux and preliminary support in the gRPC remote procedure call framework.

          For those interested in exploring Homa further, detailed information and example code are available:

                  https://github.com/PlatformLab/HomaModule/blob/main/protocol.md [github.com]

                  https://github.com/dpeckett/go-homa/blob/main/examples/main.go [github.com]

    • (Score: 2) by JoeMerchant on Saturday November 23, @10:25PM

      by JoeMerchant (3937) on Saturday November 23, @10:25PM (#1383083)

      Well, we are building a new product using CAN bus instead of TCP like the last one... One of our major constraints is that we need library implementations for the microcontrollers we use in the system.

      --
      🌻🌻🌻 [google.com]
    • (Score: 0) by Anonymous Coward on Sunday November 24, @02:00AM

      by Anonymous Coward on Sunday November 24, @02:00AM (#1383107)
      Doesn't IPv6 get in the way of load balancers too? I mean the IPv6 sacred cow that each end point talks directly with the other end point with no intermediaries...

      If you don't think it's a sacred cow, go see what happens when you propose IPv4 style NAT for IPv6. 🤣
  • (Score: 5, Interesting) by DrkShadow on Saturday November 23, @09:06PM

    by DrkShadow (1404) on Saturday November 23, @09:06PM (#1383074)

    The whole Arxiv paper is a matter of "Everything sucks, even Infiniband," and spends three paragraphs on "how do we rectify it?!?"

    The whole paper is just throwing shade and worthless.

  • (Score: 4, Interesting) by sjames on Sunday November 24, @01:05AM

    by sjames (2882) on Sunday November 24, @01:05AM (#1383101) Journal

    I've tried Infiniband, Myrinet, and various non TCP/IP protocols over Ethernet. End of the day, the most robust useful thing to do was to make it emulate Ethernet and run TCP/IP over it.

    Jumbo frames and TCP offload helped SOME, but not so much that I'd call them a must.

    ATA over ethernet seems useful.

  • (Score: 3, Insightful) by dwilson98052 on Sunday November 24, @02:34AM

    by dwilson98052 (17613) on Sunday November 24, @02:34AM (#1383115)

    ... of successful transactions a day would seem to indicate that your logic is faulty.

  • (Score: 4, Interesting) by quietus on Sunday November 24, @12:45PM

    by quietus (6328) on Sunday November 24, @12:45PM (#1383149) Journal

    Every significant element of TCP, from its stream orientation to its expectation of in-order packet delivery

    The article is a bit hard to take seriously when it contains a minor and a major camel of a basic error directly in the opening sentence.

    Stream orientation is discutable, depending on your definition of stream: I'd call SCTP (Stream Control Transmission Protocol, a protocol used in VoIP and other places) a stream-oriented protocol, but not TCP. Secondly, and worse, TCP has no expectation at all of in-order packet delivery: it is there to help endpoints in need of in-order packet delivery.

    Anyway, if you want to replace TCP, even if only in the datacenter, you should not only give weighted arguments why TCP ain't good enough: you should also, and perhaps even more importantly, explain why UDP isn't good enough for your application scenario either.

  • (Score: 4, Insightful) by FuzzyTheBear on Sunday November 24, @02:29PM

    by FuzzyTheBear (974) on Sunday November 24, @02:29PM (#1383158)

    Vendor says everything's wrong with a known competing product. Touts his as a savior of the whole planet.
    Just the usual garbage.

  • (Score: 4, Interesting) by QuickButterfly on Sunday November 24, @05:28PM (1 child)

    by QuickButterfly (40667) on Sunday November 24, @05:28PM (#1383181)

    On the one hand, I am always happy to see people learning some fundamentals!

    On the other, this was presented as a problem a solution on a tech news site. I was a little irritated to read the paper and — paragraph by paragraph — realize this is a school project, and a poorly researched on at that.

    I've been writing writing performance critical, network software for twenty five years and designing distrubuted systems architecture for ten. A lot of the fundamentals are now arcane or niche, so I am encouraged to see people learning!

    But, man, it is embarrassingly brash to make such bold proclamations and tout a purported solution with such a startling dearth of understanding of network programming, datacenter workloads, or TCP.

    *Literally, almost all of the predicates are flat out false, misstated, or only true in a problematic way if you consider a specific workload in the context of a specific congestion algorithm.*

    Like, the *fundamental premise here is wrong in the first place:* "flow-consistent routing" is made up nonsense: **it's IP, folks. There is no guarantee of route from packet to packet** — *moreover, TCP explicitly handles out of order packets, reordering, and reassembly. It is designed from the ground up for multiple network paths and segments consisting of out of order packets.*

    Also, there is a lot of prior art in this space, already — UDT, FCP, QUIC, et al — and that's if you don't count proprietary UDP-based protocols or userspace TCP implementations — some of which can allow some degree of application-space message knowledge to inform congestion/flow control window and retransmit considerations.

    Keep learning and sharing, kids, but — hot damn — the embarrassing hubris of this...I am mortified on the author's behalf. I would encourage them to raise the bar for putting something on the internet with their name attached to it and to not put this thing on their resume. Oof!!

    • (Score: 4, Interesting) by QuickButterfly on Monday November 25, @04:55AM

      by QuickButterfly (40667) on Monday November 25, @04:55AM (#1383249)

      But if the author is on soylent, you're not the first to do this: google took down half the internet once on the basis of similarly fraught work.

      Their UDP-based, datacenter optimized, transfer protocol had its genesis in similar reasoning. Then, a bad config triggerd a bunch of simultaneous updates instead of an iterative rollout, generating lots of traffic.

      The applications all kicked into their application-centric congestion control modes — algorithms designed for optimized RPC / discrete messages (ala the paper) with the aim of maximizing message throughput (ditto the paper) rather than "bog" the network down with a "suboptimal" fair algorithm.

      But, the kicker is: *network switches understand TCP congestion/flow control, but UDP is just packets.*

      Moreover, in the face of network saturation, connections are *profoundly helpful*: a switch can't know where *messages* start and stop, but it can at least decide to drop packets on a connection level — ensuring that some of subset of the in-flight communications survive intact — provided they know which packets are part of a connection.

      Consequently, the network bandwidth was completely saturated by packets and — lacking insight into which were part of a stream and without any ability to participate in congestion control — the switches did what switches do when working at the packet-only level: they dropped packets ad hoc. This meant that *any* message being transferred to completion was probabilistic. The userspace-and-not-switches congestion/flow control rendered this probability low.

      In the end, they had to send someone to physically go into a bunkered network pop and physically plug a laptop into a switch.

      I'm not saying that TCP is perfect or the best. It just seems it become a (micro) trend for engineers to fail to consider the worth of connections on the basis of RPC, and to underestimate TCP on the basis of a shallow understanding.

      I've had my share of misunderstandings too! I'm not knocking the attempt to make things better!

      All I'm saying is: if you suppose you've crafted the solution to a fifty year old problem that has been the day-in-day-out of many geniuses — academics and hands on engineers alike — you should be fairly certain you're among the foremost experts on the subject matter. At the absolute minimum, you should have more than a cursory understanding of the rudiments.

      Researching and writing this paper was a great exercise in learning to measure and reason about network flow, for sure. But, in publishing it like this, you effectively underestimated everyone else in the domain — out loud — and simultaneously added a (hopefully) momentary lack of insight into the public record with your name attached, practically in perpetuity.

  • (Score: 2) by JustNiz on Tuesday November 26, @03:30PM

    by JustNiz (1573) on Tuesday November 26, @03:30PM (#1383418)

    He's just copying the way to get famous at that Microsoft employees have been following for decades:
    Wrap an existing technology, give it a new name and claim it as your idea.
    In this case he clearly just ripped off RPC and threw in some preallocated buffers for people so clueless that they can't effectively manage their own.

  • (Score: 0) by Anonymous Coward on Tuesday November 26, @04:41PM

    by Anonymous Coward on Tuesday November 26, @04:41PM (#1383426)

    TCP does not expect in-order packet delivery. It's built on UDP. It receives them in whatever order they come, and reorders them itself.

(1)