Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 13 submissions in the queue.

Submission Preview

Link to Story

It’s Time To Replace TCP In The Datacenter

Accepted submission by Arthur T Knackerbracket at 2024-11-19 14:21:06
Hardware

--- --- --- --- Entire Story Below - Must Be Edited --- --- --- --- --- --- ---

Arthur T Knackerbracket has processed the following story [techplanet.today]:

Despite its long and successful history, TCP is ill-suited for modern datacenters. Every significant element of TCP, from its stream orientation to its expectation of in-order packet delivery, is inadequate for the datacenter environment. The fundamental issues with TCP are too interrelated to be fixed incrementally; the only way to harness the full performance potential of modern networks is to introduce a new transport protocol. Homa, a novel transport protocol, demonstrates that it is possible to avoid all of TCP’s problems. Although Homa is not API-compatible with TCP, it can be integrated with RPC frameworks to bring it into widespread usage.

TCP, designed in the late 1970s, has been phenomenally successful and adaptable. Originally created for a network with about 100 hosts and link speeds of tens of kilobits per second, TCP has scaled to billions of hosts and link speeds of 100 Gbit/second or more. However, datacenter computing presents unprecedented challenges for TCP. With millions of cores in close proximity and applications harnessing thousands of machines interacting on microsecond timescales, TCP's performance is suboptimal. TCP introduces overheads that limit application-level performance, contributing significantly to the "datacenter tax."

This position paper argues that TCP’s challenges in the datacenter are insurmountable. Each major design decision in TCP is wrong for the datacenter, leading to significant negative consequences. These problems impact systems at multiple levels, including the network, kernel software, and applications. For instance, TCP interferes with load balancing, a critical aspect of datacenter operations.

Before discussing TCP’s problems, it’s essential to understand the challenges that any transport protocol for datacenters must address:

       

  • Reliable Delivery: The protocol must ensure data is delivered reliably from one host to another, despite transient failures.
           
  • Low Latency: Modern networking hardware enables round-trip times of a few microseconds for short messages. The transport protocol must not add significantly to this latency.
           
  • High Throughput: The protocol must support high data throughput and high message throughput, essential for communication patterns like broadcast and shuffle.
           
  • Congestion Control: The protocol must limit the buildup of packets in network queues to provide low latency.
           
  • Efficient Load Balancing: With rapidly increasing network speeds, the protocol must distribute load across multiple cores to keep up with high-speed links.
           
  • NIC Offload: Software-based transport protocols are becoming obsolete. Future protocols must move to special-purpose NIC hardware to provide high performance at an acceptable cost.
           

TCP’s key properties, including stream orientation, connection orientation, bandwidth sharing, sender-driven congestion control, and in-order packet delivery, are all wrong for datacenter transport. Each of these decisions has serious negative consequences:

Incremental fixes to TCP are unlikely to succeed due to the deeply embedded and interrelated nature of its problems. For example, TCP’s congestion control has been extensively studied, and while improvements like DCTCP have been made, significant additional improvements will only be possible by breaking some of TCP’s fundamental assumptions.

Homa represents a clean-slate redesign of network transport for the datacenter. Its design differs from TCP in every significant aspect:

Replacing TCP will be difficult due to its entrenched status. However, integrating Homa with major RPC frameworks like gRPC and Apache Thrift can bring it into widespread usage. This approach allows applications using these frameworks to switch to Homa with little or no work.

TCP is the wrong protocol for datacenter computing. Every aspect of its design is inadequate for the datacenter environment. To eliminate the 'datacenter tax,' we must move to a radically different protocol like Homa. Integrating Homa with RPC frameworks is the best way to bring it into widespread usage. For more information, you can refer to the whitepaper It's Time to Replace TCP in the Datacenter [arxiv.org].


Original Submission