Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 15 submissions in the queue.
posted by martyb on Wednesday December 16 2015, @06:23AM   Printer-friendly
from the it-all-adds-up dept.

Okay, maybe not everything you know about latency is wrong. But now that I have your attention, we can talk about why the tools and methodologies you use to measure and reason about latency are likely horribly flawed. In fact, they're not just flawed, they're probably lying to your face.

When I went to Strange Loop in September, I attended a workshop called "Understanding Latency and Application Responsiveness" by Gil Tene. Gil is the CTO of Azul Systems, which is most renowned for its C4 pauseless garbage collector and associated Zing Java runtime. While the workshop was four and a half hours long, Gil also gave a 40-minute talk called "How NOT to Measure Latency" which was basically an abbreviated, less interactive version of the workshop. If you ever get the opportunity to see Gil speak or attend his workshop, I recommend you do. At the very least, do yourself a favor and watch one of his recorded talks or find his slide decks online.

The remainder of this [linked] post is primarily a summarization of that talk. You may not get anything out of it that you wouldn't get out of the talk, but I think it can be helpful to absorb some of these ideas in written form. Plus, for my own benefit, writing about them helps solidify it in my head.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 1) by ThePhilips on Wednesday December 16 2015, @05:13PM

    by ThePhilips (5677) on Wednesday December 16 2015, @05:13PM (#277208)

    TFA is essentially saying that you can't throw out the worst case reading. In my experience, it's the worst case reading that you care about.

    The same experience here. Though I have another experience too: not only a typical developer doesn't care about the worst case, but also customers do not care about the worst case. Customers often want a nice benchmark number to reaffirm their buying decision.

    I personally (having lots of background in network programming) prefer to think about latencies in terms of "depth of the queue". Deeper are your queues - longer items are staying in the queue - higher the potential/worst case latencies.

    Generic Java's GC is too good fit for it: suck up gigabytes of RAM, then spend 30-90 seconds laundering it.

    Context switch latencies are the same too: though task-off + task-on times are fairly short and nearly constant, the length of the queue of tasks, waiting for free CPU to run on, is the source of the variable latencies. Most tasks - longer the queue - higher the latencies. (That's why the old RT axiom: 1 RT task = real time, 2/more RT tasks = that's not real time anymore.) (For fun, under the Windows OS, in the Task Manager enable the "Thread Count" column - and marvel at the counts. Decades of unfixed crappy async APIs do that to your applications and your system.)

    The deep queues are also explanation for the wavy latency curves: the queues are swinging gradually between full (high) and empty (low) states (latencies).