Deadlocks in the wild

Table of Contents

Deadlock is a tale as old as concurrency, and I’ve personally seen a fair share of it working with conventional, relational databases. But most recently, I stumbled upon a deadlock in h2

The issue

With a really high max concurrent stream count with a huge payload in HTTP/2, combined with a high number of Futures in a single connection, hyper client quickly hung. The time it took to get stuck seems random at first, but after tweaking some settings here and there, I got it to hang after 5 request-response most of the time.

Number 5 seems weird. A coincidence, perhaps?

Debugging it

I decided to enable logging with tracing_subscriber and found out that it always hangs after it got into a certain state. More precisely, it happens when a stream tried to send a payload, but can’t get enough connection capacity. HTTP/2 introduced a flow control strategy where it is basically a mechanism to handle connection and stream capacities on top of the TCP layer. This allows multiple streams in a single connection to stream data concurrenctly in any order.

With that in mind, let’s look at the log. The funny thing is that it always hangs soon after this line: stream capacity is 0.

2023-11-18T16:49:56.726594Z TRACE Connection{peer=Client}:poll:pop_frame:popped{stream.id=StreamId(517) stream.state=State { inner: HalfClosedLocal(AwaitingHeaders) }}: h2::proto::streams::prioritize: stream capacity is 0
2023-11-18T16:49:56.726599Z TRACE Connection{peer=Client}:poll:FramedWrite::flush: h2::codec::framed_write: flushing buffer
2023-11-18T16:49:56.726603Z TRACE Connection{peer=Client}:poll: tokio::task::waker: op="waker.clone" task.id=5
2023-11-18T16:49:56.726607Z TRACE Connection{peer=Client}:poll: tokio::task::waker: op="waker.drop" task.id=5

# HANGS

Interesting. Let’s dive into the code for a bit:

    if sz > 0 && stream_capacity == 0 {
        tracing::trace!("stream capacity is 0");
        stream.pending_send.push_front(buffer, frame.into());
        continue;
    }

It just run out of things that can be sent immediately because we don’t have any capacity to send. Well, where did all the connection capacity go? Let’s look at how we assign the pending capacity then:

    // on receive WINDOW_UPDATE
    if stream.send_flow.available() < stream.requested_send_capacity as usize
        && stream.send_flow.has_unavailable()
    {
        // The stream requires additional capacity and the stream's
        // window has available capacity, but the connection window
        // does not.
        //
        // In this case, the stream needs to be queued up for when the
        // connection has more capacity.
        self.pending_capacity.push(stream);
    }

prying on pending_capacity, there’s not only max_concurrent_stream amount of stream there. All the Futures are in pending_capacity. Aha! Now the deadlock sensor is tingling all over the place.

Ok, that’s cool. But I hadn’t really figure out why it got stuck after 5 request or why it got stuck at all! Yes, the capacity is poorly placed to non-sending streams, but that’s all we have right now.

Upon further inspection, there are a couple of things that stood out:

  1. the server sent back WINDOW_UPDATE frames in small increments.
  2. the payload max size is 5 * the max body buffer size.
  3. streams are put back to pending_capacity in a LIFO-manner.

That’s the problem. The inert Futures waiting in pending_capacity is just hogging all the connection capacity. We’re left without any capacity for sending messages in send-ready streams.

The fix

A really easy fix is to probably to put those ready to send in front of the queue. And it works perfectly fine.

    // if the stream needs capacity we add:
    if stream.is_send_ready() {
        // Prioritize assigning capacity to a send-ready stream
        self.pending_capacity.push_front(stream);
    } else {
        self.pending_capacity.push(stream);
    }

But this runs into a problem when all the send-ready streams sent some of their payload, while we receive a WINDOW_UPDATE frames. Those streams are not in pending_capacity when this happens and some of the connection capacity will go to those inert streams. This won’t get to a deadlock, but the connection capacity will be distributed poorly.

Let’s fix that again:

    // on receiving WINDOW_UPDATE
    if !stream.is_pending_open {
        self.try_assign_capacity(stream);
    }

    // somewhere in the code when we can open another stream
    if let Some(mut stream) = self.pop_pending_open(store, counts) {
        self.pending_send.push_front(&mut stream);
        self.try_assign_capacity(&mut stream);
    }

Done! that’s neat.

What now?

Well, the flow control in HTTP/2 turns out to be challenging. Since h2’s implementation is eager on assigning capacity, combining it with another Semaphore leads to another deadlock!

But that’s for another post.