
HTTP/3: Use Cases, Envoy Support, and Google’s Rollout
With HTTP/3 being supported by 70%+ of browsers (including Chrome, Firefox, and Edge), and the official spec being finalized in June 2022, now is the time that organizations are beginning a widespread rollout of this protocol to gain performance and reliability. As leaders in the implementation of the HTTP/3 spec, Google and the Envoy Proxy teams have been working on rolling this out for quite some time, and they have learned many lessons.
Alyssa Wilk, Senior Staff Software Engineer at Google, recently spoke with Daniel Bryant, Head of DevRel at Ambassador Labs. In a wide-ranging discussion that covered how HTTP/3 has been implemented over QUIC and UDP, the benefits and challenges offered by the new protocol, and the experience of Google publicly rolling out support for this protocol, a number of key themes emerged:
- HTTP/2 sped up HTTP/1 dramatically – but if you lose one packet on a connection, everything gets stalled until the packet is retransmitted.
- This is a fundamental limitation of TCP, so HTTP/3 speeds up HTTP/2 even more by implementing the protocol on top of UDP.
- The two big wins in HTTP/3 are the zero roundtrip handshake and improved congestion control. With the former, if you have already connected to the server previously you can bypass the three-way TCP handshake. With the latter, if you drop a packet, HTTP/3 will recover better and faster than HTTP/2.
- Moreover, because HTTP/3 is implemented in userspace, you get these performance benefits even if you haven’t updated (or can’t update) your operating system kernel.
- Because there’s on average 2% packet loss on the Internet, HTTP/3 benefits virtually everyone.
- End users who see even more benefit are those on lossier networks (e.g., emerging markets, mobile, IoT use cases) and those on old kernels (e.g., Windows users at large companies that don’t upgrade).
- Adding HTTP/3 support to a proxy, ingress, or API gateway is non-trivial (unlike HTTP/2) as the protocol has very sophisticated congestion control and cryptography that needs to be implemented.
Listen to the Entire Conversation
To dive into the details of Daniel and Alyssa’s conversation, read on.
Transcript
Daniel Bryant (00:00):
Hello and welcome to the Ambassador Labs podcast where we explore all things about cloud native platforms, developer control planes, and developer experience. I'm your host, Daniel Bryant, Head of DevRel here at Ambassador Labs, and today I had the pleasure of sitting down with Alyssa Wilk, Senior Staff Software Engineer at Google, and long-time committer in the Envoy Proxy project.
Join us for a fantastic discussion on HTTP/3, the latest version of the hypertext transfer protocol that is gaining wide adoption on the Internet. We cover topics such as what is HTTP/3 and the benefits it provides, and explore how Google rolled this out for internal and external use, including insight into some of the challenges they encountered. We’ll also cover how to use Envoy Proxy to get the benefits of HTTP/3 in your tech stack.
And remember, if you want to dive deeper into the motivation for and benefits of a cloud native developer control plane, or are new to Kubernetes and want to learn in our free Kubernetes Learning Center, please visit getambassador.io to learn more.
Hello Alyssa, welcome to the Ambassador Labs podcast. Could you introduce yourself to the listeners please, and share a little bit about your history with Google, and the Envoy communities?
Alyssa Wilk (00:09):
Hey, my name's Alyssa, I've been at Google for 15 years now. I started out on GFE team, working on Google's frontline proxy. I was a server side lead for Google's launch of HTTP/2. I was the Uber tech lead for our launch of HTTP/3. And then I switched over to working on Envoy, an open source proxy, which is now the forefront proxy of cloud.
Daniel Bryant (00:28):
Super. You think you're one of the top committers in Envoy if I remember right. Obviously Matt Kline and team started there, but Google and yourself, Alyssa, have committed a lot of code there, right?
Alyssa Wilk (00:36):
I will say I was really pleased that Matt noticed, because I have been tracking it because I'm a big goof. But Matt noticed and tweeted out when I passed him and I'm like, yes.
Daniel Bryant (00:45):
Very cool.
Alyssa Wilk (00:46):
It's a fantastic project. I've really enjoyed working on it.
Daniel Bryant (00:48):
Yeah. I mean, I've only been part of the peripheral Envoy community, but it's amazing seeing the popularity of the products, of the tooling and the fact that it's open source as opposed to commercial backing behind it I think is ... And that was obviously hat tip Matt and the whole community. But that's just meant that the Envoy proxy community has gone from strength to strength over the years, right?
Alyssa Wilk (01:07):
Yeah. I think Matt's desire, and he's really done a good job, both building the community and setting it up that anyone can add whatever ... Any code you want to add, if Envoy doesn't have your business logic, you can either add it upstream or we'll add an extension point, so you can add it in house if it doesn't make sense to share it with the world. So it's really made it the success that it is.
Daniel Bryant (01:25):
Perfect. And that leads nicely on I think to our main topic of conversation today around HTTP/3. Because I was super excited when Envoy was going to support this. And I'm guessing some of our listeners may not have heard too much about HTTP/3. I think a lot of us, it's the plumbing of the interwebs, we sometimes we ignore it, but it's actually really important. Particularly with many of us building distributed systems being consumed by the web. Could you give a brief intro of what you understand around the motivations for HTTP/3 for our listeners please.
Alyssa Wilk (01:51):
Yeah. So to do that, I'm going to talk a tiny bit about HTTP/2. So basically when I started on GFE team, we were implementing HTTP/1.1, which was the way that all of web browsers worked, as HTTP/1.1. And our tech lead at the time kept complaining about the fact that the RFC, that the rules for it, had all sorts of holes and weird ways you could do injection attacks. It was really insecure. And we kept saying, look, this is the standard. If you want to fix it, you fix it. And he did.
Alyssa Wilk (02:15):
So a couple years later, Roberto Payon came up with the HTTP/2 spec, which allows you, one, has a bunch of improved security features, but two, it also really improves latency. Which Google cares a lot about, is making the web fast.
Alyssa Wilk (02:27):
So HTTP/2 is a huge value add on HTTP/1, it sped up web search, it improved YouTube, but it still had this problem where it was putting multiple requests on one connection. And then if you lose one packet on that connection, everything gets stalled until that packet's retransmitted. And you can't fix that with TCP, which is the foundation of how the internet works.
Alyssa Wilk (02:45):
So we said, okay, well, we're going to fix that next. And so we basically, HTTP/3, I like to think of it as HTTP implemented on top of UDP. We implement both the HTTP layers, like your headers and body and trailers, along with the TCP layers, that reliable ... And then while we did it, we also managed to improve a bunch of things. So, people may or may not be familiar with TCP fast open, which allows you to avoid that three way TCP handshake.
Alyssa Wilk (03:09):
Rolling that out on the Internet's really hard because there's all these legacy hardware that doesn't expect it. And the protocol's supposed to fail over, but sometimes even trying to do it, the middle boxes are like, I don't know what you're sending, I'm going to black hole you.
Alyssa Wilk (03:21):
So it just turns out doing new things on the internet is really hard. And by bypassing TCP entirely, we can roll out a bunch of improvements to congestion control. We can roll
improvements to the handshake. And again, HTTP/3 is significantly faster than HTTP/2. And so it makes the web browsing experience much faster before for your average user as well as especially good in emerging markets, third world countries where the internet is much more lossy.
Daniel Bryant (03:45):
Perfect. There's a bunch of things I'd love to dive into. Clearly the protocol details, that'd be super interesting. But you hinted at the performance gains, even with HTTP/2. And I saw a fantastic EnvoyCon talk I think last year from you and Ryan Hamilton. And you mentioned about, I think with HTTP/3 already rolled out to some parts of the interwebs, these things like faster page loads, I think it was also reduced buffering on YouTube was a couple of examples. Is that the headline?
Alyssa Wilk (04:14):
The canonical way of measuring quality on YouTube is how often you stare at what we call the spinning wheel of doom internally. How often you stare at that spinning wheel while your video's loading. And the time between, basically waiting for your video to load. And so both of those are improved. We have fewer rebuffers, so less time watching that spinning wheel and more time between those rebuffers. And again, faster web search. Basically whether or not you are ... this fast content where you need a really quick response or you're streaming content, HTTP/3 tends to be much faster than anything, TCP, HTTP/2, HTTP/1.1.
Daniel Bryant (04:43):
Super. And you mentioned emerging markets, which I thought was very interesting there. I'm guessing you're talking about lossy connections, mobile connections, that kind of thing?
Alyssa Wilk (04:49):
Yeah. So the two big wins of HTTP/3 are one, the zero roundtrip handshake. So if you've talked to a server before, you can immediately talk to it again, you don't have to do the TCP three way handshake. You don't have to do your T left handshake. You can just say, I've talked to you, here's my safe credentials and here's my request right away.
Daniel Bryant (05:06):
Very nice.
Alyssa Wilk (05:07):
The other big win is, again, improved congestion control. So when you drop a packet, it recovers from it faster and better than TCP does. And then the nice thing also, even with the latest and greatest TCP, the new algorithms in Linux kernel, they take a really long time to roll out. So, if you cost improvement in Linux kernel, it maybe five years, it maybe a decade, it maybe never, that that actually makes it out to clients. We can update the servers, but you can't update all the clients.
Alyssa Wilk (05:33):
So ironically, one of the biggest wins we had for HTTP/3, when we rolled it out, it looked amazing. And then we rolled it out for mobile and it looked great. And we're saying, well, wait, why is it amazing? It turns out that the improvement on desktop was because there were so many old desktop machines that had old versions of Windows that hadn't been updated forever. And had really old, outdated versions of TCP that didn't have the algorithm improvements that were done a decade ago. So, because QUIC was implemented in user space, even though those operating systems aren't updated, you can still get that faster traffic because it's all done in user space. You can update it with your application or with your browser rollout.
Daniel Bryant (06:10):
That is super interesting because I had no idea about that. And I'm also thinking that related technology I'm hearing a lot about at the moment is EBPF, the kernel updates are taking a long time. So once you've got EBPF support, now you can ship stuff that runs in kernel land. But I hadn't thought about the user space thing there. As in people running really old Windows machines see improvement because you're running quick in user space.
Alyssa Wilk (06:31):
Exactly. I mean, the cool thing about HTTP/3, and having the server presence and the client presence that Google has end to end, we had Chrome and we had the GFE, is we could rapidly iterate transport in a way that no one else ever could. And that was one of the reasons I was really excited to add, both work on Envoy and add HTTP/3 support to Envoy, is now other people can iterate end to end.
Alyssa Wilk (06:47):
So we've talked to Envoy users who are behind, say, multiple CDNs. And so the way that you advertise HTTP/3 doesn't work. HTTP/3, you say this host name supports HHTP/3. But if you have a CDN, maybe one CDN supports it, one doesn't. So you can't do it for the whole host name.
Alyssa Wilk (07:03):
So there's proposed improvements to the standard for maybe you can advertise it for an IP block, or maybe you can advertise it with a shorter lifetime. And having it in Envoy, lets people actually experiment with this. Okay, what works on the internet? When people are switching back and forth from different ISPs and different CDNs, how well does this actually work out? And now it isn't just Google that can do it. With our end-to-end presence, people can do it with Envoy Mobile and Envoy Servers, which is really exciting to me.
Daniel Bryant (07:29):
That is super interesting. And there's a bunch of stuff I'd love to dive in there as well as I said. Just taking a step back, you mentioned a few things on congestion control and a few other things. And some of the things you mentioned earlier, is it the classic head of line blocking problem I hear about? It's one of those things like the offside rule and soccer, I'm not a hundred percent sure what it is, but I know it's important. And I know...
Alyssa Wilk (07:47):
Exactly. So, with HTTP/2, so you have requests one, two and three on one connection. So you send out response one, response two, response three, they're all buffered in the kernel, ready to go out. They're shipped on the wire. You lose a packet for response one. And then response two and response three are held up because TCP won't hand that data up to the client application until it has all of those bytes in order.
Alyssa Wilk (08:07):
With HTTP/3, the actual transport is aware of these different streams. So we can say, oh, I lost a packet for response one, I'll delay passing response one's body. But I have the full body for response two and response three, so I'll hand them up as soon as I have them. It doesn't have to wait for the entire stream to be reassembled because it's multiplexed aware. So that ends up being a huge win, especially anywhere there's ... There's usually about 2% packet loss on average on the internet. So it is something that's encountered everywhere, not just emerging markets.
Daniel Bryant (08:34):
Super interesting. I remember when I was at college, we were getting taught about, the IP is the low layer, then TCP on top and then TLS say, and then HTTP. It's also a bit different with HTTP/3 in my understanding. It's IP, then UDP and then QUIC, is that correct?
Alyssa Wilk (08:48):
Yeah. So it's layered, that we've implemented QUIC and then there's a reliable layer on top of QUIC. And then there's HTTP/3 on top of that reliable layer. So there are use cases where you just want the encryption and authentication and you don't need the reliability. But for HTTP/3 you use all three layers.
Daniel Bryant (09:03):
Super interesting.
Alyssa Wilk (09:04):
So, you're guaranteed to have your packets delivered. Sorry. You're guaranteed to have your packets delivered in the order that you sent them. You're guaranteed that you'll know whether it gets to the other point or not. But it's on that foundation of reliable transport with better loss handling.
Daniel Bryant (09:19):
Fantastic. And something I was curious there, is everything running over TLS, is it always an encrypted transport?
Alyssa Wilk (09:24):
Everywhere that we use it, yeah. You can do no encryption, and you shouldn't.
Daniel Bryant (09:31):
Okay. Yeah, that's interesting.
Alyssa Wilk (09:31):
I mean, one of the things with HTTP/3 is we're really trying to improve privacy on the internet. The IETF has spent a lot of time working on doing advanced HTTP/3 features such as connection migration and in a way that the users cannot be tracked as they go from their wifi to their cellular. And you can have that connection processed, but making sure that intermediates can't follow you across the internet, that they can't spy on you and look at what you're browsing. So there's been a lot of thought put into the security, making sure that people's data is safe. Which I think is again really fantastic.
Daniel Bryant (10:04):
Yeah. Plus one to that. That's very cool. Excellent. So moving on a little bit, Alyssa. How did you roll this out at Google? There's clearly benefits we've heard. But was there any challenges you bumped into?
Alyssa Wilk (10:14):
So, I mean, the first thing is when we started on this, we had no idea if when we roll out HTTP/3 we were going to, I want to say break the internet, obviously wouldn't have broken the internet. But what we didn't know is if, when we rolled this out, the internet, ISPs and middle boxes and whatnot have some amount of capacity for processing TCP. And they have some for processing UDP. They have to do UDP because you do DNS lookups and video streaming and whatnot.
Alyssa Wilk (10:40):
But we didn't know if the amount of UDP hardware processing would be overwhelmed. Like we'd start rolling out, and then we'd hit a point where we were hitting packet loss because we were overwhelming ISPs and we had to back off. So the entire project was actually a big gamble from day one.
Alyssa Wilk (10:54):
And we had to roll out really, really, really slowly and carefully to make sure that the ISPs weren't caught unawares, that we didn't slow down anything. And as it turns out, we started out literally just the developers on our team with a hard coded flag saying use HTTP/3. So all of Google servers supported it, but you only used it if you opted in. So it's just us and the team.
Alyssa Wilk (11:16):
And then we sent out plea emails to everyone we worked with saying, please run Chrome with this custom flag so it's not just us. And making sure, again, there's weird pathological corner cases on the internet. And we wanted to be the ones to find them, not our users.
Alyssa Wilk (11:29):
And then when that worked out well, we rolled it out to Chrome Canary, which is, again, people who have opted into getting the latest and greatest with the understanding it may not work well. So, they know what they're getting into. They want the best experience and they're willing to take the risk. If it doesn't work, they can fail over to regular Chrome. So these, again, are the power users who signed onto this. So we rolled it out to them.
Alyssa Wilk (11:49):
Again, we had to do these A/B experiments where you say this group is assigned to TCP, this group is assigned to HTTP/3 or QUIC. And again, you can't compare the latency of TCP to QUIC because then what's happened is the people using HTTP/3, we had to make sure, we knew the internet wouldn't all support UDP 443, so you have to make sure that you're measuring the entire group of people assigned to it. And if their ISP or their corporation blocks that port, they fail over to TCP. You have to measure that. That adds latency for them. So you have to make sure that not only is HTTP/3 faster than TCP, but in aggregate, even if you're trying HTTP/3 and failing over, because that's going to happen for some percentage, that it's still a net win overall.
Alyssa Wilk (12:35):
So yeah, we rolled it out. And again, as we rolled out wider and wider, we found some interesting corner cases. There was one specific airline where ... we generally said, if UDP 443 is blocked, you send your packet, your handshake fails, you fail over to TCP. Easy peasy. It turns out that some ISPs wouldn't block the handshake. They'd say, yeah, you're allowed to send maybe a hundred packets, and then we're going to black hole you.
Alyssa Wilk (12:57):
So we do the handshake, we'd say, great, QUIC works, we're going to use it, it's faster. And then it stops working midway through. So we had to add more and more sophisticated, we call it black hole detection. But you can't do something naive and say, oh, if HTTP/3 doesn't work, always fail over because then someone closes their laptop. It shuts down your networking card first. It looks like that doesn't work. And then everyone would always use TCP.
Alyssa Wilk (13:18):
So trying to find the right balance where we weren't too aggressive about failing over, but we also were aggressive enough that no one lost their internet connection. It turned out to be a challenge. And again, we have great developers who are super responsive. And lots of trace logging and error diagnostics and whatnot to make sure that we were doing well by our users while improving the internet for everyone.
Daniel Bryant (13:39):
Fantastic. Fantastic. And I'm guessing some problems are Google problems. I'm doing air quotes for the listeners. And some problems would be problems that folks might see when implementing it themselves. So say someone is adopting Envoy maybe within one of the source gateways out there. Would they bump into any of these problems that Google saw?
Alyssa Wilk (13:55):
So again, the hope is not. So what we're doing is we're implementing the similar black hole detection in Envoy that we did. And we actually have the same development team working on it. So these are people who've been doing this for five years, who know all the weird pathological corner cases on the internet.
Alyssa Wilk (14:08):
And this is why I'm so excited to have an open source HTTP/3 client and service stack is when we did HTTP/2, HTTP/2 is pretty simple, you just slap a demux on HTTP/1.1, and a different framer and you're good to go. It's a great computer science homework project.
Alyssa Wilk (14:24):
HTTP/3 is a whole different kettle of fish, it's super complex congesting control, it's super complicated crypto, it's really easy to get crypto wrong with a zero round trip handshake and latch credentials. And literally I launched this, I don't understand the details of all the crypto, that is offloaded to Google's crypto experts.
Alyssa Wilk (14:42):
So with HTTP/3, again, your average developer can't implement it. And that's actually why Google open sourced our implementation. And now there's a bunch of open source implementations by different companies that, again, have the congestion control experts and have the crypto experts to do it right. And then to share it so that people who don't have that expertise can still get all the benefits.
Alyssa Wilk (15:04):
And so in having the Envoy stack, it's not just about, which is important, it's also about those failover mechanisms, about getting the timing correct and the black hole detection correct, so that it all works well on the internet. And so our goal is to make sure that it's just plug in, drop in usable by everyone.
Daniel Bryant (15:19):
That's awesome. And one thing I definitely learned from my software development days, never do your own crypto, and build upon the success of other folks. I think that's the two things I definitely learned.
Alyssa Wilk (15:24):
Exactly.
Daniel Bryant (15:26):
I was curious, how easy will it be for folks to adopt? Say they've got an Envoy config already, how easy once they upgrade their Envoy to a HTTP/3 supporting version of Envoy, how easy is it going to be for them to a) implement, I guess, HTTP/3 support with the config. And then b) I'm guessing we'll still have to have the fall back mechanism, so we're still going to have to have TCP listener, that kind of thing, right?
Alyssa Wilk (15:45):
Yeah. There's two ways to do it. You can do explicit HTGTP/3 or you can do, as you say, this failover, we call it auto config. So if you're in your own data center, you know UDP 443 works, you know you're not blocked, you're on your own hardware. You can just hard configure HTTP/3 and that's totally fine. And so there's documentation on the Envoy website on how to do both of these by the way.
Alyssa Wilk (16:03):
And then for internet facing clients, if you're Google and trawling the web, or you're running Envoy Mobile, and you have an Envoy client that's talking to servers, there's just an auto config. And the auto config, you can figure, just one thing, you say I want to have Envoy pick the right protocol. Here is my HTTP/3 config with my TLS credentials. And then Envoy will try to do HTTP/3. And if it doesn't work, as I said, it uses these mechanisms, fails over to TLS. And then with TLS, it does ALPN to negotiate HTTP/1 or HTTP/2. So it literally just picks the best available protocol and does it automatically. That was the goal. So, one config, you can literally copy paste from the example configs, and you're good to go.
Daniel Bryant (16:45):
Awesome, Alyssa. Awesome. Do you think folks are going to bump into any challenges if they're using a cloud vendor for example? I've been working with Ambassador Edge stack and MS Ingress and even Envoy itself, I've had some challenges sometimes, various clouds, in the way they expose their different versions of their low balancers shall we say. And I won't pick on any cloud vendors in particular. But do you think there's going to be any problems for folks as they're changing their stack now to support HTTP/3? Is there going to be any weird ... you've hinted at CDNs already being a bit of a challenge, is there going to be any strange things with low balances, do you think?
Alyssa Wilk (17:13):
I would hope not. I mean, the load balancing ... it's like any other configuration option, as long as you have your user interface pretty clean, you could have a ticky box for the use of HTTP/3 instead of a bunch of ugly YAML. I would think it would be pretty straightforward. But again, if there's problems, we all live on Slack, and there's the normal Slack Dev, but there's also a channel specifically for UDP help that I encourage people to go poke at if you have questions.
Daniel Bryant (17:36):
Very cool. Very cool. Wrapping up, I guess, now, I'm curious, I've been thinking about HTTP a lot from user facing websites. Is there an advantage to running it internal because obviously loads of folks loving Service Mesh at the moment, STO, take your pick, there's many out there. Envoy obviously will be baked into those service meshes that use Istio or Envoy. Is there advantages internally for running HTTP/3 and maybe should you start there first and then expose it publicly?
Alyssa Wilk (18:00):
So the advantage, it really depends on where you're running, but yes, there are advantages to running it internally. Because, again, if there's loss, you recover from that faster. So at Google we have our own quality of service tagging. And so our high quality traffic, AF4, basically gets zero packet loss. There's not huge wins there.
Alyssa Wilk (18:17):
But if you're AF3, there is something that'll loss. So, there are advantages. And if you're running on a cloud platform, generally you're not the super high priority service. Just because you need to have your control traffic there. So there can be advantages there.
Alyssa Wilk (18:29):
And then that zero roundtrip handshake, again, if you don't have a pre-established connection, you can have your connection, you don't have to worry about that handshake latency. So it really does help with cold start.
Alyssa Wilk (18:40):
So, there can be advantages there. And then again on the open internet, we know there's packet loss and we know there's this old hardware and old kernels. And again, there may be older kernels where you're running, especially if you don't control your own platform from the ground up. So yeah, I'd say there's definitely advantages internally and externally.
Daniel Bryant (18:56):
Awesome. And do you think it's good to perhaps experiment internally and then move externally? Where would you start, if folks are embracing HTTP/3, I'm guessing they're going to see the most advantages public facing?
Alyssa Wilk (19:06):
Yeah. Yeah. Yeah, there's pros and cons. I think you get bigger wins doing it externally. But it's easier debug things when you have access to all the logs and you can trace everything. So it depends, if you're not familiar with the technology, it may be worth rolling out internally, just so you get familiar with it before it's out on the internet. Debugging clients is harder, internet clients.
Daniel Bryant (19:29):
Yeah. Super interesting. Super interesting. So breaking the fourth wall for a moment. Is there anything you think I should have asked you that I haven't?
Alyssa Wilk (19:37):
Not that I can think of. No. I mean, again, I think most of this is out there. I think there's a lot of exciting technologies. There's a lot of places to read about it on the internet. But I do think that it's always hard to get started. I love things like this that get people in touch with like, okay, I'm using Envoy, is it safe? Is it ready? Where do I go for help? We really do encourage people, the Slack channel is Envoy UDP QUIC dev, if you have questions. And the Google QUIC team, as well as other Envoy folks are hanging out there, happy to help out.
Daniel Bryant (20:07):
Fantastic. That was actually going to be my final question. I was going to ask, where should we contact you? But you've already done that perfectly. Awesome. It's been amazing, Alyssa. I've learned so much, we've covered so many things in the, well, chatting for 20/25 minutes. It's been awesome. A bunch of notes here. I'll link everything you mentioned in the show notes as well. But thank you so much for your time. Really appreciate it.
Alyssa Wilk (20:24):
Absolutely. Have a great one.