LOTE #5: Charles Pretzer on Service Meshes, Knative, and Interoperability and the SMI
In the fifth episode of the Ambassador Livin’ on the Edge podcast, Charles Pretzer, field engineer at Buoyant (stewards of Linkerd), discussed service meshes, the Service Mesh Interface (SMI) spec, and how to implement a function as a service (FaaS) experience with Knative, Linkerd, and Ambassador
Be sure to check out the additional episodes of the "Livin' on the Edge" podcast.
Key takeaways from the podcast included:
- Creating an effective continuous delivery pipeline is essential for enabling fast feedback for developers.
- Adopting new pipeline technologies and tooling can add a lot of value, but it is all too easy to break the pipeline by adopting the latest shiny projects. Balancing risk versus reward is the key to success here.
- Adopting a service mesh allows applications to use a better abstraction for service discovery, particularly within a dynamic environment. A service mesh can hide unnecessary details of underlying infrastructure e.g. service-to-service routing can be done via the use of normalized names instead of IP addresses.
- The decision of if and when to adopt a service mesh is multi-faceted. Similar functionality can be provided via a combination of language-specific libraries or a collection of OS utilities. However, when the number of services deployed into production becomes larger than ~10, this can be a good time to begin experimentation with a service mesh.
- A service mesh can provide increased availability (through circuit breakers and retries), security (via mTLS), and observability (enabling the easy emitting/collection of communication metrics)
- The Service Mesh Interface (SMI) is a standard interface for service meshes on Kubernetes. It defines a basic feature set for the most common service mesh use cases, and provides a foundation and set of abstractions for the community to innovate upon.
- Buoyant’s new Dive product builds on Linkerd and provides an automated delivery platform. The platform includes a service catalogue that can help engineers in understanding the (distributed) system on which they are working. This catalogue includes details of service dependencies, recent changes, associated SLOs, and service metadata, such as the owner.
- Knative, Linkerd, and the Ambassador API gateway can be integrated to provide a Kubernetes-based function-as-a-service (FaaS) platform.
- Kubernetes provides an ideal foundation on which to build a platform that supports hybrid workloads. This can enable the use of a range of application architectures, such as FaaS, microservices, and (well-designed) monoliths.
- The open source community provides a fantastic platform to learn and to share knowledge and tooling.
This week's guest
Charles Pretzer is a field engineer at Buoyant, where he spends his time collaborating and engaging with the open source community of the CNCF service mesh, Linkerd. He also enables production level adoption by helping companies integrate Linkerd into their Kubernetes based applications. Charles has spoken at meetups and conferences hosted by ABN Amro, Macnica, and at the NGINX Conference. When he’s not presenting or in hacking away at his computer, he’s riding a motorcycle or making a delicious mess in the kitchen.
Hello everyone. I'm Daniel Bryant, and I'd like to welcome you to the Ambassador Livin' on the Edge Podcast. The show that focuses on all things related to cloud native platforms, creating effective developer workflows, and building modern APIs. Today, I'm joined by Charles Pretzer, field engineer at Buoyant, the organization behind the Linkerd service mesh. I've bumped into Charles' work several times over the last year, and I was particularly interested in a recent blog post where he demonstrated the integration between Kubernetes, Knative, Ambassador and Linkerd, to provide a function as a service offering. I was keen to chat with Charles and explore these topics in more detail, both from a theoretical point, and also from an implementation point too.
If you like what you hear today, I would definitely encourage you to pop over to our website. That's getambassador.io, where we have a range of articles, white papers and videos that provide more information for engineers working in the Kubernetes and cloud space. You can also find links there to our latest releases, such as the Ambassador Edge Stack, our open source ambassador API gateway, and also our CNCF hosted telepresence tools too.
Hello, Charles, welcome to the show and thanks for joining me today.
Thank you so much for having me, it's an honor.
Could you briefly introduce yourself please, and share perhaps a bit of a journey or a recent career highlight for me?
My name is Charles Pretzer. I am a field engineer at Buoyant, which is the main sponsor for the Linkerd service mesh. And a recent career highlight, is being able to speak and present about Linkerd and service mesh concepts in general, at a lot of these conferences, just working with this team, it's been great.
So today I wanted to pick your brains around topics like serverless service mesh, you and I were talking off-mic Knative, Linkerd, of course. But dialing it back, just so that listeners can understand perhaps your background and some of the things where you've come from. I want to talk a little bit about developer experience in the inner dev loop. So could you share with me, perhaps without naming names to protect the innocent and the guilty here, the worst developer experience you had, from the coding to testing, to getting stuff into production?
That's a good question. There's actual companies that I've worked for that have had systems that are less than ideal, but I would say the one that continues to frustrate me on an ongoing basis, is the one that I create for myself. So I've got side projects, personal projects that I work on, and I iterate over this one process that I've been using for years. And so it gradually gets better. And then I'll try and out clever myself and add some new piece of technology, some buzzword technology into it. And then things go back downhill. I think overall the toughest part of the dev loop that I've liked the least, is the one that goes back to my consulting days, or my G2E days, I should say, working with monolithic application servers where it's like, okay, on Monday, we're going to start the QA tests. And if everything's good on Friday, we're going to deploy.
And now looking back at that, it just seems like that is so insane, because things have really gotten much tighter and much nicer, really. Such a good experience with some of the more modern workflows. I think without those older, longer iterative processes, you wouldn't have had somebody with the creativity or a group of people with the creativity to come and say, "You know what? We can make this better. We can do this better. Let's get creative and come up with something really interesting. That's going to enable and empower developers to get their code out there quicker."
Makes a lot of sense. In this day and age, distributed systems, microservices, Kubernetes, what do you think are the good things for it in a dev loop?
Yeah, well, from having done the developer side, and also I worked at a startup many years ago where I actually used to rack and stack servers in a data center. So doing a bit of the CIS admin side, which evolved into dev ops processes, I can say, but my developer brain wants just to write code and not worry about it when it gets out to the server. Whether it's at staging or QA or production environment, that it's going to behave the same, that there isn't something locally that I've done in my environment that is different enough that the code is going to perform differently. And what I think that boils down to, and this is for me, what I love about the service mesh and the abstraction that it provides is, I can just write my code. If I need to make an external call to a service in my development environment, it may work great. But if there's a firewall in production that prevents that from going out, that's a surprise. And we don't like surprises.
Totally makes sense. You mentioned there something I'm definitely keen to pick your brains on, because I've enjoyed reading your articles about Linkerd and using service mesh. Say folks are newish to the concept, what would your pitch be for why they should look at service mesh?
Sure. I would say in a few short words, service mesh is going to transparently add that observability, security, reliability, and traffic management, that ties into that nice developer workflow that we just talked about. Let's take observability. It's the top one that we talk about for service mesh in general, but specifically for Linkerd. If you want to get metrics out of your service, you have to instrument that service with something like maybe a Prometheus library, or something that's going to write metrics out, or give you an end point to collect metrics from an external service. Because the service mesh sits at that layer where it's proxying all the traffic, intercepting the traffic, it can take that away for you. So that's my pitch. Write less code.
What do you think the trade offs are with a service mesh? And I'm not just thinking on Linkerd here, obviously there's many service mesher out there. But what do folks need to weigh up when they're looking, should I mesh, or should I not?
That reminds me of the question that I get asked a lot, which is, do I need a service mesh? I think they're all in the same thought category. And I tell people, you can absolutely build this on your own if you want to. Where I have found ... And this is based off of conversations with folks in previous work experiences. I worked at NGINX before I was at a Buoyant. And they would ask me, "When should I implement a service measure? When should I take on that extra effort?" And for me, it's funny, I realized that I'm actually putting a number behind it. I say, when you've got like nine to 12 services, and you've got a team that ... At some point, the number of services that you're running, is going to grow larger than the amount of cognitive load of your entire team.
And at that point, the team wants to do the work that they want to do. They don't want to sit there watching, waiting for something to happen. I think that's the right time to start thinking about a service mesh. But then you have to consider things like how many instances of those nine to 12 services. Another thing I challenge people to think about is meantime to detection and meantime to resolution. And so with the observability in the metrics that a service mesh gives you, you can find problems faster. You can get them fixed faster. If you're doing live rollouts or Canary deployments testing in production, like all the cool kids are doing these days. You want to find those problems sooner, rather than later. And so a service mesh enables that.
Yeah, good stuff. Actually, that leads nicely into the next topic I was thinking about. Understanding how the service mesh is performing, understand what's going on in service mesh, understanding what services are involved, what dependencies there are. That seems super important now. And I'm guessing this is an active area of research. I know that Linkerd at Buoyant, you have Dive, the new product that's coming out. I guess my question is how important do you think this user experience, this developer experience is on top of the mesh?
I'm glad you brought up Dive. It is our first commercial product. And it's based off of the metrics that Linkerd emits. And we're supporting other service meshes as well, which is an interesting-
Interesting thing, yeah. So every conversation that we have with folks who are in the beta program for Dive, the question that William always asks is, "Is this valuable to you? Is this service catalog valuable to you?" And that answer is 100% yes. Nobody's ever said, no, this is not valuable to me. Understanding the topology of a very complex distributed system is really important.
And I'm guessing there's multiple aspects to that. There is the static snapshot of how things are all connected up. There's the runtime dynamics snapshot of how they're communicating. I guess there's probably multiple views needed.
Yeah. And I think those multiple views are driven by the desires of different folks in the organization. And so the people who are on the hook to actually answer pagers these days, they're very curious to know like how much error budget is left, or when code is rolling out. And so Dive in particular will tell you that. And that's based off of the metrics and the information that Linkerd collects. Another aspect of the service mesh. And this ties a bit into when we're using Ambassadors and ingress, the security that you can get, especially with Linkerd when we're ... Mutual TLS is enabled by default, right? And so when we inject the Ambassador ingress or the Ambassador gateway with Linkerd, when that traffic is coming in, generally that is coming in, it's TLS traffic. And so Ambassador can then terminate that TLS. But then Linkerd will rewrap it in its own mutual TLS, and pass it on to the other services inside of your cluster, that use East West traffic.
And that becomes very important for some companies just by regulation. You have to have that. And other companies, I was talking with one company recently and they don't have to have encrypted traffic in their East West traffic, but they want it. And the interesting thing about Kubernetes is, you hear more and more that it's a platform for building platforms. And many of the conversations that we're having with folks are, we expect to have a multi-tenant cluster. And in that case, you really want to have MTLS, because we're working on that zero trust operational model. So the integration with Ambassador has been pretty seamless. In fact, I'm working with another customer now, who's using Ambassador for their gateway, and they also wanted to initiate distributed tracing spans.
Yeah. And that's something that we added a couple of releases ago, but I haven't tried it yet with Ambassador. So that's the fun part of my job, is I get to go and try things out. And we talked a little bit off mic, about just learning and exploring and playing with technology. And this is like, I get paid to play with new technology, which is great. And when it works, that user is so happy.
Yeah, very cool. So we get asked a lot about MTLS, because there's almost table stakes these days particularly at ingress but definitely East West. We also get asked around the authorization model, increasingly folks are talking about things like open policy agent, OPA for, can this user making this request, access these services, or what role and stuff? Is that something like Linkerd or service meshes will look at, do you think, in addition to the TLS, will there be things like authorization?
Those are conversations that we have a lot. Today that is not a feature of Linkerd, and that's by design. The initial design for Linkerd was user experience, getting it installed in 60 seconds. And to your point, if people are putting it into production and they realize, we're not ready for this, it's actually very easy to uninstall Linkerd as well. We don't talk about that as much... But focus on user experience is really important to us. And so having something like authorization ties into some of the philosophical notions of what a service mesh should and shouldn't do. Which is why I'm really, really excited to see that the service mesh interface specification has taken off.
And that's how we use a service mesh interface spec to implement traffic splitting. So that then allows us to do the Canary deployments. As a result of that though, service mesh interface specification also has a policy. And so I suspect when we get into implementing that, it will be through something like Gatekeeper. So you have Linkerd using the service mesh interface spec, and Gatekeeper following service mesh interface spec. So now you have this common language that these two open source projects are talking. And if you want to change one out later, if whatever you're changing it with is following that spec, you're in good shape.
Yeah. I hear a lot about the power of open standards. I hadn't thought too much about the SMI and this part of this conversation, but it's a really good topic to bring up. I think, because again, I've seen a lot of Buoyant folks talk positively about SMI. Of course, many of the folks are involved with this, the Istio folks, Consul folks, the HashiCorp folks. The SMI I think is a really powerful distraction potentially to build tools on top of.
Agree. Totally agree. And like I said, that's why I'm personally excited about it. But I think that's the reason that the Linkerd team and Buoyant got involved with helping develop that spec. And when you look at the people who are involved with that, it's really impressive. These are technically companies that should be competitors.
They're actually getting involved and saying, "Let's make lives easier by simplifying things, and saying, this is our standard. This is what we're going to work off of." And from there, they can all go and run and develop what they think is the best product for it.
I think that what we see is with the SMI, or the service mesh spec, is there's a lot of similarity between say traffic control East Weest and at the edge. There's a definite overlap between service mesh and API gateway. Some service meshes even support the notion of ingress, these kinds of things. I wonder as an industry, if we are coming to some end point, so to speak where, like you say, we agree on common abstractions. We can all add value on top of those interchangeable standards, as you rightly said there, Charles. I think interchange, nice word. I like that. But it minimizes the risk to folks if they're trying to try things out?
Swap things out, things like that?
Yep. For sure. And that's the beauty of open source. When I first started thinking about open source and working on open source, I thought to myself ... And it was during my consultancy days, too, where I was like, why would anybody work on anything for free? But now you actually see the human value for it, you see ... Like I have derived value from this thing, please let me give back to it. And so that other people may get the same value. So I just love it. And that's one of the reasons that I love the Linkerd project so much.
Yeah. I love again with Datawire I love the open source policy. Obviously everyone's got to make money, in terms of, let's keep these things going forward. But if we can do some good in the community, if we can share our ideas and the other folks build on top of it, that's just a great ... We're super lucky, aren't we, to be able to work in these kind of projects. It's a privilege. Right?
Totally. Well, especially coming from my first job out of university was for an e-commerce application server company. And we were using CVS back then. And you had to have access to the repository and building. I don't even know what the build system was like back then. I'm pretty sure it took hours. So that dev loop, I don't think that would have been fun to be a part of. But like I said, I think that ties into the creativity aspect as well. So in the same way that undesirable dev loops foster creativity in building desirable dev loops, I would say open source inspires creativity. Whereas if I was just hammering away on some code, that's a private repository and it's just me architecting it and building it, maybe with one or two other people, there's a good chance that we've got this group thing happening.
But with open source, if you've got people who are coming in from different angles, and somebody who's looking at it from a different perspective and says, "This looks great, but what if we did it this way?" And I just think you don't get that in some of the closed source world.
Yeah. I hear you. I think there's a danger, as you said this almost over fitting the problem isn't there. We see this a lot with some of the other tools we work on at Datawire, like telepresence. It's easy enough to get it working on one machine, if it's a pet project for me. But anything to do with proxies, trying to get it working on more than one machine, like a Mac versus Windows, different ball game. And we like to rely on the community's test on say, Windows, give us feedback that we wouldn't otherwise have. So yeah, I'm with you, the community is amazing.
Yeah, and I'm eager to try the telepresence project for sure. Speaking of the community, I spend probably a third to half of my day in the Linkerd Slack chatting with people, shepherding them through. And it's always fun when somebody pops in there and says, "Hey, we met at such and such a conference." It's always a lot of fun. Community is really, really important to us.
Very nice. I wanted to pick your brains as a final biggest topic Charles, around Knative. That's how you and I met. I was reading your blog post around Knative Linkerd Ambassador. Great stuff. So there's obviously quite a few options you mentioned, I think in your blog post, OpenFaaS. I know Alex, awesome chap, several other projects out there. What made you pick Knative for your blog post?
It was an internal decision. Somebody said, "Hey, do you think we can get Knative running with Linkerd now that it doesn't rely on any other particular service mesh?" And again, that's the fun part of my job. It's like, let's go try it. I tell my wife, she goes, "How do you think we can get this done?" I said, "Well, there's only one way to find out. Just got to try it." And so that's the fun part of this job is, we just get to try stuff. So yeah. To your point, Open Fast and Alex, awesome project, awesome person. He's been a huge fan of Linkerd. And he's done so much writing on Open Fast and Linkerd, I almost felt guilty a little bit. Sorry, Alex. So just shout out to Alex, keep using Open Fast, keep up the great work. So the experience with Knative was, okay, let's see what we can do to make this work. And it was actually all very easy, very seamless. So again, that's a testament to the team at Knative project, certainly the awesome team at Datawire working on Ambassador.
And just the fact that these things are all working together almost seamlessly, is pretty impressive. The initial iteration of that blog post involved using a different ingress. And so we thought it would just be more fun to use different open source projects from different communities, and get it all working together. So, yeah, that was the first time I had worked with Knative. And so that was, again, hats off to all of the teams that are working really hard on those projects. Again, when I say all of the teams, I mean all the teams. Everybody who's contributing their time in open source. There was a tweet that went out last May in Barcelona KubeCon that somebody said, "Open source, we need to realize we're standing on the shoulders of giants." And it's true. The amount of heavy lifting that teams have done to make it easy for me to drop in some YAML, and get this up and running in like 30 minutes, is impressive.
Yeah. It takes real time, care and attention to make the user experience something as flawless. Wee see this as we're kind of spoiled with our Apple phones and our Android phones. And. We just take it for granted now? And we almost expect the same thing with dev products, but yeah, not always the case, right?
So looking at some of the Knative stuff, do you think folks are going to be running mixed workloads? So could you see like a Kubernetes customer being spun up, Linkerd in the mix? Could it be like a model FSA, some microservices, some serverless, or do you think folks are going to run one use case per cluster?
I would challenge anybody to have one use case per cluster. Much of the work that I did when I was at Nginx, was working with companies specifically to migrate a monolith to microservices. And prior to Nginx, a lot of the work that I've been doing is tearing down that J2EE application, and building it out into more of a service oriented architecture. I wouldn't call it microservices. But I think always in the back of my head, I knew there was going to be hybrid architecture. And the conversations that I'm having now with folks, there are two topics that people are really interested in with a service mesh.
One, is multi cluster, and the other is mesh expansion. For us, we call mesh expansion, being able to extend the service mesh to those two other workloads. So you have your control plane in the cluster, and it is managing work, or it has meshed those services within the cluster. But now you've got that old Legacy system, or you've torn down this monolith as far as it's going to go, or rewritten it to as small as it can be. And so you still need to get metrics and security, and you need that to be part of the service mesh as well. So it's expanding the mesh to wrap around it. Internally we call that mesh expansion. And to your point, I believe 100% we're going to see hybrid architectures. And mesh expansion to me is more important than multi cluster.
That is very interesting, it actually leads very nice into my final question. What do you think the future development platform experience is going to look like? A lot of folks I'm chatting to are saying it's all about functions. Maybe not 100% functions in the end, but something like that, akin to Lambda, Google functions, these things. What's your thoughts about the next five years say? What do you think dev is going to look like in the next five years?
I think you'll still continue to see overlap between developer and operations and dev ops practices. You're asking me to make predictions,
Always hard, don't worry!
I don't want to be wrong. No, I'm kidding. So yeah, I think we'll continue to see more of that melding together, but also a clear definition. And what I mean by that is, people will realize there's no such thing as a dev ops engineer, there are engineers who practice dev ops. And so I will say in the future, I think we're still going to have as many buzzwords as we do now. They're always going to be added into our conversations. But yeah, the developer experience, again, I've been using IntelliJ for nearly 15, 14 years. And that's because Eclipse drove me nuts, but I can see the developer experience going fully into just an ID. And so you're writing code and you hit save, or you press a button, and it just fires that code off into the ether. And next thing you know, you're testing it in production.
So I think that will be seamless, but I'd like to see more standards around that in particular. So that way, if we have a standardized way of working within our IDs, that results in code being pushed out, it will make everybody's jobs easier. It will make debugging things easier. So I don't know, that's a hot take on the future of development, where I'd like to see. And if I'm honest, that's my five year goal. In six years, I expect us to be able to think and the code just goes out there...
I'm sure Amazon are working on that as we speak. Every time I go to Reinvent, my mind is blown. So I can imagine that's being worked on there.
It wouldn't surprise me.
Awesome Charles, I really enjoyed chatting with you today, really appreciate your time. Thank you very much.
Well, thank you for having me. It's a great chat.