Developer Control Planes: A Cloud Leader's Point of View
Kelsey Hightower (@kelseyhightower), Principal Engineer and Staff Developer Advocate at Google, and a leading voice in the cloud native and Kubernetes space, talked with Ambassador Labs's Daniel Bryant about a range of subjects, including what shifting left and developer responsibility really means, promise theory and what developers really need to be successful, and developer empathy, or stopping the practice of "cart-before-horse" coding to get to the heart of what users want first.
Here are some of the key takeaways from their conversation:
- Define responsibility
The move to "shift left" has received a lot of attention, but it's not only the developer who is affected. While the developer, and the software, will benefit from the developer getting all the information needed as early as possible, dependencies unfold further downstream. While developer ownership or understanding is important, it is just as important to define who is responsible for different steps, e.g., "If you're the developer, you will have some responsibility for the "ingredients" you add to the mix. You will be asked to understand and answer for some of the choices you make. And beyond the developer, everyone needs to be aware of their responsibility in that pipeline."
- Centralize infrastructure
Kubernetes, at the end of the day, is a last-mile technology that requires instruction to do what the developer intends for it to do. As the developer takes on more responsibility, or at least needs to understand the consequences of what they code. Centralizing the experience around a platform or control plane enables understanding of what's required to deploy and run code without the complexity of having to manage it completely. It's about creating a platform that, with some configuration, the developer can trust to run their code as instructed. "The control plane concept or experience is the evolution of this, where things take off for all sorts of infrastructure platforms. You now have a central place to hold state, converge it and keep it true over time. Kubernetes just happens to be what we consider a universal control."
- Empathetic engineering
If engineers are coding applications and features without using them or considering the user experience, it is completely devoid of empathy. And what is the point of creating software if isn't serving the needs of its users and helping them accomplish their goals? Engineering empathy for the user into the coding process requires developing human, or soft, skills, like communication and putting oneself into the user's shoes. "It does not work any more to create software in a vacuum. The user reaction should be, 'Wow, someone thought about how I would use this; it's intuitive, it's frictionless'." It is through empathetic engineering that developers will achieve this outcome.
- Allow time for research
The industry as a whole would benefit from giving developers more time not only to develop their soft skills but also to do their research, both to understand what has already been created in the world and to gain a better understanding of where gaps and needs are and the psychology of users' behavior.
"Code should be the final step in the software development process. If you get really good at the human side of it, I think you will end up writing much better software."
- Answer the maturity call: Security
The supply chain of software is the next great step in the maturity curve for cloud-native computing. Developers should be able to account for what is actually in the software. The importance of developers taking greater ownership of what they code and its dependencies is becoming codified in the form of software bills of materials (SBOMs). SBOMs ensure the integrity of the software supply chain and help elevate application security.
"Education, centralization and discipline will help us really focus attention on what's in our software. And knowing that we have the power, control and responsibility to make sure that only the things that are necessary to be in there."
Listen to Kelsey and Daniel's full discussion below.
Hello, and welcome to the Ambassador Labs Podcast, where we explore all things about cloud native platforms, developer control planes, and developer experience. I'm your host, Daniel Bryant, head of Dev Rel here at Ambassador Labs. And today, I had the pleasure of sitting down with Kelsey Hightower, a well known technical leader within the cloud native ecosystem, an author, mentor, and all around awesome individual. Join us for a fantastic discussion covering topics, such as securing the software supply chain, empathetic engineering and the benefits and challenges of a centralizing infrastructure. And remember, if you want to dive deeper into the motivations for and benefits of a cloud native developer control plane, or are new to Kubernetes and want to learn more in our free Kubernetes Developer Learning Center, please visit getambassador.io to learn more. So, welcome Kelsey. Many thanks for joining us today. I'm sure many folks will know who you are and we've done introductions I think, in past podcasts. Could you start then, by sharing perhaps, what is most exciting for you in the tech space at the moment? Or, perhaps something you are working on that's super exciting.
Yeah, I'm spending a lot of time in the security space. I think the secure supply chain is getting a lot of attention these days. So, for those unfamiliar, this is about reproducible builds, a bill of materials. So, what's inside of your software, all of the dependencies, the transient dependencies. I think the industry is now mandating a little bit more maturity about what we're building and what we're shipping and being able to go back and prove what's inside of the software that we are building. So, that's where I'm spending a lot of time. S3C or SLSA framework is something Google put out not too long ago as a standard to try to address all of these attack vectors, starting from the source code that you bring into your projects, all the way to deploy time, guarantees and everything in between.
Very nice Kelsey, very nice. So, I'm definitely hearing, in relation to platforms, we're going to dive into a bit more in a moment, the whole shift left thing's become a big thing recently, right? It's been... well, arguably for quite some time, but is your pitch at the moment that security needs to be baked in day zero, really? And then, all the way through to day one, day two and beyond?
Yeah, I think traditionally, we try to use things like scanning tools, once the code is committed, we try to do a little due diligence there, but I think what we're starting to learn... And I think shift left is an okay way to think about it. So, for those unfamiliar, with shifting left, you try to provide more tools closer to the developer, away from the platform. So, shifting left towards the developer, but the truth is, it's going to take everyone in the pipeline. So, if you're a developer, only you know what dependencies you actually need. And in many cases, only you can resolve those dependencies, if they're proven to have some, either security issue, or there's a problem that violates your company policy, right? We're not just talking about software bugs anymore. We're talking about sovereignty issues, right? Maybe, your country-
Doesn't allow a dependency to be written by someone in a different country. So, these are things where, if you give the information to the developer earlier in the process, then the developer has an opportunity to say, "Well, I'll switch that library out for another library that does meet my company security policies."
Mm-hmm (affirmative). Fascinating, Kelsey, fascinating. And it's definitely something I'm keen to dive into a bit more in a moment, is where does the responsibility cross over? Right? Because, we've got this Dev persona, Ops persona/platform. And then, in the middle, increasingly are SRE, right? And where does the responsibility lie, do you think, for the security of dependencies, for example?
Well, if you think about food, for example. Where does the responsibility lie, right? So, if you're a baker and you bake a cake and you put some ingredients inside of the cake that make people sick, then you're going to be responsible for what you put into the cake. And now, if you ship that cake to a grocery store to be sold and it gets contaminated on the way, then the person driving the delivery truck, or the delivery company can be held accountable for allowing other ingredients to pollute, or contaminate, the cake in transport. And then, if you're the person selling the cake and you allow something to go wrong, either you allow that cake to sit longer than its expiration date-
And that makes people sick. So, everyone's accountable here. I think what we're doing now is being very clear about what you can do at every step of the way. So, if you're the software developer, well, you're going to have to have some responsibility on the ingredients that go into it. And we're going to ask you more questions around, "Do you really need that additional library that has 50 transient dependencies? Is there a better library that doesn't have any transient dependencies, that has only the functionality you need?" So, we just need everyone to be aware of their responsibility in the pipeline. And then, what we're going to do is we're going to make that whole process more transparent by having things like the SBOM, Software Bill of Materials.
That says, "This is what's inside of this," and if you're the developer and you look at that report and you say, "Wow, I put that in there. And that isn't the right ingredient for this particular software application."
Yeah. Very nice Kelsey, very nice. And that comes full circle to the reason I reached out to you, is that I put a tweet of yours from February 2019 in a DevEx Days presentation I did at KubeCon. You said the Delta between Kubernetes and a developer-friendly PaaS, is where the next level of value is and where things tend to get opinionated, a requirement for reliable end-to-end workflows. And if I'm understanding correctly, this thing would include for example, security, right? Workflows in the notion of, you're responsible for that chain of value, right? From coding, shipping, and running. Is that what I'm understanding correctly?
Yeah. I mean, it's always been the case, that none of this stuff, virtual machines, bare metal machines, Linux, Solaris, now Kubernetes, none of these things were developer oriented, these were infrastructure-
We've just been leaking them to the developers, right? We gave them SSH, we gave them Bash shells, we gave them system D, we gave them all of these things that allow you to do pretty much anything you want. I think what that tweet was resembling is that Kubernetes gives you more infrastructure, better APIs. But, I think at the end of the day, it's still a last-mile technology. So, what a developer needs to be successful is something that they can trust, as a deployment target. So, ideally I'm building my code. I'm making sure that it's packaged correctly. All of my dependencies are in there, in order for it to run, but then how it gets deployed. I just need something that I can trust that if I give you my application and a bit of configuration to say, "I need three of these across the United States of America," your platform should be able to do that. And the fact is Kubernetes doesn't necessarily by itself give you the tools to just articulate that requirement without gluing a bunch of things on top.
100% Kelsey, 100%. And to use this off-use analogy, any insight into why we haven't got our Heroku for Kubernetes yet? And the reason I mentioned Heroku as well is because I used to love build packs, right? And we're now seeing cloud native build packs. You can do things like SBOMs, Software Bill Of Materials in the build packs. But, taking a step back, as in the bigger picture, do you think there will be a Heroku for Kubernetes? And if so, is it on the horizon?
If there was a Heroku for Kubernetes, you wouldn't want to use it anyway. Everyone keeps asking for simplicity, simplicity, simplicity, but it's the people who make this all complex. If everyone would just write in the same programming language, using the exact same frameworks, we wouldn't have to deal with any of this.
So, it's not about one individual's desire for simplicity. You have to account for the tens of millions of developers and organizations that just want to do whatever they want, in any language that they want, in any framework that they want. And so, since you have all of these permutations of what people want to do, trying to build a single system, that accounts for that is impossible. So, yes, you can have a Heroku, but I don't think it's going to be a sustainable business model because the subset of people who can use that for all of their compute needs is very small.
And so, what we've seen, and we've seen this from history. If you come up with a simple platform, it will definitely do a good job at simple use cases. The minute you have a large organization need to use a second platform, or a third platform, and in a time you do something like Kubernetes in the mix, we know that Kubernetes can do what Heroku can do, but we know Heroku cannot do what Kubernetes can do. So, if you're stuck using both, more than likely that organization's going to have pressure to consolidate.
So, all the stuff that was-
Running in Heroku will end up running in Kubernetes. So, I think what we need to see now, is something as flexible in terms of its ability to run a range of workloads with the simplicity or flexibility of the Kubernetes style API. I think that is on the horizon and things like Cloud Run. We saw that-
Azure just shipped their own container runtime, built around Kubernetes-
I think that's when we're going to get there. I don't think Heroku-style PaaS will be it. I think it was going to be something in this CaaS, Container as a Service platform.
Ah, very nice. That's riffing off the conversation I saw you had with Joe at the Pulumi Cloud Engineering Summit. And you talked a lot about the Kubernetes resource model, the API, the model, how the reconciliation. Do you think that's where the magic is at the moment, in terms of using those APIs, implementing those APIs, implementing the model, the workflows? Does it matter whether it's CaaS, or PaaS, or something like that? Is the Kubernetes resource model going to be at the core of all this?
So, going back to Promise Theory, Mark Burgess, CFEngine, Puppet, Chef, Ansible.
That Promise Theory is that, infrastructure is hard. You can't just run a script and assume everything's going to go right the first time you run that script. Load balancing just takes time to update. Security credentials take time to propagate. Logs take time to collect. It just takes a while. So, the only thing you can do is attempt to describe what you want. "I want this application, I want three copies of this app, with this much memory and this much CPU. And I want it to be behind a load balancer, reachable from this region." Right? That's the developer's intent. So, what is the best way to capture that? So, our industry spent 15 to 30 years scripting this out. You would run the script. And even if it worked, if a machine were to go down, or load balancer were to go down, the script would have to be run again, and it probably wouldn't work again.
Because the script assumed all of those things were up during the script's runtime. And if something goes down, well, that script isn't continuously running. So, therefore you're just down.
Promise Theory says, "Look, you need to make a promise. And then, the machinery in the background should be running 24/7, to try to keep the promise true." And so, when we talk about the KRM, the Kubernetes model, it's not about Kubernetes. I know a lot of people get stuck on, "Oh, my God. He said, Kubernetes, let's stop thinking and say, Kubernetes is not as good as Heroku." Who cares? These are implementation details. The Kubernetes resource model can be used independently of a container cluster. That's the first thing you have to reason about in order to understand what I'm going to say next. Now, Promise Theory, we need a way to make a promise. Whether you like YAML or not, it just doesn't matter.
As a human being, you need to ask for what you want the infrastructure to do. It can be XML, it can be JSON, it can be a text file. The world can sort it out. There's many spoken languages too. So, humans have to articulate their desire to the infrastructure. And today, we have the Kubernetes resource model, which is very much a REST-like interface that says, "Here is a resource type, here is a schema. And based on the schema, here's what it can do." And it has this other unique property that I think is the very first in the Promise Theory evolution of these tools, which is the status field. And so, in order to break this down, when you take the care, you want to describe a container being deployed.
Well, now you can say, "I want a deployment object, with three copies, and I would like it to be in this cluster." Great, you can submit that promise, or this ask. And then, what happens is all the controllers and Kubernetes will do whatever it takes to keep that true. So, if two of your five nodes go down, well, the control loops will move your containers to something that's already running to hold the promise and update the status field. So, given that, I think you can apply this logic, the Upbound and Crossplane people are doing this-
Shipa are doing this, CloudRun is doing this. So, now I can describe any type of infrastructure, whether it's networking, SSL certificates, you name it, with the same type of model, the KRM. So, this is where I think, having something where a user can write down what they want, submit it to a control plane, which can preserve the state, and then have actuation loops behind the scenes, converge and keep it true. It's just the evolution of this whole-
Well said Kelsey, well said. Big fan of Crossplane's work, and I've been playing around with that. And yeah, very interesting. You mentioned control planes there, super interesting What do you think the control planes will look like? Because, I saw you tweeting about the Azure container apps. I know you're a big fan of CloudRun as well. Would the control plane be like, maybe command line tools that I would use for that differ from something like Crossplane, if I was using both of them?
Yeah. So, control plane... So, you mentioned Crossplane. Crossplane is a control plane framework, almost like an SDK, if you will. So, you can design your own custom control planes based on the KRM.
And the purpose of a control plane, number one, typically is to store the state that you want. And it also hosts the actuation engines to make it true. And then, you could do other things in the control plane, like limit who can do these things. So, RBAC control, you can do things like billing, chargeback, logging. So, the control plane idea is that you can centralize a lot of these operations. So, if you have a command line tool instead of the command line tool doing all the work, like today's version of Terraform, right?
It's a control plane and the data plane all in one. So, it tries to call all of the APIs and make things happen
But, if you turn it off, that's it. If you close your laptop, the command line tool, which is also a control plane, stops. And so, no more promises can be kept until you turn it back on. So, in modern-day control planes, like you see in the cloud, right? When you go to any cloud provider, you're typically interacting with two things, the UI and the control plane. Most people confuse the two, right? They bundle the UI with the control plane, but they're two separate things under the covers. And they're just clients, right? The UI is a client in your web browser. And then, the command line tool is a client on your command prompt. But, this control plane concept, I think, is where things take off for all infrastructure platforms. You now have a central place to whole state conversion and keep it true over time and Kubernetes just happens to be what we consider a universal control plane.
Hmm. Interesting Kelsey, very interesting. Do you think GitOps in that situation would almost be like a protocol? So, the control plane would hold the state, but something you mentioned, like the actuators, is that where something like GitOps would play into, as a pattern for reconciling declared state and actual state somewhere?
Yeah, I think GitOps became a vacuum that started sucking in so many concepts that, I don't know if GitOps actually means as much to most people anymore. But, I think at the very top of the GitOps concept, this idea that this state can be version controlled. That's the big advantage here, is to say, "Look, all these applications can now be represented by this artifact." Let's just call it YAML files to keep it simple. And so, now that we can model our intent, we can store it somewhere. We can review it, we can branch it, we can diff it now. This is great. And so, once you have that in play, you can now do releases, right? I can now say, "Hey, now that I have all of this infrastructure articulated and stored in version control, I can now cut a release. So, I can now tag this infrastructure repository."
It could be as big or small as you want. Some companies will attempt to model all of their environments, all their applications in one big repository, you could do that. Some teams will just model their application that they're responsible for, in their own repository, but either way, you now have the way to version control and cut a release. Once you cut a release, there are different ways of deploying that release. So, some GitOps implementations and practices, there's going to be another control loop that will simply watch these Git repositories. And if you tag something, it will then deploy that tag into the cluster. And it will just keep this control loop going. So, anytime you release something, it will pull it in and apply those changes. And this only works because Kubernetes knows how to merge these configurations and resolve itself to the new states. So, you don't have to run a bunch of scripts. So, this is why people talk about it being GitOps, because Git tends to be where things are collaborated on and eventually actuated on by the controllers running in the target control plane.
Very interesting, Kelsey. Yeah, thanks. That's a great explanation of what I was thinking around some of the GitOps challenges there. So, I wanted to pivot a moment now, towards the final bit of the podcast and talk a little bit about empathetic engineering. I saw you talk about this again at the Pulumi Cloud Engineering Summit. Super interesting, definitely big fan of this, in general. I believe it came out of your piloting, the Google Customer Empathy Sessions. Is that right?
Yeah. It's something where I think a lot of companies have experience with what they call, a dog food program, or a hackathon, where you get together and you use your own product. I decided to call it empathetic engineering as a discipline. And the goal is to make it closer to something like SRE, right? SRE is a philosophy towards reliability. Some companies will say operational stuff, but whatever. You have this mentality around, what does it mean to have empathy during the entire life cycle of an engineering process? And so, it was born from the early days of Kubernetes getting the Google's Kubernetes Engineering Team together to use the product the way our customers were. But also, the results were not just better products. We got a lot of improvements from those things, but also a person who now approaches software design with empathy, like for the user. What would the user do with this software? And if they can't do that, then I need to change the software. And I thought the best approach was to get hands on and fill it for yourself.
Yeah, totally makes sense, totally makes sense. So, is it a human skill as much as a technical skill? Right. As I would imagine, as a backend engineer, I mainly used to be comfortable talking about code and ops and so forth. Do I have to learn new things to be able to do empathetic engineering?
Yeah. I think, in software engineering, for at least the 20 years of my career, you could get very far without actually being really good at the human stuff. You don't have to actually-
Be good at inspiring people. There's this legend of the grumpy system administrator, right? You could get very far without customer service skills. You can get very far without even using your own product. And it just doesn't work anymore, because there are companies who actually do care about their users and their customers to the point where they're also their own users and customers. And you've felt an application where it's like, "Wow, someone must have used this before."
Everything seems to be in the right place. It feels intuitive. You feel like you can accomplish most tasks without much friction. And then, there's software where it feels like there must be no one really using this thing. It's all over the place. It feels like it's a torture to do anything. And so, I think what's changing now is given that we have nice mobile experiences. We have so many delightful experiences where companies that don't have that, they're feeling the pressure. So, I just think it's becoming way more important. So, it is a set of human skills. And again, if you look at most job letters in IT, there is not a lot of emphasis on having great customer service, being empathetic. All of the words that we used to classify as soft skills, even though they're hard to do.
Yeah, yeah. And have you got any advice, Kelsey, for folks? Maybe, it's courses, books, just things to practice. If folks are listening and they want to become more empathetic as an engineer, how would they go about best doing that?
At some point we got to realize that engineering, writing code is the last part of the process. It's the last phase. Deciding what to build is where a lot of the work should be. And so, how do you learn what to build? And so, you might be an engineer listening to this and say, "Well, I'll just do whatever my product manager tells me." Right? Or, "Whatever the story from the agile process tells me. And I'll just code it up that way. QA will tell me if it's any good and I will go on to the next feature." I mean, you could probably still get really far with that, but I think the other way is to say, "Wow, can I be part of the UX process to see how this should actually feel? Should I walk through the paper prototype and ask questions? Should I watch what customers are currently doing and ask myself, how would I do the same thing different?"
I think investing, if you spend a bunch of time learning new programming languages, you could also spend time thinking about psychology, how people behave? How would they want to behave? Different countries have different needs, different groups have different needs. You can also study those things. And if you go and study those things, you might say, "Hmm, if we're building this software for everyone. Well, I know from my previous studies that this will not work for everyone." Right?
And we've seen there's things like, you go into the bathroom at the airport and they have these automatic dryers that kick on once you put your hand underneath.
But some people designed those things where they can only recognize certain skin tones. So, some people would say that it might be a lack of empathy because maybe they didn't have enough people with diverse skin tones to test it. Maybe, no one thought about mentioning that during the whole design process. So, that's what I mean by this. So, engineering, writing code is the last step of the process. And if you get really good at the human side of it, I think you'll end up writing much better software in the end.
Yeah, love it Kelsey, love it. So, the final question related to that, is I tweeted a quote from the cloud engineering summit, which got a fair bit attraction. And I'd love to get your insight into it. The quote was, "With building applications, or platforms, it's often not invented here. It's more not understood here. And therefore folks will build their own thing, rather than use components, or frameworks that are existing out there." Do you think this wraps up with the empathy as well, in terms of understanding is a core part empathy, right?
Yeah. I think understanding the needs, but this discovery problem. I mean, just to be very honest, think about it. Most people do not get enough time to research-
"Go to work. We need this feature done by Friday." And to be honest, most things that we're asking people to build, they've never built before, they've never done it before-
That's a good point.
And so, we're just saying, "Look, you got a day to go and Google and find what you can. And if you can't find it in a day, then time's ticking. You just have to go off and just throw something together that seems like it's going to work." And often; within a small team, or even at a large company in a small department, you don't have enough time for even peer review, and maybe your peers-
Don't get time to research. So, what ends up happening is that you can go along for years, creating something that you think needed to be created, only to go to a conference and say, "Oh yeah, that's an existing thing. You just reinvented TCP/IP." It's like, "Oh, I didn't know that. We just needed a network protocol." It's like, "Yeah, we've needed network protocols for 40 or 50 years. You've just literally reinvented a thing." And it's not because they were malicious, they probably just didn't know. So, I think we need to have a lot more time for researching to make sure that we know what's available in the world. I'm not saying you got to go use everything that's available, but just being aware will help you make a more informed decision.
Great thinking points Kelsey, great thinking points. So, final thing. Anything you want to share with the listeners at all there? Something to check out? Something interesting, you are mulling on? Or, a final takeaway thought?
Yeah. I think honestly, the time has come where software is making another step towards maturity. And so, this secure software supply chain stuff is serious. I think what this means for you as a developer, is that just importing random libraries from GitHub is no longer going to be acceptable in our industry in a short order of time. And you know what I'm talking about here, right? You go find a thing. It has lots of stars. You add the import statement-
And you do a build. The thing is, we're going to have to get much better at saying, "Who wrote this library? Who contributes to this library? And what are its transient dependencies?" Because, you're now the baker in this equation, pulling in random ingredients that may or may not be good for people on the other side. So, I think now is going to be the time to ask yourself, get educated on what this is all about. Get educated about what it means to have this level of software discipline, where we really start paying attention to what's in our software. And knowing that we have the power and control and responsibility to make sure that only the things that are necessary to be in there are in there.
Perfectly said Kelsey. I'll make sure to link a bunch of things for folks on the podcast that you want to check out. I know there was plenty of KubeCon talk around with the SBOM and other interesting things. But, if you've got any links, feel free to share them with me too. That's awesome. Thank you so much, Kelsey. As always, always learn a bunch chatting to you really appreciate your time.
Awesome, thanks for having me