LIVIN' ON THE EDGE PODCAST

Developer Control Planes: A Platform Engineer's Point of View

Ambassador Labs · S2E08: Bo Daley on Designing Platforms Using Kubernetes, Standards, and Developer Workflows

SUBSCRIBE:

About

In the growing cloud-native landscape, there are as many different approaches to cloud-native development as there are companies in the space. One recurring theme, though, is the adoption of a developer platform or control plane as a clear way to ensure developer productivity, workflows and developer experience. These developer control planes are likewise as varied as the companies using them, but to be effective, the design of the platform needs to match the business problems they aim to solve and the goals and challenges of the developers who use them.

Episode Guests

Bo Daley

Solutions Architect at AWS

In conversation with Ambassador Labs's Head of DevRel, Daniel Bryant, Zipcar's DevOps platform engineer, Bo Daley, shared his real-world experience with implementing a developer platform at Zipcar and what considerations informed its design.

Several key takeaways surfaced:

Support developer efficiency and success with an opinionated developer platform: Meeting business goals comes down to how people, and in this case developers, get their work done. At Zipcar, the platform team strives to help developers become more effective at their jobs and do this with a developer platform. "We are focused on the developer experience and on building developer understanding of how their work interacts with the other components in the system. We want to pave a seamless path to production."

For Zipcar, this has meant ensuring that a developer can get up to speed and contribute right away using the tools and processes of the platform. At the same time, the platform is designed on the principle that "a developer can come up with a new idea for a microservice any time, and in a self-service way, create everything they need to get it to production. With a couple of pull requests, they can in fact be in production within an hour". While most developers might not do that practically speaking, the idea is that the platform makes it possible.

"Automate and create" — Make sure ops removes development bottlenecks: If your organization, like Zipcar, is developer-focused, ops needs to remove roadblocks to developer productivity proactively. "Ops should clear the path, not be or create bottlenecks. Where possible, introduce automation and self-service so the operations team does not have to get involved in setting up basics like routing, load balancing, and so on. We want devs to be empowered to compose things in interesting patterns themselves without having to invent something completely new."

The combination of automating processes and creating self-service opportunities not only removes traditional bottlenecks but also frees ops teams to help developers answer bigger questions or solve more challenging issues.

Embrace flexibility to scale workloads: At Zipcar, “cloud native” meant embracing and then finding ways to reduce complexity to serve rapidly expanding workloads. "When we were setting up our development environment, it was clear it would become extremely complicated fast, going from having three or four deployables going out every week to potentially hundreds of deployables with a complicated dependency tree. Without a lot of automation and flexibility underpinned by careful planning, we could not have done this." While Zipcar's initial plan did not include migrating everything to Kubernetes, that became the way forward, and as Bo explains, it made the organization more flexible. "Now we can support things that are not just our in-house application framework, but much heavier workloads that our parent company [Avis Budget Group] has at a much more demanding scale."

Treat Kubernetes as a generic standard: Kubernetes has emerged as somewhat of a platform standard within the cloud-native space, and at Zipcar it has been a useful, unifying general purpose framework. "It is easy to create standards on top of the generic tools Kubernetes provides." Bo shares, "Most of us are not doing general purpose work, so we can use Kubernetes as the framework, and build our abstractions, relevant to our own business problems on top." Bo cites the Ambassador Labs's podcast interview with Cheryl Hung to explain how this approach translates into developer efficiency at Zipcar but does not lock anyone into anything. That is, it is still possible with a framework like Kubernetes to give developers the freedom to access an escape hatch or "color outside the lines". Echoing Cheryl's observations, it is possible with Kubernetes to build and promote a standard platform that is modular enough to give developers autonomy.

Consider developer empathy and workflow as part of creating a good developer experience: Empathy comes down to listening and understanding. As a concept, empathy, and humanizing the developer experience, is making its way into the work practices of many engineering organizations and cloud-native thought leaders.

Bo maintains that empathy and connecting with developers pays off: "Find ways in your organization to spend time with developers. When we had offices, you could sit with developers as they were trying to solve some problem and work through a solution together and quickly understand why the code we wrote does not work for their situation. In the absence of face-to-face time, make yourself open in other ways and be responsive and have a conversation. Any time you find one of these developers who seems like they are on the verge of "getting it", or putting all the pieces together, invest in that developer because they will help other developers, get them across the line in understanding the core concepts. Champion the developer who is a resource for other developers."

Listen to the full conversation between Daniel and Bo below.

Transcript

Daniel Bryant (00:01): Hello and welcome to the Ambassador Labs podcast, where we explore all things cloud native platforms, developer control planes, and developer experience. I'm your host, Daniel Bryant head of DevRel here at Ambassador Labs. Today I have the pleasure of sitting down with Bo Daley, DevOps platform engineer at Zipcar. Join us for a fantastic conversation on designing platforms based on Kubernetes, improving the lead time from idea to production and cultivating a great developer experience. Remember, if you want to dive deeper into the motivations four and the benefits of a cloud native developer control plane, or are new to Kubernetes and want to learn more in our free Kubernetes developer learning center, please visit getambassador.io to learn more. Welcome, Bo. Many thanks for joining us today. Could you briefly introduce yourself and say a little bit your background as well, please?

Bo Daley (00:44): Sure. My name is Bo Daley. I'm a platform engineer at Zipcar. Been there around about 10 years or so. In that time I transitioned a few times myself as our platforms have transitioned. So I started out as almost as a front end developer, what they used to call a full stack developer when full stack meant you wrote JavaScript on the front end and the back end. Then over time just graduated into more and more of an interest in platform as an idea. As in how people get their work done and trying to help developers become more effective at their own jobs. In the end, I found myself in this space that we call DevOps or platform now really focused on that developer experience and in making developers understand how interacts with the other components in the system and make sure that they have a seamless path to production.

Daniel Bryant (01:42): Super. Very similar backgrounds you and I've got, Bo. As you were saying that I was like full stack, when that meant front end back and a little bit of database. And now of course, full stack can mean everything from Terraform, Bash scripts, all the things. So I think, you and I've trodden similar paths and I'm sure you've wrangled with the complexity that's come with in terms of learning many things along the journey.

Bo Daley (02:04): Exactly. Yes. Yeah. The stack is a lot deeper and wider than it was when people started using the word full stack. So I really, I don't think that full stack is a useful term anymore. It seems to mean someone who knows how to code react in the front end these days, as far as I can tell.

Daniel Bryant (02:21): Nice. Go straight on the resume or the CV, right?

Bo Daley (02:24): Yeah, exactly.

Daniel Bryant (02:26): Nice. So you mentioned about designing platforms and as you know, you and I have been chatting a bit off mic, and I'm super excited to share with the listeners, your insight into this because being with the company for 10 years you've clearly understood the business problems. You've seen the evolution of the business, the technology, the actual the way the company's harnessed the technology. It's a fantastic vantage point to share your thoughts, your experience because I'm sure many listeners are going through similar things. So if we were to pull apart the designing platforms, could you share at a high level, and perhaps we can dive in a little bit, but could you share at a high level what the Zipcar architecture and corresponding platform looks like, please?

Bo Daley (03:04): Today it is a platform based on... It has a number of microservices, which are internally called cheaters. It's an in-house framework that runs on Java and Ruby. So we have a slightly heterogeneous environment. We even have some node JS in there as well. At its core, it's a continuous delivery platform. The general idea of it is a developer should be able to come up with a new idea from microservice anytime and in a self-service way create everything they need to get it deployed to production. With a couple of PRs, they can in fact be in production. So you could theoretically get your code into production in an hour or two of starting your project. You're probably not going to do that practically speaking, but that's the general concept. So to do that we have to... There's a lot. That basic idea makes it... Has a lot of hidden assumptions.

A lot of these hidden assumptions are conventions that we've adopted over the years. Things like your application runs in its own Git repository, which you control, and you are responsible for configuring its build process effectively. Although we have pipelines that you can effectively run a script and it's going to build you a default pipeline. That will get you all the way to a Docker build. At that point you can raise a PR against production and say, "Hey, here's my build. It's got this version number." Someone suitably qualified can say, "Yep, that looks cool," and you're in production now.

So the basic concept is it's very much inspired by the continuous delivery movement from about five or six years ago, which you remember yourself. In its first implementation, it was built on a bunch of Cloud Foundry components.

Daniel Bryant (04:55): Oh, is it?

Bo Daley (04:56): Which we played around with Cloud Foundry for quite a while. We liked it in its general concepts but a few of the things that they were really into, we disagreed with. Philosophical disagreements are probably pretty irrelevant because it was a platform that was quite mature at that time. It was probably ahead of the game really. But we did, for our own reasons, disagree with things like build packs, which I know that you're a fan of having.

Daniel Bryant (05:26): Yes. Indeed. We can discuss that later.

Bo Daley (05:30): We also wanted, in order to fulfill the continuous delivery idea, we wanted to have a gold binary effectively.

Daniel Bryant (05:36): Oh, nice. Nice.

Bo Daley (05:37): When you operate across multiple platforms, you can't have a gold binary, which is just like a JAR file or something. You have to have something like Docker is perfect. A Docker image is now a thing that you conversion and you can deploy it in your dev environment. You can deploy it in your pre-prod environment. You can deploy it in all of your production environments. We have many, many production environments and the same application runs the same way in all of them. So Docker was perfect for that. It was really that second generation of Cloud Foundry that introduced Docker as a first class citizen. So we were very early adopters of that, which meant that we were already on the path to Kubernetes when we started building that.

Daniel Bryant (06:17): Gotcha. I understand.

Bo Daley (06:18): Yes. I guess we started with Kubernetes a few years ago when we were doing an equivalent project for our parent company, Avis Budget Group, who they wanted a... They really were interested in a Kubernetes-based platform. We used that as a learning experience for ourselves and now we're just completed back porting our in-house Zipcar form into Kubernetes as well. So it has a... What were you calling it? A developed control plane.

Daniel Bryant (06:43): Yeah, yeah. I'm with you. Yeah.

Bo Daley (06:44): Right. We have one of those, which is effectively an online tool and an operator, which runs inside the Kubernetes cluster. It picks up state changes and makes the necessary changes inside the Kubernetes cluster on behalf of the application, and on behalf of the deployment. That allows us to replicate exactly the same deployment process in all of our environments and we can automate everything nicely. So you asked about... You're pointing at how do we make the decisions about how we build platforms like this? There's a lot of... I suppose a lot of it is arbitrary. I've heard people on this podcast argue that probably shouldn't do what we in fact did.

Daniel Bryant (07:25): You can put your... Now it's your turn to say stuff though, right?

Bo Daley (07:29): Yeah. I liked what Cheryl Hung was saying recently on this podcast, that if you take on this path of building your own internal tools to abstract your own abstractions on top of Kubernetes, you're into a difficult path, which is you're into a difficult world where you now have your own set of dependencies. You run the risk of, in some ways, being ahead of the community in your own internal abstractions. Then in other ways, being behind the community in terms of the way the platform, the underlying platform, is moving forward. That's absolutely true. The counter argument to that though is that you end up with a very concise set of assumptions, in which it's very easy to have a conversation with your internal development team about the different deployable components that are going out. It's easy to create standards on top of the very, very generic tools that Kubernetes provides.

Bo Daley (08:20): And so that translates into a lot of developer efficiency. You still have the possibility there. Because you're on a very open platform like Kubernetes, you still have the option of allowing developers to jump outside of those guardrails.

Daniel Bryant (08:35): Like an escape hatch.

Bo Daley (08:38): Yes. I think this is one of the more interesting things that discovered over many years between this. There's a certain category of developer that lives inside the... Lives right on the rails all the time. They're very efficient. They're very good at getting their work done. They just follow the path that's laid out for them. They're actually not interested in the underlying platform that much. They don't need to know how much memory is attached to this particular node. They don't need to. They don't really care about things like that. If they wanted to find out they could. It's all there available to them. But they have complex problems of their own to solve. That's what they work. And that's great. So the platform works for them and there's another category of developer that hits the edge really quickly and becomes frustrated.

Daniel Bryant (09:25): Yeah. Recognize them.

Bo Daley (09:29): Sometimes that leads to a little bit of conflict inside the organization. Why does the platform... Why did you make this set of assumptions? Why did you make this set of decisions? That might be a problem in some cases, but I always really like those conversations because in the back of my mind I'm thinking, "This is a developer that I'd like to steal onto my platform team."

Daniel Bryant (09:48): Asking the right questions. Right? They're really engaged. Yeah.

Bo Daley (09:54): "Why did you make this decision?" It's like, "Well, we're about to have an interesting conversation now and we're not just... You're not just blindly accepting the world as it's been presented to you. This makes you a potential platform engineer. Let's keep talking."

Daniel Bryant (10:10): Do you recruit developers like that and onto your platform team? Or do you actually hire folks externally on your platform team?

Bo Daley (10:17): Yeah. I guess we do a bit of both because we have... Now that we're on familiar looking Kubernetes type platform, we can, once again, hire people from the outside world who've used our technology before.

Daniel Bryant (10:29): Yeah. That's the danger if you got a completely bespoke platform, it's very hard to hire the skills. Good point.

Bo Daley (10:35): So we have a good number of people that came in from the general Kubernetes community. They know all the standard tools, they know Terraform, they know how to navigate their way around various cloud providers. They know Kubernetes pretty well. They know all of the monitoring stack pretty well. That's great. We haven't had that experience for a while.

Daniel Bryant (10:55): That's great. Anything, right?

Bo Daley (10:56): It is. But then for our internal tools, we do frequently recruit from inside the broader engineering group in Zipcar because that's more of a developer mindset. We write our internal tools and Go is a language that our internal teams don't use, but it's easy for them to pick up because it's language to use. Especially for this kind of tooling, it's perfect. The kind of developer that is asking complicated questions about the platform has no trouble picking up, Go and understanding interesting how it works. So I think that... So we have a mix, I guess. One of our lead developers on our internal... I remember vividly the arguments I had with him over our... While he was struggling to use the tools that we fighting. I remember the day he raised a very extensive poll request against our code base unexpectedly. And now he's the lead developer of our internal tools. It was quite fortuitous.

Daniel Bryant (11:55): That's cool. It's kind of like when you and I are chatting off mic. It's almost embracing the conflict in a positive way. Because I think there is a tendency to shy away from that. What I'm definitely hearing from you there, Bo, is these folks have got empathy. They're used to developing. They're like, "This tool doesn't quite meet my needs as a developer on this platform. If it only did this," and you're like, "Sure thing. We can either make this happen together or you can do a PR or we can try and perhaps put it on the backlog and build it." But that's that clear empathy. As in I need this as a developer. The platform doesn't offer this.

Bo Daley (12:26): Exactly. You could tell that it could have been a disaster, like the project could have completely failed and we would've ditched our platform all together and we'd still be running a 100% in a monolith today, which we are not. It would've been... It was those kind of crucial conflicts that we resolved in, as you say, in empathetic way that allowed us to get to the end of a... Effectively a complete re-platforming, which is the project you should never do according to the internet, but which everyone at some point has to do. It allowed us to actually extend the product as we're moving forward and actually make the product more usable for the end users as well.

Daniel Bryant (13:10): So even just paraphrase that, but this is super interesting. Putting a good platform in with a good developer experience gave you a competitive advantage. You, as an engineering team, could support the business going even faster.

Bo Daley (13:24): That's right. Yeah. We just tightened up a lot of the loops in terms of how long it takes of work to get to production. And especially in innovative piece of work, anything that involved some new infrastructure, that was a really a non-starter before when we were very much focused on delivering a monolith. If you wanted to do something outside of that, that was a great idea. Except there was no way our operations team, as staffed at that time, could possibly support it today. Today, if you have... If you're prepared to read some standard documentation, you can pretty much figure out how to innovate some new infrastructure. We're able to support it because we have enough automation that we can give developers some attention every now and again.

Daniel Bryant (14:08): Nice. So kind of like if I was to build a case for a new data store, for example, you might consult with me and say, "Here's the bet trade offs. Here's how you go about doing it." That kind of thing?

Bo Daley (14:17): Yeah, exactly. Meanwhile, we might try to recruit you to help us get rid of the other data store, which nobody's using anymore.

Daniel Bryant (14:25): Ah, I see. Yeah. I like it. I definitely hear from what you're saying, Bo. You're often thinking big picture and small picture. Is that part of your job, do you think? Whereas someone comes to you, "Hey, I've got this problem," do you take a step back and go, "In the grander scheme of things, we're optimizing for speed and safety. The platform's got to support that." I'm guessing a big part of your job is saying, "Why is that a problem? What's the impact there?" Then making recommendations on how to structure the platform, how to design the platform.

Bo Daley (14:52): Yeah. Yeah, you're right. So I guess designing a platform operates at all kinds of levels. Sometimes you're jumping backwards and forwards across levels all the time. You're starting with some big objectives. For us, the big picture objective was we need to break out of the monolith, we need to deliver new business functionality quickly because there are new businesses we need to get in on. We can't just hire a thousand people and think that we can deliver the project. We have limitations in terms of number of developers we have. So those developers need to not be blocked by our operations team anymore. All the roadblocks in their path need to be removed.

Daniel Bryant (15:30): It's a very developer focus DevO. That's a really interesting point you made there because definitely I, back when I was working in big enterprise systems, there was a... Very much a handoff situation and silos. Do you know what I mean? And yeah, you're right. Ops. We developers, we're bottlenecks sometimes as well, but Ops were bottlenecks and you were trying to remove that to make it developer focused self-service I guess?

Bo Daley (15:52): Yeah, exactly. Trying to deliver what we used to call service discovery, which was effectively the operations team should not have to get involved in setting up basic stuff like routing, load balancing, things like that. That was the kind of thing that an operations team would have to get involved all the way through a new piece of infrastructure previously. Now developers have enough tools at their disposal they can compose them into interesting ways, interesting patterns themselves. They rarely have to invent something completely new. When they do, then it's interesting and we actually have time to help them. I guess we got to that point because the original core of this team, we were ourselves, those developers that were frustrated by the operations team.

Daniel Bryant (16:40): Totally. You had that ingrained empathy already, right?

Bo Daley (16:43): Yeah. We wanted to be able to set up a development environment and we recognized that the development environment was going to become extremely complicated really fast. If this project succeeded, we were going to go from having three or four deployables going out once every week or two to potentially hundreds of these things and a very complicated dependency tree that was going to rapidly get out of control. Unless we had a lot of automation that we thought about ahead of time. So there was a certain amount of thinking we had to do ahead of time. But at the same time we definitely had no preconceptions that we were going to think of every problem that would exist. We had to be able to adapt as we went.

Bo Daley (17:28): Initially, we had no plan to migrate all of our stuff to Kubernetes, but that quickly became the thing we had to do. It also makes us a lot more flexible because now we can support things that are not just our in-house application framework. They also support the much heavier workloads that people like our parent company have. They have been running a business for a lot longer than we have and their scale is much bigger than ours is. They have workloads that would not have fit into our model at all. So we now have... We can now operate on both levels there, I guess.

Daniel Bryant (18:05): Very nice, very nice. I heard you mention Cloud Foundry earlier in the conversations. I did a bunch of Cloud Foundry myself today and it was the archetypal paths. Platform as a service. Heroku. I did when I was doing Ruby, I was doing Heroku as well. Because they aren't very opinionated. For better and worse. Have they influenced your designing decisions do you think? Did you already have that, "I like this path and I'd like to layer it on top of Kubernetes." Did you have that approach?

Bo Daley (18:30): We did have an approach similar to that. I guess we'd already... It was a little bit easier for us than was for the rest of the Cloud Foundry community because they really had to port everything across into Kubernetes. We only had to bring across the things that we were using, which was a subset of the Cloud Foundry pieces. So at this point we've retained a few of the cool things we got from Cloud Foundry. One of those was this... I don't know if you've encountered this CICD system called Concourse.

Daniel Bryant (18:58): Oh yeah, yeah.

Bo Daley (18:59): That's quite wonderful. That's quite wonderful. We just run that on... That's just a series of Kubernetes pods now. At least at the time it was the only system that you could rapidly iterate on itself. All of the pipelines are code like a bunch of them now. But at the same time you can... The pipeline can change itself in real time as well. It can keep modifying itself. Sounds horrifying.

Daniel Bryant (19:23): It's dangerous, right? It's like metaprogramming in Ruby almost ... Always danger there.

Bo Daley (19:28): But that meant that we were able to do a lot of... Given the series of distractions we have adopted, we were able to create a lot of automation just on that basic assumption.

Daniel Bryant (19:37): So that programmable API, the SDKs, is definitely happened to me several times. If you want to scale, you've got to automate. So you've got to choose systems that really do expose APIs well, well-defined APIs, maybe even standardized APIs, that allow to plug and play. Is that a key thing? I mean, I don't know if you do. Do you swap out things, Bo? Have you swapped out things in the past or was that another issue?

Bo Daley (20:00): Yeah, I think that's important. Isn't it? That's always been my experience of adopting any framework. Because quickly you wonder whether you're actually using the framework at all. A year on maybe you have the core control plane and the rest of it's gone. The rest of it's replaced.

Daniel Bryant (20:16): Well, there's argues been made even against Kubernetes. Like Kelsey Hightower said a few times, the API of Kubernetes is going to be around for years. Will Kubernetes the engine be around? Who knows? Right? I think you made a super interesting point there. Yeah.

Bo Daley (20:28): Right. The thing that will probably retain even beyond the Kubernetes API is this CICD approach, which is the code operates in an iterative fashion all the time. The developer's getting quick feedback, they have their... Their code is being deployed into a safe environment for them all the time and it's being constantly integrated in a way that they can test. It's going to an environment where end users can see it before it goes to production. It's being somewhat safely promoted to production. That is really the core idea that I think will persist. I think Kelsey Hightower is probably correct that Kubernetes' API is going to be around for a very long time. It's a very powerful set of abstractions that you can compose in interesting ways.

It's really that developer workflow that I'm most interested in. And the ability to migrate. One of the things we planned for in our first Cloud Foundry-ish platform was the ability to move to the next platform without too much work. We didn't want to get in a platform that was going to eventually disappear. We didn't know at the time that it was going to be somewhat replaced. But we did... We built into the design the ability to move to something that was coming in the future. It looked like it was probably going to be... Quickly become apparent it was going to be Kubernetes, but it could have been something else. So it's, I guess our own core API, which maybe we would maintain across different infrastructures so that we can just keep moving our stuff from platform to platform. Hopefully we never have to do that again.

Daniel Bryant (22:13): Could you share any insight as to decisions you made that perhaps were good or bad? I often share the mistakes I've made so that I want other folks to make new mistakes. Do you know what I mean? Like in terms of, I didn't spend enough time thinking about the future of this or whether we were going to perhaps change programing language. Is there any general advice, with all the experiences, all the knowledge you've got, that if I was coming up to you now and saying, "Hey, I'm in charge of a platform redesign," what would you get me first to think about?

Bo Daley (22:44): Think about what is your plan for retiring what's going to quickly become your legacy application? Which is probably a monolith. It could be something else. It could be a series of things. It's very easy to just wave your hands and say, "Oh yes, we're going to apply the strangler pattern," or something around it.

Daniel Bryant (23:00): Classic.

Bo Daley (23:00): That will get you a long way, but that last 10% of functionality, which is the oldest functionality in your system, it's probably the core of your application. It's probably the thing that pays you the money.

Daniel Bryant (23:14): Yes.

Bo Daley (23:15): That is the last thing that you're going to switch off in the old platform. And it's going to... The tail is very long, I guess is my point.

Daniel Bryant (23:24): Yeah. Oh yeah. Makes total sense.

Bo Daley (23:26): So you should have a plan for that piece. I guess we didn't really do that very well. We had a plan for the 80% and the last 10%, 20% was a little bit of a hand waving exercise. I would recommend working that out. I think you have better tools available today. Infrastructures like Kubernetes can handle very cloud unfriendly workloads as well. So you may have an opportunity to bring your monolith into Kubernetes and treat it like a special snowflake, which is what you don't really want to do in your Kubernetes cluster, but you can.

Daniel Bryant (24:02): You can do it. Yeah, yeah.

Bo Daley (24:04): That might provide a path that was not available to us at the time. That obviously comes with its own challenges, but that might be a way to resolve a lot of the problems that you're going to face in applying that strangler pattern to your legacy app.

Daniel Bryant (24:18): Yeah. That's great advice, Bo. So I did some work with my buddy, Nick Jackson on Not on the High Street. It's a UK version of Etsy and we jokingly said it was the monolith in a box. We were running [Apache] Mesos at the time. Pre Kubernetes. We wanted to port actually the monolith earlier on, so we were trying to get ahead of ourselves. It came with a lot of baggage. I'm not going to lie. It was a really long Bash script to get it running in a container. It was a Passenger Ruby app at the time. But that monolith in a box did allow us to spin up stuff locally and test it. It could deploy into staging. But yeah, that's really interesting to hear you say about that because we went the other way around. We're like, "Let's get the monolith running in the container," but you are saying, "Maybe you don't have to straight away, but definitely think about these things."

Bo Daley (25:02): Yeah. I think that the way you did it, I think I would do it today. That is not what we did. Until we finally switch off that very last piece of monolith, it's going to totally happen any day now. We're running in two different data centers and two different complete platforms and it's not super fun to operate in a world like that.

Daniel Bryant (25:20): Yeah. I'll echo. I made a bunch of mistakes around that on a few migrations I did. Final question I'm keen to dive into, Bo, is what would you advise me, similar persona I've been projecting here, what would you advise me in terms of developer experience? Because I've heard you mention that phrase and I'm a big fan, as you know, rather developer experience. What would you advise around say you're a platform person, Opsis admin, how do I build that empathy with developers? How should I engage with them? Because a lot of them they got a comfort from workplace, they're very good at doing what they're doing. To your point, if they're solving complex business problems, like some folks are just like, "I want to code and go home. We don't want to be messing around with all platform stuff," whereas you and I enjoy that.

Daniel Bryant (25:58): But what advice would you give to someone like us platform people that are building a platform? We've heard there's a development team in the core and maybe we haven't chatted even. I've definitely bumped into projects where the platform folks have not even chatted the developers, as tragic as that sounds. But what advice would you give to these Ops folks that are building the platforms in terms of developer experience and developer workflow?

Bo Daley (26:19): You have to find ways that work for your organization where you can spend time with developers. That's just unavoidable. In the old days when we used to have offices and we used to go into them and so forth, you could sit with developers as they're trying to solve some problem. You could work through a solution together and you can quickly see, "Now I can see exactly the problem you're facing. I understand why this code we wrote doesn't work for your situation and now I understand the problem that you're facing." That's really valuable. So if you don't have that opportunity, make yourself open in other ways. Have a Slack channel, which is where the developers can ask questions. They can interact, they can... And try to be responsive in there. Try to actually have a conversation with the developers. Then anytime you find one of these developers who seems like they're on the verge of really getting it, just invest time in that developer because that developer is going to help all of their peers from now on.

Daniel Bryant (27:19): Yeah. Like champions. They become a champion, right?

Bo Daley (27:22): Just help them get over the line so that they understand some of the core concepts. There're going to be constantly new developers coming into your organization. These days we might not even get to meet all of the developers and so you need to have those lines of communication open one way or another.

Daniel Bryant (27:37): Great advice. Great advice. So we're coming to the end there, but anything you'd like to share that I haven't talked do you think? "Really should have asked me this question." Anything in that realm?

Bo Daley (27:48): So when we were proposing this new platform design, it was obviously quite a significant investment for the company. So we had to put in a lot of design work, which is not something I'm familiar with because coming from an Agile development background, you try to... You always think that code beats design, and that's true until you actually need some design ahead of time and there's no avoiding it. So we tried to approach that in an iterative fashion as well, which is we designed a series of components and we set ourselves up with fast feedback loops with developers as well.

Daniel Bryant (28:25): Nice.

Bo Daley (28:25): We managed to recruit a bunch of developers from across the organization to act as reviewers effectively of our designs. We sometimes we brought them into sessions where we would do a card sort or lay Post-it notes down on the table or something and get people to move them around into an interesting form.

Bo Daley (28:43): The real objective there was to try not to come up with... We're never going to satisfy everyone, but at the same time we want to, and we want to come up with a solution that is not just a compromise between two competing ideas, but it's something that that encapsulates them and actually solves the use case of both people. So not just a half measure. We're never going for a half measure. We're always going for a solution that can work for both people. It's a level abstraction, which encompasses both of the ideas.

Daniel Bryant (29:14): Nice.

Bo Daley (29:16): It's nice when you can hit that. Sometimes you can't. What survived of those initial design sessions may have only just been a boot strapping for the whole project, but it was enough to establish some of those relationships with people inside the organization as well.

Daniel Bryant (29:32): Building those connections, folks you can ask opinions of. I'm kind of curious. A couple times you mentioned about the decisions. Did you ever use anything like architecture decision records? That if you bumped into those things. Mike Nygard, fantastic architect has talked a lot about ADRs, architecture decision records, and they can be super formal. He's got formal definition. But I wonder, did you capture any of these architecture decisions? Like talk about the problems, constraints, maybe put it in a GitHub repo? Or did you do anything like that? Or I was kind of curious. The reason I ask is how you mentioned there's new folks coming on board all the time is how do you pass on the knowledge? Because at a certain point you have a bunch of constraints and you do the best you can. Then it's very easy if new folks come along and go, "Why'd you make that decision?" You've talked about the conflict. What do you point to when they go, "Why?" Do you point to some kind of decision record?

Bo Daley (30:23): Yeah. We point to the Wiki page, which we've kept up-to-date mostly over the years, which was there's... It has one dimension, which is a series of decisions that were made, but it's mostly trying to be an encapsulation of what's the current thinking about this platform? What are the decisions that are active today? A new developer could take a look at that. One caveat to that is I'm yet to meet a developer that came on board and did read any of that documentation

Daniel Bryant (30:48): Oh, really? Interesting.

Bo Daley (30:52): I think the personal connections are actually at least as important, if not more important than-

Daniel Bryant (30:58): That's interesting.

Bo Daley (31:00): Than whatever documentation you can leave behind.

Daniel Bryant (31:02): And have you got any... Other than you've already got great advice about the Slack channels and reaching out to folks, is there anything else you've done internally, Bo, to maintain those relationships or build those relationships?

Bo Daley (31:12): Oh yeah. I guess early on we ran a series of workshops and we brought in people from across the engineering organization to show them what we had for a start, secondly, to get their feedback, thirdly, to get their help on a bunch of open architectural questions, but mostly to understand how it was going to affect their day-to-day.

Daniel Bryant (31:33): Oh, excellent.

Bo Daley (31:34): To give them confidence that we were going to listen to whatever feedback that they had. And also to help just get them set up initial so that they were working in a place that's comfortable to them. They could see how their code was running. They could make some simple commits that appear in production very soon afterwards.

Daniel Bryant (31:58): That is the... As a developer, that's what you're aiming for all the time. Can I get my code, my ideas running in prod in an instrumented way? Yeah, that's fascinating.

Bo Daley (32:08): Yeah. And so I think that the Zipcar platform is very much optimized towards that. The other platform we're operating right now for Avis Budget Group is more of a traditional Kubernetes infrastructure. So I think the barrier to entry is higher on that side, but the possibilities are greater. So we may be able to transition a lot of the work that people on the Avis side do into our Zipcar way of working and that might encapsulate 80% of their work. They'll be able to be as effective as a Zipcar developer in short order. That's one of the background objectives that we have in the future.

Daniel Bryant (32:47): Nice. That's definitely quite a nebulous goal, isn't it? In terms of managing the, as you said a few times, the abstractions of the key thing there. The lining up abstractions and where folks have dialed into the low level Kubernetes concepts, what can you offer them in replacement, I suppose?

Bo Daley (33:03): Yeah, exactly. So the Zipcar platform shows some of its lineage from the Cloud Foundry side as well. The abstractions there are very much Cloud Foundry-esk.

Daniel Bryant (33:15): That'd be interesting.

Bo Daley (33:15): Because Kubernetes allows you to define your own abstractions in interesting ways, we were able to create an abstraction which describes the Zipcar world view as a CRD inside Kubernetes.

Daniel Bryant (33:26): Oh, super interesting!

Bo Daley (33:28): And then our operator is able to just bring that into reality. I guess that just provides the foundation of the automation. So that's true to say that Kubernetes can certainly do all of the things that those other platforms could do previously. It's just that without some automation of your own, it's going to be harder for people to bring that into existence.

Daniel Bryant (33:47): That I think at the moment, the maturity we're at the organization, that's why we are doing a lot of work on the DCP for an example, developer control plane. Yeah, I would 100% agree with you, Bo, in that Kubernetes is a general purpose framework building platforms, but most of us are not doing general purpose work. We have a business problem to solve and therefore like I've definitely had you say a few times, you are trading off some of the flexibility for some of the speed, if you like, or simplicity. You're putting on your abstractions relevant to Zipcar on top of Kubernetes. There's always a trade off when you to up putting levels of indirection in there, but it does give you some clear advantages on you don't need to worry about that stuff as a developer. Focus on pressing that button, shipping into prod. Right?

Bo Daley (34:32): Right. We'll turn your very simplistic manifest YAML file into this sea of YAML that's required for Kubernetes.

Daniel Bryant (34:41): Yeah. Yeah. We did a lot work on automatically generating YAML because you don't learn to... You don't love that YAML when you join the Kubernetes ecosystem. You soon will learn to love/hate YAML. So yeah, we are investing a lot in terms of being able to auto generate YAML because a lot of it's quite boil plate. But it's so key to your point about the overall definition. If you can codify everything, as an engineer, what more do you want? Do you know what I mean? There's no room for... particularly with declarative specification. I assume my application is this. It's so understandable as an engineer. Whereas I remember I used to read a lot of scripts, it's very imperative and something you'd done on top of the script you'd unwind it later on. Whereas with a declarative stance, it's like I want this running. Much easier to understand. So I think that power in the YAML, as much as we have to generate a lot of the YAML sometimes, it is all there in terms of this is the definition of our platform or this is the definition of our application.

Bo Daley (35:36): That's right. It's very, very powerful, but it takes time to absorb all of those pieces. A new developer is just going to take some time to really grasp what all of the different available levers are that they can pull. Some much simpler abstractions, which codify what we consider to be the best practice or at least the conventions that we decided to adopt, I think are extremely helpful.

Daniel Bryant (35:58): That's been fantastic. Well said. The abstractions, I'm going to have to pass at it more. I think that's super deep insight there. And those trade offs I've heard you talk about a lot, I think is fantastic. So thank you so much, Bo. If folks want to get in touch with you, are you on Twitter, LinkedIn, email? What's the best way folks to reach out if they've got the odd question that you might be able to help with?

Bo Daley (36:16): Yeah. I don't really use the internet too much. I think it's more of a passing fad at this point, but you can probably find me on LinkedIn, BoDaley@Zipcar.

Daniel Bryant (36:23): Super. Well, thank you very much for your time, Bo.

Bo Daley (36:24): Oh, thanks. That was an interesting discussion. Thank you.

Developer Control Planes: A Platform Engineer's Point of View

About

Episode Guests

Featured Episodes

S3 Ep10: Foundations of Formidable API Federation feat. Daniel Kocot

S3 Ep11: Embracing Tech Change: Matthew Reinbold on Adapting to Industry Shifts

S3 Ep12: Kubecrash 2024: Engineering Insights with Danielle