LIVIN' ON THE EDGE PODCAST

Developer Control Planes: A (Google) Developer's Point of View

Ambassador Labs · S2E03: Cheryl Hung on Google DevEx, Cloud Native Development, and Infrastructure Challenges

SUBSCRIBE:

About

Startup companies in the cloud-native space continue experimenting at varying levels of maturity with the "you build it, you own it" idea. For developers coming from larger-scale environments in which they don't need to think about platform and operations considerations, there are lessons to be learned from the upsides and downsides of full code-ship-run developer ownership.

Episode Guests

Cheryl Hung

Former software developer at Google

Cheryl Hung, former software developer at Google and, more recently, VP Ecosystem at the Cloud Native Computing Foundation (CNCF), shared some of these tradeoffs when she chatted recently with Ambassador's Director of DevRel, Daniel Bryant. In addition, their wide-ranging discussion touched on the developer experience, the challenges of infrastructure, and the future of cloud-native development.

Notable themes emerged during the conversation:

The pure developer - upsides and downsides: While many developers just want to code and not worry about the challenges of infrastructure -- and companies like Google make the developer experience seamless and easy -- there is a twofold tradeoff. First, the developer never needs to learn how it works under the hood, which isn't helpful if a developer goes to work in a company that insists on developer ownership and autonomy. Secondly, the Google-like experience, while convenient, shields the developer from understanding of the complexity of shipping and running their code. This can be positive, keeping the developer focused. At the same time, it removes the responsibility for considerations like provisioning resources, which would be valuable knowledge for full-ownership developers.

"Infrastructure is really, really hard": Infrastructure can be unreliable; it fails; it is unpredictable. "Compared to software that runs pretty much the same way every time, infrastructure is really, really hard.” Containers and the cloud-native paradigm have simplified this to some extent, but there are still challenges exposed to developers.

Infrastructure and standardization is unpredictable and ever-changing: Even five years ago, there was no consensus that a universal infrastructure would emerge in the cloud-native space. Now Kubernetes has become a "standard-ish" infrastructure layer. The fast-moving nature of software development, and the cloud-native space more generally, makes it impossible to predict what technologies will dominate, and how the de facto standard of today may be different in just a couple of years.

Providing a centralized source of truth creates a good developer experience: Tools like Backstage or other developer control planes lessen the learning curve and provide a clarity of experience for developers without limiting their ability to seek out and learn platform tools beyond that portal. That is, developers can get access to a dashboard and 95% of what they need is centralized and actionable from the UI. Modular control planes enable enough developer autonomy to do what they need to do and "break the glass" to escape and move beyond that environment if needed, learning platform tools, working from the command line, or requesting specific functionality from the platform teams, when and if needed.

No unified developer experience, but self-service is an attribute developer experience should share: There's no one-size-fits-all developer experience, as shown by the variance between developers at established banks and startups. However, a growing consensus across different businesses is that providing application developers with more self-service management improves the process. This suggests that platform and SRE teams should focus on creating the right abstraction layers to empower developers, irrespective of their level of ownership in the process. Check out the full conversation between Daniel and Cheryl below.

Transcript

Daniel (00:01): Hello and welcome to the Ambassador Labs Podcast, where we explore all things cloud native platforms, developer control planes, and developer experience. I'm your host, Daniel Bryant, director of Dev Rel here at Ambassador Labs. And today, I had the pleasure of sitting down with Cheryl Hung, whom I'm sure many of you know from her great work at the CNCF and within the tech community at large. Join us for a fantastic discussion covering topics such as the Google development process and tooling, how the cloud native developer experience and control planes are evolving, and why dealing with infrastructure is still very challenging.

And remember, if you want to dive deeper into the motivation for and the benefits of a developer control plane, or are new to Kubernetes and want to learn more in our free summer of Kubernetes series of events, please visit getambassador.io to get started.

So welcome, Cheryl, many thanks for joining us today. Could you briefly introduce yourself and give a bit of your background please?

Cheryl (00:47): Hey, Daniel, really great to be here, thank you for inviting me. So yeah, my name is Cheryl Hung, I'm the VP of ecosystem at CNCF, the Cloud Native Computing Foundation, which is the home of Kubernetes, Prometheus, Envoy, lots and lots of other open source projects. In fact, I think we've just about hit a hundred open source projects, which is something to celebrate. But yeah, my background is as a software engineer, I was at Google for quite a few years, building C++, Google Maps, backend features, and then I moved into infrastructure, developer advocacy, and that's how I found myself at CNCF.

Daniel (01:28): Awesome stuff, Cheryl. So, you and I have known each other for a few years through the London tech scene and other things, likes of the meetup scene CNCF of course. I remember chatting to you a few years ago now about your time at Google. And I was always super interested in the developer experience because what goes on at Google is not public to everyone. So I wonder, could you briefly explain the dev tool chain you had at the time, when you were writing code and then obviously deploying it and releasing it, what was the experience like during your time at Google?

Cheryl (01:57): Yeah, this is actually a really, really interesting topic because I started at Google in 2010, so you could say that I'd been thinking that container, cloud native mode for about 10 years at this point. And I was there for five years, so obviously the experience also changed over the course of that five years.

And I actually remember the next gig I went to after Google, they asked me a similar question, it was a storage company. They asked me, oh, how does Google do storage? And I was a bit like, I don't know because I never had to worry about it because it was all a solved problem at Google.

So, as a developer inside Google, the experience is super, super nice to be honest. For the most part, you could actually forget about the fact that you have to run infrastructure, that there is any physical infrastructure underneath it. For instance, you don't know what machines you're running on, you know roughly what data center and what region it's in, but you don't have any further control over that.

And things like storage, things like security, things that are quite big problems at the moment still, I would say, cloud native, Google's team solved those 10 years ago. So as a developer, I was just like, oh, okay, I never have to worry about data loss or problems like that. Yeah, because it was literally a solved problem. Layers upon layers of libraries that I could use and all of those solved issues with reliability and with consistency and all of these things.

However, there's a flip side to this. It's not all roses. The negative side of this is, as a developer you don't really spend a lot of time understanding. You don't need to spend a lot of time understanding how this works. And it's very, very easy to start cargo culting bits of, well, not Yammer, because it was not Yammer inside Google, but equivalent. Bits of Yammer around because you're like, ah, this looks like it does more or less the right thing. I will just copy it from some other directory. And as long as it does more or less the right thing, then I'm happy. I never have to touch it again.

So you did see a lot of things like people just massively over provisioning because it has no impact on the developer, whether you want ... However many gigabytes of whatever you request, it's just a number.

Daniel (04:42): You're not paying for it as a developer?

Cheryl (04:42): You're not paying for it. Yeah, you don't get any ... Exactly. The infrastructure teams are not going to come back to you and say, cut back on what you're using. And you don't have that pressure that you do ... That a developer today, if they're using a cloud provider, you would get a bill at the end of the day.

Daniel (05:02): Good point.

Cheryl (05:02): So you would have that exposed to you. But inside Google, because it was all Google infrastructure, all managed by Google, it was just like, whatever. Resources are free and infinite. And you treat them as free and infinite.

So yeah, I would say the developer experience in terms of infrastructure, that's how it played out. Very, very convenient, very easy, because there was not a lot that you had to think about.

But on the flip side of it, because you didn't have to think about it, it was easy to never really learn the details of how things worked. Not be careful about resources.

Daniel (05:42): I can imagine it's almost like, what they call it in finance, a moral hazard. In terms of, if you're not responsible for the consequences, you're like, whatever. And it's much safer to over-provision than under-provision and cause a problem.

Cheryl (05:52): Yes.

Daniel (05:53): But obviously there's a cost associated with that. So yeah, that's super interesting, Cheryl. It definitely means you can focus more on the actual problem at hand, your requirements, but it does create that strange disconnection with the platform.

Cheryl (06:07): Yes, exactly, exactly. And I think much like, there was a saying inside Google that I think applies to Kubernetes today as well, or cloud native. Which is that, it takes six months to run one of something and then minutes, no time at all to run 10,000 of it. Because the learning curve for Borg and for Kubernetes is so steep.

Daniel (06:39): Good point.

Cheryl (06:40): You do actually have to understand a lot of concepts before you can get to the point of deploying a single application. But then once you've got that single application up and running, scaling for them is ...

Daniel (06:52): Yeah, a simple line, like a Yammer config these days. Replica count. Super. Very interesting.

What was the actual developer experience like around say the coding, Cheryl? I presume you pulled the code from a Git like repo and then developing on a local machine and then CI and CD. Is that quite standard?

Cheryl (07:15): Yeah, it wasn't using Git, we were using Perforce. Well, we used Perforce for a while and then Google rebuilt everything internally to be Perforce compatible. As Google does, for sure, rebuild everything. But for better scalability basically.

The biggest thing that I remember changing about the coding experience over the five years is that at the beginning of my time there when I first joined, it was as you described. You would pull down some code locally, you would build it, you would run it, you would make changes, blah, blah, blah. By the end of my time there, so 2015, I was coding exclusively in the browser.

Daniel (08:00): Interesting. I've heard that from other friends.

Cheryl (08:02): Yeah. Yeah. And there was this project called Sitsi, which is coding in the cloud. And it was all custom built for Google's infrastructure, all the testing, all the CI/CD, everything was built into this one application. And of course being a cloud service, it was nice because you could go anywhere and you could move from computer to computer, and you didn't have to worry about taking your local state with you. That's something that I haven't actually seen very much of. And I don't know if you've seen ...

Daniel (08:39): Now you mention this, this was five or so years ago, I've chatted to Kelsey Hightower on the podcast last year, and he mentioned the same experience. And I was blown away by hearing about this.

Because AWS acquired I think Cloud Nine or something like that. AWS have got an offering in this space. And there's an Eclipse project, the name is escaping me at the moment, but Eclipse G have an online browser thing. But it's pretty niche from what I understand.

But when I hear the likes of yourself and Kelsey and Google in general using this and you all pretty much enjoyed using it. Why is it not so popular in the quote unquote real world? Do you know what I mean?

Cheryl (09:17): I think part of the problem is that inside Google, you still have a very similar understanding of infrastructure. Yes, there's a lot of teams, but you're still using more or less the same infrastructure and the same tools at the end of the day. Whereas externally, every company is going to have a different way of doing things. So maybe it's harder to build one browser experience that makes sense for everyone.

Daniel (09:43): That is interesting. And even like you structure your code, Cheryl. I've heard the Google and the monorepo, all that kind of stuff, the famous monorepo in Google, even that makes a bunch of decisions for you. Compared to like, I've worked at companies that had lots of repos, different structures, these kind of things.

Cheryl (10:00): Yeah, exactly. Yeah, I was so used to working in a single gigantic repo. And yeah, it means that you can make certain assumptions about how things are structured in a browser. So yeah, by the end I was a hundred percent working browser-based.

Daniel (10:20): And all the CI and CD was kicked off from that, you've committed in the browser, so to speak. And then behind the scenes it would do its thing.

Cheryl (10:25): Exactly. Exactly, yeah.

Daniel (10:27): Pretty nice. And could you easily check up on that? Was there a UI where you say, oh, this is my job, it's building, these tests have failed, that kind of thing?

Cheryl (10:35): Yeah. Yeah, yeah, yeah. Some of that was built into that particular Sitsi project, some of it was a separate place that you would go if you wanted more information. But that is something that, yeah, I don't feel like I've seen it very much. I've not seen it widely adopted outside of Google. But as an experience from a developer's point of view, it was brilliant.

Daniel (11:00): It's amazing, it does sound amazing.

Cheryl (11:01): Yeah, it was just like, yeah. But this is what I mean again by that trade-off between, it's so convenient, but then you never actually have to learn anything in detail, because to you it's just a button on a webpage. So yeah, pros and cons.

Daniel (11:18): That is interesting. And what's your personal thoughts around those levels of trade-offs. Because I definitely chat to engineers who just want to code business functionality. And I respect that. They're like, I know Node, I know Go, I know Java, I just want to code stuff. And I want a Heroku like experience, cloud foundry like experience, I want to just hand off my code to someone, something, and it runs. Do you know what I mean?

But then I keep hearing you talk about, there's a danger in that, you lose touch with how things are going to run. So what's your personal thoughts, as an engineer, how do you balance that personally?

Cheryl (11:52): Even though I've been working in infrastructure for a few years now, I think I'm still definitely biased towards the application developer perspective. I've got things that I need to build, as long as they work, I don't ... and as long as they keep running, I don't really care that much about things.

Cheryl (12:13): Actually I remember on a podcast that I did a few months ago, they asked me, oh, what's your controversial take on infrastructure? And my answer was, I hate infrastructure.

Daniel (12:26): Brilliant.

Cheryl (12:27): I don't literally hate it. But infrastructure is hard, it's unreliable, it fails, it's unpredictable. Compared to software that you just write in Go or whatever, your language of choice. And it always runs pretty much the same way every time. Infrastructure is really, really hard.

Daniel (12:49): To keep going.

Cheryl (12:49): And the reason that I like containers, and everybody likes the paradigm of cloud native, is it makes it easier. It allows you to pretend that you've got a cluster that can scale infinitely, that has no problems, that never goes down. And I actually think these are really, really great benefits. So my own personal take on it is that, yeah, I'm like, let the application developer focus on what they're good at, what they can do. And let the infrastructure teams focus on what they can do. And that's enough.

Daniel (13:26): I like it, Cheryl. I like it. And I definitely think, from my time at the various roles I've done, that separation of concerns is really useful at times. Knowing these are the boundaries of my role because it's like, there's a notion of full stack engineer even encompasses sometimes ops these days, you're writing Terraform code or whatever. And I imagine that can be super stressful. Do you know what I mean? Suddenly one minute you're writing Terraform, the next minute you're writing some Go code. And as an engineer, learning all that stuff, and to your point, all the things a little bit below it, is a recipe for burnout almost there.

Cheryl (14:01): Yeah. I'm also a little bit suspicious of the full stack designation because of that reason. It's like, okay, in a startup where you have two developers, they have to do everything. Fine. You're going to have to learn it.

But there is so much to learn in infrastructure. You can spend your entire career trying to figure out the best way to do monitoring or the best way to do storage or any of these topics. So I don't think it's ever realistic to say, okay, just developers and dev ops are now the same thing. And now it's just up to developers to learn everything.

Daniel (14:41): I hear you. Yeah, totally. Have you have any thoughts around the interface or the control plane or the APIs that would sit between dev and ops. Because we mentioned Heroku and Cloud Foundry. I grew up, I did a bunch of Ruby and Rails and we used Heroku and it's magical, Gitpush, Heroku, job done, it was deployed.

I was curious, with your experience at Google and your experience up until now as well, have you got any preferences for the APIs and the UX of the thing that you'd use as a developer?

Cheryl (15:12): Yeah, so one of the things that I do at CNCF is I lead the end-user community, it's a group of 150 plus companies who are end users of cloud native. So, retail companies, banks, technology companies like Reddit that are going to be using cloud native, but they're not selling cloud native services.

So I've seen a lot of different approaches to this question of what is the right level of abstraction that you want to hand over between developers and dev ops or the platform. And there's two extremes to this. One is give everyone kubectl and give everyone all access to everything, and just let them do whatever they want.

The other extreme is hide all levels of abstraction, hide everything away in fully abstract, build a custom abstraction within your own organization. And never let the developers access anything outside of that. And I think the latter is problematic to be honest.

Daniel (16:29): Interesting, interesting.

Cheryl (16:31): I've seen a couple of companies go down this route and then a year or 18 months later go, it's way too much work for us to keep up with the external pace of Kubernetes and the change.

Daniel (16:42): Interesting, yeah, yeah, of course.

Cheryl (16:44): And people can't use documentation, they can't use external forums. We have to rebuild all the training materials because they can't use any of this external work. So, it's just too much to try and maintain your own complete abstractions. But then the other side of this, the hand raw kubectl over is uncomfortable--

Daniel (17:15): Danger.

Cheryl (17:20): Yeah, exactly. Exactly. Yeah, exactly. Yeah. You said it, dangerous.

Daniel (17:23): I've been there. I've literally SSH-ed into prod boxes and blown stuff up. I'm sure you've done something similar back before kubectl was a thing.

Cheryl (17:28): For sure.

Daniel (17:28): And I often think now, kubectl is analogous to SSH in some cases. You can just jump in, you can exec into pods, whatever. You can do a lot of good and you can do a lot of damage. Even with good intentions, you can do a lot of damage.

Cheryl (17:41): Exactly, exactly. And it's not easy to figure out, to put the proper guard rails on things. So, one thing that I have seen which I think is pretty exciting is Backstage, which is this developer portal from Spotify.

Daniel (18:00): Yeah, loving it, loving it, yeah.

Cheryl (18:00): And the nice thing, well, a couple of nice things, obviously it's a UI, it's a set of dashboards. So from a developer's point of view, they don't need to go and read the manual on how kubectl commands work. Probably 95% of the things that they ever need to do, they can do it from within a dashboard. So, lessens the learning curve and it makes it a much nicer experience for the developers. But then if they really need access to anything beyond that, then it's up to them to learn the platform tools and do the correct thing using command line.

Daniel (18:47): They pop the hood, so to speak. Or break glass, isn't it, as in if you're doing stuff that is the norm, it should be all via the UI. But if you really need to break the glass, you can escape and get down to the lower level.

Cheryl (19:00): Exactly, that's a really good metaphor for it.

Daniel (19:04): I've stolen it from someone, yeah, I've not made that up. But I've definitely heard the break glass metaphor before, I was like, I like that.

Cheryl (19:10): Yeah. No, that's exactly the point. Most of the time you don't need to be learning the depths of everything to do the basic stuff. For that, give someone a UI, let them click around. Yeah, you can give them grass and nice things to look at, but if they really need it, then yeah, give them the actual raw tools so that they can do whatever they need to.

Daniel (19:34): I like that. I like that. And you're not the first person to mention Backstage. I think Casper at Luna, they are using Backstage. And he said it's fantastic for onboarding, all the points you made there, Cheryl, in terms of single pane of glass, place to go to when you need to look at documentation. So I definitely think the future is being probably driven by Backstage in the moment in that space. But I think there's a lot of other interesting tools, and we're definitely, at Ambassador Labs, we're looking in that space too.

And I wonder, how do you think that will alter the relationship between developers and platform? Something you hinted at earlier on. Is it going to be that the platform team create all the components of this Backstage-like thing, and then developers just consume that? Or maybe make requirement requests, the platform team saying, hey, I really need this dashboard, I really need this way of interacting with Kubernetes or something like that?

Cheryl (20:24): Yeah, I think that is a reasonable way to go about it. I will say at Google, it was extremely, extremely rare to ever have to interface with the platform team.

Daniel (20:34): Really?

Cheryl (20:34): The platform team provided a lot of these tools. Backstage, it's got a plug architecture as well, so I could imagine platform teams building in the correct plugins that they need to use. And then saying, 99% of the time just use this. You don't need to be talking to us as individuals to request stuff.

Cheryl (20:58): About the only time that at Google I would talk to the ... Okay. So, there were two times when I really interacted most with the platform. One was when you were launching a new product.

Daniel (21:18): Makes sense.

Cheryl (21:19): And then you have to set up SOE rotations and there's more people involved. And then the second time, there was some feature that I really, really wanted in the depths of some, I can't even remember which contract it was now. But the way that I tried to do that was, I was a C++ developer, this was written in Go. So I sent them a very bad patch with basically Go written in a C++ idiomatic style and said, this is what I want, please take this and rewrite it in a proper Go-like format. Because I hadn't actually properly studied up on Go at that point. I was like, ah, this looks more or less right. And then I just handed it over to them and said, I've expressed what I want, I've given you the tests.

Daniel (22:15): Nice, nice.

Cheryl (22:15): Please go and implement this, implement this in a proper way.

Daniel (22:18): Idiomatic way. Yeah, yeah.

Cheryl (22:21): Yeah. Idiomatic and one that makes sense with the rest of the products and everything else. But yeah, every other time, I never had to interface, I never had to talk to a person to get what I wanted from the Borg infrastructure.

Daniel (22:40): That's super interesting, Cheryl. And do you think that was a sign of the evolution of the platform? Do you reckon, you're getting to that stage, some folks may have been doing that? Or was that always Google's intention, I guess, with the platform to get that abstraction level so good that you don't need to be chatting to each other?

Cheryl (23:00): Yeah. I think there's actually a lot of, I don't know if it was the intention from the beginning, but certainly over time, the more self service things you can provide from a platform to the application developers the better. It saves you time on both sides. You can be more constrained about requests and what you need. And Google of course had an army, a veritable army of developers. So we would run into every single obscure use case.

Daniel (23:33): That's a good point. Yeah, that is a good point. You very quickly learn, whereas an average organization might take years to bump into all these things.

Cheryl (23:42): Yeah, exactly. Exactly. If you are a platform person and you're getting the same kind of requests once a week, it's worth it to you to build that out.

Daniel (23:53): That's a super interesting observation. Because that comes back to your early comment around the infrastructure being somewhat homogenized at Google. Whereas if we look at the CNCF world, well, just the cloud native world let's say, there is some homogenization, Kubernetes, maybe STL or something. But there's a whole bunch of infrastructure, Amazon versus Azure versus GCP, that I wonder, we can never get to that level of homogenization, most of us, that Google have. Is that going to be a limiting factor?

Cheryl (24:26): Limiting in what sense?

Daniel (24:29): In that all the good stuff I'm hearing you talk about in the Google world, is somewhat predicated on it being homogenized at the infrastructure level. And I'm definitely paraphrasing I think your earlier take on this, in that because most organizations are not going to have that level of homogeneity because they've got infrastructure from different eras. They've done maybe some mergers or some acquisitions, and they've brought in a new cloud or something. Do you know what I mean? It's not such a controlled environment as Google.

Cheryl (25:08): Kind of. So yes, I agree that Google was more homogenous than most companies can be. But underneath the hood, there was actually a lot of ... Google's infrastructure was also built out over tens of years as well.

Daniel (25:23): Good comment, yeah, yeah.

Cheryl (25:25): People were using different versions of things. So there was a little bit more different than it seemed. But on the whole ... I think if we had been having this conversation 10 years ago, we would have said, there's no way that one universal layer of infrastructure will emerge because-

Daniel (25:44): Good point.

Cheryl (25:44): We're using Terraform, people are using bash scripts, people are using whatever, duct tape and glue and crossed fingers. So, the emergence of Kubernetes and the-

Daniel (25:56): It's a really good point.

Cheryl (25:59): And the standard-ish layer is already quite a big leap forward. Would you agree with that?

Daniel (26:06): Yeah, 100%. That's a really good observation, Cheryl. I a hundred percent agree with that in that ... Yeah, and to that point, is there a future layer that's waiting to be discovered? I've had a few folks talk about this. I thought Knative for a while was super interesting. And there's loads of other spaces. Kubernetes is still quite raw, isn't it, in terms of, like to your point, you have to learn a lot of things, pod, services, deployments, these kind of things.

Daniel (26:30): You make a really good point. So, we could be having this conversation in five years time. And there'll be a completely different fabric that has become ... Even sever lesson stuff, like Knative and other offerings, they are providing a different abstraction on top of Kubernetes and other things.

Cheryl (26:43): Yeah, yeah, yeah, that's true. I think the developer experience is always going to be different from company to company. That's where you want ... A startup and a 200 year old bank, they're not going to have the same experience.

Daniel (27:00): Interesting. That's a good point. Yeah.

Cheryl (27:04): But the infrastructure below that, I feel there is something that is a little bit more opinionated maybe than Kubernetes, that could get wider adoption. At one point, I also thought Serverless might be that because it seems like a nice paradigm. But I would say Serverless has not really taken over I would say.

Daniel (27:26): It's funny, I think many folks I've chatted to, I chatted to Sam Newman a while back, he said exactly the same as you. It shows promise, but it's not quite there yet in terms of developer experience. I chatted to Gareth Rushgrove and he was like, you go from Serverless and writing lots of application programming code to writing lots of conflict code. Your application is pretty basic, a few lines of nodes, then you're writing all this terrible code, or all this cloud formation code or whatever. So he was like, you're swapping complexities around, which I thought was super interesting and challenging.

Cheryl (27:57): That is interesting, that is something that ... I wonder if that is one of the reasons that Serverless hasn't taken off to a greater extent. Because as we were talking about earlier, developers often want to be like, let me just focus on the code. I don't actually want to learn all of the glue code that holds it together. Whereas yeah, Serverless requires you, you have to put more emphasis on the glue code, then it's not going to be as attractive platform for developers.

Daniel (28:29): I have wondered that. Cheryl, you often, the way Serverless runs, you're forced, maybe we're coming full circle here, but you're forced to write to a certain API that you're dealing with events arriving. Do you know what I mean? I remember even back in my early Java days, I was dealing with EJBs, enterprise Java beans, and we had to code to very specific interfaces. And I was like, this is so restrictive. And now we've come full circle. When I fired up Lambda, obviously a year or two ago when it was newish, I was like, hang on, this interface is really restrictive. But you have to code to that because that's the way. And then I had to write all this plugin glue code, to your point, I was like glue config, I was like, hang on, this is back to the EJB days in 2002.

Cheryl (29:09): True, yeah. Yeah, no, it's an interesting thought. I think Serverless is ... I think it's a really nice paradigm, but I think, yeah, it's going to stay niche for the foreseeable.

Daniel (29:21): Interesting thought. These are great thoughts on the various experiences you've had. Before we wrap up, I'd love to get your thoughts on, we talked about infrastructure a fair bit there. The Tech Radars are super interesting. I'm curious what you've learned in the CNCF Tech Radars from chatting to the end users. Is there clear takeaways, because what you've done, multi-cluster, you've done CD, I think I forget all the different ... they're fascinating. I'll put them in the show notes, so folks can find them. But they're fascinating insights. Sometimes the tools that get recommended at Tech Radar, I'm like, really. But clearly that's what you found when chatting to all these end users. So is there anything that jumps out to you as super interesting trends across the Tech Radars I guess?

Cheryl (30:04): Yeah. So, the Tech Radar is an initiative that started in the middle of 2020. Yeah. Middle of pandemic time. And the goal behind the Tech Radars was never to be ... And I'm not sure you ever can be completely objective about saying-

Daniel (30:25): Agreed, agreed.

Cheryl (30:26): This is how you do infrastructure, that's it. Which is what actually I hear some people ask me all the time. Can CNCF tell us if we're doing our infrastructure right. And no, because nobody can tell you that.

Daniel (30:42): Yeah, good point.

Cheryl (30:43): So, the idea behind the Tech Radars was to give a snapshot of what is currently actually happening. What is the ground truth from the end-users. What do they recommend. What do they really think about different aspects of infrastructure.

So we've done, as you said, we've done quite a few editions now, it comes out once a quarter. So we've looked at, yeah, multi cluster management, database storage, secrets management, observability I think was one.

And I guess there's been some surprises in each individual one. I can't really think of any trends that happen across them so much. The topics by the way are also chosen by a team of volunteers called Radar Team.

Daniel (31:39): Interesting.

Cheryl (31:40): So, we select five of the end users at random and ask them to pick a topic and ask them to compile the final results together. So that it's properly representative of the community.

Daniel (31:53): That's cool.

Cheryl (31:54): And then we support publishing it-

Daniel (31:57): Facilitate.

Cheryl (31:57): And facilitate it. And it's very specifically set up in that way, so that it's not CNCF's opinion, it is the real community.

Daniel (32:05): I didn't know about the topics being picked, yeah, that's cool. It makes a lot of sense as well. Because it's the voice of the community.

Cheryl (32:10): Yes, yes, a hundred percent. So yeah, I think the topics are actually quite interesting because there were some topics like observability that pretty much everybody will run into. Or they will need pretty quickly. It's a pretty standard part of setting up cloud native infrastructure. Things like secrets management, you will need, but you can get away with going big in stuff for quite a while. And then something like multi-cluster management is much harder because-

Daniel (32:50): Yeah, interesting.

Cheryl (32:51): You probably have so many different approaches to it. And multi-cluster management, once you've got your tools set up, it's a lot of risk to move away from what you've currently got.

Daniel (33:07): Yeah, that totally makes ... you're very coupled into the solutions.

Cheryl (33:10): Exactly. Exactly. And there's not usually a big reason why you'd want to change how you're managing your clusters once you've done it for a year or two, and you've solved basic problems. So yeah, I think that's the thing that I find interesting about the topics. Some of those ... it's worth thinking about, is there something that everybody's going to need to do? Is there something that people can ignore for a while or then they run into it later?

Daniel (33:38): Yeah, that's super interesting. Because it almost goes back to our platform argument, because like you said, observability, you need feedback on a infrastructure level as much you do on an app level. KPIs, here my KPIs as the application. But then yeah, when I saw multi-cluster management, because I've been working with the Linkerd folks, the Buoyant folks for a while on this kind of stuff, so I've loving the Linkydee experience. I've bumped into Katie Gamanji talking about cluster API and all these kinds of things. So, I've bumped into it, but I haven't seen that many people actually implement it yet. Customers I'm chatting to, community members I'm chatting to, everyone's got observability to your point. Multi-cluster, maybe in the future folks are looking at it. Now that I know what you said, even the choice of topics that the community is interested in on the Tech Radars is somewhat telling itself, isn't it?

Cheryl (34:26): Exactly, exactly. The idea behind getting the community to pick the topics is, who knows what things are currently of interest. There could be thousands of things that are of interest out there. So what do the community think what is currently a problem with the ... that the community itself is struggling with? So yeah, as people read the Tech Radar, which actually you can look at, you can find on radar.cncf.io, you can find all our past editions. That's something that I would think about as you are reading and looking at those reports.

Daniel (35:04): Awesome. And I think that's a perfect segue to wrap up. Is there anything we haven't talked about today that you'd like to share with the audience?

Cheryl (35:11): What I've learned from my last couple of years working with the open source community, working with CNCF, fundamentally open source is not hard. In the sense that a lot of people who want to go into doing open source work tend to think, oh, I have to be a genius, I have to be able to solve everything in the world.

Daniel (35:33): The imposter syndrome.

Cheryl (35:33): Yeah. And I try and tell people, open source work is grueling, you have to constantly be there, you have to constantly show up to things. You have to do this over an extended period of time so that people get to know you. But fundamentally, what you put into it is what you're going to get out of it. The more time you put into it, the more enjoyment and satisfaction that you'll get back from what you're doing. Yeah, I guess just a little encouragement, just a little shout out to people. If you're thinking, oh, open source is this really big overwhelming thing, and I don't know where to begin. Start showing up, do that consistently over a long period of time, and yeah, you'll be great.

Daniel (36:24): That is awesome advice, Cheryl. Something like my mentors have said to me in the past, and I started with open source Java and things like that. And when I started doing that, that was definitely an inflection point in my career. I met some amazing people, opportunities opened up. And it wasn't easy, to your point, there was a lot of running meetups and all these kind of things. But I definitely echo that advice to all the folks I work with now. Get involved. The community is where it's at. I think at CNCF, the general cloud community just do a really good job of being welcoming to folks, bringing more folks into the community. Learning and these sort of things.

Cheryl (36:55): Yes. Yeah, exactly. And pay it forward. I love the fact that you're telling other people and bringing in other people to do this as well, because that's how we all get better.

Daniel (37:04): A hundred percent, pay it forward, I love that, pay for it forward. My mentors always said that's me, don't pay it back, pay it forward. That is great advice, Cheryl. Awesome. Well, thank you very much for your time today, Cheryl, great chatting to you.

Cheryl (37:13): Yeah, lovely chatting to you too. Thank you so much.

Developer Control Planes: A (Google) Developer's Point of View

About

Episode Guests

Featured Episodes

S3 Ep10: Foundations of Formidable API Federation feat. Daniel Kocot

S3 Ep11: Embracing Tech Change: Matthew Reinbold on Adapting to Industry Shifts

S3 Ep12: Kubecrash 2024: Engineering Insights with Danielle