LIVIN' ON THE EDGE PODCAST

Livin’ on the Edge #14: Katie Gamanji on Kubernetes Tooling DX, GitOps, and the Cluster API

Ambassador Labs · LOTE #14: Katie Gamanji on Kubernetes Tooling DX, GitOps, and the Cluster API

SUBSCRIBE:

About

In the fourteenth episode of the Ambassador Livin’ on the Edge podcast, Katie Gamanji, Cloud Platform Engineer at American Express and TOC member of the CNCF, discusses the building blocks of developer experience related to interacting with Kubernetes clusters. She covers the evolution of tooling from the kubectl that we all know and love, to UI-driven tooling like k9s and Octant, and ultimately to ApplicationOps and GitOps. Katie also talks about the evolving ClusterAPI and how this is becoming integrated with the Kubernetes developer experience. She discusses if the tooling is currently good enough to allow engineers to treat Kubernetes clusters as resources themselves, and treat them as “cattle, not pets”.

Episode Guests

Katie Gamanji

Cloud Platform Engineer at American Express

Katie Gamanji is a Cloud Platform Engineer at American Express. She was formerly a cloud platforms engineer at Condé Nast, and contributed to the creation of a centralized, globally distributed platform, with Kubernetes as its centerpiece. In the past she has worked on maintaining and automating site delivery on OpenStack based infrastructure, which transitioned into a role that focuses on designing, deploying and evolving cloud-native infrastructure. Katie is a strong advocate for women in STEM, and a public speaker that focuses on topics that gravitate towards cloud-native tools and the Kubernetes ecosystem.

Be sure to check out the additional episodes of the "Livin' on the Edge" podcast.

Key takeaways from the podcast included:

Automation and transparency are key to creating an effective continuous delivery pipeline. Siloing based on teams or technical verticals (e.g. networking, security) leads to inefficiencies via the introduction of uncertainty who is responsible, increased handoffs, and delays in verification.

Creating an application platform that is focused on self-service operation for development teams decreases friction when building, deploying, and releasing applications.

Platform and ops teams can provide expertise, support, and consultation to dev teams when required, but they should not be on the critical path for daily work.

Platform teams should recognize that developers are their customers. Platform and operations teams must empathize with and understand the needs of developers in order to build an effective developer experience.

There are typically three personas involved in building and releasing applications on Kubernetes: application developers, application operators, and infrastructure operators.

Each persona has differing requirements for interacting with a Kubernetes cluster. As the developer experience of Kubernetes cluster tooling has evolved, a series of categories and techniques have emerged: cluster CLI e.g. kubectl, including plugins and wrappers; and ApplicationOps, such as ClickOps, GitOps, and SheetOps.

Engineers are increasingly using kubectl "wrapper" tools to view and interact with a Kubernetes cluster. These tools provide a richer user interface than traditional CLI tools.

K9s provides a useful terminal-driven user interface. Octant provides a useful web-based UI onto a cluster and associated resources. Spect8 provides an interesting view onto the networking components of a k8s cluster.

GitOps enables engineers to specify the desired state of a cluster, typically via declarative YAML configuration, and use tooling like Flux or ArgoCD to ensure that the cluster state matches this specification. All configuration is stored in version control, and this enables easy rollout and rollback, and also provides an audit trail.

The Kubernetes founding team, the community, and the CNCF have worked to define useful abstractions and APIs for all of the personas listed above. These are now becoming a standard, and moving forward the innovation will focus on techniques on top of Kubernetes.

The Cluster API is potentially allowing infrastructure operators to treat clusters as "cattle, not pets". When engineers can think of a cluster as a Kubernetes resource, this opens up new possibilities. For example, clusters can be created and managed via well-established practices like GitOps via ArgoCD or Flux.

The Cluster API can also allow engineers to easily visualise what's "underneath their clusters", such as VMs, network attached storage etc. This will help with provisioning and upgrading, for example, engineers can easily look via the Kubernetes API to see the progress of each node during a K8s update

Transcript

Daniel Bryant (00:03):

Hello everyone. I'm Daniel Bryant and I'd like to welcome you to the Ambassador Livin' On the Edge podcast, the show that focuses on all things related to cloud-native platforms, creating effective developer workflows, and building modern APIs. Today, I'm joined by Katie Gamanji, Cloud Platform Engineer at American Express and TOC member of the CNCF. Katie has recently been writing and talking about the building blocks of developer experience with interacting with Kubernetes clusters, from the kubectl or kubectl that we all know and love to other GUI driven tooling and more.

Daniel Bryant (00:31):

One of her fantastic recent blog posts discussed the use of kubectl UI wrappers, like K9s and Octant, in addition to covering cloud platform web based portal approaches using ClickOps, and also YAML driven approaches such as GitOps. I was keen to hear Katie's analysis of the strengths and weaknesses of each approach and also understand her recommendations as to what to use and when.

Daniel Bryant (00:51):

Additionally, I've learned a lot from Katie recently around the evolving cluster API and I was keen to understand how this is becoming integrated with the Kubernetes developer experience. And for example, I was keen to understand, is the tooling mature enough to allow us to treat Kubernetes clusters as cattle and not pets?

If you like what you hear today, I would definitely encourage you to pop over to our website. That's www.getambassador.io where we have a range of articles, white papers, and videos that provide more information for engineers working in the Kubernetes and cloud space. You can also find links there to our latest releases, such as the Ambassador Edge Stack, our open source Edge Stack API gateway, and also our CNCF-hosted Telepresence tool too. So hi Katie, welcome to the podcast. Thanks for joining us today.

Katie (01:31):

Hello, Daniel, happy to be here.

Daniel Bryant (01:33):

Could you briefly introduce yourself for the listeners please and share a recent career highlight?

Katie (01:37):

Yes. So my name is Katie Gamanji and I am one of the cloud platform engineers for American Express. I've joined American Express six months ago, so quite recently, and I am part of the team that aims to transform the current platform by embracing the cloud native principles and making the best use of the open source tools. In terms of my career highlights, quite recently I've been elected as one of the TOC for the CNCF. So this has been quite a new and quite grand event in my career development, I would say. Being part of the CNCF as a TOC is, I think quite a good opportunity to really influence and leverage how the CNCF landscape should be constructed. So we have the power to leverage different projects and to move them through the pipeline all the way to graduations, such as, for example, quite recently, Helm graduated or of course Kubernetes which has been graduated for quite a while now.

Daniel Bryant (02:34):

Brilliant. Brilliant. You've got so many highlights Katie, at QCon, keynoting, you were keynoting some other stuff online. You got so many to choose from, right?

Katie (02:41):

Absolutely. Yeah. I'm enjoying it though.

Daniel Bryant (02:44):

Good on you. Yeah. Brilliant. So first the traditional question in the podcast is around developer experiences and developer loops. So that capability of being able to rapidly have an idea, code, test, deploy, release, and verify. And I ask folks to share their worst developer experience, you don't have to name names, protect the guilty and the innocent, right? But can you share your worst developer experience?

Katie (03:06):

Oh, I think I've had quite a few bad experiences when it comes to deploying an application and troubleshooting an application. And I think that's why actually I'm in this industry, I want to improve all of these methods around deployment. So I think one of, I wouldn't say the worst, but quite generic was when we'd like to deploy an application through our pipelines, or a team would like to deploy their application through our pipelines. Unfortunately, we had verticals when it comes to, for example, networking and security and the CI and CD, there was all divided across different teams. It wasn't one unified manner. So what actually happens if something would, for example, fail at the networking level in the pipeline, imagine the stress to actually understanding why it failed. As an end user you don't really understand that, so you have to go through every single team. There were so many points of contacts to potentially troubleshoot any bug in your application. And I think that's definitely the worst.

Katie (04:05):

Why I'm saying it's the worst, sometimes it can take days and it's very overwhelming. And I think that's definitely one of the worst DX that we can provision for any kind of engineers. And that's why, as I mentioned, I'm in this industry, in this area of technology where we can really close the gap between all of these functionalities, but at the same time, bring automation around them, bring transparency around them. So the power would be with the developers. Of course they would need to be up skilled, but they would have an understanding of how to troubleshoot and connect to the application and how to properly debug it and just reach us when they need to. So, yeah, that's my motivation, why I'm in this position. But when I started in this area, it was that, pretty much one day per team, who has the bug? What's helping? Yeah.

Daniel Bryant (04:58):

Something I've chatted to you in the past, Katie, around this notion of self service and end to end delivery. What I heard you say there is that they're really important, right? We as engineers need that. If you're doing platform and I'm a developer, I need to be able to interact enough with the platform to get my app deployed, but I also need to have the end to end responsibility. Otherwise, like you say, stuff gets lost and then you've got all these handoffs and all this communication, right?

Katie (05:20):

Yes. I completely agree with this. The focus always when we develop any infrastructure, any platform, it should always be on the developers. We've built a platform for, I think I mentioned this idea before, but we build a platform for our customers. Our customers are the developers. And really need to tailor whatever products and tooling we introduce to their needs. It doesn't necessarily mean that we need to build it in-house. Well, sometimes it's the case, sometimes it's not, but if you get a product off the shelf, we really need to think about, what is important for our developers? Is it actually going to improve the experience throughout? Is it actually going to be benefiting to have a better insight of how the application is deployed?

Katie (06:00):

So from that perspective, it's all about the developers should be always at the center when it comes to any DevOps model or topology my perspective. But it's definitely leveraging further rather than blocking. There should never be a situation where a developer should be blocked or will be blocked because of the new tools or the platform components. Ideally, that's pretty much the state we need to get towards.

Daniel Bryant (06:29):

Nice, Katie, nice. I think that leads nicely into what you and I were chatting off mic around, your fantastic blog post recently around the building blocks of developer experience, Kubernetes's evolution from CLI to GitOps. Yeah, I know we're only a young industry, Kubernetes has been around a few years or whatever, but you and I have both seen that evolution. It's changed rapidly. So I wanted to basically try and break down some of the ideas within the blog post. So you mentioned sort of kubectl, all the way through to plugins, wrappers, and then we can break down GitOps and stuff later on. But what do you see as the primary use cases and advantages and disadvantages of something like kubectl? And most of them, we start with Kubernetes, the first thing we do is fire up kubectl.

Katie (07:11):

Yes. From this perspective, I feel that it is always and generally assumed that all the engineers interacting with Kubernetes are going to be fluent in operating kubectl, kubectl, I'm going to call it kubectl.

Daniel Bryant (07:22):

Call it what you like. Yeah, I know it's controversial, right?

Katie (07:30):

Let the discussion be in the comments about the right way to say it. But yeah, I think usually when you talk about Kubernetes, there are different personas we can think about. And I've identified currently maybe three groups. The first one is going to be the application developer and they focus really on the business logic. They don't necessarily need to operate with Kubernetes. However, they need to have a mindset that their application is going to be containerized. And this comes with, for example, you need to build a Docker image or some of the best practices building the readiness and liveness checks endpoints, stuff like that. They don't need to interact with jube, but they need to understand how to build applications.

Katie (08:07):

The second persona to identify is the application operators. And these usually are the developers that interact with the cluster. And they will have an understanding of the resources in the cluster, help to push an application through our different stages, how to touch different resources. Now this persona is usually the application developers and application operators. Sometimes they overlap, but it's not usually the case. That's why I like to segregate them into two different areas. And the last group of users are the infrastructure operators. Now these are going to be the admins of the cluster. They understand how to deploy the cluster, how to configure it, and of course all the resources within the cluster as well.

Katie (08:47):

So for all of these personas, all of them require a different level of understanding to operate with kubectl and the cluster CLI. With, again, it is wrongly assumed that everyone is going to be fluent within the this tool. Kubectl is actually a great tool, it's like a Swiss knife. You have more than 40 actions you can do with that. And this is associated with more than 70 flags. So you can really go long miles in terms of how can you build your kubectl command. However, not everyone has that level of insight and not everyone requires that level of insight. And that's why kubectl is great to start, it's a good way to explore your cluster at the beginning. However, if you would like to upscale a tool such as Kubernetes to many teams or across a big organization, you might think about ways obstructed, ways to maybe visualize it rather than doing it for a terminal. All of these areas obviously depends on the use case, but these are things to really have in mind when introducing a technology like Kube.

Daniel Bryant (09:51):

Yeah. Nice. It is. I mean, you hinted at there, so kubectl is very easy to get started I guess, but it's mastering it is quite challenging. I didn't even know it had that many flags, for example. I'm mainly like kubectl apply and kubectl get whatever. Right? And when I start debugging, it gets more in depth. How do you think folks, if they need to, how do they go about building that knowledge? Because like you said, but maybe it's more the infrastructure operators persona, you mentioned there. I guess, is it just a case of just playing and reading the docs and learning about all the things they can do?

Katie (10:21):

I think it's a matter of practicing and experiencing. The more advanced use cases you have for a cluster for example, the more advancement you're going to have to raise the kubectl commands as well. I think one of my favorite commands, so for example, in our production cluster, we wouldn't be able to be admin. So we had to impersonate an admin, for example, to modify. It's not a good example. I'm not saying you have to modify things live in the cluster, but when you actually in an incident, for example, you really need to get into that admin endpoint, the kubectl command would be, "Okay, you have to edit, for example, a deployment and you have this impersonation string." So you have to do it as group, which would be, for example, assuming into a GitHub group for admins and then you have to assume into your user as well.

Katie (11:10):

So that itself was a very long command, just to do a kubectl edit, which would be straightforward beforehand. So things like that really makes you think, there is different ways to really custom your commands for cluster. Again, it depends really on the use case, but I think the more advanced use cases you have, the more layers, for example of, if you want security, if you want just specific roles to access specific resources, then you get into things like, "Oh, we have to impersonation for the kubectl command, how do we do that?" Things like that. This is I think a good kind of motivator to move forward and to really think if kubectl is the right way to operate all the time for all developers.

Daniel Bryant (11:57):

Yeah. Well said, Katie, well said on that one. Because something I've played around with a bit, I think I was doing some Wireshark stuff in the cluster and playing around with sniffing network traffic. And I found this plugin I've completely forgotten the name of the plugin, I'll put it in the show notes afterwards, but the plugin was installed using something called Krew. So I did that, installed Krew and then I installed these plugins and I was like, "This is really quite cool." It kind of extended the power of kubectl. So I'm guessing, is that something you would recommend for folks looking to encapsulate some of their knowledge? Could they take some of the complicated stuff, put it in a plugin, and then expose a simpler command to the users?

Katie (12:34):

Absolutely. Yeah, absolutely. I think that's definitely something I advocate for. If you can make it simple, do it. There is no way to over complicate things or type, I don't know type miles of kubectl flags. So from that perspective, yes, I think plugins are quite popular within the industry and ecosystem at the moment. And some of the most widely known I think are kubens and kubectx, which allows you to change your clusters. Or allows you to change your namespaces within the cluster to defaults. As well kubectl tail has been quite popular, which allows you to stream a collection of logs from a collection of plots. Usually you do get logs-

Daniel Bryant (13:12):

Oh interesting.

Katie (13:13):

Yeah. Usually you do get logs from a pod. However with tail, you're able to get logs on the deployment level. So all the pods you'll be able to see the latest output. So the plugins, they really make it easier for us to aggregate some of the commands together and to simplify the user experience. And you've mentioned Krew. Now, Krew is a plugin to install other plugins within Kubernetes. So I've mentioned these plugins and over time, there has been a few patterns discovered in the community. And there was a need to really curate an index of all of these plugins, but more importantly to distribute them in a centralized manner. And that's why Krew is actually important and that's what actually took shape. And why it's a good tool for kubectl.

Katie (14:00):

And with Krew, what it actually offers is easy install of available plugins. So currently I think there is more than 90 plugins available, with just the Krew install, for example, kubetail, bam, and you have the command. But more importantly, with Krew you don't really need to install anything else. Your plugins are going to be consumed for kubectl. So it's a way to further extend kubectl, rather than change it. It's in addition to all the actions you can do now, you can do other actions as well and you can write them. And this is where plugins are actually quite important. One thing I want to mention about the plugins as well, they can be written in any language. So whatever you want to do with Kube, write it into your favorite language and just make it a plugin and then it's going to be consumed anywhere. And if you want to distribute it to a wider community, then Krew is going to be the tool.

Daniel Bryant (14:54):

Interesting. So how does that work under the hood, Katie? Does it compose to a Docker file? Or... Because if I write in Go, if I write in Python, your laptop may not have that installed for example, how does that work?

Katie (15:06):

So with Krew itself, what it actually does, to distribute a plugin for Krew, you will require a CRD which they will actually manage. So the CRD itself, it has two areas of configuration. The first one is going to be informative. The second one is going to be on a guide how to install that plugin. So the informative part is going to have the version of the plugin, short and long descriptions, things which are going to be helpful for your user to change back to the plugin. It's quite important to mention that here you can have like help messages and so forth. If you have a more complex plugin, then you can have different flags or whatever comes in the package. And you can really give an explanation to that.

Katie (15:45):

And the second part of the CRD is the guides to install. Now, some of the plugins, of course you can write them in every single language, however they might be targeted for a specific operating systems. So if you run a Kube, for example, on Windows or if you run it on Linux or MyQuest, it doesn't matter really, but you really can choose to say, "Okay, my plugin works securely or 100% of the time on this operating system." So you can choose that. The second stage is going to be actually pointing to a zipped file of your plugin. So usually if your plugin is going to be in get help, it isn't just going to point at the release itself, which is going to visit. And Krew underneath the hood just does an unzip operation. It will extract the binary, you can actually point to the right binary.

Daniel Bryant (16:34):

Gotcha.

Katie (16:35):

So extract the binary. It's quite a lengthy explanation here. It's important to mention the steps. So extract the binary and the binary itself is then put on your local system, local file system. And with the plugins in kubectl, there are two requirements. The first one, the name of the file should be prefixed with kubectl. And the second one, it should be placed in one of the binary folders within a path environmental variable. And once it's there, kubectl by default will identify the plugin and run it. So underneath the hood, it's quite a lengthy explanation as I mentioned. I do demos about this, so if that's going to be easier for the listeners to understand the process. But yeah, once the Kube itself is pretty much downloading your binary or executable and put it on a file system with the right name in the right path and kubectl is going to pick it up. That's kind of a nutshell explanation.

Daniel Bryant (17:27):

Yeah. Thanks Katie. To be honest I never thought about this. It kind of makes sense that the binary, I suddenly thought when you mentioned multiple languages, I think I've only ever seen plugins written in Go. And obviously these plugins are binary. And I was suddenly thinking, "How does Python work?" Because I've had much fun over the years with different Python versions. Right? When I was installing it. I was just curious about that.

Katie (17:46):

I'm very controversial here. My demo is actually a shell script, so I can promise controversial ideas, here we go.

Daniel Bryant (17:56):

Oh, awesome. Like bash programming, always a winner. Right?

Katie (18:00):

Let's say that.

Daniel Bryant (18:01):

Yes. Well said. Well said. I wanted to move on to wrappers next, working for your blog. When we say wrappers for interacting with Kubernetes, is it kind of this notion of wrapping kubectl? Or is it more in depth than that?

Katie (18:18):

I think there is a reason for why I call them wrappers, rather than portals and dashboards. I was really thinking because portal and dashboards, they're so different across when it comes to visualizing Kubernetes resources. And all of them have different functionalities, but what it actually does underneath, I mean, even in my previous company, Conde Nast, we wanted to develop a dashboard, a point of presence for all our clusters and how to write it from scratch. That was our choice. But what it actually does underneath is just interacting with the API server and just gets resources. So underneath the hood, it's still a very similar way of interacting with the cluster. Usually when I'm talking about the wrappers, I refer to any tooling that provides an operational state of the cluster. And this is usually a graphical representation as well. So it's going to be a portal or it's going to be a dashboard, a terminal. They're tools, which provide ways to integrate a cluster free terminal. I don't even kubectl, it's actually a UI in the terminal. It's going very old school style, but it's quite a popular tool as well.

Katie (19:26):

So from that perspective, it's more about how can you visualize a cluster? But underneath that, what's actually happening, it's usually kubectl get commands mainly. Most of the dashboards are read only. So they just get information. And some of them go the extra mile to provision a way to delete a cluster. So a delete button or a scale button or even a port forward. Then this is something which is provisioned by Octant for example. And that's actually quite cool because for a portal you can do your application in the UI. Well, it actually ports forward the application, provides you with a link, and you'll be able to visualize that straight away. So instead of you actually typing kubectl port forward and understanding what's your internal and external port, it's just the one click and you actually see your application. But that's usually the extent of operations it does. It's quite basic, but the focus is placed on a graphical representation, as mentioned before. And that's why I call it wrappers, because it's usually a wrapper about kubectl commands, a visual wrapper in a way. Yeah.

Daniel Bryant (20:34):

I like it. I definitely think focus I work with data-wise and the customers, certainly folks that are perhaps they don't want to get into the nitty gritty of Kubernetes, having a UI on top is essential, some way of interacting. Do you think the current solutions, I bumped into Octant, I was chatting to Bryan Liles a few weeks ago. And he's obviously working very hard on that tool amongst many other folks as well. Do you think tools like Octant are ready for the prime time for a typical developer?

Katie (21:04):

The answer to that actually, I think it's definitely a better place to start. I've been exploring the area quite some time now. Octant has been open source quite recently and I think it's got to a very stable state. It's been developed mainly to be used by the SR team in the VMware company. So they have a very strong motivation to make that operational and you can actually navigate for it and so forth. But what I really like about Octant, it really provides a good way to explore a cluster. So for example, as an application developer, well application operator, I know there's Kubernetes, but I don't want to interact with Kubernetes through the CLI.

Katie (21:44):

With Octant, I have a finite amount of actions I can do in the portal. So I can go through for different things. So for example, we can understand what is the deployment, what's a pod? You can understand there is a cluster and there's notes under the cluster, and this is all because the dashboard is very well organized. Another good thing about Octant which I really like is that they show a resource and any associated resources as well. So for example, when you have a pod, it's not just the pod. Actually on top of that, you have a replica set, then you have a deployment, usually with every single port you have a service account associated. And that is just a very basic setup. Sometimes you can have configuration maps, volumes, secrets, ingress. It can go the entire mile, but with Octant, you can see all of these in one view with arrows and everything.

Katie (22:35):

And it can really show... It's a very good visualizer of, "Okay, my resource is not just the pod, it's all of these components, but it's so nicely instructed that I don't really need to think about that. It's somewhere there." Yeah. It's somewhere there, but I have the chance to customize any of them if I want to. So in terms of the tool nowadays, I think Octant is a good tool and mainly because I was talking about developer first. I really think they nailed this quite well. It's really a good tool to use by all the personas I've mentioned before. In terms of the, let's talk the cool tools. There is K9s as well, which is terminal UI. Of course.

Katie (23:14):

Terminal UIs are amazing. What it actually leverages, it's actually a wrapper around all the kubectl commands. This is the only tool which does that at the moment, which I've found. So what it actually allows different developers is to navigate for your cluster for different pods by using your our keys. So that's a different way of interacting, quite a... I would say cool. I'm not sure if that's a good word to use, but it's a cool tool to use. And of course, there has been a lot of tools which are to be installed, but they have a paid version for it, such as lens for example, just a Kubernetes ID would be able to install it as a standalone application on any operating system.

Katie (23:54):

There has been some tools such as, for example, Spectate. I think Spectate is a great tool, but it hasn't been developed for quite awhile. Spectate is usually focuses on your pod and any networking components around your pod. So it will showcase your surface and your ingress. So it's a good way to understand the networking glare. However, it didn't have too much contribution in the past months. So maybe the listeners will find it useful and will decide to contribute to it. I think that's going to be a great idea.

Daniel Bryant (24:23):

Thanks Katie. There's a couple of tools I definitely haven't heard of there. So I'll bookmark them to look later on and I'll check them in the show notes. So brilliant. Switching gears a little bit onto what you called ApplicationOps and you talked about GitOps, ClickOps, and SheetOps. I'll say that very carefully. You and I were chatting about that. Could you run through what you were meaning by that? Because I actually really enjoy this bit of the blog post. And you've talked about the kubectl, you've talked about wrapping this to be more visual and so forth, which I think is really powerful. But then GitOps, ClickOps, and SheetOps, is it more around the methodologies associated with how developers interact with this config?

Katie (25:00):

Yeah, it's really about the techniques to deploy application. Now if you look at kubectl, it's just a way to manage our resources in the cluster. When we talked about the plugins and the wrappers, it's still manual labor. You still have to type your commands or your plugins. You still have to click through. With ApplicationOps, I really refer to an area where we automate as much of this. So the end goal of ApplicationOps is to automate the deployment of an application or the management or the resources in the cluster. And that's why I've identified so far three main areas, or three main techniques.

Katie (25:35):

ClickOps is something which is very widely distributed by cloud service providers. And it's just a way to deploy applications by clicking through a set of menus. It's one way, I think it's leveraged by an experience such as Heroku, for example. Just click through, your application is going to be deployed. It's a very powerful DX. As a developer, you really don't care what's happening underneath. Click through, application up and running. However a way with this kind of tooling is some of the questions such as, how do you do rollbacks? How do you store your configuration? Can you redeploy any past configurations? All of these are questions which need to be thought about when using ClickOps technique and maybe creating a mechanism for any of those issues.

Katie (26:23):

And the second area, which kind of transitioned, some of the rollbacks and historical usage, it's a very nice transition to a technique such as GitOps. Because GitOps really things as the Git repositories as a source of truth of how you want your application to be in the cluster. And what I really like about this technique is that the delta between my local terminal and the cluster is just one PR. So whatever I deploy within my terminal, I just PR it and then it's going to be in the cloud just straight away. It's quite a powerful model.

Katie (26:56):

But as I mentioned, some of the problems with ClickOps that we've encountered, with GitOps we have a version state of the cluster, and that's a very important thing. We can do rollbacks. We can do roll forwards. We have always a data file cluster in the history of our commits. And that's really a very different way to think about how you manage the state of a cluster. Before, yes, it should be version, but now it's version in the more native way in a way. It's coming out of the box with GitHubs and GitLab. It's there. So it's a good way for us to further leverage how do we do rollbacks and store our cluster?

Katie (27:37):

Yeah. And I think the last thing that you've mentioned was SheetOps, it's actually referring to Google spreadsheets. It's a new initiative, which has been developed earlier this year. I would like to say that all the puns are intended when I'm talking about so, however if you find to be a tool to be move forward, please contribute. So SheetOps is a way to control your clusters' resources through using Excel. Their mission is actually to replace YAML with Excel spreadsheets.

Katie (28:07):

So here is the main question, I've heard many developers complaining about YAML for example, it being quite bothersome and so forth. Now you have a chance to replace all the YAML with Excel. So again, they mention of course, it's a fun kind of project and it was developed around that. But what it really outlines is the fact that we can integrate any tooling with Kubernetes and to manage the cluster state. And that's the underneath idea we need to see. Kubernetes is really a good way to diversify how you can and would like to deploy to your cluster. So we've seen different techniques. We've seen different obstruction layers. But what really needs to be leveraged is the configuration scheme, which is declarative in Kube. And we can really leverage that further to automate any of the processes that we want.

Daniel Bryant (29:02):

Yeah. Something you and I were chatting sort of just casually before we began, Katie, you was talking a lot about abstractions, getting the abstractions right. And then putting the APIs on. I think that's something that Kubernetes has done really well and probably the CNCF as well, the whole community. I think that's probably, I'd like to get your opinion here, but do you think that's one of the driving factors of the success of Kubernetes and the CNCF?

Katie (29:22):

I think so. I think, we're talking about the abstraction layer. Introducing an abstraction layer over Kubernetes has been extremely easy and effortless for our time. Of course, it wasn't like from the beginning, but moving forward, we really could see a system where all the components would be talking to each other transparently. We would not have anything underneath the hood and we could follow the request, which goes, for example to the API server, from the API server to the scheduler, back, then it goes to the control manager. So we really can understand the full root of any request we have. And from that, because we have transparency, we can introduce abstractions because if you know what's happening underneath, you can abstract and you can pick and choose what's important for you to configure.

Daniel Bryant (30:10):

Interesting. Yes.

Katie (30:11):

So from that perspective, I think it's been quite a powerful tool throughout. And I think there has been many ideas around Kubernetes just becoming a standard when it comes to platform. I think there was idea, I think Kelsey Hightower mentioned it, was that we're not going to think about Kubernetes anymore moving forward. Even now, we're thinking about techniques on top of Kubernetes. So which means that Kube itself, it's actually becoming something as standard as an operating system, for example. It's there, it's stable, it runs, we just need to care about the applications or the layers on top of it. So from that perspective, I think it's been quite a successful project.

Daniel Bryant (30:49):

Yeah. Good insight there, Katie. Something actually that again we were chatting about earlier on, was the evolution of the cluster API, for example. I know you've done a lot of work in this space and that sort of, you mentioned it's treating the cluster as a resource, and as we're now pulling up from the traditionally cattle versus pets was a container focus, right? One container. We now treat that as cattle, rather than pets. I'm guessing with the abstraction moving up to the cluster now, is the goal or one of the goals maybe treating those clusters as cattle effectively?

Katie (31:21):

Yes. I really hope that's going to be the case. I think now we've been treating, or even at the moment, many teams are treating their clusters as precious things. They can be incorporated in like, Lord of the Rings or something like that.

Daniel Bryant (31:33):

Yeah. Jump in there, Star Trek, right? I love my Star Trek.

Katie (31:38):

So at the moment, this is the state. When you have something which is precious, which you really think about how can you fail over? How can you make sure that the disaster recovery processes, it's defined from scratch? All of these things are important, but the idea which I think the community move towards now is, how can we move that importance of a cluster? How can we make sure that even if we lose a cluster, we're going to be up and running with minimal ease? And that's why I think there has been a lot of movement towards making the cluster deployment easier, making it more... Well when talking about abstraction, making it possible to different call providers. And that is where cluster API has been quite successful as well. And that's why I'm actually a very big fan of the tool. I think it actually opens very good opportunities.

Katie (32:28):

So with cluster API, we have one interface to deploy our clusters to many cloud providers by using the same manifest. But the most important concept that it introduces is thinking of your cluster as a resource, generally as a resource in your... As a deployment or as a pod, it's the same kind of concept. When you have a resource in the cluster, you have control, well we have control managers on top of it, but you can recreate it. You can roll different versions to it. You can delete it. It's all of these operations you can do on top of your cluster. So from that perspective, it's a very powerful concept. And with cluster API, I've been mentioning the GitOps model and one of the latest thing I've been concentrating on, and I'm trying to enhance it and distribute it to the community as much as I can, was the integration of ClusterOps with GitOps. So all the Ops here.

Daniel Bryant (33:25):

Yeah. Nice.

Katie (33:27):

So how can you do cluster operations by leveraging a mechanism such as GitOps? And from that perspective, is with cluster API, we can have our cluster defined in YAML manifests. If you have YAML manifests, you can introduce a configuration manager on top, such as Helm or Customize, which means you'll be able to template all of these resources. And once you have a configuration manager on top, or a templating layer, that means that you could use a technique such as GitOps very easily. And well, the templating layer is optional, of course, but it's a good way to further customize clusters for different regions. We don't want just one set of manifests to be individualized. If we can template it, why not?

Katie (34:13):

So we have our manifests templating, and then we can use a tool such as ArgoCD or Flux at the moment to do GitOps. So at the moment, one of the demos I'm doing is, how can we visualize our resources for the clusters in ArgoCD? It's quite a scary view because we have cluster API introduces quite a couple of new resources. It's actually five of them, but you have machines which are going to be turned into instances you have machine sets and machine deployments. Very similar to replica set and action deployments in Kubernetes. You have the concept of the control plane, which is going to define your master sets. And this is very powerful. I think this has been a good movement because we can do rollbacks to our control managers. So we can actually scale up and down and even change the version of Kubernetes we run in a rollout version. So it's been quite a powerful tooling, but yes, we can integrate further wave or something as GitOps. Yeah.

Daniel Bryant (35:12):

Very nice. And I guess that management, now Katie, falls in very much to the infrastructure operator persona? It's something else they have to look after now. So before I presume they're probably using something like Kind to spin up their clusters. And now the abstraction really would move to something more like cluster API. It wouldn't necessarily be for the application developers or application operators to know too much?

Katie (35:34):

Exactly. I think, when we talked about the visualizing, we could visualize an application or a deployment very easily. But you can never really had a good visualization of your underneath actual clusters. You can have your notes, that's fine, but then for example, if you roll out a new version of... If you have a cluster, for example in AWS, it doesn't matter. You have it there, you can see your instances, but if you're rolling a new version of Kube, how do you see that? How do you actually know that that instance is up and running? How do you know that the version of Kube is the same on all machines, if you're in the migration process? All of this is difficult to visualize at the moment. With cluster API, because we have resources around your instances, you'll be able to visualize your cluster in a tool such as ArgoCD, because you can see all of your services, you'll be able to see the state of your resources as well. That is an instant win when it comes to really visualizing and how that graphical representation of the infrastructure.

Daniel Bryant (36:36):

Yeah, very nice. I think the understandability all through my career when I've done dev and ops things, that being able to understand, build that mental model is critical to me doing good work. So I think this is going to make it that much easier, isn't it?

Katie (36:49):

I hope so. I'm rooting for this project to move forward.

Daniel Bryant (36:52):

Very nice. I'll definitely put the links in the show notes, so interested listeners can then follow your work. And I know you've got some demos that wouldn't translate too well to podcasts, but definitely worth looking on YouTube and so forth. So I'll put those links in. So wrapping up Katie, what are you most excited about working on in the future?

Katie (37:08):

Oh, I think my current role, I'm actually quite excited about it. Being in a fintech organization and introducing concepts such as cloud native tooling, it's challenging, but I think this is going to be quite an exciting product to work on or even project to deliver. I think currently, when it comes to the cloud native tooling, they're very widely used for example by startups, by companies which really focus on tech, but when it comes to other industries, it's still crawling, it's baby steps at the moment. But I think a good use case to actually see a successful fintech running on cloud native tooling is going to be quite exciting. So I'm actually quite excited about that overall. Yeah. Hopefully, hopefully we're going to deliver that and get it working. But I think it's going to be a very good use case too for show how powerful the entire CNCF tooling landscape is.

Daniel Bryant (38:04):

Very nice Katie. This has been awesome. If folks want to follow you online, what's the best way? Twitter? LinkedIn?

Katie (38:09):

Both of them. I am available on Twitter and LinkedIn. As well I'm writing quite a few blogs around all the tooling I'm exploring or new ideas that I have and I would like to transfer them to the community. So I'm going to be on Medium as well for more insightful blog posts.

Daniel Bryant (38:25):

Brilliant. Yeah. I love the blog posts, fantastic, so I'll definitely link to a bunch of them. You can go much deeper in a blog post, can't you, than in talking sometimes?

Katie (38:31):

Yes.

Daniel Bryant (38:32):

Awesome stuff, Katie. Thanks once again for your time today. Really appreciate it. Great chat.

Katie (38:36):

Thank you, Daniel. I'm so happy to be here again. So it's been great chatting to you.

Livin’ on the Edge #14: Katie Gamanji on Kubernetes Tooling DX, GitOps, and the Cluster API

About

Episode Guests

Key takeaways from the podcast included:

Featured Episodes

S3 Ep10: Foundations of Formidable API Federation feat. Daniel Kocot

S3 Ep11: Embracing Tech Change: Matthew Reinbold on Adapting to Industry Shifts

S3 Ep12: Kubecrash 2024: Engineering Insights with Danielle