LIVIN' ON THE EDGE PODCAST

Kelsey Hightower on Developer Experience, PaaS, and Testing in Production

SUBSCRIBE:

About

In the seventh episode of the Ambassador Livin’ on the Edge podcast, Kelsey Hightower, technologist at Google, discusses his thought on cloud developer experience, modern Platform-as-a-Service (PaaS), and explores the reality that every organisation is testing in production

Episode Guests

Kelsey Hightower

Technologist at Google

Kelsey Hightower has worn every hat possible throughout his career in tech, and enjoys leadership roles focused on making things happen and shipping software. Kelsey is a strong open source advocate focused on building simple tools that make people smile. When he is not slinging Go code, you can catch him giving technical workshops covering everything from programming to system administration.

Be sure to check out the additional episodes of the "Livin' on the Edge" podcast.

Key takeaways from the podcast included:

Custom configured and bespoke production environments, which are often combined with the manual deployment of applications, can lead to many challenges “because you're just kind of shooting in the dark and crossing your fingers at every deployment.”

Google’s internal development tooling allows engineers to automagically checkout a codebase and perform the full dev/test lifecycle via a browser-based IDE. All of the development tools and utilities are provided within this environment, and unit tests can be run automatically. When a pull request is issued there is an SLA for another engineering team to review and comment on the work.

Engineers should make informed decisions in regard to how much to learn about the important topics of the business environment they are working in and underlying platform infrastructure. It is possible to get lost in the business details, but developing knowledge of the context and constraints to the problem being worked on helps to make the best implementation decisions. Infrastructure is a complicated topic, but understanding performance and cost implications for design choices is important.

Every organization deploys onto a platform, whether or not they make the decision to buy a Platform as a Service (PaaS) or consciously build their own platform. Recognizing the needs of the customers (the developers) is key to building an effective platform.

Modern platforms and PaaS can effectively run a standard 12 factor web app. The real value in a PaaS can be seen if it can easily run other workloads, such as legacy (money making) applications.

Although there is clear value in creating open standards for infrastructure and platforms, sometimes adopting de facto or agreed upon standards are enough. For example, the iOS ecosystem is not open, but this has driven a lot of innovation and has seen the delivery and upgrade of many successful applications.

Part of the value provided by organizations like the Cloud Native Computing Foundation (CNCF) and Continuous Delivery Foundation (CDF) is that they aggregate projects and tools, and provide a distribution mechanism. They recognize the value of, and help implement, a governance model for each project. They also provide lifecycle management, potentially long after the original creators have moved on.

There is no single “best practice” architecture, such as microservices or a monolithic approach. Focus instead on your goals and constraints, and choose the practices that fit this best. Learn from other organizations, but recognize that labels (“nanoservice”, “macroservices”) do not always capture the actual underlying practice.

There is value in keeping development loops within a local context (e.g. on a developer’s laptop) for as long as possible. Use unit testing, integration testing, and mocking appropriately.

Be conscious of test harnesses that leak the underlying abstractions of a platform being worked on, as this will couple engineers to this platform over the long term.

Organizations are always testing in production, whether they deliberately choose to or not. Configuration is often the biggest source of problems and outages, and it can be very challenging to test this in a non-production environment. Incremental testing in production, using techniques like canary releasing and feature flagging are very powerful.

When starting out in a career, engineers should go super deep on the technologies used within the current organization. This provides good value to the organization, and also enables them to take the context learned here and apply this to future similar problems or tools, e.g. understanding established software-based network proxies such as NGINX and HAProxy allows an engineer to understand Envoy more easily.

In the future, innovation within the developer experience space will focus on workflows. Some platforms may specialize e.g. providing the best DevEx with ML-based application development

Transcript

Daniel: [00:00:03] Hello everyone! I'm Daniel Bryant and I'd like to welcome you to the Ambassador "Livin' on the Edge" podcast, the show that focuses on all things related to cloud native platforms, creating effective workflows and building modern APIs. Today I'm joined by Kelsey Hightower, a technologist at Google.

Now, I'm guessing that Kelsey really needs no introduction to listeners of this podcast, but just in case you haven't bumped into his work, he is a keynote speaker at KubeCon, co-authorwith BrendanBburns and Joe Beda of the O'Reilly book "Kubernetes: Up and Running", and creator of one of my all time favorite tutorials " Kubernetes: the Hard Way".

I was keen to chat to Kelsey about all things related to developer experience, modern PaaS, and also testing and production.

If you like what you hear today, definitely encourage you to pop over to our website. That's www.getambassador.io where we have a range of articles, white papers, and videos that provide more information for engineers working in the Kubernetes and cloud space. You can also find links there to our latest releases, which are the Ambassador Edge Stack, our open source Ambassador API gateway, and also our CNCF-hosted Telepresence tool too.

Hello Kelsey, and welcome to the podcast.

Kelsey: [00:01:02] Awesome. Happy to be here.

Daniel: [00:01:04] So could you briefly introduce yourself and share a recent career highlight for the listeners please?

Kelsey: [00:01:09] Oh, wow. So I'm Kelsey Hightower. I consider myself a technologist at Google. I work in our developer relations org, but I do a lot of work with our customers and help contribute to products ranging from,on-premise appliances and Kubernetes.

And I guess my most recent career highlight, might be the diversity and inclusion work that I'm doing at Google. So this is about trying to make Google a very inclusive place for all backgrounds, beginners, experts, from all parts of the world.

Daniel: [00:01:39] Very cool. Very cool. So first off Kelsey, I wanted to dive into developer experience and developer loops. So I'm specifically thinking around that ability to take code, test it, deploy it, release it, verify it, that kind of thing. I wanted to dive into your past a little bit, and probably without naming names, protect the guilty and the innocent here, could you describe your worst developer experience or your worst dev loop from code to prod.

Kelsey: [00:02:02] Yeah, I mean I think it's really like in that Java ecosystem. It's, it's fairly heavy. You know, like earlier in my career, maybe about 10 years ago, a lot of the stuff in production didn't match any other environment because it was kinda hand curated. So this is a world where there's probably less than a hundred machines, but they're big machines, right?

Like, you know, 32 cores, tons of RAM. But the machines were handcrafted over years. So. In order to understand what was there, you would always have to kind of a take a peek and see if there's any changes since the last time you tried to work on something. And those environments are tough because you're just shooting in the dark, so you're adding stuff to your local environment.

Maybe it works in testing, but it just doesn't work in production. Add to that the configuration, which doesn't get talked about as much as source code, but when the config was radically different meaning the Oracle connection parameters, message queues are different. You just have no idea how that code is going to behave because sometimes it changes its behavior based on that config file.

And those to me, have been the most challenging environments to work in because you're just kind of shooting in the dark and crossing your fingers at every deployment.

Daniel: [00:03:13] Could you share your best developer loop? This may be in the past or maybe something modern, but your best experience of coding, getting feedback really quick.

Kelsey: [00:03:22] Oh, by far at Google. So Google has a famous system internally called Google Three, there's lots of tools built on top of it. A lot of people give credit to it being a monorepo, but I think that's more of a technical implementation detail. I think it's more about the ecosystem around it. So specifically I was adding a bit of code to Go, a Cloud Function support.

So Google cloud has a product called Google Cloud Functions. And when you're writing functions is very language specific. And in that case, I was not really used to working inside of Google Three. All right? The nice thing about that developer workflow is, you go to Google Three, you think about something like GitHub internally for Google.

I can click around and browse the source tree and find out where I want to work and once I know where I want to work, I can just click on that and then an IDE opens up in my browser. It has all of the tooling Google needs, a lot of the linters, all the things that it really takes to really contribute code into the code base.

It deals with authentication, it deals with everything, right? So I can just start writing code and I remember making some changes based on a few open bugs, and when I hit save and you're ready to commit right away, there's an SLA for someone in another country to review my code. And as they made their reviews, I got an email saying, Hey, your review was in, and I could just click on the review and get right back into the IDE where we left off to address any feedback from that review.Hit save. all the unit tests are running, get a couple of look goods to me, and then it was going to be rolling out into the staging and environments soon after that. So that was like the best experience I've ever seen, because all of the best practices were not encoded in a doc, but it was encoded in a tooling.

Daniel: [00:05:08] Very cool. Out of curiosity, cause it was the IDE running locally or were you in some kind of VM based environment?

Kelsey: [00:05:13] it was running in the browser, so my guess is it may have been running on Borg somewhere, or maybe it was running on some Google infrastructure somewhere. I didn't have to know.

Daniel: [00:05:22] You didn't care. You were just in your zone. Be able to like release your code as fast as possible.

Kelsey: [00:05:26] Yup. I didn't have to install bazel. I didn't have to install any of the building tools. Right? I can just jump into my IDE and I'm pretty sure there's some people who prefer to have more of a local experience, but in this particular case, I just need to make some small changes in specific areas of code, and it was really nice since people had already written a lot of good integration testing and unit tests, I could do everything right from that browser.

Daniel: [00:05:49] That is very cool and definitely keen dive into that a bit more later on around the Kubernetes experience in that space as well, or containers in general. Before we dive into the platform stuff, I wanted to bring something up. I'm chatting to a lot of folks who are in this moment.

One thing I hear with developers, they're expected to know a lot these days. Business savvy, better platform experience. What's your thoughts around this?How much knowledge do you think developers should cultivate around understanding the business they're working in?

Kelsey: [00:06:15] I mean, it depends, right? I mean, a lot of work we do, even if you understood the business. It may or may not matter. I know a lot of people like, oh, focus on the business, but yeah, truly, sometimes those things are decoupled, right? Sometimes you have a really successful business that just needs a little bit of technology to back it up.

For example, if you have a very popular e-commerce item, right? Some item that everyone in the world wants. It has amarket fit. People love the price. They just can't get enough of it. At that point you just need to adopt some technology to sell the product. You knowing how to design a better shoe or a better microwave will probably not do anything because the product is there and people want it.

Not everything works that way, but at that point you just need some supporting technology so people can get access to the thing that they want to buy. And that's a specific use case. And there's other use cases where if you knew the cost to run a particular business, like I think everyone was impressed when Lyft, when IPO and they produced their S1 and said, Hey, I think the number was around 14 cents per ride is how much they pay in technology costs to do that. Right? So that's the kind of thing where knowing the business and knowing where the margins are will help you pick better technology choices to make sure that you don't, you know, make those margins disappear.

Daniel: [00:07:33] Totally makes sense, kelsey, it totally makes sense. On sort of flip side of that, and you've kind of hinted at there with the Lyft example, is how important do you think it is for developers to become operationnally aware these days? We've been talking about devops of course, for for many years now, but still I see some developers that just want to write code. They're good at it. Delivering business value, but they're increasingly being encouraged to develop more sort of understanding of the platform, understanding of Kubernetes, ops, many other technologies. What's your thoughts on how,we as developers should react to that?

Kelsey: [00:08:02] It depends, I think, as the individual, right? So if you're on the market looking for a job, I think it's safe to say the more knowledge you have, the better. If you understood how infrastructure works. The better. I don't think anyone would complain that you know so much about the stack that you're deploying to I don't think there's going to be any complaints there.

Now, on the business side, the question to ask is if I find a very talented developer that can write the code safely, securely, the way I want it, but they struggle with the infrastructure components. Is that a problem? And, andin some world I think that shouldn't be a problem. Sometimes you hope that there are people that are really good at infrastructure. For example, Comcast is pretty good at infrastructure, right? I got a cable modem from the store, I screwed it in and now I have Internet connectivity and the people providing me my content don't really need to know anything about Comcast's network.

They just piggy back on it. That's how it should be, right? It should. Everyone at Netflix know exactly how Comcast works. This is make a lot of sense that every movie producer in the world that's creating a movie or TV show, know how streaming works at a technical level. That doesn't make a lot of sense. So any software developer attempting to implement some front end should they have to know infrastructure to be productive?

And I think the answer is no.

Daniel: [00:09:31] Very interesting. Interesting take on that. I wanted to dive into some of the infrastructure stuff a little bit more now. Some of the platform stuff and a couple of great tweets you've got going on at the moment and one was talking around, you know, one of your sysadmins gives me root access, and I install a bunch of CNCF projects. Even the ones you don't need, then make their big reveal during your office offsite. I though it was a very funny tweet, but I definitely see folks doing this. I mean, I've been there myself in the past, got a bit enamored with Docker, but enamored with Kubernetes and kind of putting these things into a stack.

How do we stop folks doing this? If I could step, say, 10 years back in time when I was building these platforms and you were going to give me advice, how would you encourage me not to just go tech crazy?

Kelsey: [00:10:10] I think as it's so cheap to do that, right? Like if we were building a car manufacturing plant. You can't just go swap out all the robots. You can't just go swap out all the lights and electrical circuits. You have to plan because it's a physical structure. You need permits, you need experts, and it has to work and it has to be safe, and is a huge capital investment to build a factory.

But the thing is, typically, you know what you're building!

Daniel: [00:10:40] Yeah.

Kelsey: [00:10:41] Cars, not frigerators. So that factory is going to be retrofitted to do that as quickly as possible. And you might leave a little bit of room to turn that factory to produce a slightly different car, but I don't know if you're going to be able to take the factory and just pivot tomorrow to start creating refigerators.

Now in the software world, we kind of started building these systems to support everything we might imagine possible. It's important. Any programming languages, even the ones we've will never use, it will support the thing I just read online. Oh, I read that this other tool that we don't need is better than this tool that we already have.

And it's so cheap for you just to take two days, install it and claim victory. And I think that's one part of it. So the nice thing about that is we can experiment and find something slightly better and it's a little easy to bring it in and, and to incorporate it. But I think the root of the problem is what are we actually using this infrastructure for?

Once you understand that part, then you know when to stop. Hey, we write in one programming language, we deployed to one server. That pipeline should be fairly easy to construct. Now, if you spend a bunch of times, then, well, what if we where to deploy to 10 million servers. That's when you start to have people, you know, spending a lot of time trying to future proof because they may not be sure about what the actual challenges right now, and they're very afraid of never getting the opportunity to do version two that accommodates the new needs when they show up.

Daniel: [00:12:20] .Makes sense. So the advice would be to perhaps think more closely around your goals and within a reasonable timescale as well.

Kelsey: [00:12:29] I don't know. To me it's more like you'll never be finished. You're never going to be done. You're never going to understand the problem fully, and the best you can do is operate on what you know right now. I try not to paint yourself too far in a corner, but do understand that whatever you do now may be obsolete in the future.

So all you're trying to do or just be a professional, ask the right questions, take everything you know and do the best you can for now, but knowing that it's going to change so, don't get too tied to it and don't overdo it because then it becomes a bit harder to reverse that or change to something else.

Daniel: [00:13:10] Yeah. I like that, kelsey. That makes complete sense based on myexperience. So another evergreen tweet of yours, which I've quoted many times in my talks, it says, I'm convinced the majority of people managing infrastructure just want a PaaS, a platform as a service. The only requirement: it has to be built by them.

Could you unpack that one a little bit for us?

Kelsey: [00:13:30] Yeah, so no one just wants raw infrastructure for the sake of it. They, they're usually trying to do something with it. And to credit to all those people that are managing infrastructure themselves, trying to build a PaaS. I have a lot of empathy for people trying to build a PaaS or truth is most PaaSs today just don't work, not for everything you want to do.

Right. worked for 12 factor apps. They may work for Spring Boot apps. They may work for cloud native apps. The problem with that is not every application you have right now probably fits in one of those buckets and then this is what presents the challenge. Can you retrofit everything else to conform to that platform, or do you have to customize the platform and now we're starting to deal with that unit of time.

Sometimes you don't have the time. It may take you five or 10 years to refactor all your apps and that platform we're using maybe obsolete by then. And I think this is where people struggle. So given that caveat, I think where most people are doing, they're being forced to assemble these millions of pieces of infrastructure: Prometheus, Kubernetes, FluentD, all of this stuff.

So they can hopefully at some point put a little high level UI on top that accommodates the applications and the developers at their specific company. That's all they're trying to do. So that that whole tweet is about, they need the flexibility to build it themselves because most platforms have shown that they don't even cover the 80% use case.

Right. They cover more like the 50% use case.

Daniel: [00:15:09] Makes sense. Kelsey. So unpacking for a second, if I was just dealing with a standard kind of web-based workload, what do you think a modern PaaS would look like? Is it me simply coding, doing a git commit, a container being built in a Kubernetes cluster, and then creating a route to that service kind of Heroku experience?

Do you think that's what most of us are after?

Kelsey: [00:15:30] Oh yeah.Let's say you have a brand new application. No previous technology investments, if you're doing a front end look, CDNs,Netlify, all of those kinds ofjust give us your source code and we'll host it and make sure that it's everywhere. Maybe you mix in a little bit of CloudFlare workers if you customize a little bit underneath that, and then if you have an API backing that, that means you do get to pick more of that API pattern of HTTP or gRPC.

And then you can host that on like Heroku or CloudRun, or even AWS Lambda. So in that kind of perfect world, if you do things based on what we've learned in the past, you have so many of these easier to use platforms because you don't have to think about auto scaling and load balance. There's an all of these infrastructure as a service primitives.

So I would say this, even if you didn't even have a great developer loop. Just checking in your code, like just versioning your code to say, we deployed this, not sure how we got this, but we deployed this. So even in a manual workflow, right, if you were to step through it manually, right. I wrote some code, I checked it in a, maybe I version it, and I deploy it based on the instructions of the platform that I'm using.

If you write that down. Maybe someone comes around and says, we should automate that. Even if they didn't though. It's a pretty straight forward process to follow and it's hard to get wrong because you're on guard rails thanks to the platform.

Daniel: [00:17:06] Yeah. It makes total sense.go back to your earlier point. How do we then scale that up to handle other workloads.

Kelsey: [00:17:11] So the truth is, I always do this mental thing and say, based on all of the kind of platform as a service products available today. What applications could I not build? And let's take speed and performance, and just put that to the side for a second. Most platforms as a service don't allow me to access something like a GPU or custom hardware. So any workload that falls in that category is going to be a hard sell. Like machine learning could be very difficult or not as performant as it should be. Now, once you take away the custom hardware aspect, then you start to get into, can I read something from a message queue and process it? Yeah. Can I take a web request?

For sure. Can I read a file? Maybe not from a POSIX file system, but you could probably read that file from like, you know, God forbid FTP or an optic store like Google cloud storage or S3.

But you can probably do any computation that I think most people would think of that doesn't require custom hardware.

So I think we're getting close to the point where any data crunching pipeline type of applications, web serving, the majority of things people need to do. I do think now we're getting to that 90% The problem though is. What we're really measuring now is, can you run this application I've wrote 15 years ago?

This is the thing that's making me money. So the way we're judging these platforms, it's not the fact that they can, in theory and in practice, actually run most CPU required or data crunching applications you would build today for most problems that we know about. The fact that they don't run the thing that we use 15 years ago is where we start to draw the lines around platform maturity and its capabilities.

Daniel: [00:19:05] Hmm. Interesting. I was going to ask you a COBOL question later on actually, but it's something that you mentioned there, which sort of caught my attention, is you mentioned things like message queues, and I was reading some stuff by Brian Liles the other day talking about, interop between CD components, continuous delivery components.

in all we've been talking about in the last of 20 minutes or so, do you think there's a role that open standards play here, in messaging and in continuous delivery pipelines? The integration between all these different things. Is open standards important?

Kelsey: [00:19:37] That one's tricky, right? Because does it matter? I take the iPhone, for example, is iOS open standard?

Is the app store and open standard? Yet there's hundreds of millions of people creating software super successfully. I've never seen so many automated updates of software in my life. Video games, very complex applications.

Somehow we found a way to create and distribute to, 300 million devices to people we've never met, continuously, right?

We proved that you can actually do it, and it's not the fact that they're open, it's just that there's an agreed upon set of standards. So whether it's open or closed is debatable. But I think what we're saying here is that in the Apple world, things are very clear because there's typically one way to do it.

There's one app store, there's one deployment model, there's one set of configurations, and there's one target. And the open source world. What language are you talking about? What's platform? What cloud providers. So that's where open standards become a little bit more critical because there's so much choice underneath that at some point, either rewrite abstraction layers with if statements and case logic, or we try to have one spec or protocol, and we hope that everyone implements it . The battle between open and closed, it's more about the execution, and I think Apple has proven that even in the closed world, the execution can be flawless and the open world, I think we see good examples was like HTTP, for example.

Daniel: [00:21:18] Oh, yep, yep,

Kelsey: [00:21:19] Right? But we've seen less successful ones like CORBA where

Daniel: [00:21:23] yeah.

Kelsey: [00:21:24] if the ecosystem isn't broad enough, it just becomes yet another world of isolation that not everyone wants to integrate with.

Daniel: [00:21:30] So if we,put the open and closed aside a second, do you think as an area in our industry at the moment, there is calling out the standards that perhaps isn't getting the attention it deserves.

Kelsey: [00:21:41] Yeah. So Apple is in a unique system or a unique position that they have an ecosystem big enough to motivate people to build on top of it. And they have the money to do it. They have the testing, they have all the things necessary for ecosystem to thrive. But what happens when you don't?

So if you're a bank and your business is bank accounts, not creating technology per se. You may not want to spend the work building the community, defining the standards, building all the validations, and then honestly, usually when standardization comes, someone has to enforce the standardization. What happens when someone isn't compliant? Most industries, most companies don't really have the extra money to spend on the full life cycle of standardization for certain things that they just want to use. So in that world, it makes a lot more sense for all the banks to come together with the regulators and the consumer protection folks and create things like open banking standards to saying what ,the people we're trying to serve and those trying to serve them, let's take input from all of them. And let's build something that's super reliable or consistent, and then that should also set the stage for much broader new ecosystem to build on top of that versus bespoke systems being built by these various companies.

Daniel: [00:23:09] Mmm. I like that a lot. That makes a lot of sense. I guess as you were talking then, I was thinking about organizations like the, cloud native computing foundation, the continuous delivery foundation. They're probably at a higher abstraction that you were talking about there, but do you, do you think there's a good role for folks to play in those organizations to perhaps orchestrate some of this more lower level development?

Kelsey: [00:23:29] Yeah. So I think they're brokers, right. So to me, in some ways, I'll kind of look at the CNCF, like Netflix. They aggregate a lot of these projects and tools, and they give a distribution mechanism form. So even though they don't require any project to do anything a specific way, but they have a way of thinking about like you should have a governance model.

Daniel: [00:23:53] Yup.

Kelsey: [00:23:54] They have an idea on what a healthy project looks like. And they understand how to cultivate a community, whether that's with conferences or having a special interest groups, but having something or an entity that will be involved and engagewith the project longer than the originators, right?

So let's say you are that bank. You may not be as interested in working on this 10 years from now,

Daniel: [00:24:20] Yeah, yeah.

Kelsey: [00:24:21] right? But if the rest of the world is. Then it should live longer than one entity. So this is why I really think those foundations have a big role to play because I'm a victim of that, right? I've started a lot of open source projects where I can't just maintain them for 50 years.

I can't have that kind of pressure on me, and usually you're doing it for free, so we need something a little bit more sustainable. So I think those foundations are more about sustainability than anything else.

Daniel: [00:24:47] . I want to switch gears a little bit now and just talk about some of the dev loop stuff you mentioned earlier on and how do you think modern architecture styles have impacted, like the development loop . Obviously everyone's microservice crazy these days. I've seen some interesting chat on Twitter recently.Cindy Sridharan, bunch of other interesting folks talking about macroservices. I think it was the Uber team were coming up saying, Hey, we're actually refactoring some of our microservices into what we're now sort of calling these macroservices, mini monoliths almost. Do you have any thoughts around how best to choose your architecture styles?

Kelsey: [00:25:19] No, it's funny. I've, it's a lots of customers and engineering teams and we do these whiteboard deep dives and they always ask Kelsey, what are the best practices? And always like what, what classifies something as "best". And the way I look at it now as we're just practicing. Uber is practicing, they practice with microservices for everything.

Maybe they went a little too far. So, and their practice, they learned that they should dial it back a little bit. Maybe everything didn't need to be split into a hundred pieces, maybe sometimes 20 pieces would have been better. And to me, I look at this as just a practice. So, so far in our practice, we have some experience in building applications where all of the services are deployed in a single executable. There's a practice around that, and I question if people have been practicing that well, because some people don't really think of , multiple services in a single executable in the same way they think about microservices when the services are deployed in different executable, but in practice they should be one and the same: the way you care rate them. The way you deploy them. You can Canary services in a model of this well, but the tooling isn't there. And the same is true for microservices, right? When it comes to stick those things back together, you end up with that distributed monolith, and we're just practicing. So I think the real thing here is relax a little bit.

There is no right or wrong. There is no best. There's just practice. So I think the way that influences the developers inner-loop is you've got to ask yourself, how are you practicing. Right? As you start out, you're going to practice by just learning the git commands and learning what a tag is, and you may build a pipeline or two.

That's how you're practicing. But thank you may practice slightly different. You may start to automate some of those things and wrap them in bash scripts or IDE plugins, and then that will be how you practice your software development. But it's just a practice. So I think the best way to go about it is to study the practice of others. And, and in, in many cases, ignore the names they attribute to their practice because it can get confusing what one company means by microservices or cloud native may not be the same as another company using the same names, but the practice is different. So examine the practice and then see, okay, I understand why they think this is microservices, because in practice they are looking at opening the door to polyglot programming,

Daniel: [00:27:59] Yeah,

Kelsey: [00:27:59] Opening the door to being able to deploy a service independent of another service through an executable . That's the practice. Then it may align to one of these architectures.

Daniel: [00:28:09] , I think another thing around dev loops and the notion of services, whether they're distributedor whether they're sort of packaged and distributed or not, is the testing to, if you've got testing and you've got other dependencies on your service. I chatted to Matt Klein a couple of weeks ago on the podcast, and he was talking a lot about his experience.

He's actually gone out of his way to, you know. Get around, say, developer loops that are imposed on him by a company in order to work as local for as long as possible. So he talks about spinning up mocks rather than having to run other services that he was dependent on. Have you got any experiences or advice around managing these dependencies when you're testing and building that local service

Kelsey: [00:28:48] Yeah. I'm more in the Matt Klein camp. I like to work with the source tree directly as long as possible, and if the dependency is easy, like MySQL. I might run it locally, you know, just to kind of have that real database and scratched this. And I'll keep that as my dev database. Sometimes that's a little easier if I don't have the time to mock it all out.

Sometimes I will actually start, especially when I first build an app or I don't really start with the mock. I tend to start with the real infrastructure around it and I try to favor things that I can actually run locally. So then as I'm getting that initial prototype or that v zero zero one out the door, I like to make sure that it actually works. You're gonna test my SQL driver. I gotta test all these various things.

Now, once those things become a hindrance, reading, meaning I don't want anyone to have to stand up a database so they can run a unit test. That's crazy, right? So then you start to say, let me optimize this testing loop to do things like mock things out so that they can run fast.

So now I'm looking at mocks as a, kinda, trying to get that boost and performance, so you can run a locally and you don't have all the same infra dependencies I had. You can just kind of focus on the logic for as long as you can. So I prefer that. So no containers in my loops, no Kubernetes in my loop, none of those things in my loop.Configuration, what services do I depend on? And if I really do need a second service on my laptop, or third or fourth, I actually tend to just run those services, in the background on different ports and then update my config to point to those local services. I need that to work quickly and fast. And then when we go to deploy, I look at that as integration testing, right?

Daniel: [00:30:30] Interesting. What's your thoughts on kind of the local to remotedev loops You've got Telepresence in the CNCF and , I've seen you talking about, I was Ksync or there was a couple of other tools where you were using nice domain names to access services that were actually running, I think in a remote Kubernetes cluster? What's your thought in relation to that? You're not really going mocks or local development, but you're not really going full on developing in the cloud with your IDE in the cloud. It's kind of a halfway situation. Do you think it's good or bad?

Kelsey: [00:30:58] Yeah. I don't know. There's good or bad, it's just practice. Right? SoI've worked in environments where I don't know about all the other services, like there's 30 other dependency that someone said, Oh, you have to talk to this service, and then this other one, and then this is how you send mail. And I'm like, ah, that's a lot of stuff, right?

So I don't really have access to those things and nor do I want it. So in those cases, I might just ask someone, what is the URL to the testing mail server thing? And once I have that config, I can understand why some people may say, well, all of those things live in Kubernetes. Just deploy your app into a namespace where those things live.

And I think that's okay, right? Like, look, if you have all the infrastructure, it may come to mind that you may want to have a practice where you just kind of develop in the like environment. But one thing I've noticed, a lot of those type of apps become super brittle. When we have to switch away from Kubernetes to something else. Because you start leaking infrastructure details in that code. For example, people start having a flag where they just take a short name. That's not a real URL. You can't interpret that. So at that point, you're assuming that maybe I'm in a Kubernetes namespace and anyone looking at this flag setting should understand that and it's like, uh oh, you know, we're leaking some details that shouldn't be there.

So I'm still of the nature that even if you do have Kubernetes, I'm still of the mind that, okay, if all of the dependencies for staging when I'm doing my integration testing or that initial dev loop, I just want the end points that are accessible. The fact that they're in Kubernetes, to me is an implementation detail.

So I'll get all of those remote URLs, HTTP, blah, blah, blah. Even if I have to port forward to those to make them run on localhost, it doesn't matter to me. That's just the implementation detail. And then in my config file, I'm going to list those services, but it doesn't matter that they're running in Kubernetes or actually on my laptop.

That's the way I like to draw that line, but I get what Telepresence is trying to do. It's saying, look, we have access to all of this infrastructure that are already has name spacing. And it deals with a lot of these things that you would have to reimplement on your local environment. But I just been around long enough to see us going from bash scripts to Puppet to Vagrant

Daniel: [00:33:22] yes. Some here.

Kelsey: [00:33:22] now to Telepresence. So it's like it's never going to stop, but if that's the way you work, if that's the way you practice, then maybe Telepresence is going to be a great tool to help you with that practice.

Daniel: [00:33:33] Mmm. Yeah. Well said Kelsey? Well said. A final questionI wanted to get your thoughts on testing in production. This is a very popular topic. Again, we can pick some really interesting names out of the hat here, a Charity Majors, Cindy Sridharan, talking a lot about this. Some folks are a bit scared, I think, with the idea of testing in production, even if it's dark launching or Canarying. I'm guessing Google's got an opinion on this, but what's your opinion in this space, Kelsey?

Kelsey: [00:33:55] You're testing and production, whether you like it or not

Daniel: [00:33:59] Yeah.

Kelsey: [00:33:59] is not a choice.

Daniel: [00:34:01] Yeah.

Kelsey: [00:34:01] You're, you're always testing in production what think the conversation is, should you deliberately test in production, right? So I think a lot of times people are saying, I only want to deploy to production when I have to.

And once I deploy, you know, that's when the testing starts because that's why you're always like putting extra eyes in many cases on the metrics and the logs and the errors. No, there is no way any environment except production looks like production. Right? I've, I've seen issues because of the project name. Right. So one project was called, you know, something, something staging. The other one is called something, something prod. And if one of those flags points to staging, aha,it doesn't work. Right? So just from a conflict standpoint, it can never be the same. Usually the only difference between staging and production is the configuration,

Daniel: [00:35:05] yup, yeah.

Kelsey: [00:35:06] right?

And I think we just saw a few tweets, and from the SRE books, configuration is usually the biggest source, of big outages.

Daniel: [00:35:14] Hmm. Yeah. I've been there.

Kelsey: [00:35:16] So just that alone tells you that you're always testing and production. So. If you want to be deliberate about it, what you say is, you know what? We know that there is no environment like production, so we're, we're going to do is we're going to intentionally roll out and maybe adopt a Canary pattern, like 1% of our traffic will go to this new release. Now you're being very explicit about the fact that you're testing in production, and now what you're going to do is going to make it safe. You'regoing to make its safe to rewind things that don't go right. And that's why I think it should be plotted and it should be a goal of everyone to be able to, quote unquote test in production, but I think we're really talking about the ability to canary things and then pull those canaries back if we don't like the results.

Daniel: [00:36:04] wrapping up, few final things I want to pick your brains on. I always like to ask about sort of what should the next generation be looking at learning and if sort of junior folks are coming into the industry now they're thinking, should I invest time in learning, you know, CRDs and Kubernetes, Operators or perhaps going back to old school basics and learning something like COBOL, which I see you talking about a lot, which I think is great, what would be your advice for a person coming into the industry now in terms of learning.

Kelsey: [00:36:29] So maybe I can just reflect on my personal career and journey. I'm not sure what the other people should do, but I'll tell you what happened for me. First, I started trying to learn everything, right? Whatever the job postings wanted, I looked at all the open job recs, Oh, I got to learn these eight technologies, nine technologies, and in many ways I was just hedging my best to get into the door.

Then later I learned that, once you're in the door, there's a lot of value in going super deep on the technologies you have to work with, right? So if you're writing HPE, if you go deep enough to the runtime, how the byte code works, how memory management works, you can carry that over to the next programming language.

So I think a lot of times it's not about you saying, Oh, I'm going to miss out if I don't learn all the languages, at the same time, I think it's better to invest deeply. So go broad to get into the door. Once you're in the door, the technology that the company is using go super deep on. Number one, it will make you more valuable immediately to your current employer.

Daniel: [00:37:37] yup.

Kelsey: [00:37:37] It's a great way to progress and level up your career there, and it's also going to set the stage. And foundation for the next technology you need to learn. So maybe there's a different type of job you want. Maybe there's a different type of system you want to build. Then you go and hop the technologies becauseit's a means to an end and say, finally is later in my career, I have this phrase, I say, learn something old every day.

And the thing that I've noticed about learning the old stuff is that you can see that a lot of what we're doing has been tried before. Maybe it was a little bit too early. So for example, when Envoy came out and it's almost like, Kelsey, how'd you ramp up on Envoy so fast?

I was like, well, I went very deep on Apache, then NGINX and then HAProxy back in the day. So my foundation was strong. When I looked at Envoy, I'm like, what is it trying to be? It's like, all right, it's a proxy. I've heard those before. And then I studied the configuration file and said, Oh, look at that, the config language is different, but the feature set is very similar.

Right? And then it had another practice on top of it. So to me, it's like if you are trying to understand Envoy, I've seen power where people say, well, let me go see what the first load balancers look like. What were the problems they were trying to solve? And you start to read those mailing lists. They look a lot like the mailing list that we're looking at the day, and sometimes we provide a lot more clarity because they're talking about the core problem, not necessarily the layers on top.

So if you go and study the Envoy world, you might get distracted a bit by Istio and Vault for certificate management, right? There's too many layers there. But if you were to wind the clock back 15 years and you look at a discussion around NGINX and mutual auth. Sometimes you get a little bit more clarity, because they're working at the core of the problem, and then once you understand it there, sometimes it's a little easier to understand it now.

Daniel: [00:39:39] Final question for you Kelsey is what do you think, say in five years time, the infrastructure and development experience will look like? Will it be all functions as a service? There will obviously be a mix of things, I guess, what do you think the target will be that we'll be aiming for in five years time?

Kelsey: [00:39:55] I love the amount of competition we have right now. So the next five years are hopeful. No single provider is going to have a silver bullet across all the possible ways of doing this, but I think what's going to be available, it's not going to be accessible to everyone, but it will be available, is the ability to constrain yourself to a particular platform.

So if you're doing machine learning, you'll see a lot more machine learning based platforms that say, Hey, come over here, focus on machine learning, and we'll do the rest. I mean, HTTP? There's a ton of those available today. If you're doing front end work, we're starting to see not just the fact that you can definitely say, you know, tool push and then your front end is ready to go on a custom domain with a certificate that's done.

Now you're starting to see those tools move into the workflow space. So I think what's going to be more interesting is we've already proven for some workloads, we can handle end to end and make the infrastructure disappear. We've already done that. More of those will happen. Guarantee it. It will happen. Even custom hardware is going to happen.

So I think in the next five years it's going to be more about that workflow. As a developer, can we get more iOS like experiences for other ecosystems?

Daniel: [00:41:16] Hmm. Very interesting thoughts to part on there Kelsey, superb. Well, it's been a pleasure, as always, Kelsey, really appreciate your time today. Thank you very much.

Kelsey: [00:41:23] Thanks for having me.

Kelsey Hightower on Developer Experience, PaaS, and Testing in Production

About

Episode Guests

Key takeaways from the podcast included:

Featured Episodes

S3 Ep10: Foundations of Formidable API Federation feat. Daniel Kocot

S3 Ep11: Embracing Tech Change: Matthew Reinbold on Adapting to Industry Shifts

S3 Ep12: Kubecrash 2024: Engineering Insights with Danielle