Platform Engineering in 2022: Onramps, Self-Service, and Standardization

As discussion of developer platforms reaches a fever pitch, the industry still has not converged around a single definition of "platform".

To get a variety of viewpoints, Ambassador Labs invited a small group of practitioners in the cloud-native development space to join Ambassador's Head of Developer Relations, Daniel Bryant, to discuss what developer platforms actually are, why platforms are a hot topic now and where these platforms are heading. The group was made up of Humanitec's CEO, Kaspar von Grünberg (Kaspar on LinkedIn), nesto's Director of DevOps, Mathieu Frenette, and Syntasso's Chief Operating Officer, Paula Kennedy (@PaulaLKennedy), each bringing very different experience to the discussion.

Here are some of the key takeaways from their exchange:

Developer platforms: One size does not fit all

Platform engineering, growing fast but relatively new, has not matured to the point that everyone agrees on what a "platform" is. Nevertheless, the idea of building one's own platform to create a "golden path" for developers, reducing complexity, fostering self-service options and increasing speed to production, is driving the platform discussion. With this in mind, what is a platform?

Nesto's Mathieu Frenette, coming from a DevOps point of view, defined a platform as "everything that we can provide as tools, components and foundations to help developers do their job more easily, faster, and ideally with more robustness, predictability, repeatability and confidence." Kaspar von Grünberg, CEO of Humanitec, provided a similar definition, …"a platform is the sum of all tech and tools that a platform engineering team binds together to pave a golden path or paths. Developers leverage this path to self-serve with low cognitive load. That is important to me because it drives standardization by design for the respective organization."

Syntasso COO, Paula Kennedy, cited the desire to create a curated developer experience – whatever that might mean for a given organization, as no single definition fits all companies or needs. "We want to provide just enough to give developers what they need to get the job done – enough capabilities that developers can go fast without worrying about things like infrastructure or security. Providing a good developer experience via a developer platform is about never giving a developer too many choices or adding to cognitive load."

How to start with platforms without a single definition of a platform

Where should organizations begin their platform journeys, if no single definition of a platform exists. Podcasts participants were keen to highlight that this is exactly the point. An organization is free to assess and examine its own needs without getting locked in or having to accept an on-the-shelf and inflexible solution. Several key elements should be kept in mind:

Design with platform-as-commodity in mind

Mathieu insisted, and other podcast guests agreed, that platform design is not about reinventing the wheel. It's about, as Paula described, assessing what is commodity and what is unique about the platform you want to create. If you know you struggle with integration testing, find an existing tool to ease the struggle. If configuration management isn't your specialty, find a solution that relieves you of this burden. Focus the majority of effort and ingenuity on delivering on your unique value.

A combination of following best practices and figuring out the 90-95% out-of-the-box solutions to balance with the unique 5-10% that is your "special sauce" will pave a golden path while providing the freedom to go off-path as needed. By drawing on existing components and tools to solve "commodity problems", you have more resources for bigger-picture or truly unique platform issues.

Think about platform engineering value

Investing in platforms, when planned, will require some justification. All three podcast guests described situations where developer or ops teams have accidentally created platforms, or in which teams didn't communicate internally and ended up with multiple platforms that never solved for their purported need. This "rogue" development frequently happens when organizations have not yet realized the need – the space is, as Kaspar explained, evolving and still immature. And when they do realize the need, there is not enough data to understand the value of the platform being built and thus approve the investment in platform-building.

Before ever getting started, Paula advocated, it is critical to understand what need(s) the platform aims to meet and define this clearly. At a high level, providing value to the business is the bottom line, but how does an organization get there? Paula described a workshop in which her organization mapped out the complete journey from application idea to production, and each stakeholder team added in their steps and dependencies to create a clear picture of what steps took the most time, what could be cut, and what steps are actually important to stakeholders. Removing assumptions by mapping the journey enabled a clear view of a platform's value before ever taking any action at all.

Extend platform engineering value to ROI

Once the value and its component parts are understood, it becomes easier to justify costs and calculate a potential return on investment. Kaspar explained how he proposes writing down and calculating the effort and time involved in all activities beyond simple image updates, and using these figures to justify the contribution the platform engineering team and platform itself contribute to the business. Some of these factors are technical, such as the frequency of X activity every 100 deployments and how much time developers and ops teams dedicate to these activities. Here is a sample calculation.

Others, as Mathieu pointed out, are less cut and dry, "Some development activity is very complicated, and this takes a toll. That is, it takes time to run, it is difficult to set up when you want to troubleshoot, it is complicated to assemble and configure. By extension, it is extremely difficult to onboard new developers because of this complexity, and we end up losing developers because of this complicated work environment. All of these factors are costs to the business that need to be woven into the ROI equation. How much does the platform cost, but conversely, what is the cost in lost time, developer turnover, etc., if we don't invest in a platform?"

Get the big picture on developer platforms

If the overarching big picture for platforms is to deliver value to the organization by getting value into the hands of customers, how can developer platforms support this?

Delivering real value from a platform depends largely on creating a good developer experience in order to ease cognitive toil and help get software shipped faster. Yet, in developing platforms, it is too easy to get bogged down in the day-to-day. Mathieu explained, "With platforms, people are too often absorbed in their daily work and immediate problems, losing the bigger picture. One key to building and making a developer platform successful is having a platform team that can take a step back from the daily noise and figure out the value stream, bottlenecks and solve at that level, routinely bringing in a fresh perspective."

Another key to stepping away from the daily grind is taking other perspectives into account. Paula recounted experiences in which different teams failed to speak to each other, and were not only creating siloed, duplicative platforms, but were not talking to product teams, looking at user research or validating that what they were building would be used. Platforms that succeed look at the big picture, knowing that things will scale up, and the limited scope one team brings to a platform won't serve the demands of growth or cross-functional use.

Conclusion: Standardize the platform; innovate with exceptions

Golden paths, Kaspar argued, are great, as "standardization forms the lowest common tech denominators, clearing the way for individual freedom where needed. We cannot get too hung up on the idea of not getting locked in, as Gregor Hohpe has argued, because avoiding lock-in is a kind of lock-in. Instead, focus on the standardizations that reduce complexity and help developers move faster."

Paula agreed, "Standardization is how to gain economies of scale and scope, helping organizations reap many benefits."

Mathieu's experience aligned with Kaspar and Paula's arguments. He explained that golden paths should cover most use cases, but should enable exceptions as needed to prevent getting locked in golden cages. "We make exceptions but afterwards go back and figure out how to introduce that exception into the golden path, maybe make it a new standard. In our case, when we introduced Kafka as an alternate way of doing asynchronous messaging, we decided to make it the new standard. Exceptions are important - for us, they drove innovation and allowed us to move forward."


Listen to the full conversation below

false

Transcript

Daniel Bryant (00:01):

Hello, and welcome to the Ambassador Labs podcast, where we explore all things about Cloud-native platforms, developer control planes and developer experience.

I'm your host, Daniel Bryant, head of Dev Rel here at Ambassador Labs. And today I have the pleasure of sitting down with a group of amazing people, including: Humanitec, CEO, Kaspar von Grünberg, Nesto's director of DevOps; Mathieu Frenette and Syntasso's Chief Operating Officer, Paula Kennedy.

Join us for a fantastic discussion, covering topics such as building platforms in 2022, creating on-ramps for developers and exploring how to implement self-service and integrate the correct amount of standardization in platforms.

And remember, if you want to dive deeper into the motivations for, and the benefits of a Cloud-native developer control plane, or are new to Kubernetes and want to learn more, check out our free Kubernetes learning center. Visit getambassador.io to learn more.

So welcome everyone. Thanks for joining me today. Really appreciate you doing that. Can we have a quick round of intros around the room? And Mathieu, we'll start with you, please.

Mathieu Frenette (00:53)

Yeah. Hello. I'm Mathieu Frenette. I'm director of DevOps at Nesto. I'm leading a small team of four great DevOps engineers. So what we're doing at Nesto is that we're trying to revolutionize how people obtain and renew their mortgages online.

So we are really focusing on an exceptional user journey. So we've been on for four years. But in the past year, has been an explosive growth. And especially, in terms of market, technology and engineering team. So we are in the process of trying to provide the same kind of very exceptional experience as our customers have, but to our own developers.

So I'm trying to inspire myself from what I've done in the past, but now I'm doing it at a much greater scale and with much larger challenges. But I'm sure that we'll get to talk about that later.

Daniel Bryant (01:55):

Awesome. Kaspar?

Kaspar von Grünberg (01:58):

Sure. So I'm Kaspar. I'm a software developer, focused on platform engineering. I've been working on that space for quite a while now. I work with mainly larger enterprises, helping them on their platforming journey in my capacity as working for Humanitec. We're a platform orchestrator that's one component in the grand scheme of platforms. And that's my little contribution here.

Daniel Bryant (02:32):

Perfect. Perfect. Paula, over to you.

Paula Kennedy (02:34):

Hi, I'm Paula Kennedy. I am chief operating officer for a company called Syntasso, which sounds like a super grand title. But mostly, it means doing all the things that aren't the actual engineering of our product. So Kratix is our product that we've been working on.

It's in the platform space. But Syntasso's very much focused on helping customers try to build their own platforms. So it's kind of a framework for composing a platform, as opposed to an off-the-shelf, or singing or dancing platform as a service.

Daniel Bryant (03:11):

Nice. Very nice. And I think a few of you hinted there the platforms and what are platforms. But I'd love to get everyone's take of what is a platform? Because so many times I bump into folks, and one person's platform is a CI pipeline, another ones is, it's got to be Heroku.

Someone else is talking about a Cloud vendor. So I'd love to go quick around the room. And we'll keep it relatively brief, and then we can dive into the details. But what do you folks think is a platform? How do you define platform? And Kaspar, we'll start with you.

Kaspar von Grünberg (03:38):

Sure. I can tell you what I think is not a platform. And if there's a vendor who tells you, "Hey, we're a platform or an internal developer platform," whatever, then they're actually a pass. That's like Heroku. Even if you deploy that to Amazon Web Services, that's a platform as a service.

For me, a platform is the sum of all tech and tools that a platform engineering team binds into a golden path or golden paths or paved roads. And it is used by developers to self-serve with low cognitive load. And that's very important to me. It drives sanitization by design for the respective organization.

But what a platform is will remain a little bit of a blurry definition because it really lies in the eye of the beholder. A financial institution will have more focus on compliance, and there is not this one thing off the shelf. It's the question, it's the result of the great work that a platform engineering team puts in.

Daniel Bryant (04:35):

Perfectly said. And all the keywords I'm thinking of, which I'm sure other folks will touch on there. Mathieu, what do you think in terms of what is a platform?

Mathieu Frenette (04:43):

Yeah. I love your definition, Kaspar. I'll be more intuitive in the way I describe it. But I'd say that it's basically everything that we can provide as tools, components and foundations to help developers do their job more easily, faster and ideally, with more robustness, predictability, repeatability and confidence.

And I like to emphasize confidence because we really understand that confidence is really key to innovation. It's kind of the prerequisite for innovation. If we want people to be at ease, trying out new things, then they need to be confident that they have a solid platform underneath them.

Daniel Bryant (05:32):

Very interesting. I think very interesting. Paula?

Paula Kennedy (05:35):

Yeah. I think your point, Daniel, about platform means different things to different people really resonates. Because I'm like you, whenever I speak to people, I hear completely different definitions. And Kaspar's point made as well about people who are selling you a platform out of the box is not necessarily what you think it is.

I quite like the Team Topologies' definition about having a platform is basically just kind of a curated experience for the developers. And they talk a lot about the Thinnest Viable Platform, about providing just enough to give the developers what they need to get the job done. And I think that's really, for me, what a platform is.

It's providing enough capabilities that developers can go fast and have fast flow through the business. That's what Team Topologies is all about. But not having for developers to have to compile lots of things together or worry about underlying infrastructure concerns or security concerns. Providing a good developer experience.

But as thin as possible. Not having too many choices, not having too much cognitive load, not having to worry about too many things. Just having the platform that enables and gives them a good experience, as opposed to a platform that increases cognitive load or makes it harder for people to get their jobs done.

Daniel Bryant (06:59):

Yeah. I'm sure all of us have seen those. I know when I was consulting in London, I walked into a few companies where I was like, "What? You've just recreated Amazon internally. And how does that benefit your developers?" And they're like, "What developers?" Like, "Your customers." Which is maddening, right?

Paula Kennedy (07:10):

Yeah.

Daniel Bryant (07:10):

Always chat to the customer. And that leads on to what I want to explore next, actually, is what do you think the motivations are for building a platform? And I'm front-loading this by saying, is it a dev-led thing, an ops-led thing? Both? Neither? Would love to get everyone's thoughts on those things. Feel free, anyone who wants to jump in.

Paula Kennedy (07:35):

I have strong opinions on this.

Daniel Bryant (07:38):

Go off. Perfect, Paula. Start. We can all chip in.

Paula Kennedy (07:42):

So one of the things I've been talking about a bit in conferences and that is what we talk about in Syntasso, we call it this platform gap. But what we are really talking about is the gap between infrastructure commodity and value and whose job is it to get from one to the other?

So your question, Daniel, about ops versus devs. I think what lots of people are trying to do is solve that gap. Lots of people. Ops and dev are both trying to solve that gap. Because everybody wants to get the value in the hands of customers. That's the focus. But I think what we've seen is product teams, particularly very, very strong engineering teams have ideas about, "Well, I can just build this myself." So they have lots of thoughts of how to do it. And so they'll just build it.

And then another product team and another business division of the same company has the same thoughts. So they build the same thing. And then you end up with this duplication, where everybody's building their path to production all over the place. And you've got multiple platforms and multiple teams doing the same things.

And maybe you've also got an ops team that is trying to centralize. But the challenge then is if they're not talking to any of these product teams, and they're not doing any user research and they're not meeting their customers' needs, they're building a solution that maybe isn't going to get used by anybody.

So what I've seen is a lot of internal developer platforms (https://www.getambassador.io/resources/developer-advocacy/) being built by both dev and ops, because everybody is trying to solve their own thing. And somehow, people don't think an existing solution works for them, or don't want to buy off the shelf. So they'll build something even without the intention of necessarily building a platform.

So if a dev team thinks, "Oh, I'll just do this thing and then I'll just add a bit more to it. I'll just build a CI/CD pipeline (https://www.getambassador.io/developer-control-plane/ship/continuous-delivery-within-kubernetes/). And then I'm just going to add a few more things to it, a bit of a security thing." And then suddenly you get that classic image of Clippy, the Microsoft thing saying, "It looks like you're trying to build a platform."

Daniel Bryant (09:41):

Yes.

Paula Kennedy (09:41):

Without even intending to.

Daniel Bryant (09:44):

Yeah.

Kaspar von Grünberg (09:45):

I think that's a very valid point. You cannot not build a platform. That's I think the problem. And you can do it consciously or you don't do it consciously. And then basically, the platform builds itself. You either take that serious and you get a product platform, product manager, which I think is the most important thing to do.

And then you really structure that and you have your users. I think, Paula, you're absolutely spot-on. I think that the question is who is asking for that? I might have a huge bias because for some reason, 95% of the people that come to us, at least, are coming from operations. And so the most common thing that I hear is like, "We're becoming ticket ops. We're so overwhelmed. We get repetitive requests."

Then you sometimes speak to the developers in the same organization. They say, "Oh, no. I think everything is fine. If I have a problem, I select somebody or send a Jira ticket or ServiceNow." Now, that means for the developers, the problem might not be as pressing because if I think about when do you not need a platform? It's if everything stays constant; you just need to do a git push. I update something into something that's already there. You don't need a platform. You need a platform if you do things that go beyond the simple update of an image.

Now, if I'm a developer, 80% of times, all I'm doing is I do git push update and I'm updating my business logic. Now for me, the 20% will feel like an edge case if I'm the operations team and I'm only dealing with these 20% cases of the developer. And dozens of developers actually shout at me. I'm completely overwhelmed.

And so I'm seeing the vast majority, Paula, actually from the operations team. But that also differs by organizational size. So we would see a very large enterprise organization, a slightly different spiel here, where the developers are confronted with extreme waiting times.

So then if you have that, that actually turns the tide. And developers would be much more demanding of saying like, "Hey, we need to have more automation, more self-service." But in those smaller ones that are pretty autonomous and small, I'm speaking below 200, 300 developers. I'm often finding that the developers, I'm not sure whether they want to have things shift left. I can see them-

Mathieu Frenette (12:28):

Interesting.

Kaspar von Grünberg (12:28):

... also not always reacting very positively.

Daniel Bryant (12:32):

Definitely a topic for later. Perfect. Mathieu, do you have any thoughts on that?

Mathieu Frenette (12:36):

Yeah. I find that very interesting what you just said, both of you. I think that I'll talk from my point of view. I haven't seen super huge companies. I work in small startups and medium-sized companies. But what I've seen is that all the pipelines and foundations, they usually emerge from the developers themselves or the ops. But we haven't had big ops department in my cases.

But what often emerges from developers, who build their own CI/CD, they figure what they need to achieve and how to do it as efficiently as possible. They know what they're doing. And that's especially true in the beginning. When things are small, you want to stay lean. And so that's just great that things are as simple as possible. But eventually, things start to scale up and the reality changes.

So teams start to be pressured to deliver features. They are less focused on all the infrastructure, the foundation, the platform. The technical debt accumulates, and even the management rarely acknowledges the cost of technical debt or the necessity to just pay it off. So it often happens that the platform gets obsolete or no longer up to the task.

So I think that in the beginning, that's true that it can emerge from people doing the things, so the developers or ops. But I'd say that in the long run, we need the platform team. Because as Einstein said, "You can't solve a problem at the same level of thinking that created the problem."

Daniel Bryant (14:42):

Interesting.

Mathieu Frenette (14:43):

So people are often too absorbed in their work and the problems they need to address and they start to lose the bigger picture. So what the platform team can do is take a step back, extricate themselves from the daily noise, look at the big picture, talk to people, listen to what they have to say, figure out the value stream, the bottlenecks and then try to address that. So I think that it can start simple by the people, but at some point we need to take a step back and have that perspective.

Daniel Bryant (15:25):

I love it.

Paula Kennedy (15:26):

I have a question on that.

Daniel Bryant (15:27):

Go for it, Paula.

Paula Kennedy (15:30):

I really agree, Mathieu, with what you're saying. And one of the challenges I see with that exact evolution of the platform is then you end up where... Enterprise customers that I have worked with on a very, very big scale. And then they've got developers getting more and more pressure to deliver features, as you just mentioned. And there's more and more pressure on the platform and the infrastructure to be able to support the speed of change that needs to happen.

But then a problem I've run into, and I just wondered if you have, is where the money sits with those product teams, and the platform team is either underfunded, underinvested in and there's more and more pressure. But somehow the business prioritizes the products and that's where the money sits.

And then I've seen weird situations where somehow, the product team is under the most pressure to deliver features, maybe they end up funding their own platform, which then diverges from the central platform. And I just wonder if you've seen that?

It is an interesting thing where platform teams, I 100% agree, really have the opportunity to step back and look at the bigger picture and build the platform that meets the business needs. But sometimes I've seen companies not investing in that platform team to give them the opportunity to do that.

Mathieu Frenette (16:45):

Yeah, definitely. Definitely right. And often, the platform team can turn into a support team with, like you said, the ticket base. And you end up just having to resolve, to put out the immediate fires. And that detracts you from the platform itself.

So I think that you're right, that it's really difficult for management to see the value of the platform, or the platform team. Because it's one level remote from the value stream from the ROI. So they don't see the value. Or it's very difficult to prove it, to demonstrate, "Hey, here, you see how much time you gain by doing this or that," or "How could we gain some time here or there?" It's all very abstract. So I think it can indeed end up having the platform team underinvested. Yeah, I agree.

Daniel Bryant (17:54):

I think on that note, something interesting, Kaspar, I'd like to get your thoughts on metrics, as the key here. Because I think if you measure the right thing, to your point, Paula, you can influence people's behavior. How you reward folks, generally is how they behave.

And I've seen when there's no metrics on platforms, it just gets ignored. But if you incentivize folks and put metrics on platforms, like the DORA metrics, Accelerate metrics, all the good work by Nicole Forsgren. Fantastic stuff. Doubt my experience can help. But Kaspar, I'd love to get your thoughts on can you use metrics to drive a successful adoption of a platform?

Kaspar von Grünberg (18:32):

Well, I think one of the problems that we're confronted with is that the platform engineering job function is terribly immature and the whole space is very rapidly evolving. But it's really just evolving. So if you look at what you previously did at Pivotal, Paula, that that was actually a front-runner, an outlier in its time really.

If you look at the job profiles platform engineers, you can see how that is going up. All of the data that we're observing, those communities are a good example. The update is really rapid. I'm seeing a significant increase in the willingness of a business to put the money where their mouth is, and then actually support the platform team. So I think we're seeing that maturity, and we're seeing that these functions have to learn, frankly, like SRE or DevOps teams.

And I'm aware that nobody likes to call them that way. But call them whatever you want, operations team. They learn to actually articulate their value. And that's, I think, very important. And one of the exercises that I recommend to platform engineering teams is to take a white piece of paper and write down what are the things that go beyond the simple update of an image. And then normalize that.

How often do you do that against 100 deployments? And how much time does that now involve for developers and from operations? And then force rank. And you can build your own ROI case. Every good product manager is looking at what's my contribution to the business. If you don't do that, then you can't blame the management that much. Because they're just making decisions with the data that they're confronted with. If they're not confronted with data, then they cannot make a decision.

Daniel Bryant (20:32):

Well said, Kaspar. Anyone else got any thoughts on that?

Mathieu Frenette (20:35):

Yeah. I fully agree on what you're saying. That's one of the challenges I'm facing right now. I'm not that good with metrics and all that, and I'm learning. But I find that some of those aspects that we'd like to tackle, we really feel that there's a big need for something very specific. In our case, it's integration tests and building all that.

The CI portion is very intricate and it's heavy for developers and it has tolls on many aspects. So it takes time to run, it's really difficult to set up when you want to troubleshoot, it's super complicated to assemble everything, to configure everything. And it's complicated in the CI pipelines. It's difficult to onboard developers, newcomers.

And also, we might even end up losing people in the end if it's too complicated to work in their work environment. But all of those things are really disseminated metrics, that if we want to gather all that up together, it will be a major endeavor to try to measure the cost to the business of just that aspect of things.

Daniel Bryant (22:00):

Super interesting. Paula, have you got any experience? I like Kaspar's phrasing there in terms of Pivotal was ahead of its time. I worked with Pivotal when I was at OpenCredo. We often went hand in hand to companies. And looking back, I 100% agree with that statement. Just the way of working, the connection of people and tech, was super ahead of its time. So I'd love to get your thoughts. Did you use any metrics to prove to the business the value you were delivering and that kind of thing?

Paula Kennedy (22:25):

We did. We talked about the five S's, which was: security, scalability, speed, savings. One other one that I can't remember. So we had a lot of metrics. And actually, it's interesting when I look back. I appreciate Kaspar's point. It's interesting when I look back. I was talking about developer experience and platform as a product in 2018 before Team Topologies came out, even. It was something we talked about a lot at Pivotal.

One of the most interesting experiences I had was doing a platform inception with a big bank. I don't quite know how we managed it. But we managed to persuade them for the inception meeting to bring along... We put this big request in, and we didn't think they were going to do it. We asked them to show up with the product team that was going to be putting their first application onto the platform.

Daniel Bryant (23:19):

Love it.

Paula Kennedy (23:20):

And the whole platform team: somebody from security, somebody from compliance and somebody from networking. We sent out this list of we want people from all these different stakeholders to show up into this inception for two days. And they did. And to Kaspar's point, one of the exercises we did was we put up this huge long whiteboard, there was whiteboard posting things. And then we mapped out the journey from, "Developer has an idea for something that they want to put on this application. And then what are all the steps? What are all the steps that have to happen to get it to production?"

And we drew a line. And we basically asked all the teams to put all their steps up there. And then what was interesting was the platform team, or operations team, I think they were probably described at that point, put all their stickies up. And then we asked the developers to go up and say, "Which steps out of this massive journey do you actually care about? And anything you don't care about, pull them below the line."

And so the developers literally took most of those stickies down. And we're like, "Yeah. We don't care about all these things. We just want to see a dev environment, a journey to push and then go live." It was very simplified. And I think there were so many conversations that happened in those two days, between all those stakeholders.

Daniel Bryant (24:37):

Brilliant.

Paula Kennedy (24:37):

Where they've never had that. And to be able to map out that. We did a lot of value stream mapping at Pivotal as well. But to map out that journey and then look at... There were so many steps in the process: which ones take the most time? Which ones can we cut out? Which ones don't... Nobody cares about. Everyone thinks compliance cares about this step, but actually, compliance are sitting in the room and they don't care about that step.

Daniel Bryant (24:57):

Oh, fascinating. Yeah.

Paula Kennedy (24:58):

Those conversations, that was such a fantastic exercise.

Daniel Bryant (25:01):

I love the Value Stream Mapping and also User Journey Mapping. Reading about that many years or a few years ago, the book. That kind of stuff. And a few of us have said several times, think about the customer, think about the blockers and then dive in.

I think that's what I'm hearing from all of you. You've got to step back a little bit, some point in the journey to a platform. Do your analysis and go, "What's our next steps?" And that leads on to my next question actually. And let's maybe look at this, perhaps one, and then the other. If a platform is being developer-led, where do you recommend folks start in that journey? What's the most important thing? Or is that a daft question?

Kaspar von Grünberg (25:44):

I can tell you what I'm observing most. I'm very careful with recommending stuff because it's so specific to the business that you're in. And then you say things and people respond, and it has a very large impact on their daily life. So I'm a little careful. But what I'm seeing most often that I think teams should fix is configuration management.

I think this is something that's so underappreciated. Frankly, we see on average that the thing that you do most often, apart from just updating an image, is just frankly, adding an environment variable or applying a change across all environments. And if you don't do that well, you really drive change failure rate through inter-environment drift.

And so this is something that's not that complex. You just need to look at it and maybe streamline the way you work against baseline charts. And what's your strategy of actually aligning applications and infrastructure configurations? That's, for me, something that feels maybe not super intuitive, but that I'm seeing a lot. This assumes to be fair, that your CI/CD stream, your CI stream, your git push, deploy stream, that's figured out. If that's not figured out right, that's probably what you want to look at first.

Daniel Bryant (27:23):

Love it. Anyone else got any thoughts?

Paula Kennedy (27:27):

I think where I come at it is probably a different place to Kaspar. Thinking about where to start. My natural instinct is always to start very, very high level. But I try to think about something like the Wardley mapping, where I think about, "What's commodity?" And then, "What's the unique part that you should focus on?"

And so I completely take Kaspar's point about there's certain building blocks that have to be in place, that you have to have the CI/CD pipeline, you have to have certain things. But where you try to analyze maybe the full set of requirements that you need, bearing in mind Thinnest Viable, but the set of requirements. And then think about build versus buy.

What is not unique? What is not special to me that I can just bring in? Whatever tool is the best tool, or whatever the team has a bit of knowledge on that is the lowest cost to onboard onto. Or what seems to be the most popular and supported thing in the community that I can use and therefore, lean on community to support? Bringing in the things, or if you've already got existing things. Again, to Kaspar's point earlier, everyone ends up with a platform. You've got a platform somewhere, even if you don't know it.

Daniel Bryant (28:48):

Whether you've got it or not. Yeah, totally.

Paula Kennedy (28:51):

But looking at what you've got and then looking at what's your differentiation? What's the bit that you really need to focus your platform engineering team on? Versus what's things that you currently have that maybe someone has built and now it's being supported by one person who's left the business maybe?

Try to analyze from a very high level what are the bits that you can just get 90% or 95% of what you need out of the box and compose those? And then what's the bit you focus on, which is your special, unique sauce? And then try to think, "Okay. This is what we've currently got. Let's swap out some of these things, standardize on these things and add extra bits where we actually have the unique needs that we actually uniquely need, and that we just don't think are unique. And actually we could just buy off the shelf." Does that make sense?

Daniel Bryant (29:38):

It does, Paula. Definitely I've seen on my journey, folks building their own Kubernetes clusters now. Most times I push back and go, "Why aren't you using GKE, AKS? Take your pick of vendor." Some very special needs, you need to Terraform it yourself. But more often not, you don't.

And I love Wardley mapping. I've thought about it a few times in this conversation, Paula. You mentioned about looking for duplication. Wardley mapping is really good at looking for duplication across businesses. But I just think that's a very, very interesting statement. Mathieu, have you got any thoughts on those?

Mathieu Frenette (30:09):

Yeah. If it emerges from developers, for example, if you have a small startup with a small team, and they have everything in their hands and they want to make sure that they don't pin themselves in a corner. What I'd say is really to just stick to best practices in the industry. There's a lot of content about that.

For example, the Twelve-Factor App. So Kaspar mentioned, adding environment variables and all that. It assumes that you do control everything, true environment variables in the first place. So just doing those things, sticking to one app per process and sticking to regular signaling, like Linux-based signaling or whatever on Windows. But to put everything in containers. If you do that in the first place when you start, then all the possibilities are really open to you afterwards.

Do you really need Kubernetes, or do you want to go on Heroku? All those are still possibilities if you did create your application with best practices in the first place. And I really like what Paula said about not reinventing the wheel. When you're designing your system, to really focus on value.

So what I always try to advocate for is not try to host everything yourself. With Kubernetes, it's super easy to spin up a database or whatever you need, a Kafka cluster and all that. But just try to reframe from doing that, just use managed services as much as possible, externalize all those concerns. And like you said, Kubernetes cluster, use a managed Kubernetes service. And really focus on building a good application with best practices.

Kaspar von Grünberg (32:21):

And frankly, Jason Warner said that, the CEO of GitHub. And he was with Heroku previously. He was a little bit biased, but I'm not biased. I think I can say it. Actually, 90% of the world should run on Heroku. That's really my core belief. The vast majority of teams that I've seen, I'm always wondering, what's the material difference for you, running these ultra complex things?

You're literally taking a tractor to do your grocery shopping and it's not very productive. But what resonated with me, Mathieu, is you want to, as a development team, I think... And Paula, you said that as well, it has a lot of cultural components to it. And even more important, than fancy golden paths are frankly, to sit down and to come to an agreement about the lowest common tech denominator, if you want.

Right. And that could be Twelve-Factor application. That could be, "Hey, if we're using Kubernetes, okay, let's do that. But we are going to standardize on Helm, EKS that's our standard flow." And this is what the entire business is focusing on all units, because you can sprinkle in a little Lambda here and there. That's fine.

But if you really scatter things all over the place, little bit here, a little bit there, you're not actually getting the effects of scale. All of these fancy tools you're bringing in, they will not actually deliver their ROI. You're making the life so tough for the operations team. So that's why I'm always advocating, sit down, spend too much time explaining the value of standardization. Make clear that standardization does not go at the expense of individual freedom, that the contrary is actually the case. And a setup is not better if you find the last niche technology to fit a certain case.

I'm always taking an example that engineers hate, and I'm still taking it. So imagine you're a sales team. And imagine you sell to the enterprise. And you go into Salesforce, and you completely customize Salesforce because you say I actually don't want to have my processes follow a software in Salesforce.

Now, is that a good idea? Have you ever met any high-profile sales team doing that? No. They're actually designing their processes, following the software. And Gregor Hohpe speaks about this as well a lot, "Don't get locked in avoiding lock-in. Think about these lowest common tech denominators." Is something that I find very, very important.

Daniel Bryant (35:31):

Oh, I love it. I love it. It reminded me a bit of the conversation you had, Mathieu, in your presentation around... Was it golden paths and golden cages? You were almost the opposite to what Kaspar was saying there. The 90% of the world on Heroku, I think I need to ponder that. A super interesting quote.

But Mathieu, I'd love to get your thoughts on a counterpoint to that. Because my understanding, where you were talking around the golden cages was if we're too opinionated and force folks into one way of doing things. Did I understand correctly?

Mathieu Frenette (36:04):

What I meant about that was that it's all right to have golden paths, and to be super opinionated, as you said, Kaspar. In our case, for all our databases to be all on MySQL, ideally all the same version, we try to really stick to that as much as possible.

All the asynchronous communication, we try to do it through Pub/Sub. And eventually, for some technical reason, we were forced into going to Kafka. But we're still trying to minimize the exceptions. But it's okay to have exceptions. So that's what I meant about golden cages. Golden paths should be the easiest path, should be the best one, the one that covers most of your cases. But it's almost impossible to avoid exceptions. And if developers don't have those exit routes to do some exceptions, they will be stuck on your golden path.

And it's no longer a golden path; it's a golden cage or a golden tunnel. So that's what I meant about that. And that's why I mentioned the three levels of abstraction. First one, being complete recipes, that's your best golden path. And then you have maybe a pre-paved section of paths that developers could place in the order they want.

And then even if you have that, you still need a third escape hatch. Do whatever you want. You can still do that within the same ecosystem if there's exception. Let's just do it. And then, afterwards, we'll figure out how we can maybe try to bring that back into our golden path. Maybe make it a new standard. For example, in our case, when we introduce Kafka as an alternate way of doing asynchronous messaging, we decided to make it a new standard.

So now it's no longer an exception; it's a new way of doing asynchronous messaging. And we'll even try to phase out the Google Pub/Sub and try to do everything through Kafka eventually. So it was okay to have that exception in the beginning. It drove innovation, it allowed us to move forward. But it shouldn't stay that way. You shouldn't have exceptions that are growing everywhere, all over the place. You should try to eventually tie them up, and bring them back into a standard golden packet.

Daniel Bryant (39:01):

Fantastic. Fantastic. I know we're getting close to time here. Really appreciate everyone's input. I think we could talk for days. I'm sure if we meet at conferences, we probably will do. So I'm looking forward to that. Actually meeting in person would be fantastic. But just as a wrap-up, any pithy final thoughts to the conversation we've been having today?

And I'd love if each of you, in turn, just go through where folks can reach out to you as well. If they want to continue the conversation, they want to explore the work you're doing. That'd be much appreciated. So Mathieu, should we start with you? Final thoughts and where folks can reach you?

Mathieu Frenette (39:28):

Yeah. Just to say I am super excited with everything that's happening in the platform space. And all my team, we're really diving into that. And it's a brave new way of perceiving all the challenges that we are facing. We are hiring, we are growing so fast and we are hiring in the DevOps team, but also all forms of engineering roles.

So if you're interested, just visit our careers page on Nesto.ca. And otherwise, I'm available on LinkedIn and I have a few funny videos on YouTube about recruiting at Nesto.

Daniel Bryant (40:20):

Nice. Check that out. Brilliant. Kaspar, final thoughts and where folks can reach out to you?

Kaspar von Grünberg (40:26):

Sure. I think there's this one more thing that's really important to me and that's around abstraction. If I'm speaking about finding lowest common tech denominators, and I'm thinking that you should actually use Heroku. I very much agree with Mathieu. We should actually stop talking, or I want to stop talking, let's speak for myself, about abstraction at all.

I don't think abstraction is a good word. Abstraction always implies that you're taking away context. And context is incredibly important if we want to do our work well as an engineer. I think about what I love as a concept and what I'm diving into more and more is, what I call, standardization by design. How can we build flows? How can we build concepts that just by using the human behavior and psychological profile, really lead to standardization inside of an organization?

And I find myself guilty that some of the platforms that I advocated for or built a couple of years ago, now have been very UI-heavy, for instance. It's something that I'm completely getting away from. I'm a big advocate of everything as code, full focus on making sure we simplify without ever taking away context.

That's very important for me. And you can reach me at kaspar@humanitec.com. I much appreciate if people reach out and share their critique or feedback. Not many people do, unfortunately. So I'd really love to hear from you.

Daniel Bryant (42:16):

Fantastic. Thanks guys. And I think there's a whole new podcast around abstractions and I think that's super interesting. And the sociotechnical aspects there, I think are fascinating too. Paula?

Paula Kennedy (42:27):

My final thoughts are really around something that Kaspar mentioned about everybody wants a Heroku, they just want to build it themselves. I feel like that's a kind of a Kelsey Hightower quote.

Daniel Bryant (42:35):

Definitely.

Paula Kennedy (42:35):

Everybody wants the paths, just as long as they've built it themselves.

Daniel Bryant (42:38):

Love it.

Paula Kennedy (42:40):

It's definitely something I learnt from my time at Pivotal. Pivotal Cloud Foundry, as it was then, was similar to Heroku and perfect for Twelve-Factor apps. But to Mathieu's point, the problem we face, time and time again, as we rolled it out to customers was it was absolutely fantastic for a certain percentage of workloads, but there were always exceptions.

And one of the things we learnt from Pivotal days was Pivotal was a very opinionated platform, very opinionated company. But we showed up with strong opinions. And it was perfect for a lot of things, but not perfect for everything.

And so this is something we've learnt. My team at Syntasso, we've learnt from our experience there. And this is where we are now very much in agreement with the customer. Standardization is the way that you can gain economies of scale and economies of scope. You have lots of benefits. But there were always edge cases. And the way Mathieu talks about taking those edge cases and then making them as standard as possible, bringing them in.

Daniel Bryant (43:41):

Love that.

Paula Kennedy (43:41):

I absolutely loved that as well. I think that's fantastic. But I think anyone who is thinking of either building or buying a platform, has to try to focus on making the 80% as standard as possible, but leave room for those edge cases.

You have to add that extra unique value on the top. There's that magic sauce that your business needs that only you can actually build. If folks want to reach out to me and give me any thoughts or feedback on that, they can reach me. So Syntasso.io is our website with the contact desk. Or you can reach me on Twitter. I feel like I hang out on Twitter quite a lot, a bit too much.

Daniel Bryant (44:16):

Same here, Paula. I'm the same.

Paula Kennedy (44:17):

Way too much. So my Twitter handle is @PaulaLKennedy. So you can find me there as well.

Daniel Bryant (44:22):

Fantastic. Really appreciate everyone's time. Thanks a lot.

You Might Also Like

Thumbnail for resource: "Developer Control Planes: An Experienced SRE's Point of View"
Podcast

Developer Control Planes: An Experienced SRE's Point of View

Thumbnail for resource: "Developer Control Planes: A (Google) Developer's Point of View"
Podcast

Developer Control Planes: A (Google) Developer's Point of View

Thumbnail for resource: "Developer Control Planes: An Engineering Leadership Point of View"
Podcast

Developer Control Planes: An Engineering Leadership Point of View