LIVIN' ON THE EDGE PODCAST

Developer Control Planes: An Engineering Director's Point of View

Ambassador Labs · S2E09: Crystal Hirschorn on Socio-technical Systems, Building Platforms, and Developer Culture

SUBSCRIBE:

About

Managing three different engineering functions — Cloud Platforms, SRE, and Developer Experience — Crystal advocates understanding the operating model of the teams you are building within the company you're building it. That is, building both teams and systems informed by how the people and technology in workplaces interact. How do human factors interact with technical ones?

Episode Guests

Crystal Hirschorn

Director of Engineering - Infrastructure, SRE and Developer Experience at Snyk

At Snyk, Crystal explains, she leads one of the only teams that has to serve both internal and external customers. Not only do they build deployment options for Snyk, i.e., single-tenant SaaS, multi-tenant SaaS, and on-prem, the team also provides "paved paths", or control planes, internally for platform and observability capabilities. To do this effectively, the team needs to be built to understand the business, technical and human factors involved in delivering to external customers as well as those motivating and helping internal customers.

Crystal Hirschorn, Director of Engineering - Infrastructure, SRE and Developer Experience at Snyk, shared some time with Ambassador Labs's Head of Developer Relations, Daniel Bryant, to talk about all things platform engineering and building control planes that “pave a path” for developers.

Here are some of the key takeaways from their conversation:

Platform engineering vision: Playing the long game

Platform engineering for growing teams requires a lot of perseverance and the ability to think in the long term with a clear vision in mind. You are playing the long game when you build a platform for developers, thinking both about what kind of control plane will work today and will still work in a few years when hundreds or even thousands of developers have joined your team.

"As I've gained more experience, I see the bigger picture: you have to think about the market that the company operates in, how mature it is, what it is trying to achieve in terms of business and technical goals. And by extension, how should the team be composed to serve those goals, for both internal and external customers?"

"You have to realize that things happen on different time horizons. You have to realize that you may not see results immediately. It takes a certain kind of person to want to work with platforms as well. Someone who can start to see the outlines of what the paved road will look like miles down the road. What does the road look like where you are and where you want to get to? What does the end state look like to know what you are working backwards from? How will we get there practically?”

Ultimately a platform team may be traveling from a "minimum viable platform" to the other end of the spectrum, which might be, as Crystal describes, turnkey. A self-service platform that offers an easy way for a developer to get new instances up and running without any outside involvement, and with the ability to continuously and safely deploy across a fleet of global infrastructure. That's the vision at Snyk, and platform engineering and achieving automation goals will make it happen.

Bring people to the platform

Even if technology and technical choices underpin the building of a platform, the success of a control plane and a good platform engineering endeavor relies on meeting the needs of the humans who use it. But how? It is not as easy as, Crystal warns, building the tech and hoping that your internal users will adopt it. Developer input must go into it. You must advocate for it and non-functionals like documentation and playbooks must be first-class citizens. You must get your early adopters internally, who will then take on wider advocacy, i.e. "championing developers who are resources for other developers."

"The best platform is designed to do its job and get out of the developer's way. We want to enable a paved path without barriers. We can build that platform, but it's useless unless you get the people it is designed for on your side. It is powerful to get the developers who use and advocate for the platform to rally adoption and enthusiasm."

Keep people using the platform: Culture, education, experimentation

The biggest change a platform introduces isn't technical; it's cultural. Human nature isn't designed to embrace change. Yet with a platform, change comes in the drive to codify good practices within the platform, which becomes more important the more production environments there are. Best practices and standards need to be embedded in the platform and the culture.

And how does culture change? Apart from the internal advocacy described above, education and the freedom to experiment are key.

Platform education

Snyk has developed a range of educational materials, including extensive documentation using Backstage as a developer portal or ecosystem, adding capability to it and integrations for the rest of the organization. Training and onboarding sessions are frequent and popular, both with the rapidly growing set of new hires and with existing employees.

"Different people learn in different ways, so educating developers about a platform is also about being in touch with them and understanding their needs and having a rapid feedback loop. We want to remove barriers to entry as easily as possible. Having a developer experience team will really help us with this inward-facing effort, listening to the R&D organization and bringing that feedback in."

Platform experimentation

Many engineers may prefer the paved path of the platform. But what about when an engineer wants to do something that is not a part of the platform?

"As part of our platform migration, there was one team that needed a resource we didn't really have the capacity to provide, but we said, 'Here is a playbook you can follow,'" Crystal explains. "We want to encourage an ecosystem of components. Get your hands dirty where you're comfortable, be that with terraform, helm, or a higher level UI, but you can use the platform as it is just as easily."

"The goal of platform engineering is to reduce developer toil and pain, and maybe even find interesting cases where our platform does more than just reduce pain and instead brings joy. We aim for delivering a developer platform that is a pleasure to use and that makes a developer's work easier."

Transcript

Daniel Bryant (00:02):

Hello, and welcome to the Ambassador Labs podcast, where they explore all things about client ID platforms, developer control planes, and developer experience. I'm your host, Daniel Bryant, head of DevRel here at Ambassador Labs, and today I had the pleasure of sitting down with Crystal Hirschorn, developer of engineering, infrastructure, SRE, and development experience at Sneak. Join us for a fantastic discussion covering topics such as building sociotechnical systems, playing the long game with platform engineering, and the importance of culture, education, and experimentation when creating platforms. And remember, if you want to dive deeper into the motivations for and the benefits of a client-native developer control plane, or are new to Kubernetes and want to learn more in our free Kubernetes learning center, please visit getambassador.io to learn more.

Daniel Bryant (00:43):

So welcome, Crystal, many thanks for joining us today. Could you briefly introduce yourself and share a little bit about your background, please?

Crystal Hirschorn (00:49):

Sure, thanks for having me. So I'm Crystal Hirschorn, I'm currently director of engineering for the infrastructure group at Sneak. I look after three teams there currently, so cloud platforms, SRV, and developer experience. Previous to this, I was a VP of engineering at a company called Conde Nast, which was a global role there. And I've been a longtime engineer, I started my career about 20 years ago.

Daniel Bryant (01:16):

Ah, similar to me. We've seen it all, right? We've been through the individual contributor roles, we've done the management roles, and everything. Oh, super interesting, I didn't realize you had the three teams, like the cloud platform, SRE, and developer experience. Now they, at first glance they're quite similar, but also quite different. So how do you find managing, leading those teams? Is it different hats, different times?

Crystal Hirschorn (01:36):

Yeah, it depends. I use a lot of inspiration from things like Team Topologies to kind of build out my organizations. I might've mentioned them really early, because I ...-

Daniel Bryant (01:44):

You got to, right

Crystal Hirschorn (01:46):

... so much, and I have, yeah, just great ... We used DevOps Topologies at my last company as well to sort out the infrastructure part of that organization. And then they wrote the book Team Topologies thereafter, but yes, I was quoted in the book.

Daniel Bryant (02:04):

Wow, that's awesome.

Crystal Hirschorn (02:06):

So are people from my team. Because I ended up doing a talk, I think it was at KubeCon, and I ended up giving them a shout-out.

Daniel Bryant (02:11):

Ah, brilliant.

Crystal Hirschorn (02:13):

Manuel was in the audience, and he was ...

Daniel Bryant (02:18):

Oh, I remember that. I remember it, yeah.

Crystal Hirschorn (02:19):

Yeah, it was amazing, he couldn't believe it, I think, because I was like-

Daniel Bryant (02:19):

Yeah.

Crystal Hirschorn (02:21):

... "This is a great thing that we use, I endorse this, please use this when you're thinking about building teams." Because I think as I've got more experience, it's about the operating model, it's like, thinking about it. We talked about it a second ago, sociotechnical systems, you have to think about like, what is the market that the company operates in? How mature is it? What is it trying to achieve, in terms of business goals and technical goals? What should the makeup of those teams look like as well to serve the customer? As that could be internal and external, for that matter.

Daniel Bryant (02:50):

Yeah, yeah.

Crystal Hirschorn (02:51):

And so for us, I'm in an interesting spot at Sneak, because it's been a fast-growing company, but my group is one of the only groups that has to serve both external customers and internal customers.

Daniel Bryant (03:02):

Interesting.

Crystal Hirschorn (03:02):

We build the deployment options for Sneak, which is single-tenant, multi-tenant, and on-prem, and we also have to provide paved roads internally for platform capabilities and observability capabilities as well. And now our most recent team, developer experience, it only got going just over a month ago.

Daniel Bryant (03:21):

Oh, interesting. Oh, okay.

Crystal Hirschorn (03:22):

Yeah, so it's like our most recent team, and we're still trying to define its charter somewhat, but have a pretty good handle on that. Some of it is about decomposing some of the scope that was already in cloud platforms, in terms of the platform that they've built, and handing them off to developer experience, particularly around things like improvements around CICD.

Daniel Bryant (03:40):

Nice.

Crystal Hirschorn (03:41):

And kind of, we've, the first thing that that team has done, actually, is build a DORA metrics dashboard.

Daniel Bryant (03:45):

Ah, excellent.

Crystal Hirschorn (03:47):

Yeah, it took a couple weeks. They started off doing a hackathon, basically, just building a DORA metrics dashboard, but that was a really cool way to kickoff a team. They had a lot of fun doing that, but it's had a lot of praise internally as well.

Daniel Bryant (03:57):

And just for those not familiar, Crystal, DORA metrics is like say lead time, delivery frequency-

Crystal Hirschorn (04:03):

Yeah

Daniel Bryant (04:03):

... mean time to failure and resolution, that kind of stuff, right? Nicole Forsgren, all the ... Gene Kim, fantastic work with Accelerate, as well. And you found that understanding those DORA metrics has really helped, I guess, get a level set of where the team's at and where you want to go.

Crystal Hirschorn (04:18):

Exactly. Like from my point of view, the reason why we did it is that we're going under a decomposition of our monoliths internally.

Daniel Bryant (04:25):

Ah.

Crystal Hirschorn (04:26):

So, Sneak still has a monolithic architecture internally.

Daniel Bryant (04:29):

Oh, interesting.

Crystal Hirschorn (04:30):

With some services that sit outside of that, but we want to go for a fully decomposed service-oriented architecture. I'm not going to say microservices necessarily, because everybody likes that buzzword, but that might be a next step that we take. It might not be, we just have to see where we go with it. But I think for us to know how we're doing, we need to know what our current state is, and we need to know ultimately where do we want to be in terms of performance and delivery metrics around our SDLC? And the only way we can do that is to have the visibility, and to kind of make those things visibile.

Crystal Hirschorn (05:00):

Like I feel like the data is latent, it's there, right? It's in GitHub, it's in CircleCI, it's in Argo CD, it's in Kubernetes. And that's, I'm basically telling you part of our tech stack just by describing those things. But we were able to then hook into a lot of those APIs and get the data that we needed out of them to actually visualize those into graphs the teams can now understand and make decisions off of.

Daniel Bryant (05:24):

That is awesome. And I appreciate some of this might be sensitive, so feel free to say no, but was there any amazing insight that jumped out straightaway?

Crystal Hirschorn (05:32):

Oh, for sure. The most interesting thing for us was we know that we have a monolith, and there's kind of this perceived gut feeling around how many times builds fail versus succeed on that monolith. It fails more often than we would like. That was one insight. I think it was a bit higher than we had first anticipated. I think the, yeah, actually I'll mention one more thing as well. It also shows the kind of lead time, the way we track lead time is PR being open until the time that it's actually in front of the eyes of the customer. So for us it's like, I think what we actually choose as the end metric is that the Argo sync has been successful.

Daniel Bryant (06:17):

Got ya

Crystal Hirschorn (06:17):

Maybe it's a few milliseconds longer than that, whatever, but actually being in front of the eyes of customers, be serving traffic, whatever. But that's close enough in terms of a proxy. And we found that during that cycle time, PR review is where the bottleneck. It's, again, this insight, right? It generates insights, it doesn't necessarily give you answers. But at least lets you know where to focus on in terms of then like, oh, systemically, how should we be looking at fixing that as engineering leaders inside the company?

Crystal Hirschorn (06:50):

But another interesting aspect was, I had a pretty good vision for it, I was like, "All right, okay, what I want to do here is," because I've obviously read Accelerate a couple of times, and I've seen people like Nicole speak at a couple of conferences and that kind of thing. And I was like, "Let's overlay the," they talk about like the four indicators that make high-performing teams, but they also put them into kind of quartiles, right? Like low performing, kind of medium performing, high performing, elite performance as well. And so I said, "Let's overlay those onto the top of the graph." And then it just looks like we're in the high to elite performance all the time.

Daniel Bryant (07:28):

Nice.

Crystal Hirschorn (07:30):

And I was like, "Wait, hold on. Actually, maybe that's not the right thing to do as a six year old cloud native company," right? We were fully cloud-native from day one, and so CICD was just something we always did, and so we deployed to production, I don't know, between sort of 20 to 50 times a day at least.

Daniel Bryant (07:49):

Oh wow, that's good, yeah.

Crystal Hirschorn (07:51):

But then I was like, "But we know we want to be better." But when you look at the DORA metrics, and the way that they do this, they canvas the whole industry.

Daniel Bryant (08:02):

Yes.

Crystal Hirschorn (08:02):

And so they're looking at companies that, where their lead time is six months to a year.

Daniel Bryant (08:06):

Yeah, I know, yes, yeah.

Crystal Hirschorn (08:08):

All the way back to teams that are doing more deployments per day than us, right? And so it's like, it's kind of wasted in a particular way. So we're like, "Maybe what we need to do is create our own internal benchmarks of what we think good looks like." So yeah, that was a learning, anyway. I kind of wanted to share, was you can kind of sometimes be thinking this is the right thing to do, and we're like, "Yeah, maybe we should look at our own benchmarks," so yeah.

Daniel Bryant (08:31):

That is interesting, getting a handle on where you're at and where you aspire to be. Because I've definitely tried doing, even the podcast, right, to folks that are much more conservative in their tech stack, they've been around for 20 plus years, right? But for those folks, the elite DORA metrics are something to aspire to in like, two years' time, right? But it's a really good point, as in, whatever motivates you and the team, that's the important thing, I guess.

Crystal Hirschorn (08:53):

Yeah. I think it's like, one thing we absolutely don't want to do as part of this decomposition work is lose velocity. We know that it will be variable. During the process it's going to be variable. So there are certain metrics that we're tracking right now, which I've pointed out and said, "I don't know how useful those are going to be right now, but they'll start to stabilize in four to six months' time, and then they're going to become more useful. There are things that we wanted to track right now that we want to know about like are we starting to see that our velocity actually went down? That would be a pretty bad indicator. What we want to see is actually our velocity increases. the number of deploys, because we have independent services that are more of kind of isolated and smaller is that we can actually release way more often.

Daniel Bryant (09:35):

Yeah, makes sense.

Crystal Hirschorn (09:35):

To that kind of trend upwards instead of kind of stay stable or go down, so that's what we're expecting to see. So I guess also you have to think about what are you expecting to see, not just what does good look like, but what's the expected outcome of this?

Daniel Bryant (09:49):

I like that, Crystal. I think too many times when I was consultant, I'd go in and help platform build. There was really no vision. We're building a platform, and I used to frequently get, "We want to be a competitor to AWS," and obviously at the time, this is like five, 10 years ago, and even then I was thinking, "That's a strange goal for a company that's not competing with AWS," right? But then the ones that I did chat, more at conferences, because as a consultant you always saw the bad stuff. But the folks I chatted to at conferences, kind of like yourself, right, they had a real clear idea of not only the vision, but the goals and the steps they were going to take to get there. Does that make sense? Sort of breaking it down. I think, is that something you are looking to do as well, so having almost like milestones, right?

Crystal Hirschorn (10:27):

Yeah, I think like with building platforms it's a long game, and so I do think you have to have a lot of perseverance, and you also have to realize that things happen on just different time horizons in order to realize those outcomes. I do think sometimes there's an expectation that you're just going to come in and you're going to see the results immediately. So I think it takes a certain kind of person, I guess, that wants to work in platforms as well... It's not like streamlined teams, right, because I've worked in that area for plenty of the years that I was an engineer as well, and it kind of gets this constant kind of dopamine hit... Making releases, at least a few times a week back in those days. And then you're kind of getting that kind of good feeling from, "My thing's now in front of the eyes of the customer." Whereas-

Daniel Bryant (11:13):

Yeah, I love that.

Crystal Hirschorn (11:14):

... it takes longer, and so I do think ... But you have to, you're right, you have to kind of know what does that two year road look like, for instance? And then how do you ... I always think you have to know what the end state looks like to kind of know what you're working backwards from.

Daniel Bryant (11:29):

Oh, I like it.

Crystal Hirschorn (11:29):

you have a start state and an end state, and it's like, "Okay, this is what today looks like, this is what I want it to look like in two years. How are we going to get there? Like practically, how are we going to get there?" And some of it is stuff ... I think also you go into things knowing if you're going to do something, but you're going to have to replace it later. It's kind of being more mindful of like, "Yeah, sometimes we will have to build stuff and throw it away."

Daniel Bryant (11:52):

Oh, gotcha, yep.

Crystal Hirschorn (11:52):

But it's a step in the right direction to where we need to go, because I do think that often happens by accident rather than something that's done on purpose. But for us, yeah, it's something we've been doing at Sneak. Like when I first joined here, there wasn't an infrastructure team.

Daniel Bryant (12:06):

Oh, interesting.

Crystal Hirschorn (12:07):

Yeah, I've only been here 18 months. And then, so like before that it was, they had some devops-minded people, some...

Daniel Bryant (12:13):

Ah, classic, yeah. Sort of the real experts, right, the ...

Crystal Hirschorn (12:17):

Yeah. But it was kind of done in a very organic way, and sometimes it lacked good practices and principles around kind of cloud-native, kind of built infrastructure and platforms, and that's kind of what we've brought to bear over the last 18 months. We've managed to achieve kind of a minimum viable platform, I guess you would call it.

Daniel Bryant (12:36):

Ah, I love it, love it, yep.

Crystal Hirschorn (12:36):

Yeah, which we call Polaris, internally. It's built on top of, well, two clouds. So we have a multi-cloud kind of strategy at Snake, we use GCP and AWS, and so we've kind of guilt this so that it can support both.

Daniel Bryant (12:52):

Nice.

Crystal Hirschorn (12:52):

So currently we have GCP supporting our multi-tenants, and AWS is now driving our single tenant business.

Daniel Bryant (12:59):

Oh, interesting.

Crystal Hirschorn (13:01):

Yeah, so there might be a point of rationalization in the future where we move more to one or the other, but for now that's how it works and it's this kind of abstraction that sits on top of it, fully IAC'd using lots of Terraform modules as...

Daniel Bryant (13:13):

Love it.

Crystal Hirschorn (13:14):

Yes, all the good principles, stuff that didn't exist before. But also beyond just like infrastructure as code, just like, also, we built out a reference architecture in advance, said, "Here's our reference architecture."

Daniel Bryant (13:26):

Oh, interesting.

Crystal Hirschorn (13:27):

"This is what we think our reference architecture should be," and there were lots of aspects of that, and we said, "Let's build against that."

Daniel Bryant (13:32):

Was this for the platform, Crystal, or for the apps?

Crystal Hirschorn (13:32):

Platform, for the platform.

Daniel Bryant (13:34):

Platform, mm.

Crystal Hirschorn (13:35):

for the Polaris platform, so we built out a reference architecture and said, "This is what a good platform would look like, and these are the principles that we wanted to kind of codify as well, against each one of these aspects," and we kind of built against that over about eight months-

Daniel Bryant (13:50):

Very nice.

Crystal Hirschorn (13:50):

... and then kind of said, "Okay, now we're going to get all the teams to start migrating their apps and workloads to it," and we started launching some single-tenant customers now as well.

Daniel Bryant (13:59):

Brilliant. There's a couple things to jump into there, Crystal, that I think are super interesting and I definitely think folks would like to get your insight on, is the two things about, is this the people, right? People's always the hardest bit in this kind of game. And I was kind of curious, how did you bring the devops-minded people along? Because they're obviously freelancing, doing their thing, very well-respected, you're suddenly coming in standardizing, right? How do you bring them along, and then kind of as a secondary or corollary to that is how do you then work with the rest of the organization to upsell the platform, right? I'm kind of curious, like that I find is really hard.

Crystal Hirschorn (14:32):

It is hard, actually. And again, it's about, like I said, having perseverance. Because you're going to have a mixed audience of people, right? You'll have people who are kind of zealots, you'll have people who don't really care in the middle, that's the bulk of the people, and then you have people who are detractors, who just kind of hate what you're doing. And that goes for anywhere you work, right? It's like a human nature thing. And I think when we come in and we try to introduce things like this, what we're basically telling the company is we're about to introduce huge amounts of culture change here. As I say, it's not just about new tech, it's new patterns.

Daniel Bryant (15:11):

That's a really good point.

Crystal Hirschorn (15:12):

And so for us it was like, how should we do deployments, and what's the kind of devops-correct way of doing this, especially when we have, we're looking at a future of not having one production environment, but many, many, many production environments? And how do we kind of do rolling deployments, canary deployments, how do we deploy these things in groups and to different regions? And all of that stuff has to come. Because it's going to be like a global kind of infrastructure footprint, and we're going to start building our multi-tenant EU region next month-

Daniel Bryant (15:46):

Nice.

Crystal Hirschorn (15:47):

...so that'll be our second kind of multi-tenant region. And so it's like, it's not just about how many instances, but where it runs, as well. And then there's mixed cloud. So there's a lot of things to think about there, but also it's just about how do you codify those good practices into a platform as well? And for me it's about, it's a few things. I think the tech aspect is where maybe I would start a little bit. It's about creating a platform that does the job and gets out of your way, essentially. You don't really want it to feel like a barrier, so you need to enable those what they call paved paths, paved roads

Daniel Bryant (16:21):

Yeah, a little bit.

Crystal Hirschorn (16:21):

... and that's what we try to do. But how do you do that? It's not just by, "Hey, we're going to build some tech and the people will come." No, not at all. Actually, what you've got to do is you've got to get out there and advocate for it, you've got to get your early adopters onside and get those adopters to advocate for you. That's almost more powerful than getting the platform team to advocate for itself. It's like, so what we did is we would get those people to go into our R&D-wide demos and say, "Hey, we just moved to Polaris, and it's amazing, and here's how we did it. And hey, you can follow their great playbook over here, and you can get it-"

Daniel Bryant (16:54):

Ah, playbooks.

Crystal Hirschorn (16:55):

Yeah, "You can get it done in a couple of hours, and it's really easy." I mean, clearly the platform itself was built really fast. You don't normally hear of a team building a platform in six to eight months.

Daniel Bryant (17:05):

Agreed, yeah, agreed.

Crystal Hirschorn (17:07):

So there's still some things to do, and we're looking already at our reference architecture v2 and what are the gaps that we're missing? But also trying to, what are we spending our time on this quarter? A lot of things around documentation.

Daniel Bryant (17:19):

Ah, important, right?

Crystal Hirschorn (17:20):

Yeah, super important for adoption. The documentation is really slick and really easy to work with. But it's also about automation goals too. And so for us, we spent a heavy investment upfront in automation, but we still know that we have kind of a distance to go there. And I think that way we're trying to capture as much of that in Q2 and Q3, because the further we can get to those automation goals, I think, the better, because my kind of vision for this is that it's turnkey.

Daniel Bryant (17:47):

Yeah, makes sense.

Crystal Hirschorn (17:47):

Somebody needs a new instance for whatever reason, they can literally go into a UI and push a button.

Daniel Bryant (17:53):

Self-service.

Crystal Hirschorn (17:54):

Yeah, exactly. It's up and running for them in 45 minutes. I don't, honestly, there'll be a lot of use cases that'll be there. It'll be like, "I need an instance to run as a PSE for a customer because I want to try it out," could be that I want a playground environment, it could be that I need an actual production environment, I need a dev environment, I want a chaos engineering environment. There's a lot of reasons why, and so we want to get to that. But the only way you can really, truly get there is through those automation goals as well, and so we're trying to push for that so that it's not like, "Hey, there's still stuff that isn't fully automated, say, in the application layer itself," where there's a lot of kind of dependency-mapping to do with lots of teams.

Daniel Bryant (18:30):

Ah, makes sense, because I had done one, hearing you say, sort of going back to the SRE principles, is kind of toil, right? It's the gluing things together. Whereas like, you as engineers that are fiddling around, the toil can be quite fun to start with, right? But definitely after a while the toil gets boring, particularly for folks who are later adopters. They're like, "I don't want that toil, just give the turnkey, give me the solutions." I'm guessing that's a key driver.

Crystal Hirschorn (18:55):

And yeah, trying to make the interface to your platform really easy, right? So this is something I bang on about a lot with my own teams. I said like, "Who knows in the future? Maybe the interface into your platform isn't a Helm Chart anymore." I don't know if that's what it's going to be, but right now we're asking them to have three YAML files. It already feels like too much, honestly. We need to compress that down into one. But also it's like, that's what you want, really. That, to me, is like, you pass on a handful of configuration and the thing does the thing. It's like-

Daniel Bryant (19:28):

Oh, 100%.

Crystal Hirschorn (19:30):

... it kind of bootstraps it up, and there's your running instance. Ta-da. And yeah, it feels like magic. But our future at Sneak, right, we have, when I started 18 months ago, we had 100 engineers, now we have 350.

Daniel Bryant (19:42):

Wow, yeah. That's a growth rate, isn't it?

Crystal Hirschorn (19:45):

Yeah. And in two years we're going to have 1000, probably, and we'll probably have 100 production environments, at least. So we need to be thinking about that future right now, and how do we build a platform that will support that as we go, like with us now, but also how it supports us then, as well.

Daniel Bryant (20:01):

And what about the education piece? Because there was, I heard you mention about doing the R&D sessions, bringing folks along-

Crystal Hirschorn (20:06):

Education's piece of it as well, you're right.

Daniel Bryant (20:08):

Yeah, and you mentioned documentation, right? So have you sort of started that journey already, providing, say, tutorials for folks, or boot camps, or that kind of stuff?

Crystal Hirschorn (20:16):

Yeah, so we have written a ton of documentation as we go. So it's part of the acceptance criteria on anything we do.

Daniel Bryant (20:21):

Got it, great idea.

Crystal Hirschorn (20:23):

It has to be. This is also my kind of reason for also having teams like SRE and development experience, to kind of be that inwardly-facing team into the R&D organization, to kind of listen to them, bring that feedback in as a quick loop, kind of-

Daniel Bryant (20:38):

Yeah, very useful

Crystal Hirschorn (20:39):

... helping bring down toil, developer pain, or even just finding really interesting cases where we can actually bring joy into using our platform, much less no pain, but actually make it a joyful experience, right? Because you don't want to be like, "Eh, it's all right." But it's lovely when you get joy out of using something, and I think that's what we want to aim for. But we started moving in-

Daniel Bryant (21:02):

Awesome.

Crystal Hirschorn (21:03):

... yeah, we started moving all of our documentation into Backstage, so there's more-

Daniel Bryant (21:06):

Ah, popular.

Crystal Hirschorn (21:07):

Yes, exactly. So we're using a hosted version on SaaS from a company called Roadie, and they're-

Daniel Bryant (21:13):

Oh, I've bumped into them, good folks. Yeah, yeah.

Crystal Hirschorn (21:15):

Yeah, they're amazing. Like yeah, totally recommend them, they're great to work with, and super responsive, and helpful, and yeah, just able to do lots of sort of helpful stuff for us. We've moved all of our documentation there. This is kind of what we were trying to use as a developer portal kind of ecosystem.

Daniel Bryant (21:31):

Ah, gotcha.

Crystal Hirschorn (21:32):

Kind of trying to add more to that, so that's one of the developer experience team's first set of goals, was actually kind of just working on backstage and kind of bringing that developer portal to live, and kind of adding more capability into it as well, and integrations for the rest of the organization. But yeah, there's that, but then there's education, which is quite a different thing. So we have done quite a few training sessions, and most recently, yes, it's a R&D org, I said to them on Slack, I was like, "Hey, we're about to do another round of onboarding sessions, because we've got some new startups," and I said, "Normally, I just tell a few folks that I know that are kind of friendly if they want to join," I said, "But hey, we're willing to kind of open this out," and it was, it's like a response from about 50 people saying, "Yeah, we'd quite like this kind of onboarding." And I was thinking, "Wow, that was way more popular than I was expecting"

Crystal Hirschorn (22:24):

So now we're having a think about Q2 goal again, it's like, how do we build a proper onboarding kind of sessions around this, so that we can kind of get focus up to speed on our platform? Because we're hiring all the time.

Daniel Bryant (22:37):

Yeah, of course, of course.

Crystal Hirschorn (22:38):

New people will be coming in who don't have any experience of it. I said, the thing that caught my attention was it was pretty mixed, the response. It was half people that had already worked here, and half new people. So I said, "Clearly there's still an appetite, even to kind of capture the audience that has already been here and did that migration to Polaris earlier this year." So yeah, so I do think you have to do a lot of that training. But also we're doing things like video tutorials.

Daniel Bryant (23:02):

Ah, awesome, yeah.

Crystal Hirschorn (23:04):

Yeah, it has to-

Daniel Bryant (23:06):

self-service again, right?

Crystal Hirschorn (23:06):

Exactly, and also you have to ... different people learn in different ways.

Daniel Bryant (23:10):

Yes.

Crystal Hirschorn (23:10):

They engage in different ways.

Daniel Bryant (23:11):

Totally.

Crystal Hirschorn (23:13):

But again, it's like, it's also about trying to constantly be in touch with the engineers to kind of reduce the barriers to entry. Just make it as easy as possible, they can get to grips with it as soon as possible, and that's where developer experience is really going to help us at Snyk, so ...

Daniel Bryant (23:25):

Really good, Crystal. I know we're getting close to time here. I just want to pick your brains finally on how do you balance the standardization? Because you already mentioned Helm Charts, for example. I've bumped into a few folks that definitely going to suck it up, "We've got to use Helm Charts." Some folks just expose the values.yaml file for example, right, and other folks hide it behind a UI, I think like Shopify kind of hide some of that stuff behind a UI. But as a developer, I like to learn standards, because that's portable across jobs and just industry and stuff, right? But I also like the comfort sometimes of just clicking a button in a UI. So how do you as a leader balance the standards, I guess, that you choose to implement and expose?

Crystal Hirschorn (24:02):

Yeah, I mean, I don't see them as an either/or choice, maybe. For me it's about, when engineers want to get their hands dirty with a platform, why shouldn't they be allowed to? For instance, right now, we're saying to, we have a finite number of engineers in my group, and we had to build a platform really quickly, and we needed to get all the teams migrated over to it so we could launch our first single tenant customers, and as part of that migration, there was one team that needed a DocumentDB resource in AWS, and we're like, "Well, we don't really have the capacity to do that. But here's a playbook that you could follow, and we want to encourage an ecosystem of components." We don't want to build it all in infra, because otherwise we become the bottleneck.

Daniel Bryant (24:43):

Good point.

Crystal Hirschorn (24:44):

And the guy had infra skills and he did it, and then he went and advocated for us and I follow up kind of R&D monthly conversations of like, "Hey, I followed their playbooks, and it really wasn't that hard."

Daniel Bryant (24:53):

That's pretty cool.

Crystal Hirschorn (24:54):

He's like, "I barely even needed to talk to them to build this whole terraform module myself and add it to Polaris," the platform. And I was like, "This is amazing." So that's the thing we want to encourage, is like, get your hands dirty where you're comfortable, but also think, it's not about like, "Nobody ever needs to see Helm or YAML or..." Because most engineers can get on okay with that. I think when you have like 1000 engineers, I'm asking myself, "Is it reasonable anymore to constantly be exposed to YAML and Helm and know that the engineers are always going to know what they're meant to be doing, and that we have the right linting and error handling, basically. We're not making mistakes, because that could be a counterargument, is that you just need stronger linting, and error checking, and testing, and these things, just to make sure that you're not introducing bugs. But I also think, why not behind a UI as well? Engineer can get, they can do whatever they want. And this is kind of where I see other platforms like Heroku going as well. When I worked at the BBC many years ago, this was very much in line with the kind of plans that we built there too.

Daniel Bryant (26:05):

Oh, interesting.

Crystal Hirschorn (26:06):

You want to get your hands dirty? You can. But most engineers just choose the UI and built stacks for them, and it would show them a console of kind of the stack traces, it's doing builds and various stuff like that, but it's all behind a UI. So they didn't really have to get their hands dirty unless they wanted to. Yeah, they could promote artifacts between different environments through that...

Daniel Bryant (26:28):

mm-hmm (affirmative).

Crystal Hirschorn (26:29):

But that was it, yeah. And so I think, yeah, I'm asking myself when you have that many engineers at a company, is there an inflection point where you think, "Hm, not sure?" But yeah, it's, I don't know the answer. I mean, within my own team you debate it.

Daniel Bryant (26:43):

Nice. That's kinda healthy, right?

Crystal Hirschorn (26:45):

Yeah, yeah, like there's some engineers that are like, "Oh no, no, no, it's fine, Helm's fine. We'll always be with Helm, Helm will always be our API," and I'm like, "I don't know, we'll see, we'll see." I think there's still space for just, yeah, putting a UI on top of it. Why not? So I say, why not both?

Daniel Bryant (27:00):

I think that's a perfect point to end there, Crystal, it's so much wisdom being shared, thank you so, so much. If folks want to get in touch with you and they like what they heard today, LinkedIn, Twitter, where's the best way to reach out to you?

Crystal Hirschorn (27:11):

I'm on Twitter. So my handle is "cfhirschorn". So yeah, you can look me up by my last name is, yeah, kind of complicated. But also ...

Daniel Bryant (27:25):

Unique, right? Unique, as we say.

Crystal Hirschorn (27:27):

Yeah, so I don't know if people will know how to spell that, just after me saying it, but ...

Daniel Bryant (27:31):

I'll definitely put it in the show notes, for sure.

Crystal Hirschorn (27:33):

Yeah, put it in there, yeah. Also, I'm fine to be contacted on LinkedIn, but I'm on Twitter much more often.

Daniel Bryant (27:38):

Thank you so much for sharing all that great knowledge, Crystal, really appreciate it.

Crystal Hirschorn (27:41):

Yeah, no, it's been great to be on the show today, and thank you for having me, Daniel.

Developer Control Planes: An Engineering Director's Point of View

About

Episode Guests

Featured Episodes

S3 Ep10: Foundations of Formidable API Federation feat. Daniel Kocot

S3 Ep11: Embracing Tech Change: Matthew Reinbold on Adapting to Industry Shifts

S3 Ep12: Kubecrash 2024: Engineering Insights with Danielle