Voiceflow Telepresence Case Study

Opportunity From monolith to microservices

We chose Telepresence because it allowed us to run multiple services locally on our laptops and allowed us to deploy around 20 microservices within five minutes.

Xavi Portilla Edo, Head of Infrastructure at Voiceflow

Voiceflow’s architecture has become more complex over the years. They only had a monolithic application and two database engines when they started.

But as they adopted the microservice architecture, it grew to over 20 microservices, multiple databases, and queue engines. As a result, running the entire application on their developers’ laptops like they used to was no longer scalable nor practical for their developers.

Challenge The existing setup

Applications became too large to run locally. Not only the number of services they had to run but also the external services they needed: DBs, queue engines, etc.
The development workflow was very slow. Building images and waiting for them to deploy isn't the fastest or most efficient use of a developer’s time.
Even if they had capacity, they couldn’t replicate the interdependency of cloud deployed apps locally, so their local testing was not comprehensive enough.
Devs didn’t know what other devs were working on. For example, a backend dev does not need to know about the frontend services: how to run, compile, and package them.

Since their transition from the monolithic to the microservice architecture, they’ve learned that mocking and testing an entire application on a laptop causes problems like excessive resource consumption on the dev’s local machines, unrealistic environments, complex setups, and lengthy onboarding. As the app’s architecture becomes more complex, it doesn’t scale.

Approach Local to remote to the rescue

After exploring different inner dev loop tools, they found Telepresence to be the most useful because it allowed them to implement a hybrid development environment where all of their services would run in multi-tenant developer clusters, and the service they were developing would run locally. This led to a faster feedback loop and the ability to still use their local tools even though most of their application was running remotely.

How Telepresence stacks up against other tools

They were looking for a development tool that allows them to have everything deployed in the cloud and have something running on their machines. They tested Telepresence, Okteto, DevSpace, Skaffold, etc. The other tools were very complex to use compared to Telepresence. Sometimes, the fastest solution is the best one. While developing, they just needed a local service running with a connection to the remote cluster, and all the env vars required injected.

For their dev workflow, other solutions introduce more complexity to the inner loop: adding file syncing with the cluster, checking changes in a Dockerfile, re-building it, and pushing it to the cluster. But in the end, what they wanted was to run our source code and execute it in dev mode with hot reloading! Telepresence fitted into their existing workflow and allowed them to improve their inner dev loop.

Telepresence architecture overview

Telepresence enables you to create intercepts to a target Kubernetes workload. Once you have created an intercept, you can code and debug your associated service locally. This is how telepresence works with the Kubernetes architecture.

As shown in the first diagram, every time a developer is meshing a service, every request to the service meshed needs to go to the traffic-manager deployed on the Kubernetes cluster to route that request to the service running locally on the developer laptop.

Looking at things from a networking perspective in the second diagram, every request that’s been service meshed will have this route.

Solution Incorporating Telepresence commands into the CLI

At Voiceflow, they have developed an internal tool for their devs called vfcli. With the vfcli, their devs can create their ephemeral environments using the power of GitOps. They have three different types of environments:

In every single environment, they deploy all the services, databases, and queue engines mentioned above. When they designed the development environment and the inner loop, they had in mind that simplicity was our top priority. This is why they embedded telepresence on their CLI.

With the vfcli, you can just run the vfcli mesh connect <service> command and it will run the Telepresence intercept command under the hood. Additional pre/post meshing hooks can also be run for service setup/teardown. They also have vfcli mesh connect to start the Telepresence connection with the remote environment. These hooks are useful if they have to run any script to prepare the service (install dependencies, build, etc.) or to execute post-scripts (cleanup things, revert changes, etc.)

Now that you know how their system works, this is what a typical day for a Voiceflow dev might look like using their ‘remocal’ dev environment:

A dev wants to add a new API call from the frontend to a backend service
The dev spins up a dev env to work in
They use vfcli to connect their local frontend and backend-API to their env, initializing Telepresence
With this, the developer can quickly iterate and test their changes in a prod-parity environment without needing to push any code or run all the services locally.

I finished my development. What's next?

Of course, while you are developing, you will probably run your source code in dev mode and, depending on the language, that code is not the final code that is going to be executed in a real environment; you will need to compile it and run a compiled code or a binary and also that code should be executed in a container in a cluster. For this, they have created a system called Track System. A track is basically a particular branch that gets continuously updated. Each developer can set what track wants to follow in each service. The devs can specify the track they want to follow using the vfcli. Then their GitOps system will update the remote environments.

The following steps are involved when you want to push a change to your branch:

The code is built.
The built code is containerized.
The container is published to a container registry.
The metadata of the track is updated.
All the remote environments that are following that track for that specific service are updated automatically by an internal reconciliation service.

They also have tracks that are not related to a branch. The team called them “Virtual Tracks”: they are similar to the above but not directly corresponding to a branch: (update a track every XYZ commits or update a track every XYZ minutes/hours, etc.). These special tracks can be used on our development environments just in case their developers want to get updates for their environments that are not strictly following a git tree.

That is for their remote development environment, but what happens with a production cloud? For that, they have created a Release channel system. A channel is a set of services and tracks. In each channel, you specify the services you want to update and which track follows each. Then the GitOps system will update our clouds with the spec defined on that channel.

Gravitee Acquires Ambassador Labs

Voiceflow Story

Results