Increased the speed of inner dev loop
Telepresence increased the speed and fidelity of our inner dev loop by removing the need to constantly build and deploy test images.
Improved development speed by 50%
Our development speed and onboarding both improved by over 50% while eliminating development setup complexity.
Removed resource burden
Using Telepresence removed both the operational and resource burden of running all of our microservices locally.
Achieved more accurate and comprehensive tests
Our tests are more accurate and comprehensive since now we can use Telepresence to “shift left” our testing cycles.
From monolith to microservices
Voiceflow’s architecture has become more complex over the years. We only had a monolithic application and two database engines when we started.
But as we adopted the microservice architecture, it grew to over 20 microservices, multiple databases, and queue engines. As a result, running the entire application on our developers’ laptops like we used to was no longer scalable nor practical for our developers.
Challenges with the existing setup
- Applications became too large to run locally. Not only the number of services we had to run but also the external services we needed: DBs, queue engines, etc.
- The development workflow was very slow. Building images and waiting for them to deploy isn't the fastest or most efficient use of a developer’s time.
- Even if we had capacity, we couldn’t replicate the interdependency of cloud deployed apps locally, so our local testing was not comprehensive enough.
- Devs didn’t know what other devs were working on. For example, a backend dev does not need to know about the frontend services: how to run, compile, and package them.
Since our transition from the monolithic to the microservice architecture, we’ve learned that mocking and testing an entire application on a laptop causes problems like excessive resource consumption on the dev’s local machines, unrealistic environments, complex setups, and lengthy onboarding. As the app’s architecture becomes more complex, it doesn’t scale.
Local to Remote to the Rescue
After exploring different inner dev loop tools, we found Telepresence to be the most useful because it allowed us to implement a hybrid development environment where all our services would run in multi-tenant developer clusters, and the service we were developing would run locally. This led to a faster feedback loop and the ability to still use our local tools even though most of our application was running remotely.
How Telepresence stacks up against other tools
We were looking for a development tool that allows us to have everything deployed in the cloud and have something running on our machines. We tested Telepresence, Okteto, DevSpace, Skaffold, etc. The other tools were very complex to use compared to Telepresence. Sometimes, the fastest solution is the best one. While developing, we just needed a local service running with a connection to the remote cluster, and all the env vars required injected.
For our dev workflow, other solutions introduce more complexity to the inner loop: adding file syncing with the cluster, checking changes in a Dockerfile, re-building it, and pushing it to the cluster. But in the end, what we want is to run our source code and execute it in dev mode with hot reloading! Telepresence fitted into our existing workflow and allowed us to improve our inner dev loop.
Telepresence architecture overview
Telepresence enables you to create intercepts to a target Kubernetes workload. Once you have created an intercept, you can code and debug your associated service locally. This is how telepresence works with the Kubernetes architecture:
As shown in the diagram above, every time a developer is meshing a service, every request to the service meshed needs to go to the traffic-manager deployed on the Kubernetes cluster to route that request to the service running locally on the developer laptop.
Looking at things from a networking perspective, every request that’s been service meshed will have this route:
Incorporating Telepresence commands into our CLI
At Voiceflow, we have developed an internal tool for our devs called vfcli. With the vfcli, our devs can create their ephemeral environments using the power of GitOps. We have three different types of environments:
- Development environments
- Review Environments
- Test Environments
In every single environment, we deploy all the services, databases, and queue engines mentioned above. When we designed the development environment and the inner loop, we had in mind that simplicity was our top priority. This is why we embedded telepresence on our CLI.
With the vfcli you can just run the vfcli mesh connect <service> command and it will run the telepresence intercept command under the hood. Additional pre/post meshing hooks can also be run for service setup/teardown. We also have vfcli mesh connect to start the telepresence connection with the remote environment. These hooks are useful if we have to run any script to prepare the service (install dependencies, build, etc.) or to execute post-scripts (cleanup things, revert changes, etc.):
Now that you know how our system works, this is what a typical day for a Voiceflow dev might look like using our ‘remocal’ dev environment:
- A dev wants to add a new API call from the frontend to a backend service
- The dev spins up a dev env to work in
- They use vfcli to connect their local frontend and backend-API to their env, initializing Telepresence
- With this, the developer can quickly iterate and test their changes in a prod-parity environment without needing to push any code or run all the services locally.
I finished my development. What's next?
Of course, while you are developing, you will probably run your source code in dev mode and, depending on the language, that code is not the final code that is going to be executed in a real environment; you will need to compile it and run a compiled code or a binary and also that code should be executed in a container in a cluster. For this, we have created a system called Track System. A track is basically a particular branch that gets continuously updated. Each developer can set what track wants to follow in each service. The devs can specify the track they want to follow using the vfcli. Then our GitOps system will update the remote environments.
The following steps are involved when you want to push a change to your branch:
- The code is built.
- The built code is containerized.
- The container is published to a container registry.
- The metadata of the track is updated.
- All the remote environments that are following that track for that specific service are updated automatically by an internal reconciliation service.
We also have tracks that are not related to a branch. We called them “Virtual Tracks”: They are similar to the above but not directly corresponding to a branch: (update a track every XYZ commits or update a track every XYZ minutes/hours, etc.). These special tracks can be used on our development environments just in case our developers want to get updates for their environments that are not strictly following a git tree.
That is for our remote development environment, but what happens with a production cloud? For that, we have created a Release channel system. A channel is a set of services and tracks. In each channel, you specify the services you want to update and which track follows each. Then our GitOps system will update our clouds with the spec defined on that channel.
Implementing a hybrid development environment with Telepresence allowed us to develop faster compared to when we were running everything on our local machines.
As more companies continue to make the monolithic to microservice transition, we want to share our experiences so others can learn from our mistakes, insights, and successes. I hope this article gives you an insider's view of what is possible and available with open source technologies and explains how having the right dev loop improves the developer experience.