Sentry is a popular error-tracking and logging platform that helps developers identify and resolve software bugs and issues in real-time. However, simply using Sentry is not enough to fully benefit from its capabilities. A good Sentry process is critical for maximizing its potential and streamlining your error-tracking process. When developing a product consumed by users, you need insights into your product stability, no matter the size of your team. Errors are part of every software out there, and detecting issues sooner also makes a happy customer.
This article is based on our experiences with LeanIX SMP (SaaS Management Platform) development, a fully automated SaaS management solution that helps you optimize costs and productivity of SasS subscriptions. Since SMP is mainly developed in Go, we will describe how to configure Go with Sentry and what processes we use to handle the errors.
With Sentry, we wanted to detect issues sooner and fix them before (too many) customers notice them. By plugging it in, we expected to clearly see where our application is failing and which problems we should fix first. And since we practice continuous deployment, we are able to deploy fixes quickly.
If you are not using Go and you are just interested in processes around Sentry, feel free to skip the first part of the article since the processes are language-agnostic.
LeanIX SMP lives in a mono repo and most of its services are built with Go, except for the frontend part, which is built in Vue.js. Since Sentry and Vue.js mostly work out of the box, in this section, we will be focusing on Go and Sentry configuration.
Although configuring Sentry and Go should be easy, there are some gotchas and ways to improve the initial configuration, especially when your logs are not the most descriptive and the error codes are not always aligned with the actual errors.
In LeanIX SMP, we are capturing and propagating errors all around the place. Therefore, if an unexpected error happens while accessing the database, it will be propagated to the front end as a
500 Internal Server Error.
When we introduced Sentry, we were still using go-json-rest The main issue with
go-json-rest was that it did not report internal server errors (500s) to Sentry. Therefore, we needed to introduce a custom middleware (based on
go-json-rest recover middleware) that would be able to catch internal server errors, propagate them in the same way as before and also send the report to Sentry.
When we started to use Sentry, it felt noisy due to too many errors. At this point, we needed a driver for each of our services, basically, someone that will go through all the detected issues, fix simple ones, introduce better logs, suppress, or just report some of the issues differently. Our product went from startup mode to working as a mature product in just a few years, and if you already took that bus in the past, you probably know what it means…
Technical Debt 😱
At this point, you may be thinking, what does technical debt have to do with Sentry and error reporting? Well, let’s say we came to the realization a lot of errors are not handled properly in the first place, and to understand what I am saying, here are a few examples:
500 Internal Server Errorwas commonly used instead of
400 Bad Request,
404 Not Found,
404 Conflict, etc. Usually, you want to know when something internally goes wrong, but 4xx client errors do not belong to Sentry
- A lot of
500 Internal Server Errorerrors were not carrying enough information and therefore it was hard to understand what exactly went wrong
- Various tasks were reporting critical errors, although this was more related to observability
The list can go on, and to start using Sentry as intended, we had to review and clean up the existing error handling and response status codes.
I, for instance, took the ownership of our private API and spent a couple of weeks (not full-time) cleaning up our Sentry project - this means fixing issues, delegating them to the owners, refactoring handlers, updating/adding logs, etc. At the end of this process, when the project was clear enough for everyone to understand, the guideline for tackling Sentry issues was internally announced, together with all the valuable materials and offers for mentorship. Since most of our engineers had previous experience with Sentry and since we already had an internal lightning talk on that subject, onboarding was not an issue.
Sentry is a powerful error-tracking tool, although you still need to have internal processes in place. Below, you can find what works for us.
About the same time we started using Sentry, we also started to split the ownership between our architecture and components. In our case, this is a document in Confluence, with a list of all the architecture and components, and the owners/teams assigned to it. Although it sounds simple, we needed some time to find clean ownership over some pieces of our architecture.
As mentioned above, our product evolved from the startup mode and started entering the maturity mode quite fast. Due to that, we also had to scale our team, hire many new engineers and form a couple more teams. We are not talking about some huge numbers, but we had to double our team in a short period, which is always a challenge.
During that growth period, it was also challenging to figure out clear ownership over the architecture. But there’s another thing that happened somewhere around the same time and helped us with that. We took the initiative of rewriting our frontend app from Angular.js to Vue.js. That took us almost half a year but after it was done, we also had clear ownership over our frontend application. You are maybe wondering why the rewrite helped us, but you need to understand that a lot of parts of our application were written more than ~4 years ago, and most (95%) of current engineers were not even here back then. Therefore, people changed, knowledge was lost, and we needed to build it back again.
LeanIX SMP is structured as a monorepo, although there are multiple domains like private API, public API, back office API, Frontend, processors, connectors, etc. Some domains are completely owned by a single team, while some are owned by multiple teams.
Due to that, it was also meaningful to split those domains by creating multiple projects within Sentry. With this in place, we were able to find a driver for each project, while also reducing the noise in each of those projects.
Driver, in this case, is someone that keeps an eye on a specific Sentry project, creates, delegates and sometimes even tackles and investigates tasks. For a driver, it is also desirable to have enough context to understand issues quickly, to be able to find out the owners, or even to find the right solutions, all depending on the priorities. The driver shouldn’t tackle everything on their own, but rather properly delegate the tasks. After all, everyone working on the project is responsible for its success.
After having the ownership in place, the method you’ll take is completely up to you. In our case, each project has its monitoring Slack channel where Sentry is reporting each time an issue occurs. The channel tends to be reviewed by everyone working on the project, but the fallback is always to the driver mentioned above.
For this to work as intended, we are using those two integrations:
- Sentry + Slack, to receive all the alerts directly in Slack
- Sentry + Jira, to quickly create tasks from Sentry issues
The one that’s first to examine the issue will place an 👀 emoji to it and afterward either ✅ or Jira emoji, depending if the issue was immediately resolved or there was a Jira ticket was created. Sometimes, the driver will not have enough context, so maybe the whole Slack thread will be delegated to the team that knows more about it and can examine it. Therefore, keep the process as straightforward as possible and use common sense when you need to involve multiple people/teams.
We believe the issues should be examined as soon as possible, while the actual tackling plan can be prolonged based on the issue's priority. Therefore, a Slack channel for each project should be kept an eye on, which is truly important because we are deploying 10+ times a day, and each release can bring new surprises. Usually, errors pop out quickly after the release, and ideally, the team responsible for the release should also observe the app's behaviour closely.
Many times, we had examples where a single customer would experience an error that was recently introduced. Sometimes those customers would also report an issue, but since we had Sentry in place, we would already be tackling it and by the time there would be a support ticket, we would already have a fix in place. By quickly examining and fixing the issue, we could deploy it before other customers would even experience it.
A good Sentry process can greatly improve the efficiency of your error-tracking process. By establishing clear ownership, setting up alerts, prioritizing errors, using Sentry's context information, integrating with other tools, regularly reviewing and cleaning up errors, and tracking error resolution progress, you can ensure that your team can quickly and effectively resolve software bugs and issues. Implementing these best practices will help you get the most out of Sentry, streamline your error-tracking process, and make customers happy.