Case Study

GitLab Duo Health Check

Every misconfigured GitLab Duo setup became a support ticket. It did not have to be that way.

~30% fewer tickets · Shipped 17.3
RoleDesign Lead
TimelineFY25
TeamAI-powered DevSecOps stage

Overview

When GitLab Duo launched on self-managed instances, administrators had almost no way to know why it was not working. The setup process touched network configuration, license validation, connectivity to GitLab's cloud services, and feature flag states across multiple layers of the product. When something went wrong, and something often did, the only path forward was filing a support ticket and waiting.

I designed the GitLab Duo Health Check: an admin-facing diagnostic tool that lets teams identify and fix configuration problems themselves, without involving support.


The problem

Self-managed GitLab admins are a specific kind of user. They are technically capable, often managing GitLab alongside many other tools, and they do not have time to debug opaque failures. When GitLab Duo did not work, they needed to know why quickly and they needed clear next steps. What they had instead was a black box.

The support queue reflected this. Tickets requesting help with GitLab Duo configuration were high in volume and largely repetitive. The same handful of issues, network connectivity, license synchronization, feature flag misconfiguration, showed up again and again. Every one of those tickets represented an admin who had tried to figure it out themselves and could not.

The problem was not that GitLab Duo was broken. It was that there was no surface in the product for admins to understand the system's state.


Understanding the user

Before designing anything I needed to understand where people actually got stuck. I did three things.

I read through client complaints in Slack. Not summaries or categorizations, the actual messages. The language people use when they are frustrated with a product is different from the language they use in structured feedback. It showed me not just what was failing but how it felt to be an admin who could not get GitLab Duo working for their organization.

I tried to set up GitLab Duo myself. This was the most valuable thing. Documentation that looks complete can hide the moments where the product and the instructions diverge. Setting it up myself surfaced exactly those moments, the steps that were ambiguous, the error states that gave no direction, the places where I had to make a guess and hope.

I talked to the AI team about where people consistently got stuck. They had pattern-matched across support interactions and could point to the highest-frequency blockers. That list became the scope for the first version.


The design approach

A health check that tries to diagnose every possible failure state would take a long time to build and ship too late to help. The right approach was triage: identify the problems that accounted for the most tickets and build a tool that handled those first.

The design had to work for an admin who was not necessarily a GitLab Duo expert. They might know GitLab well without knowing the specifics of how the AI infrastructure connected to it. Every check needed a plain-language result. Pass or fail was not enough on its own. A failure needed to tell the admin what was wrong and what to do about it.

The UI followed a step-by-step structure: run the check, see results by category, get specific guidance for anything that failed. In GitLab 17.5 we added the ability to download a full diagnostic report, which gave admins something concrete to share with their team or with GitLab support if they needed to escalate.

The tool was not designed to catch everything. It was designed to catch the things that affected the most people. That constraint was a feature, not a limitation. A focused tool that ships and helps thousands of admins is better than a comprehensive one that does not exist yet.


Results

The health check reduced the volume of support tickets related to GitLab Duo configuration by giving admins a self-serve path to the most common problems. Admins who would previously have hit a wall and opened a ticket could now run a check, see exactly what was wrong, and fix it.

The downloadable report in 17.5 added a second use case: admins escalating complex issues to support could include a structured diagnostic rather than trying to describe their configuration from memory. That made the support interactions that did happen faster to resolve.

It is not a complete solution. There are failure modes the health check does not cover. But it is progress in the right direction and a foundation to build on. For the admins it does help, it turns a frustrating dead end into a solvable problem.