At Fullscript, as our platform and teams continue to scale, so does the complexity of keeping the system healthy. One of the hidden costs in any growing engineering org is the time and friction it takes to manage and route issues — especially bugs.

This post is about a tool I built that uses Large Language Models (LLMs) to improve issue triaging. It detects whether something is a bug and determines which team should own it. The results so far have been promising — and the approach has implications far beyond just routing bugs.

Image from: Reimagining Issue Triaging with LLMs

Why Rethink Issue Triaging?

Issue triaging — deciding if something is broken and who should fix it — sounds straightforward. But it quickly breaks down at scale.

At Fullscript, we’ve faced a few recurring pain points:

On-call engineers spend valuable cycles rerouting issues instead of solving them
New team members struggle to understand who owns what
Bugs often sit unassigned, delaying our ability to respond and resolve
Team ownership constantly shifts as the org evolves

All of this adds friction to an already noisy system.

The Challenge of Ownership in a Growing Org

Our engineering teams are domain specific and intentionally dynamic. People switch teams. Team responsibilities evolve. Business priorities shift.

While this agility helps us adapt, it makes issue ownership extremely fluid. Traditional approaches — like CODEOWNERS, static mappings, or heuristics based on file paths — simply can’t keep up. They break the moment someone changes teams or a file gets moved.

To solve this, we needed a smarter system that could understand both the technical and organizational context of a problem, and assign it accordingly — without relying on brittle logic.

Step One: Detecting What’s Actually a Bug

Before assigning anything, we needed to determine if an issue is a real bug or not. This is where the first part of the LLM pipeline comes in: bug detection.

The prompt is fed the issue title and description, and uses a strict definition of what qualifies as a bug — e.g., crashes, incorrect behavior, security flaws — while filtering out things like feature requests, performance improvements, or refactors.

The prompt used looks like this:

1BUG_DETECTION_USER_PROMPT = """You are a software developer reviewing GitLab issues.
2Analyze this issue and determine if it describes a BUG (functional defect) as opposed to a feature request, enhancement, or code quality improvement.
3A BUG is defined as:
4- Something that is broken or not working as intended
5- Unexpected behavior or errors
6- Crashes, exceptions, or failures
7- Data corruption or incorrect results
8- Security vulnerabilities
9- Performance issues that prevent normal operation (timeouts, system crashes, complete failures)
10- An that mentions a text like "An object of type <object name> was hidden due to permissions" or similar policy issues
11NOT BUGS (do not classify these as bugs):
12- Code quality improvements (refactoring, cleanup, better practices)
13- Performance optimizations and improvements (slow requests, query optimization, caching improvements)
14- Slow response times or requests that could be faster (unless causing timeouts or complete failures)
15- Feature requests or enhancements
16- Documentation improvements
17- Style or formatting changes
18- Technical debt reduction
19- Accessibility improvements (unless breaking existing functionality)
20- UI/UX improvements or suggestions
21- Adding new functionality
22- Upgrading dependencies (unless fixing a specific issue)
23- Database query optimization requests
24- API response time improvements
25- Load time optimizations
26- Memory usage optimizations (unless causing out-of-memory crashes)
27- Calling for investigation of some potential security issue
28IMPORTANT: Distinguish between:
29- BUGS: "The API times out and fails" or "Requests crash the server" (actual failures)
30- NOT BUGS: "The API is slow" or "This query could be optimized" (performance improvements)
31Issue Title: {title}
32Issue Description:
33{description}
34Is this describing a functional BUG (something broken/not working)? Rate your confidence from 0–100% and explain your reasoning.
35Focus only on whether something is actually broken or malfunctioning, not just slow or suboptimal.
36Respond in JSON format with the following fields:
37- is_bug (boolean)
38- confidence (integer percentage 0–100)
39- reason (string explanation)"""

This part was fairly straightforward to get working. But triaging was just getting started. The hard part is determining who should fix it

Step Two: Assigning the Right Team

Team assignment is significantly more complex. To do this well, the system has to synthesize multiple layers of context:

Technical signals: Stack traces, file paths, git blame, commit history
Organizational signals: GitLab groups, assignee metadata, milestone labels
Business domain knowledge: Our internal document that maps teams to domain areas

Each of these can be useful — depending on the issue. Some issues have full stack traces and clear technical blame. Others are vague business problems or screenshots without much context.

The system needs to adapt to all of them.

Leveraging GitLab Structure

We started by leveraging GitLab’s org structure. Our engineering teams are organized into groups under groups/eng, and these are already well maintained.

At the beginning of the script we fetch teams and members from GitLab so that we can later infer a team from git history/blame data on related stack traces and code paths mentioned in the issues

Teaching AI to Think Like a Triaging Engineer

One of the most challenging parts of this project was the constant iteration to the code and prompts. The process was highly iterative, involving 1–2 hours of focused work per day over the span of a week.

Each session involved testing real-world issue scenarios, updating the prompt logic based on outcomes, and rerunning tests. Crucially, any change had to be validated not just for new use cases, but also for regressions in previously passing scenarios. The loop: test → refine → regression check → commit → repeat.

Examples of encoded logic:

Stack trace points to recent code changes → prioritize blame data
UI bug lacks a stack trace but has a descriptive title → fall back to domain ownership docs
Vague report with a screenshot offering minimal clues → analyze image contents and fallback to ownership mapping
Blame shows only former employees → defer to commit history and secondary authorship

Here is an example of the prompt that we ended up with to handle the main decision making on ownership. It takes as input relevant data (only recent/relevant blame data, git commit history, team domain ownership document):

1# Team assignment warning - used across multiple prompts
2TEAM_NAME_MATCHING_WARNING = """⚠️ CRITICAL WARNING - DO NOT MATCH TEAM NAMES TO TECHNICAL TERMS:
3- NEVER assign teams based on semantic similarity between team names and technical terms
4- Team names are random organizational labels with NO relation to technology domains
5- Example: Do NOT assign "Team::Apollo" just because the error mentions "ApolloError" or "Apollo Client"
6- Example: Do NOT assign "Team::React" just because the issue mentions "React" components
7- Example: Do NOT assign "Team::Mobile" just because the issue mentions "mobile" features
8- ONLY use technical evidence (git blame, file ownership) and explicit domain ownership from the breakdown
9- If you find yourself matching team names to technology keywords, STOP and reassess"""
10TEAM_OWNERSHIP_COMBINED_SYSTEM_PROMPT = f"""You are a team ownership analyzer for a software engineering organization. Your job is to determine which engineering team should own/fix a given issue based on:
111. Git blame information (who recently modified the relevant files) - PRIORITIZED BY IMPORTANCE
122. Team domain ownership information from the organization's pod breakdown
133. File paths and stack traces in the issue
144. The nature and context of the issue
15CRITICAL PRIORITIZATION RULES FOR GIT BLAME ANALYSIS:
161. HIGHEST PRIORITY: Focus primarily on blame for the topmost file in the stack trace - this is where the failure occurred
172. HIGH/MEDIUM PRIORITY: Consider subsequent files in the stack trace as supporting evidence
183. LOWERER PRIORITY: Use remaining files and older commits only as additional context
19The team that owns the topmost failing file should be given the strongest consideration, as this represents the actual point of failure.
20{TEAM_NAME_MATCHING_WARNING}
21You have access to the organization's team domain ownership breakdown which shows which teams own which areas of the codebase and business domains.
22Always respond with a JSON object containing:
23- "recommended_team": The team name most likely to own this issue
24- "confidence": A number from 1–10 indicating your confidence
25- "reasoning": A detailed explanation of your decision
26- "alternative_teams": A list of other teams that could potentially own this issue
27- "primary_method": The primary method used for the recommendation (technical_evidence, domain_ownership, or combined)
28Consider both technical ownership (who modified the code) and domain ownership (which team owns the business area) when making your decision."""
29TEAM_OWNERSHIP_COMBINED_USER_PROMPT = f"""You are analyzing a software issue to determine which engineering team should own it. You have been given:
301. **TEAM DOMAIN OWNERSHIP BREAKDOWN:**
31{{pdf_ownership_text}}
322. **ISSUE ANALYSIS:**
33{{parsed_failure}}
343. **GIT BLAME INFORMATION (PRIORITIZED BY STACK TRACE ORDER):**
35{{blame_summary}}
364. **VALID GITLAB TEAMS:**
37{{valid_teams}}
38CRITICAL: The git blame information is ordered by priority - the TOPMOST file in the stack trace is where the failure occurred and should be given the strongest consideration for team assignment.
39{TEAM_NAME_MATCHING_WARNING_BRIEF}
40Based on this information, determine which team should own this issue. Consider:
41- Which team owns the business domain related to this issue (from the ownership breakdown)
42- Which team members have recently modified the relevant files (from git blame) - PRIORITIZE THE TOPMOST FILE
43- The nature of the issue and which team's expertise it requires
44Respond with a JSON object containing your recommendation."""

Handling UI Screenshots with LLMs

Sometimes, issues include little more than a generic title and a screenshot.

We figured — if a human can look at a screenshot and figure out the right team, so can an LLM.

So we added support for parsing screenshots using Claude’s image understanding features. When available, the model analyzes the image and answers questions like:

What feature or workflow is this?
What’s the user trying to do?
What domain does this belong to?

This context is then merged with the text-based inputs before assignment as below:

1# Prepare images for AI analysis if available
2ai_images = []
3if images:
4   self.log(f"Preparing {len(images)} image(s) for AI analysis...")
5   for img in images:
6       try:
7           # Encode image data for AI
8           encoded_data = self.gitlab_client.encode_image_for_ai(img.get('data', b''))
9           ai_images.append({
10               'filename': img.get('filename', 'unknown'),
11               'alt_text': img.get('alt_text', ''),
12               'source': img.get('source', 'unknown'),
13               'encoded_data': encoded_data
14           })
15           self.log(f"  Prepared image: {img.get('filename', 'unknown')} from {img.get('source', 'unknown')}")
16       except Exception as e:
17           self.log(f"Error preparing image {img.get('filename', 'unknown')}: {str(e)}")
18
19# Prepare the prompt with image context
20image_context = ""
21if ai_images:
22   image_context = f"\n\nIMAGE CONTEXT:\nThe issue includes {len(ai_images)} attached image(s) that provide visual context about the specific functionality and business area:\n"
23   for i, img in enumerate(ai_images, 1):
24       image_context += f"- Image {i}: {img['filename']} (from {img['source']})"
25       if img['alt_text']:
26           image_context += f" - {img['alt_text']}"
27       image_context += "\n"
28   image_context += "\n🔍 IMPORTANT: Analyze these images to identify SPECIFIC FUNCTIONALITY and BUSINESS FEATURES, not just generic UI components:\n"
29   image_context += "1. What specific business functionality is shown? (e.g., lab ordering, prescription management, patient checkout, etc.)\n"
30   image_context += "2. What domain-specific features are visible? (e.g., onboarding, supplement search, labs search, product details page, etc.)\n"
31   image_context += "3. What workflow or process is the user in? (e.g., checkout flow, treatment plan writing, etc.)\n"
32   image_context += "4. Look for business-specific terminology, brand names, or specialized features that indicate ownership\n"
33   image_context += "5. PRIORITIZE teams that own the specific functionality shown over teams that own generic UI components\n"
34   image_context += "6. If you see specific features, or specialized workflows, those should heavily influence team assignment\n\n"

What’s Working So Far

We’ve run the system against historical data and are already using it in parts of our triaging process. So far:

It’s successfully triaged and labeled over 1,000 issues (99% correct bug labeling and 90% correct team assignment)
Assigned 500+ previously unowned bugs
Handled fuzzy “gray area” issues with solid reasoning
Routed bugs faster than any manual process we’ve tried

The tool saved over 14 hours from triaging existing issues we had unassigned and unlabeled. Beyond that, today we get around 250 bugs a month, if we assume it takes on average 2 minutes per issue to determine ownership and route to the appropriate team, that could save us 8 hours a month.

What Didn’t Work So Well

Cursor is a power tool — with sharp edges: Writing this system entirely in Cursor made things move fast, but it also came with a constant risk of regressions. It’s great at helping you get things working — but it doesn’t always preserve what was working. The rule I landed on: as soon as something works, commit it, close the chat, and start a new one. Otherwise, expect weird side effects.
LLMs default to code when you want reasoning: Without explicitly asking Cursor to use Claude and prompt-based logic, it often defaulted to traditional code-heavy solutions. It would reach for regexes or nested conditionals when what I wanted was natural language reasoning. It took persistent nudging to get it to stop “thinking like a script” and start “thinking like an engineer.” Counterintuitive — but worth the push.
Old issues are messy and misleading: Issues older than ~2 years were a lost cause. Many referenced deleted code, dead links, or UI screenshots that no longer resemble the current product. These created noise and confusion in the ownership model. In the end, we just closed most of them — they weren’t worth routing.
Team names kept getting semantically matched (wrongly): Despite explicitly telling the model not to match team names to error messages, it still loved to assign Team::Apollo to anything mentioning “Apollo cache” errors. The prompt includes a clear warning — but LLMs sometimes “forget” or revert to surface-level heuristics. It’s gotten better, but it still slips occasionally.
Infra-heavy teams falsely tagged as domain owners: We have an internal team that leads large-scale infrastructure changes and often shows up in blame across the entire codebase. That caused issues — the model would frequently assign them as owners of features they don’t actually own. The fix? We excluded that team from ownership calculations entirely, which worked well since they don’t have business domain responsibilities.

What’s Next

We’re continuing to evolve this tooling in a few key directions:

📡 Real-time triaging via GitLab webhooks As soon as an issue is created, it will be processed and — if applicable — labeled and routed automatically. This will create efficiencies not just for developers but also customer facing teams and any other team that might create bug reports directly.
💬 Follow slack links Sometimes the description of the issue lives in a slack thread that’s linked to the ticket. The script should be able to follow these links and extract context as a human would.
🚨 Follow sentry links Similar to above but when the details might leave in Sentry.
🛠️ Support for additional trackers GitLab is just the start. We’re building an abstraction layer so platforms like Linear are easy to plug in next.
🤖 Agentic resolution With ownership routing solved, we’re exploring agents that can propose small fixes, run tests, and route the MR to the correct team for review. This would mean our backlogs of issues could start to shrink faster, and developers could spend more time building features.

Final Thoughts

We’re not automating for the sake of it — we’re designing systems that reduce friction and let engineers focus where they’re most valuable. As software organizations grow, the coordination tax grows with them. Intelligent triaging is one way we can offload that tax.

LLMs let us scale judgment. When thoughtfully applied, they bring structure to unstructured problems, speed to noisy processes, and clarity to ambiguity. This project started as an experiment — and will likely evolve into a system we can rely on.

It’s a glimpse at how engineering will look in the near future — where reasoning is distributed, workflows are agentic, and engineers spend less time routing tickets, and more time producing great software.

Reimagining Issue Triaging with LLMs

Share this post