How Uber Built AI Agents That Saved 21,000 Developer Hours

Key Takeaways:

  • Uber’s Developer Platform Team has deployed agentic tools that streamline tasks across a massive codebase
  • LangGraph was used to orchestrate reusable, domain-specific agents for testing, validation, and workflow assistance
  • Validator and Autocover tools have reduced manual toil and saved an estimated 21,000 developer hours
  • Uber created a reusable internal agent framework that aligns with company systems and culture
  • Strategic reuse and encapsulation have scaled agent development across engineering teams

At LangChain’s Interrupt event, Uber’s Developer Platform Team shared how it’s using LangGraph to power an expanding set of internal AI agents—many of which are already in daily use across an engineering organization supporting 5,000 developers and a codebase with hundreds of millions of lines.

The strategy, as presented by Sourabh Shirhatti and Matas Rastenis, is built on three foundational pillars:

  • Build agents that eliminate high-friction engineering tasks
  • Create reusable, cross-cutting primitives that power multiple tools
  • Support intentional tech transfer across teams by wrapping LangChain and LangGraph inside Uber-native abstractions

Meet Validator and Autocover

Uber’s first major LangGraph-based agent was Validator, a tool embedded into engineers’ IDEs. It automatically flags security vulnerabilities and best-practice violations in real time, then proposes fixes which can be accepted with a click or routed to an agentic assistant for more context-aware resolution.

Validator’s architecture is layered. A central agent coordinates multiple sub-agents, including one that queries an LLM with curated best-practice prompts, and another that invokes deterministic tools like static linters. This hybrid setup lets Uber precompute common fixes while allowing dynamic code evaluation where needed.

The success of Validator led to the creation of Autocover, a generative test-authoring tool. Autocover automates high-quality test generation using domain-specific expert agents that scaffold, generate, execute, and mutate test cases based on heuristics encoded in a LangGraph-based structure. Validator is even embedded as a subagent to catch test design issues early.

The result is a fluid, semi-autonomous experience. Engineers trigger test generation from within their IDE, and Autocover populates a stream of context-aware, build-validated test cases in real time. For large files, Uber’s system can execute up to 100 tests concurrently—boosting throughput and increasing test coverage two to three times faster than other AI coding tools.

So far, Autocover has helped increase test coverage across the Developer Platform by 10%, which Uber estimates has saved over 21,000 hours of developer time.

From Agents to Graphs to Platforms

Uber didn’t stop at standalone tools. The company also introduced a GPT-like internal chatbot builder, a security assistant, and a workflow platform called Picasso that includes conversational AI agents integrated with organizational knowledge. These share primitives with tools like Validator and Autocover, creating a layered ecosystem of composable, graph-based agent systems.

The team emphasized that encapsulation has been key to enabling collaborative development. For example, Uber’s security team was able to contribute validation rules to Validator without needing deep familiarity with agent architecture or LangGraph itself. Modular graph nodes allowed teams to encode their logic and let the system handle execution paths and state.

Strategic Lessons for Scaling AI Agent Development

Shirhatti and Rastenis closed with key lessons they believe other teams can apply:

  • Encapsulation enables reuse. Clear agent interfaces allow teams to extend functionality without central coordination.
  • Domain expert agents outperform generic tools. Specialized agents with deep context around source code, test strategies, and build systems performed better than general-purpose AI code tools.
  • Determinism still matters. In many cases, deterministic agents outperformed LLMs for tasks like linting and build execution, while still plugging into agentic workflows for orchestration.
  • Solving narrow problems first pays off. Several Uber agents began as tightly scoped solutions but were later reused in broader workflows. One such example: a build system agent that now supports multiple tools by handling test execution and feedback routing.

Rather than racing to deploy the flashiest AI, Uber has invested in foundational architecture and consistent tooling—resulting in real performance gains, reusable components, and increasing developer satisfaction across teams.

Learn how AI Agents can supercharge your company’s profits and productivity at TMC’s AI Agent Event in Sept 29-30, 2025 in DC.

Rich Tehrani serves as CEO of TMC and chairman of ITEXPO #TECHSUPERSHOW Feb 10-12, 2026 and is CEO of RT Advisors and is a Registered Representative (investment banker) with and offering securities through Four Points Capital Partners LLC (Four Points) (Member FINRA/SIPC). He handles capital/debt raises as well as M&A. RT Advisors is not owned by Four Points.

The above is not an endorsement or recommendation to buy/sell any security or sector mentioned. No companies mentioned above are current or past clients of RT Advisors.

The views and opinions expressed above are those of the participants. While believed to be reliable, the information has not been independently verified for accuracy. Any broad, general statements made herein are provided for context only and should not be construed as exhaustive or universally applicable.

Portions of this article may have been developed with the assistance of artificial intelligence, which may have contributed to ideation, content generation, factual review, or editing.


 

Loading
Share via
Copy link
Powered by Social Snap