Deep dive into: Cloud Coding Agents at HubSpot

This post is the second in a series about empowering product, UX, and engineering teams with AI, especially in the context of writing code. Read the first post about scaling AI adoption here!

In our last post, we shared a small part of our journey towards universal AI adoption in our engineering organization and claimed that AI has fundamentally transformed how we build software at HubSpot. That’s a pretty bold statement!

We have had a lot of success rolling out local AI coding agents to our engineering teams, but the transformative nature of AI-enabled software development became most apparent to us when we began to deeply integrate it at all points in the software development process.

One example of this integration is our recent rollout of cloud coding agents for engineers, which have been live for planning, implementation, and code review at HubSpot for the past six months. To date, we have merged over 7,000 fully AI-generated pull requests and code reviewed over 50,000 pull requests authored by humans. We run all of the agents on our own infrastructure and tightly integrate them with GitHub, allowing HubSpot engineers to ship changes quickly and with more confidence than ever before.

In this blog post, we will explore the technical architecture of this system, discuss how we went from zero to MVP quickly with a small team, and share what we learned about getting good results from fully autonomous coding agents.

From our past experience deploying local coding agents, we knew that giving them a tight feedback loop would be key to their success. At HubSpot, this meant giving them access to our existing developer tooling and infrastructure in order to read build logs, run integration tests, and call internal services. We have an extremely consistent and opinionated internal stack, which gave us a clear vision for the capabilities our platform would need to provide to the coding agent. On the other hand, our heavy usage of internal tools and libraries would have also made replicating our developer environment with an external provider quite challenging. For this reason, we decided that building an internal platform for executing cloud coding agents would be the quickest way to begin showing results.

Luckily, HubSpot already has a strong culture of building internal infrastructure platforms from scratch – we already have an internal build and deploy system built on top of Kubernetes which runs over one million builds per day and hosts tens of thousands of microservices across approximately 3,000 EC2 instances. It was an easy choice for us to build our agent execution platform on top of Kubernetes as well.

Analysis & Development

This approach has several advantages: it guarantees that every agent is sandboxed, it’s extremely easy to scale, and it’s flexible enough to handle many different kinds of requests (for example, agents don’t have to make code changes – they could also be instructed to examine a branch and leave a pull request review using gh).

With a solid foundation, we could begin working on integrations for GitHub and Slack. We wanted to reduce friction for using our tools as much as possible, which meant meeting users where they are instead of directing them to a new internal platform.

Sidekick is an AI assistant we had previously created to help engineers in navigating our internal documentation. We decided to extend its functionality using several new Crucible-based tools via @-mention on GitHub or Slack:

Of the above workflows, one which was especially challenging to get right was autonomous issue implementation. Agents would often decide they were finished despite a failing build, not communicate adequately with users, or even entirely forget that they were supposed to create a pull request!

Here’s what we learned along the way to making these agents produce reliable and consistent results:

Our Engineering team at HubSpot loves working with Crucible and Sidekick. Most HubSpot engineers use them every day to accelerate their work. We believe our success comes down to a few factors:

Future Impact

We have lots planned for the future! Crucible has also served as the foundation for an internal evaluation framework for coding agents, and we are continuing to iterate on the UX for our existing workflows. GitHub Copilot offers similar features and gets a lot right with its UX, especially with the way it is able to include small indications of progress in the timeline.

One of the things we’re most excited for though is to experiment more with other coding agents besides Claude Code. Most recently, OpenCode has been very interesting due to its robust plugin architecture, but we also want to test a new kind of coding agent built entirely in-house using our internal agent framework. We’re hoping that this will offer us easier control over complex workflows like pull request review and autonomous implementation.

We look forward to sharing more about all of these things in future posts, but that just about wraps it up for today!

Thanks for reading, and a special thank you to Ze’ev Klapow, Francesco Signoretti, Brian LaMattina, David Camprubí, Emily Adams, and everyone else at HubSpot who helped with this project.

Not using HubSpot yet?

Source: View Original