
Case Study - Uncommon Goods
How Uncommon Goods stopped spending 50% of their QA time on Selenium

COMPANY
UncommonGoods is an online retailer known for thoughtfully curated, design-forward products, connecting customers with unique goods from independent makers around the world
INDUSTRY
E-Commerce ($222 ARR)
COMPANY SIZE
51–200
FOUNDED
1999
50%

Reduction in release time,
after adopting Spur.
90%+

Test accuracy
achieved in weeks vs. months with Selenium.
$300K

Saved in QA costs
while optimizing the process.
The Problem
Half of every working day, gone on maintaining the infrastructure that was supposed to test the product.
UncommonGoods has been selling unique, thoughtfully curated goods from independent makers since 1999. Their e-commerce site is the product, and keeping it working reliably across checkout, browsing, and discovery flows is what QA exists to do.
But before Spur, QA at UncommonGoods wasn't really doing that. They had 150 tests built on Selenium, a DevOps dependency to keep the infrastructure running, and an offshore support arrangement just to maintain what they had. Around 50% of the QA team's time was going to maintenance, keeping the test suite alive, not running it productively.
"You're not spending 50% of your time doing maintenance anymore… you're spending maybe 1%, and boom, you run it."
The tests that did exist were brittle. Any UI change could break them. Complex releases required weeks of QA preparation and coordination just to reach a starting point. And despite all of that effort, site reliability was landing at 89–92%, below industry standards for a retailer that depends entirely on its website. Bugs were still reaching production. QA was a bottleneck without being a safety net.

The Solution
Maintenance became 1% of the job and coverage actually improved.
The shift from Selenium to Spur wasn't just a tool swap, it was a rethink of the whole testing approach. UncommonGoods consolidated 150 redundant, overlapping, brittle Selenium tests into around 30 dynamic, adaptive Spur tests. Fewer tests, covering more ground, with virtually no maintenance overhead.
What made that possible is the difference in how Spur works. Spur's agents navigate like real users, they adapt to UI changes automatically rather than breaking when a selector shifts. Writing a test is describing what you want to verify in plain language, not maintaining a fragile script. The infrastructure dependency disappeared entirely.

Within weeks, UncommonGoods reached 90%+ test accuracy, a benchmark that took months to achieve with Selenium. Regression moved from a once-per-release event to something the team could run multiple times per week with minimal overhead.
"The more you use Spur, the smarter it gets. The smarter it gets, the faster you can write tests and find bugs."
Crucial Moment
A complex release that would have taken weeks of QA was automated in 1-2 days.
This is the number Solomon comes back to most, a specific release that previously required weeks of QA preparation was handled by Spur in one to two days. That's roughly 10 business days saved on a single release. For a retailer where time to deploy directly affects revenue, that's not an operational improvement, it's a strategic one.
"Time is money, and that's the strength of Spur."
Spur also started surfacing clusters of bugs in checkout, the highest-stakes flow on any e-commerce site, that were previously reaching production. Site reliability climbed from 89–92% to 95–98%.
"That's above industry standards… a pretty good indicator of how good Spur is."

The Shift
With maintenance gone, QA became what it was always supposed to be.
The 50% of time that used to go to Selenium maintenance didn't disappear, it got redirected. With regression running reliably and automatically, Solomon's team shifted to the work that actually requires human judgment:
- Edge case and exploratory testing, the scenarios no automated suite will think to try
- Expanding automation coverage into new areas of the product
- Evaluating internal tools for further automation opportunities
- Working toward a further 25–40% reduction in manual QA
"It's allowed employees to focus on what they're really good at instead of just busy work."
The longer-term goal is catching blockers earlier, in development, not at release. That's the shift from QA as a release gate to QA as a development accelerator.
"If we can catch blockers early… that's the whole ball game."

50%
Release time reduction

90%+
Test accuracy achieved in weeks

$300K
Saved in QA costs



Key Insights
UncommonGoods didn't just replace a tool. They replaced a way of working, one where half the job was keeping the test infrastructure alive, with one where tests run themselves and the team focuses on what actually takes judgment. 150 tests became 30. Maintenance became 1%. Site reliability crossed industry benchmarks. That's what happens when QA stops being a burden and starts being a system.




















