Building a testing platform with Playwright: JMO Labs architecture

In the previous post I explained how we use Playwright to automate screenshots in Ofusca. But there's one project where Playwright isn't a supporting tool, it's the whole engine. In JMO Labs, our web testing platform, Playwright launches browsers, runs 9 quality checks in phases, records video for every test, and even fixes itself with AI when a selector fails. In the same space, in environment variables in E2E scripts I covered the other side of the problem.

This article is a technical walkthrough of how we built a full testing system on top of the Playwright API, from the orchestrator architecture to the 9-strategy locator that makes E2E tests resilient to interface changes. I talk about this in more detail in designing APIs for AI agents.

What JMO Labs is

JMO Labs is a fullstack platform for web testing and quality analysis. It offers three modes:

Quick Scan runs 9 automatic checks (performance, accessibility, SEO, security, responsive, links, console errors, interactivity, and Lighthouse).
E2E Testing gives you AI-powered end-to-end tests. You write a spec in natural language and the platform generates, runs, and verifies the steps automatically.
API Testing lets you validate REST endpoints with assertions, variable extraction, and request chaining.

JMO Labs homepage showing the three available testing modes

Everything runs in a single Docker container with Chromium preinstalled, an Express backend, and a React frontend. And at the center of it all, Playwright.

The orchestrator, 9 phases so nothing breaks

The problem with running multiple checks against the same page is that some of them can interfere with others. A responsive check changes the viewport. An interactivity check clicks buttons. If everything runs in parallel, the results are unpredictable.

The fix is an orchestrator that runs checks in 9 phases ordered by how intrusive they are:

javascript

// Fases de ejecución — de solo lectura a mutación del DOM
const phases = [
  { name: "render",         parallel: false },  // Fase 1: captura inicial
  { name: "console-errors", parallel: false },  // Fase 2: re-navega para capturar consola
  { group: ["performance", "security", "seo"], parallel: true },  // Fase 3: solo lectura
  { name: "accessibility",  parallel: false },  // Fase 4: inyecta axe-core
  { name: "responsive",     parallel: false },  // Fase 5: cambia viewport
  { name: "links",          parallel: false },  // Fase 6: peticiones HTTP externas
  { name: "interactive",    parallel: false },  // Fase 7: clics, formularios
  { name: "web-vitals",     parallel: false },  // Fase 8: PerformanceObserver
  { name: "lighthouse",     parallel: false },  // Fase 9: auditoría completa (CDP)
];

The first three read-only phases (performance, security, SEO) run in parallel because they don't mutate the DOM. Starting with phase 4, each check runs on its own to avoid interference.

The orchestrator also handles concurrency between tests, a configurable maximum number of browsers running at once (3 by default) keeps the server's memory from getting hammered.

Smart cache

If the same user runs the same test with the same selected checks, the orchestrator returns cached results without opening a browser. The cache key is a hash of the URL + the selected checks, with a configurable TTL.

javascript

function cacheKey(url, checks) {
  const normalized = checks.slice().sort().join(",");
  return `${url}::${normalized}`;
}

// Si existe resultado reciente, lo retransmitimos por SSE sin ejecutar nada
const cached = getFromCache(cacheKey(url, selectedChecks));
if (cached && !options.forceRerun) {
  emitCachedResults(cached, stream);
  return;
}

E2E tests with AI, from natural language to Playwright

E2E mode is where Playwright and artificial intelligence come together. The flow has three phases:

In the planning phase, an AI model gets the natural language spec and generates a JSON step plan.
In the execution phase, each step runs against the real page with Playwright, using a 9-strategy locator.
In the verification phase, another model validates that each step produced the expected result, using screenshots as evidence.

JMO Labs E2E mode interface showing the AI model selector (Qwen 3.5 27B), a textarea to describe the flow to test in natural language, and Templates and Suggest with AI tabs

For example, a spec like “Go to login, enter the email [email protected] and password 1234, click Sign in, and verify that the dashboard appears” turns into an executable plan:

json

{
  "steps": [
    { "action": "navigate", "value": "/login" },
    { "action": "fill", "selector": "input[type=email]", "value": "[email protected]" },
    { "action": "fill", "selector": "input[type=password]", "value": "1234" },
    { "action": "click", "selector": "button:Entrar" },
    { "action": "wait", "value": 2000 },
    { "action": "screenshot_only" }
  ]
}

The 9-strategy locator

The biggest challenge in automated E2E tests is fragile selectors. A CSS class change, translated text, or renamed attribute can break the whole test. Our fix is a locator that tries 9 strategies in sequence before giving up on finding an element.

javascript

async function smartLocator(page, selector) {
  // 1. CSS directo
  let el = page.locator(selector);
  if (await el.count()) return el;

  // 2. role:name (ej: "button:Enviar")
  if (selector.includes(":")) {
    const [role, name] = selector.split(":");
    el = page.getByRole(role, { name });
    if (await el.count()) return el;
  }

  // 3. getByLabel (campos de formulario)
  el = page.getByLabel(selector);
  if (await el.count()) return el;

  // 4. getByPlaceholder (inputs)
  el = page.getByPlaceholder(selector);
  if (await el.count()) return el;

  // 5. getByRole con roles comunes
  for (const role of ["button", "link", "textbox", "heading"]) {
    el = page.getByRole(role, { name: selector });
    if (await el.count()) return el;
  }

  // 6. getByText (texto visible)
  el = page.getByText(selector);
  if (await el.count()) return el;

  // 7. aria-label parcial
  el = page.locator(`[aria-label*="${selector}"]`);
  if (await el.count()) return el;

  // 8. data-testid parcial
  el = page.locator(`[data-testid*="${selector}"]`);
  if (await el.count()) return el;

  // 9. title parcial
  el = page.locator(`[title*="${selector}"]`);
  if (await el.count()) return el;

  return null; // las 9 estrategias fallaron
}

But it doesn't stop there. If all 9 strategies fail, the system switches into self-healing mode:

It checks a selector cache that remembers which strategy worked in previous runs for the same URL.
If the cache doesn't help, it asks an AI model to analyze the current HTML of the page and suggest an alternative selector.
If the alternative selector works, it saves it in cache for future runs.

The result is tests that fix themselves when the interface changes. The success rate goes up with every run because the selector cache learns what works.

Responsive testing with real viewports, not generic ones

The responsive check doesn't use made-up sizes. The viewports are calibrated against real 2025 devices:

javascript

const defaultViewports = [
  { name: "mobile",  width: 402,  height: 874,  label: "iPhone 17" },
  { name: "tablet",  width: 820,  height: 1180, label: "iPad Air 11\" M3" },
  { name: "desktop", width: 1440, height: 932,  label: "MacBook Air 15\" M4" },
];

For each viewport, Playwright:

Resizes the window with page.setViewportSize().
Takes a screenshot.
If a previous baseline exists, compares it pixel by pixel with pixelmatch.
Generates a diff image if the change goes over the configured threshold.

Visual comparison is especially useful in continuous development, every CSS change gets validated automatically against the approved baseline.

Accessibility with axe-core injected into the page

The accessibility check takes advantage of the fact that Playwright has full control of the browser context to inject axe-core directly into the page under test:

javascript

async function runAccessibilityCheck(page) {
  // Inyecta axe-core en la página
  const axePath = require.resolve("axe-core/axe.min.js");
  const axeScript = readFileSync(axePath, "utf-8");
  await page.evaluate(axeScript);

  // Ejecuta el análisis
  const results = await page.evaluate(() => axe.run());

  // Agrupa violaciones por impacto
  const violations = results.violations.map((v) => ({
    impact: v.impact,          // critical | serious | moderate | minor
    description: v.description,
    nodes: v.nodes.length,
    help: v.helpUrl,
  }));

  return {
    status: violations.some((v) => v.impact === "critical") ? "fail" : "pass",
    violations,
    passes: results.passes.length,
  };
}

The rule is strict. Any critical violation marks the check as failed. Moderate and minor violations are reported as warnings so the team can prioritize them.

Video and real-time screenshots

Every test in JMO Labs gets recorded as video. Playwright supports native WebM recording, which we enable when creating the browser context:

javascript

const context = await browser.newContext({
  recordVideo: { dir: videoTmpDir, size: viewport },
  viewport,
});

const page = await context.newPage();

// ... ejecutar comprobaciones ...

// Finalizar grabación
await context.close();
const videoPath = await page.video().path();

On top of that, during execution we send real-time screenshots every 500 ms over Server-Sent Events (SSE). The frontend shows them as a live view of the browser running the test. It's like watching Playwright work in real time.

The full architecture

Everything fits into a flow that goes from the user's request to the PDF report:

The user sends URL + mode + options via POST /api/test.
The orchestrator checks the cache and available concurrency.
Playwright launches Chromium headless with video enabled.
The 9 phases run in order, emitting results over SSE.
Results, screenshots, and video are saved to SQLite and disk.
The user can download a PDF report generated with PDFKit.
A periodic cleanup job deletes videos (>2h), screenshots (>1h), and old tests (>30 days).

Quick Scan results run against josemanuelortega.me showing the 9 checks: rendering, console errors, performance, security, SEO, accessibility, responsive, links, and interactive elements

And the history of executed tests, with filtering by URL, mode, duration, and number of checks passed:

JMO Labs history page showing tests run against josemanuelortega.me with quick scan results and duration

What we learned building on top of Playwright

After building a full testing platform on top of Playwright, these are the most valuable lessons:

Playwright isn't just for tests. Its browser control API is powerful enough to work as the engine for any tool that needs to interact with web pages: scrapers, PDF generators, performance monitors, screenshot automation.
Phases matter. Running everything in parallel is tempting but dangerous. A phase system that respects how intrusive each operation is produces consistent, reproducible results.
Selectors break. Plan for it. A locator with multiple fallback strategies and a cache that learns is the difference between fragile tests and tests that survive refactors.
SSE beats WebSocket for one-way streaming. It's simpler, works through proxies and load balancers without extra configuration, and reconnects automatically.
A single container makes everything simpler. Packaging Chromium, backend, and frontend into one Docker image gets rid of browser and Playwright version compatibility issues.

Playwright is much more than an E2E testing framework. It's a browser automation API powerful enough to build full products on top of it. JMO Labs is proof that, with the right architecture, you can turn a headless browser into a web quality analysis platform.

If you want to try JMO Labs, it's available at e2e.josemanuelortega.dev. Run a Quick Scan against any URL and you'll see Playwright in action.

Another entry in the Playwright in depth series. You're coming from Automating 60 screenshots with Playwright and continuing with E2E tests that fix themselves.

What JMO Labs is

JMO Labs is a fullstack platform for web testing and quality analysis. It offers three modes:

Quick Scan runs 9 automatic checks (performance, accessibility, SEO, security, responsive, links, console errors, interactivity, and Lighthouse).
E2E Testing gives you AI-powered end-to-end tests. You write a spec in natural language and the platform generates, runs, and verifies the steps automatically.
API Testing lets you validate REST endpoints with assertions, variable extraction, and request chaining.

Everything runs in a single Docker container with Chromium preinstalled, an Express backend, and a React frontend. And at the center of it all, Playwright.

The orchestrator, 9 phases so nothing breaks

The fix is an orchestrator that runs checks in 9 phases ordered by how intrusive they are:

javascript

// Fases de ejecución — de solo lectura a mutación del DOM
const phases = [
  { name: "render",         parallel: false },  // Fase 1: captura inicial
  { name: "console-errors", parallel: false },  // Fase 2: re-navega para capturar consola
  { group: ["performance", "security", "seo"], parallel: true },  // Fase 3: solo lectura
  { name: "accessibility",  parallel: false },  // Fase 4: inyecta axe-core
  { name: "responsive",     parallel: false },  // Fase 5: cambia viewport
  { name: "links",          parallel: false },  // Fase 6: peticiones HTTP externas
  { name: "interactive",    parallel: false },  // Fase 7: clics, formularios
  { name: "web-vitals",     parallel: false },  // Fase 8: PerformanceObserver
  { name: "lighthouse",     parallel: false },  // Fase 9: auditoría completa (CDP)
];

The first three read-only phases (performance, security, SEO) run in parallel because they don't mutate the DOM. Starting with phase 4, each check runs on its own to avoid interference.

The orchestrator also handles concurrency between tests, a configurable maximum number of browsers running at once (3 by default) keeps the server's memory from getting hammered.

Smart cache

javascript

function cacheKey(url, checks) {
  const normalized = checks.slice().sort().join(",");
  return `${url}::${normalized}`;
}

// Si existe resultado reciente, lo retransmitimos por SSE sin ejecutar nada
const cached = getFromCache(cacheKey(url, selectedChecks));
if (cached && !options.forceRerun) {
  emitCachedResults(cached, stream);
  return;
}

E2E tests with AI, from natural language to Playwright

E2E mode is where Playwright and artificial intelligence come together. The flow has three phases:

In the planning phase, an AI model gets the natural language spec and generates a JSON step plan.
In the execution phase, each step runs against the real page with Playwright, using a 9-strategy locator.
In the verification phase, another model validates that each step produced the expected result, using screenshots as evidence.

For example, a spec like “Go to login, enter the email [email protected] and password 1234, click Sign in, and verify that the dashboard appears” turns into an executable plan:

json

{
  "steps": [
    { "action": "navigate", "value": "/login" },
    { "action": "fill", "selector": "input[type=email]", "value": "[email protected]" },
    { "action": "fill", "selector": "input[type=password]", "value": "1234" },
    { "action": "click", "selector": "button:Entrar" },
    { "action": "wait", "value": 2000 },
    { "action": "screenshot_only" }
  ]
}

The 9-strategy locator

javascript

async function smartLocator(page, selector) {
  // 1. CSS directo
  let el = page.locator(selector);
  if (await el.count()) return el;

  // 2. role:name (ej: "button:Enviar")
  if (selector.includes(":")) {
    const [role, name] = selector.split(":");
    el = page.getByRole(role, { name });
    if (await el.count()) return el;
  }

  // 3. getByLabel (campos de formulario)
  el = page.getByLabel(selector);
  if (await el.count()) return el;

  // 4. getByPlaceholder (inputs)
  el = page.getByPlaceholder(selector);
  if (await el.count()) return el;

  // 5. getByRole con roles comunes
  for (const role of ["button", "link", "textbox", "heading"]) {
    el = page.getByRole(role, { name: selector });
    if (await el.count()) return el;
  }

  // 6. getByText (texto visible)
  el = page.getByText(selector);
  if (await el.count()) return el;

  // 7. aria-label parcial
  el = page.locator(`[aria-label*="${selector}"]`);
  if (await el.count()) return el;

  // 8. data-testid parcial
  el = page.locator(`[data-testid*="${selector}"]`);
  if (await el.count()) return el;

  // 9. title parcial
  el = page.locator(`[title*="${selector}"]`);
  if (await el.count()) return el;

  return null; // las 9 estrategias fallaron
}

But it doesn't stop there. If all 9 strategies fail, the system switches into self-healing mode:

It checks a selector cache that remembers which strategy worked in previous runs for the same URL.
If the cache doesn't help, it asks an AI model to analyze the current HTML of the page and suggest an alternative selector.
If the alternative selector works, it saves it in cache for future runs.

The result is tests that fix themselves when the interface changes. The success rate goes up with every run because the selector cache learns what works.

Responsive testing with real viewports, not generic ones

The responsive check doesn't use made-up sizes. The viewports are calibrated against real 2025 devices:

javascript

const defaultViewports = [
  { name: "mobile",  width: 402,  height: 874,  label: "iPhone 17" },
  { name: "tablet",  width: 820,  height: 1180, label: "iPad Air 11\" M3" },
  { name: "desktop", width: 1440, height: 932,  label: "MacBook Air 15\" M4" },
];

For each viewport, Playwright:

Resizes the window with page.setViewportSize().
Takes a screenshot.
If a previous baseline exists, compares it pixel by pixel with pixelmatch.
Generates a diff image if the change goes over the configured threshold.

Visual comparison is especially useful in continuous development, every CSS change gets validated automatically against the approved baseline.

Accessibility with axe-core injected into the page

The accessibility check takes advantage of the fact that Playwright has full control of the browser context to inject axe-core directly into the page under test:

javascript

async function runAccessibilityCheck(page) {
  // Inyecta axe-core en la página
  const axePath = require.resolve("axe-core/axe.min.js");
  const axeScript = readFileSync(axePath, "utf-8");
  await page.evaluate(axeScript);

  // Ejecuta el análisis
  const results = await page.evaluate(() => axe.run());

  // Agrupa violaciones por impacto
  const violations = results.violations.map((v) => ({
    impact: v.impact,          // critical | serious | moderate | minor
    description: v.description,
    nodes: v.nodes.length,
    help: v.helpUrl,
  }));

  return {
    status: violations.some((v) => v.impact === "critical") ? "fail" : "pass",
    violations,
    passes: results.passes.length,
  };
}

The rule is strict. Any critical violation marks the check as failed. Moderate and minor violations are reported as warnings so the team can prioritize them.

Video and real-time screenshots

Every test in JMO Labs gets recorded as video. Playwright supports native WebM recording, which we enable when creating the browser context:

javascript

const context = await browser.newContext({
  recordVideo: { dir: videoTmpDir, size: viewport },
  viewport,
});

const page = await context.newPage();

// ... ejecutar comprobaciones ...

// Finalizar grabación
await context.close();
const videoPath = await page.video().path();

The full architecture

Everything fits into a flow that goes from the user's request to the PDF report:

The user sends URL + mode + options via POST /api/test.
The orchestrator checks the cache and available concurrency.
Playwright launches Chromium headless with video enabled.
The 9 phases run in order, emitting results over SSE.
Results, screenshots, and video are saved to SQLite and disk.
The user can download a PDF report generated with PDFKit.
A periodic cleanup job deletes videos (>2h), screenshots (>1h), and old tests (>30 days).

And the history of executed tests, with filtering by URL, mode, duration, and number of checks passed:

What we learned building on top of Playwright

After building a full testing platform on top of Playwright, these are the most valuable lessons:

Playwright isn't just for tests. Its browser control API is powerful enough to work as the engine for any tool that needs to interact with web pages: scrapers, PDF generators, performance monitors, screenshot automation.
Phases matter. Running everything in parallel is tempting but dangerous. A phase system that respects how intrusive each operation is produces consistent, reproducible results.
Selectors break. Plan for it. A locator with multiple fallback strategies and a cache that learns is the difference between fragile tests and tests that survive refactors.
SSE beats WebSocket for one-way streaming. It's simpler, works through proxies and load balancers without extra configuration, and reconnects automatically.
A single container makes everything simpler. Packaging Chromium, backend, and frontend into one Docker image gets rid of browser and Playwright version compatibility issues.

Playwright is much more than an E2E testing framework. It's a browser automation API powerful enough to build full products on top of it. JMO Labs is proof that, with the right architecture, you can turn a headless browser into a web quality analysis platform.

If you want to try JMO Labs, it's available at e2e.josemanuelortega.dev. Run a Quick Scan against any URL and you'll see Playwright in action.

Another entry in the Playwright in depth series. You're coming from Automating 60 screenshots with Playwright and continuing with E2E tests that fix themselves.

Building a testing platform with Playwright: JMO Labs architecture

What JMO Labs is

The orchestrator, 9 phases so nothing breaks

Smart cache

E2E tests with AI, from natural language to Playwright

The 9-strategy locator

Responsive testing with real viewports, not generic ones

Accessibility with axe-core injected into the page

Video and real-time screenshots

The full architecture

What we learned building on top of Playwright

Leave the first comment

La paradoja del pesticida, el quinto principio del testing que ignora tu equipo

Variables de entorno en scripts E2E: secretos seguros en JMO Labs

Tests E2E que se reparan solos: cómo construimos un pipeline de self-healing con IA

Building a testing platform with Playwright: JMO Labs architecture

What JMO Labs is

The orchestrator, 9 phases so nothing breaks

Smart cache

E2E tests with AI, from natural language to Playwright

The 9-strategy locator

Responsive testing with real viewports, not generic ones

Accessibility with axe-core injected into the page

Video and real-time screenshots

The full architecture

What we learned building on top of Playwright

Leave the first comment

La paradoja del pesticida, el quinto principio del testing que ignora tu equipo

Variables de entorno en scripts E2E: secretos seguros en JMO Labs

Tests E2E que se reparan solos: cómo construimos un pipeline de self-healing con IA