Self-healing E2E tests: how we built an AI-powered self-healing pipeline

E2E tests have a silent enemy: selector fragility. Change a CSS class, translate a button, or move a form field, and the whole suite breaks. At JMO Labs we decided tests should fix themselves. This article explains how we built an AI-powered self-healing pipeline that plans, executes, diagnoses, repairs, and verifies E2E tests on its own. I go into more detail about this in designing APIs for AI agents to validate Playwright tests autonomously.

If you don't know JMO Labs yet, in the previous post I explained its overall architecture. Today I'm focusing on the most ambitious part: the self-healing system.

The full pipeline, from natural language to a verified test

The flow has five linked phases, each with its own AI model:

In planning, a natural language spec is turned into a plan with executable steps.
In execution, each step runs against the real page with a 9-strategy locator.
In healing, if the selector fails, the AI looks at the screenshot and suggests alternatives.
In recovery, if the whole step fails, the AI diagnoses the problem and proposes corrective actions.
In verification, another model looks at the resulting screenshot and confirms whether the step succeeded.

JMO Labs E2E mode interface showing the natural language spec editor and the AI model selector

Let's look at each phase in detail.

Phase 1: from natural language to an executable plan

The user writes something like: “Go to login, enter [email protected] and password 1234, click Sign in, and verify that the dashboard appears”. The planner turns that into a JSON object with concrete steps.

But before planning, the system extracts the real context from the page: headings, forms with their fields (label, id, placeholder), and the first 50 interactive elements with their attributes (role, aria-label, data-testid, visible text). That context is sent to the model together with the spec so selectors can be as precise as possible.

javascript

// El prompt incluye el contexto real de la página
const userMessage = `URL: ${url}

Especificación funcional:
${spec}

Contexto de la página:
- Título: ${pageContext.title}
- Formularios: ${JSON.stringify(pageContext.forms)}
- Elementos interactivos: ${JSON.stringify(pageContext.elements.slice(0, 50))}

Genera un plan JSON con esta estructura:
{
  "name": "Nombre breve del test",
  "steps": [
    { "id": 1, "action": "navigate", "selector": "...", "value": "..." }
  ]
}`;

The plan is structurally validated before execution: each step must have a valid action (navigate, click, type, select, scroll, wait, screenshot_only) and the required fields for that action. If validation fails, the planner retries once.

Selector preference: resilience first

The prompt tells the model to prefer selectors in this order:

role:name, for example button:Enviar, link:Inicio (the most resilient)
aria-label, like [aria-label="Search"]
data-testid, like [data-testid="submit-btn"]
id, like #login-form
Visible text, meaning the exact text of the button or link
CSS selector, like .btn-primary (last resort, the most fragile)

This hierarchy makes generated tests naturally resistant to design changes. A CSS class change won't break a selector based on the element's role.

Phase 2: execution with 9 cascading strategies

Each step in the plan runs by looking for the target element with 9 strategies in order of specificity. If the first one doesn't find the element, it tries the next one. And if none of them work on the first attempt, it retries 3 times with increasing timeouts (3s, 6s, 9s).

javascript

async function smartLocator(page, step, attempt = 0) {
  const timeout = 3000 + (attempt * 3000); // 3s → 6s → 9s

  const strategies = [
    // 1. CSS directo
    () => page.locator(selector).first(),
    // 2. role:name (ej: "button:Enviar")
    () => page.getByRole(role, { name }),
    // 3. getByLabel (campos de formulario)
    () => page.getByLabel(selector, { exact: false }).first(),
    // 4. getByPlaceholder (inputs)
    () => page.getByPlaceholder(selector, { exact: false }).first(),
    // 5. getByRole con roles comunes
    () => page.getByRole("button", { name: selector }),
    // 6. getByText (texto visible)
    () => page.getByText(selector, { exact: false }).first(),
    // 7. aria-label parcial
    () => page.locator(`[aria-label*="${selector}" i]`).first(),
    // 8. data-testid parcial
    () => page.locator(`[data-testid*="${selector}" i]`).first(),
    // 9. title parcial
    () => page.locator(`[title*="${selector}" i]`).first(),
  ];

  for (const strategy of strategies) {
    const el = await strategy();
    if (await el.isVisible({ timeout })) return el;
  }
  return null;
}

Between retries, the system does two smart things:

It dismisses obstacles like cookie banners, newsletter modals, or popups that might be covering the element.
It scrolls, moving down 300px to reveal lazy-loaded content that might contain the target element.

The selector cache, tests that learn

Every time a selector works, the system stores it in a persistent SQLite cache tied to the URL pattern. The next time a test runs against that URL, the locator checks the cache before trying the 9 strategies.

javascript

// Antes de probar estrategias, consulta la caché
const cached = getCachedSelector(urlPattern, step.selector);
if (cached && cached.success_count > cached.fail_count) {
  const el = page.locator(cached.working_selector).first();
  if (await el.isVisible({ timeout: 3000 })) {
    incrementSelectorSuccess(urlPattern, step.selector);
    return el; // Encontrado en caché
  }
}

// Si la caché falla, prueba las 9 estrategias...
// Si tiene éxito, guarda en caché:
upsertSelector(urlPattern, original, working, description, strategy);

The cache uses a success and failure counter system. A cached selector is only used if success_count > fail_count. If a cached selector stops working, its failure count goes up and eventually it gets dropped. Selectors that haven't been used in 90 days are automatically deleted.

The result: tests get faster with every run because the cache avoids going through all 9 strategies for elements it already knows.

Phase 3: healing with computer vision

When all 9 strategies fail across the 3 attempts and the cache doesn't help, the healer kicks in. This module takes a screenshot of the current page and sends it to an AI model with vision capabilities together with the selector that failed.

javascript

const result = await provider.askWithImage(
  HEALER_SYSTEM,      // Instrucciones: "Analiza el screenshot y sugiere selectores"
  HEALER_USER(step),  // "El selector X no funciona para la acción Y"
  screenshotBase64    // Captura actual de la página
);

// El modelo responde con:
// { selector: "nuevo CSS", fallbacks: ["alt1", "alt2"], reasoning: "..." }

// Prueba el selector sugerido
const locator = page.locator(result.selector);
if (await locator.count() > 0) {
  // Funciona → guardar en caché con strategy: "ai-healed"
  upsertSelector(urlPattern, original, result.selector, description, "ai-healed");
  return locator;
}

The healer also provides fallback selectors that get tried if the main one doesn't work. If any of the suggested selectors finds the element, it's saved in the cache with the ai-healed strategy for future runs.

Phase 4: recovery when the whole step fails

Sometimes the problem isn't the selector, it's the state of the page. A modal blocking the interface, an unexpected redirect to login, a 500 error, or content that hasn't loaded yet. For those cases, the recovery system analyzes the screenshot and the error to diagnose what happened and propose a corrective action.

javascript

const recovery = await attemptRecovery(
  page, provider, failedStep, errorMessage, screenshotBase64
);

// El modelo analiza y responde con:
// {
//   diagnosis: "Un banner de cookies está bloqueando el botón",
//   recoveryAction: { action: "dismiss", selector: "#cookie-accept" },
//   shouldRetryOriginal: true,
//   shouldSkipStep: false
// }

// Ejecuta la acción de recuperación
if (recovery.recoveryAction) {
  await executeAction(page, recovery.recoveryAction);
}

// Reintenta el paso original si el modelo lo sugiere
if (recovery.shouldRetryOriginal) {
  await executeAction(page, failedStep);
}

The available recovery actions are: dismiss (close obstacle), scroll, wait, click (click another element first), and navigate. If the model decides the state can't be recovered from (error page, login wall), it recommends skipping the step instead of failing the whole test.

Phase 5: visual verification

After executing each step, with or without healing/recovery, the verifier takes a screenshot and analyzes it with vision AI to confirm that the result is what we expected.

javascript

const verification = await verifyStep(
  provider, step, screenshotBase64,
  {
    urlChanged: currentUrl !== previousUrl,
    consoleErrors: errorsThisStep,
    networkErrors: failuresThisStep,
  }
);

// Respuesta: { status: "pass" | "fail" | "warn", explanation: "..." }
// "pass"  → el resultado esperado es visible
// "fail"  → el resultado NO está presente
// "warn"  → no se puede determinar con certeza

The verifier gets extra context: whether the URL changed between steps, whether there were console errors, and whether there were network failures during execution. That lets it tell expected changes apart from unexpected ones, like an error redirect.

The obstacle dismissor

Before the planner analyzes the page, and during executor retries, a specialized module automatically dismisses cookie banners, modals, and popups. It knows 28 cookie consent patterns (OneTrust, Cookiebot, Didomi, Tarteaucitron...) and acceptance text in 7 languages.

javascript

// Ejemplo de los 28 selectores conocidos (extracto)
const COOKIE_SELECTORS = [
  "#onetrust-accept-btn-handler",
  "#CybotCookiebotDialogBodyLevelButtonLevelOptinAllowAll",
  "#didomi-notice-agree-button",
  ".cc-accept-all",
  "[data-cookie-accept]",
  // ... 23 más
];

// Si ningún selector funciona, prueba textos en 7 idiomas:
// "Aceptar", "Accept", "Accepter", "Akzeptieren", "Aceitar", "Accetta"...

If all of that fails, it presses Escape as a last resort. The system runs up to 3 dismissal rounds to handle stacked popups.

Real-time streaming: watching the AI work

The whole pipeline emits events through Server-Sent Events (SSE), which the frontend consumes in real time. The user sees each step execute, each healing action apply, and each verification resolve.

javascript

// Eventos emitidos durante la ejecución:
emit("step_start", { stepId, description, action });
emit("selector_healed", { stepId, originalSelector, usedSelector, strategy });
emit("step_recovery", { stepId, diagnosis, action, retried, skipped });
emit("step_complete", { stepId, status, explanation, screenshot });
emit("live_frame", { filename, frame, timestamp });
emit("ai_complete", { report, steps, usage });

On top of that, during the whole E2E run the system captures real-time screenshots every 500 ms and sends them as SSE frames. The frontend shows them like a live video feed from the headless browser. It's like having a window open into what Playwright is doing at that moment.

The result: tests that improve on their own

The full pipeline creates a continuous improvement loop:

The planner generates resilient selectors based on the page's real context.
The executor tries 9 strategies with retries before giving up.
The healer repairs broken selectors using computer vision.
Recovery diagnoses and corrects unexpected page states.
The verifier confirms each step with visual evidence.
The selector cache accumulates knowledge across runs.

Each run leaves the cache richer. Today's tests are faster and more resilient than yesterday's because the system remembers which selectors work on each domain.

E2E test fragility isn't a technical problem, it's a design problem. If your system assumes selectors will always work, any UI change breaks it. If it assumes they'll fail and has ways to adapt, it becomes resilient. Build tests that expect chaos.

If you want to try the self-healing system in action, run an E2E test on e2e.josemanuelortega.dev with any natural language spec. You'll see the AI plan, execute, and verify each step in real time.

Another entry in the Playwright in depth series. You're coming from Playwright as the testing engine for JMO Labs. To go back to the beginning, Automating 60 screenshots with Playwright.

If you don't know JMO Labs yet, in the previous post I explained its overall architecture. Today I'm focusing on the most ambitious part: the self-healing system.

The full pipeline, from natural language to a verified test

The flow has five linked phases, each with its own AI model:

In planning, a natural language spec is turned into a plan with executable steps.
In execution, each step runs against the real page with a 9-strategy locator.
In healing, if the selector fails, the AI looks at the screenshot and suggests alternatives.
In recovery, if the whole step fails, the AI diagnoses the problem and proposes corrective actions.
In verification, another model looks at the resulting screenshot and confirms whether the step succeeded.

Let's look at each phase in detail.

Phase 1: from natural language to an executable plan

javascript

// El prompt incluye el contexto real de la página
const userMessage = `URL: ${url}

Especificación funcional:
${spec}

Contexto de la página:
- Título: ${pageContext.title}
- Formularios: ${JSON.stringify(pageContext.forms)}
- Elementos interactivos: ${JSON.stringify(pageContext.elements.slice(0, 50))}

Genera un plan JSON con esta estructura:
{
  "name": "Nombre breve del test",
  "steps": [
    { "id": 1, "action": "navigate", "selector": "...", "value": "..." }
  ]
}`;

Selector preference: resilience first

The prompt tells the model to prefer selectors in this order:

role:name, for example button:Enviar, link:Inicio (the most resilient)
aria-label, like [aria-label="Search"]
data-testid, like [data-testid="submit-btn"]
id, like #login-form
Visible text, meaning the exact text of the button or link
CSS selector, like .btn-primary (last resort, the most fragile)

This hierarchy makes generated tests naturally resistant to design changes. A CSS class change won't break a selector based on the element's role.

Phase 2: execution with 9 cascading strategies

javascript

async function smartLocator(page, step, attempt = 0) {
  const timeout = 3000 + (attempt * 3000); // 3s → 6s → 9s

  const strategies = [
    // 1. CSS directo
    () => page.locator(selector).first(),
    // 2. role:name (ej: "button:Enviar")
    () => page.getByRole(role, { name }),
    // 3. getByLabel (campos de formulario)
    () => page.getByLabel(selector, { exact: false }).first(),
    // 4. getByPlaceholder (inputs)
    () => page.getByPlaceholder(selector, { exact: false }).first(),
    // 5. getByRole con roles comunes
    () => page.getByRole("button", { name: selector }),
    // 6. getByText (texto visible)
    () => page.getByText(selector, { exact: false }).first(),
    // 7. aria-label parcial
    () => page.locator(`[aria-label*="${selector}" i]`).first(),
    // 8. data-testid parcial
    () => page.locator(`[data-testid*="${selector}" i]`).first(),
    // 9. title parcial
    () => page.locator(`[title*="${selector}" i]`).first(),
  ];

  for (const strategy of strategies) {
    const el = await strategy();
    if (await el.isVisible({ timeout })) return el;
  }
  return null;
}

Between retries, the system does two smart things:

It dismisses obstacles like cookie banners, newsletter modals, or popups that might be covering the element.
It scrolls, moving down 300px to reveal lazy-loaded content that might contain the target element.

The selector cache, tests that learn

javascript

// Antes de probar estrategias, consulta la caché
const cached = getCachedSelector(urlPattern, step.selector);
if (cached && cached.success_count > cached.fail_count) {
  const el = page.locator(cached.working_selector).first();
  if (await el.isVisible({ timeout: 3000 })) {
    incrementSelectorSuccess(urlPattern, step.selector);
    return el; // Encontrado en caché
  }
}

// Si la caché falla, prueba las 9 estrategias...
// Si tiene éxito, guarda en caché:
upsertSelector(urlPattern, original, working, description, strategy);

The result: tests get faster with every run because the cache avoids going through all 9 strategies for elements it already knows.

Phase 3: healing with computer vision

javascript

const result = await provider.askWithImage(
  HEALER_SYSTEM,      // Instrucciones: "Analiza el screenshot y sugiere selectores"
  HEALER_USER(step),  // "El selector X no funciona para la acción Y"
  screenshotBase64    // Captura actual de la página
);

// El modelo responde con:
// { selector: "nuevo CSS", fallbacks: ["alt1", "alt2"], reasoning: "..." }

// Prueba el selector sugerido
const locator = page.locator(result.selector);
if (await locator.count() > 0) {
  // Funciona → guardar en caché con strategy: "ai-healed"
  upsertSelector(urlPattern, original, result.selector, description, "ai-healed");
  return locator;
}

Phase 4: recovery when the whole step fails

javascript

const recovery = await attemptRecovery(
  page, provider, failedStep, errorMessage, screenshotBase64
);

// El modelo analiza y responde con:
// {
//   diagnosis: "Un banner de cookies está bloqueando el botón",
//   recoveryAction: { action: "dismiss", selector: "#cookie-accept" },
//   shouldRetryOriginal: true,
//   shouldSkipStep: false
// }

// Ejecuta la acción de recuperación
if (recovery.recoveryAction) {
  await executeAction(page, recovery.recoveryAction);
}

// Reintenta el paso original si el modelo lo sugiere
if (recovery.shouldRetryOriginal) {
  await executeAction(page, failedStep);
}

Phase 5: visual verification

After executing each step, with or without healing/recovery, the verifier takes a screenshot and analyzes it with vision AI to confirm that the result is what we expected.

javascript

const verification = await verifyStep(
  provider, step, screenshotBase64,
  {
    urlChanged: currentUrl !== previousUrl,
    consoleErrors: errorsThisStep,
    networkErrors: failuresThisStep,
  }
);

// Respuesta: { status: "pass" | "fail" | "warn", explanation: "..." }
// "pass"  → el resultado esperado es visible
// "fail"  → el resultado NO está presente
// "warn"  → no se puede determinar con certeza

The obstacle dismissor

javascript

// Ejemplo de los 28 selectores conocidos (extracto)
const COOKIE_SELECTORS = [
  "#onetrust-accept-btn-handler",
  "#CybotCookiebotDialogBodyLevelButtonLevelOptinAllowAll",
  "#didomi-notice-agree-button",
  ".cc-accept-all",
  "[data-cookie-accept]",
  // ... 23 más
];

// Si ningún selector funciona, prueba textos en 7 idiomas:
// "Aceptar", "Accept", "Accepter", "Akzeptieren", "Aceitar", "Accetta"...

If all of that fails, it presses Escape as a last resort. The system runs up to 3 dismissal rounds to handle stacked popups.

Real-time streaming: watching the AI work

javascript

// Eventos emitidos durante la ejecución:
emit("step_start", { stepId, description, action });
emit("selector_healed", { stepId, originalSelector, usedSelector, strategy });
emit("step_recovery", { stepId, diagnosis, action, retried, skipped });
emit("step_complete", { stepId, status, explanation, screenshot });
emit("live_frame", { filename, frame, timestamp });
emit("ai_complete", { report, steps, usage });

The result: tests that improve on their own

The full pipeline creates a continuous improvement loop:

The planner generates resilient selectors based on the page's real context.
The executor tries 9 strategies with retries before giving up.
The healer repairs broken selectors using computer vision.
Recovery diagnoses and corrects unexpected page states.
The verifier confirms each step with visual evidence.
The selector cache accumulates knowledge across runs.

Each run leaves the cache richer. Today's tests are faster and more resilient than yesterday's because the system remembers which selectors work on each domain.

E2E test fragility isn't a technical problem, it's a design problem. If your system assumes selectors will always work, any UI change breaks it. If it assumes they'll fail and has ways to adapt, it becomes resilient. Build tests that expect chaos.

If you want to try the self-healing system in action, run an E2E test on e2e.josemanuelortega.dev with any natural language spec. You'll see the AI plan, execute, and verify each step in real time.

Another entry in the Playwright in depth series. You're coming from Playwright as the testing engine for JMO Labs. To go back to the beginning, Automating 60 screenshots with Playwright.

Self-healing E2E tests: how we built an AI-powered self-healing pipeline

The full pipeline, from natural language to a verified test

Phase 1: from natural language to an executable plan

Selector preference: resilience first

Phase 2: execution with 9 cascading strategies

The selector cache, tests that learn

Phase 3: healing with computer vision

Phase 4: recovery when the whole step fails

Phase 5: visual verification

The obstacle dismissor

Real-time streaming: watching the AI work

The result: tests that improve on their own

Leave the first comment

La paradoja del pesticida, el quinto principio del testing que ignora tu equipo

Variables de entorno en scripts E2E: secretos seguros en JMO Labs

Construir una plataforma de testing con Playwright: arquitectura de JMO Labs

Self-healing E2E tests: how we built an AI-powered self-healing pipeline

The full pipeline, from natural language to a verified test

Phase 1: from natural language to an executable plan

Selector preference: resilience first

Phase 2: execution with 9 cascading strategies

The selector cache, tests that learn

Phase 3: healing with computer vision

Phase 4: recovery when the whole step fails

Phase 5: visual verification

The obstacle dismissor

Real-time streaming: watching the AI work

The result: tests that improve on their own

Leave the first comment

La paradoja del pesticida, el quinto principio del testing que ignora tu equipo

Variables de entorno en scripts E2E: secretos seguros en JMO Labs

Construir una plataforma de testing con Playwright: arquitectura de JMO Labs