Skip to main content

Command Palette

Search for a command to run...

Testing my EPUB reader by letting AI handle the browser

Published
8 min read
Testing my EPUB reader by letting AI handle the browser
M

Backend-focused full-stack developer with AWS cloud knowledge. AWS Community Builder (Serverless category). Passionate about knowledge sharing.

As a backend-focused developer whose preference is not to fiddle with UI elements, manually writing tests for UI quirks is a chore I’d rather avoid. An AI-powered testing framework built on Playwright was for that reason something that I definitely wanted to try - specifically, verifying that my React app’s EPUB reader works as expected without me having to hunt down elusive DOM elements. When testing an EPUB reader, you’re dealing with a lot of dynamic iframe content and standard Playwright selectors can be a headache.

How does Passmark help?

Passmark is an open-source Playwright library for browser regression testing. It utilises intelligent caching, auto-healing, and multi-model assertion verification.

The concept behind Passmark is simple - you set up a Passmark project with an AI API key/keys (for example OpenRouter to handle both AI models that are needed in the implementation of the tests), define your test scenarios with natural language assertions (or let your AI IDE do the heavy lifting even with the assertions) and Passmark will use the AI models to find and click the correct UI elements to test your assertions. The basic flow is explained in the diagram below:

In scenarios where Passmark is able to successfully run the tests the flow is straightforward. However, this obviously is highly dependent on the clarity and precision of your assertions - as usual, the quality of the prompt will determine the outcome. Outside of that, there are also scenarios where Passmark is due to its limitations not able to run the tests - and in these situations the fallback would be to use raw Playwright. I'm describing below some of the scenarios I came across.

Testing my React app with an EPUB reader

I created a Passmark project and pointed it to my React app that I was running locally. The React app has a simple landing page, a blog and an e-book that has been created with ReactReader. I started by creating some baseline tests that would verify things like whether buttons are present, lists are loading and if clicking a post opens a full article. From there I moved on to tests especially for the EPUB reader around chaotic user flows, user distraction and changing network conditions. I also wanted to test some content and local storage edge cases as well as browser zooming.

The 25 tests I created for this application are summarised below:

Baseline tests

The baseline tests were all straightforward and worked with Passmark as long as my test scenario was clearly explained, apart from one exception. This is an example of one of the baseline test assertions:

test("Home - external English book link leaves the site", async ({ page }) => {
  test.setTimeout(120_000);
  await runSteps({
    page,
    userFlow: "Navigate to original English book",
    steps: [
      { description: `Navigate to ${BASE_URL}` },
      { description: "Click the link to the original English book website" },
    ],
    assertions: [
      { assertion: "The browser has navigated to an external website that is not http://localhost:5000" },
    ],
    test,
    expect,
  });
});

It is worth noting that the added benefit here is that the website you are testing doesn't even have to be in English as an LLM would naturally know how to handle the translations. So in this scenario if you ask it to 'click the link to the original English book website', it would be able to find the correct button even if the buttons are labelled in another language.

And then the exception. It was a seemingly simple scenario of verifying that a PDF download link is clickable. The limitation of AI was, however, that when a file downloads, nothing is visually changing on the page. So the AI is not able to see evidence of a download from the screenshot-based assertions. The error message was very clearly explaining the issue:

Error: The accessibility snapshot shows that the button 'Download pdf' [ref=e38] is currently 'active', which indicates it has been clicked or focused, but there is no evidence in the DOM or the screenshot that a download has initiated or that a PDF viewer has opened. The page content remains the landing page.

The way around this was writing a plain Playwright download event listener for this one and leaving Passmark to handle only the navigation.

Chaotic User Flows

The most interesting experiment was testing the EPUB book that was integrated as part of the web app. Most of the scenarios such as rapid page flipping or next/previous clicking were handled without issues.

What caused problems were tests where several repetitive actions were needed, such as this:

{ assertion: "The epub reader is showing the last page of the book without crashing or displaying a blank broken page" }

The issue here is that you are asking AI to do a lot of repetitions, essentially clicking through a book until it reaches the end. Each page click means another API call which ended up with me hitting the rate limit. I didn't manage to get this to work even after giving clearer instructions of starting from a certain page near the end of the book. It was a lot easier to handle this with plain Playwright as it is just a loop.

Another flow I tested was a user who is resizing the browser in the middle of reading a book. Here I faced a fundamental issue of Passmark not being able to handle browser-level commands as it is not a simple UI interaction. Again plain Playwright was needed to make this test work.

I ended up needing plain Playwright also in the scenario where I wanted to test whether the user navigating somewhere else after reading the book and later navigating back to the book would land them on the same saved position. As AI is only able to see a single screenshot at assertion time, we had to capture the page position and pass it as context:

test("Chaos - both entry points to the reader land on the same saved position", async ({ page }) => {
  test.setTimeout(180_000);

  // Step 1: advance via the home page button and capture the saved localStorage position
  await runSteps({
    page,
    userFlow: "Advance via home page button",
    steps: [
      { description: `Navigate to ${BASE_URL}` },
      { description: "Click the read the book in browser button on the home page" },
      { description: "Click the next page arrow 4 times", waitUntil: "New page content is visible" },
    ],
    assertions: [],
    test,
    expect,
  });

  // Capture the epub location saved in localStorage
  const savedPosition = await page.evaluate(() => localStorage.getItem('epub-location'));

  // Step 2: reopen via the navigation menu and assert the position matches
  await page.goto(BASE_URL);
  await runSteps({
    page,
    userFlow: "Reopen via navigation menu",
    steps: [
      { description: "Click the e-book link in the main navigation menu" },
    ],
    assertions: [
      { assertion: `The epub reader has opened and the reading position in localStorage matches the previously saved position: ${savedPosition}` },
    ],
    test,
    expect,
  });
});

Key Takeaways

Passmark definitely took care of the heavy lifting of the browser testing. Most of the test scenarios were surprisingly straightforward, as long as you really paid attention to your test assertion wording and made sure to give the AI all the context it needed. The really helpful thing was also the error explanation that you got if a test failed. As these came from the LLM in natural language, they were pretty much explaining the issue directly without any need to view the logs or trying to figure out what went wrong.

Of course I also ended up facing some 'general AI errors' , such as OpenRouter rate limiting errors when too many parallel API calls hitting the provider simultaneously. I also faced the Google AI Studio "Corrupted thought signature" 400 error a few times, where the internal reasoning chain gets corrupted during long multi-turn conversations (many tool calls back and forth). So these types of errors cause your tests to fail sometimes and you would need a way to manage that.

It is also important to understand the boundary that Passmark handles well - finding and clicking UI elements described in plain English. As soon as you need to verify side effect such as file downloads etc you would need to use raw Playwright. At the moment that is, however, I assume it is just a matter of time until even the rest of the test scenarios can be handled with AI.

--

The full test-suite is available on this GitHub repository.

This article was written as part of the Hashnode Breaking Apps Hackathon, sponsored by Bug0.
#BreakingAppsHackathon