Day 6: The core engine - getting Playwright running

Day 6 of 30. Today we capture our first screenshot. Finally, actual product work.

#screenshots #performance #kotlin

Day 6 of 30. Today we capture our first screenshot. Finally, actual product work.

We’ve spent the last five days on setup, planning, and infrastructure. While this is absolutely important work, it might feel a bit like time we could have spent this time better building the product. But, at the same time, all of this planning doesn’t deliver our product… So, let’s get at it!

Today we are writing the code that actually does the thing we’re building: take a URL, render it in a browser and capture a screenshot!

Why Playwright over Puppeteer?

There are several screenshotting solutions. Let’s compare them:

tech-compare - Browser automation comparison
[✓] Playwright CHOSEN

[+] Pros

  • + Better auto-waiting - automatically waits for elements to be ready
  • + Multi-browser support (Chromium, Firefox, WebKit) out of the box
  • + Cleaner device emulation with built-in device profiles
  • + Active development - Microsoft backing, evolving fast
  • + Official Java/Kotlin bindings

[-] Cons

  • - Newer ecosystem, fewer tutorials
  • - Larger dependency size
Verdict

Playwright feels like the future. Built-in device profiles with playwright.devices["iPhone 13"] is a game-changer.

[ ] Puppeteer

[+] Pros

  • + Mature ecosystem
  • + Lots of tutorials and examples
  • + Google backing

[-] Cons

  • - Chrome-focused (Firefox support experimental)
  • - Requires more manual wait logic
  • - More configuration for device emulation
[ ] Selenium + WebDriver

[+] Pros

  • + Industry standard
  • + Supports all browsers
  • + Extensive tooling

[-] Cons

  • - Heavier weight
  • - Slower for our use case
  • - More setup complexity

Our first screenshot

After getting everything wired up, we hit our local endpoint:

curl -X POST http://localhost:8080/api/v1/screenshots \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

Fourteen seconds later: a screenshot. It worked! Note that at this moment, there’s no authentication at all yet. We’ll get back to that.

Unfortunately, 14 seconds is a little bit too slow, so let’s dive in why this is.

What’s making it slow?

We added some timing logs:

Browser context creation: 245ms
Page creation: 12ms
Navigation: 11,847ms
Screenshot capture: 1,203ms
Context cleanup: 89ms

It’s pretty clear: navigation is the killer. WaitUntilState.NETWORKIDLE waits until there are no network requests for 500ms. On a site with analytics, chat widgets, and lazy-loaded images, that takes a very long time indeed.

Making it faster

1. Smarter wait strategies

Instead of waiting for network idle, we can wait for specific conditions:

// Wait for DOM to be ready, not all resources
page.navigate(request.url, Page.NavigateOptions()
    .setTimeout(30000.0)
    .setWaitUntil(WaitUntilState.DOMCONTENTLOADED)
)

// Then wait a bit for JavaScript rendering
page.waitForTimeout(1000.0)

A simple change like this brought us from 14 seconds to 3-4 seconds!

2. Block unnecessary resources

We don’t need analytics, ads, or tracking pixels for screenshots:

page.route("**/*") { route ->
    val url = route.request().url()
    val blockedPatterns = listOf(
        "google-analytics.com",
        "googletagmanager.com",
        "facebook.net",
        "hotjar.com"
    )

    if (blockedPatterns.any { url.contains(it) }) {
        route.abort()
    } else {
        route.resume()
    }
}

This shaved off another 500ms on average.

3. Browser reuse

Creating a new browser for each request is wasteful. We now reuse the browser instance and only create new contexts per request. Browser startup is ~500ms, context creation is ~50ms.

Results after optimization

===

Screenshot capture optimization

==============================
Before
After
Navigation 82% faster
Before:
11.8s
After:
2.1s
Screenshot capture 67% faster
Before:
1.2s
After:
400ms
Context creation 80% faster
Before:
245ms
After:
50ms
Total time 82% faster
Before:
14.0s
After:
2.5s
==================================================

That’s a 6x improvement! Simple sites are now under 2 seconds, complex sites are 3-5 seconds.

Edge cases we discovered

Sites that block headless browsers. Some sites detect and block automated browsers. They see no mouse movements, no scroll events, and they see headless Chrome signatures.

Infinite scroll pages. A “full page” screenshot on an infinite scroll page is undefined behavior. We’ll need to cap the maximum page height.

Cookie consent popups. A lot of sites show GDPR popups which end up in our screenshots.

Sites that require login. Our API won’t help here (yet).

Really slow sites. Some sites take 30+ seconds to load. Our timeout handles this, but we need good error messages.

Didn’t we mention that we thought making a screenshot would be easy? Seems like we have a few challenges to tackle after all.

What we accomplished today

  • Got Playwright working in Kotlin/Spring Boot
  • Captured our first programmatic screenshot
  • Reduced capture time from 14s to ~2s
  • Identified major edge cases to handle
  • Learned a lot about how the web actually behaves

This is the core of our product, and it’s important that this works well. Most other things will depend on our engine, and no doubt we’ll optimize further in the future. But it’s a start!

Tomorrow: week 1 retrospective

Day 7 marks the end of week one. We’ll step back and assess: what did we get done, what didn’t work, and are we on track for a paying customer by day 15?

Book of the day

Web Scraping with Python by Ryan Mitchell

Okay, we’re not using Python, and this book is about scraping, not screenshots. But sometimes it’s good to look beyond your own stack, and the principles between the two overlap significantly.

Mitchell covers how websites work under the hood, how to handle JavaScript-rendered content, dealing with anti-bot measures, and the ethics of automated web access. Chapter 10 on JavaScript execution is particularly relevant, since it explains why “waiting for the page to load” is so complicated in the modern web.


Day 6 stats

Hours
██░░░░░░░░░░░░░
14h
</> Code
█░░░░░░░░░░░░░░
450
$ Revenue
░░░░░░░░░░░░░░░
$0
Customers
░░░░░░░░░░░░░░░
0
Hosting
████░░░░░░░░░░░
$5.5/mo
Achievements:
[✓] First screenshot captured [✓] Capture time ~2-3s [✓] 5 edge cases identified
╔════════════════════════════════════════════════════════════╗
E

Erik

Building Allscreenshots. Writes code, takes screenshots, goes diving.

Try allscreenshots

Screenshot API for the modern web. Capture any URL with a simple API call.

Get started