Day 6: The core engine - getting Playwright running
Day 6 of 30. Today we capture our first screenshot. Finally, actual product work.
Day 6 of 30. Today we capture our first screenshot. Finally, actual product work.
We’ve spent the last five days on setup, planning, and infrastructure. While this is absolutely important work, it might feel a bit like time we could have spent this time better building the product. But, at the same time, all of this planning doesn’t deliver our product… So, let’s get at it!
Today we are writing the code that actually does the thing we’re building: take a URL, render it in a browser and capture a screenshot!
Why Playwright over Puppeteer?
There are several screenshotting solutions. Let’s compare them:
[✓] Playwright
CHOSEN
▼
[+] Pros
- + Better auto-waiting - automatically waits for elements to be ready
- + Multi-browser support (Chromium, Firefox, WebKit) out of the box
- + Cleaner device emulation with built-in device profiles
- + Active development - Microsoft backing, evolving fast
- + Official Java/Kotlin bindings
[-] Cons
- - Newer ecosystem, fewer tutorials
- - Larger dependency size
Playwright feels like the future. Built-in device profiles with playwright.devices["iPhone 13"] is a game-changer.
[ ] Puppeteer
▼
[+] Pros
- + Mature ecosystem
- + Lots of tutorials and examples
- + Google backing
[-] Cons
- - Chrome-focused (Firefox support experimental)
- - Requires more manual wait logic
- - More configuration for device emulation
[ ] Selenium + WebDriver
▼
[+] Pros
- + Industry standard
- + Supports all browsers
- + Extensive tooling
[-] Cons
- - Heavier weight
- - Slower for our use case
- - More setup complexity
Our first screenshot
After getting everything wired up, we hit our local endpoint:
curl -X POST http://localhost:8080/api/v1/screenshots \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
Fourteen seconds later: a screenshot. It worked! Note that at this moment, there’s no authentication at all yet. We’ll get back to that.
Unfortunately, 14 seconds is a little bit too slow, so let’s dive in why this is.
What’s making it slow?
We added some timing logs:
Browser context creation: 245ms
Page creation: 12ms
Navigation: 11,847ms
Screenshot capture: 1,203ms
Context cleanup: 89ms
It’s pretty clear: navigation is the killer. WaitUntilState.NETWORKIDLE waits until there are no network requests for 500ms. On a site with analytics, chat widgets, and lazy-loaded images, that takes a very long time indeed.
Making it faster
1. Smarter wait strategies
Instead of waiting for network idle, we can wait for specific conditions:
// Wait for DOM to be ready, not all resources
page.navigate(request.url, Page.NavigateOptions()
.setTimeout(30000.0)
.setWaitUntil(WaitUntilState.DOMCONTENTLOADED)
)
// Then wait a bit for JavaScript rendering
page.waitForTimeout(1000.0)
A simple change like this brought us from 14 seconds to 3-4 seconds!
2. Block unnecessary resources
We don’t need analytics, ads, or tracking pixels for screenshots:
page.route("**/*") { route ->
val url = route.request().url()
val blockedPatterns = listOf(
"google-analytics.com",
"googletagmanager.com",
"facebook.net",
"hotjar.com"
)
if (blockedPatterns.any { url.contains(it) }) {
route.abort()
} else {
route.resume()
}
}
This shaved off another 500ms on average.
3. Browser reuse
Creating a new browser for each request is wasteful. We now reuse the browser instance and only create new contexts per request. Browser startup is ~500ms, context creation is ~50ms.
Results after optimization
Screenshot capture optimization
==============================That’s a 6x improvement! Simple sites are now under 2 seconds, complex sites are 3-5 seconds.
Edge cases we discovered
Sites that block headless browsers. Some sites detect and block automated browsers. They see no mouse movements, no scroll events, and they see headless Chrome signatures.
Infinite scroll pages. A “full page” screenshot on an infinite scroll page is undefined behavior. We’ll need to cap the maximum page height.
Cookie consent popups. A lot of sites show GDPR popups which end up in our screenshots.
Sites that require login. Our API won’t help here (yet).
Really slow sites. Some sites take 30+ seconds to load. Our timeout handles this, but we need good error messages.
Didn’t we mention that we thought making a screenshot would be easy? Seems like we have a few challenges to tackle after all.
What we accomplished today
- Got Playwright working in Kotlin/Spring Boot
- Captured our first programmatic screenshot
- Reduced capture time from 14s to ~2s
- Identified major edge cases to handle
- Learned a lot about how the web actually behaves
This is the core of our product, and it’s important that this works well. Most other things will depend on our engine, and no doubt we’ll optimize further in the future. But it’s a start!
Tomorrow: week 1 retrospective
Day 7 marks the end of week one. We’ll step back and assess: what did we get done, what didn’t work, and are we on track for a paying customer by day 15?
Book of the day
Web Scraping with Python by Ryan Mitchell
Okay, we’re not using Python, and this book is about scraping, not screenshots. But sometimes it’s good to look beyond your own stack, and the principles between the two overlap significantly.
Mitchell covers how websites work under the hood, how to handle JavaScript-rendered content, dealing with anti-bot measures, and the ethics of automated web access. Chapter 10 on JavaScript execution is particularly relevant, since it explains why “waiting for the page to load” is so complicated in the modern web.