Day 25: Performance optimization - screenshots in under 2 seconds
Day 25 of 30. Today we make things faster.
Day 25 of 30. Today we make things faster.
Our customer is happy, but they mentioned: “Some screenshots take 5+ seconds. Any way to speed that up?”
And while we mentioned not to build any new features and focus on traction, improving an existing feature is a bit in the gray zone here right? Let’s see if we can optimise things a bit.
Interactive performance benchmark
See the before and after results of our optimizations:
Performance optimization results
==============================Where does time go?
We instrumented everything:
fun captureWithTiming(request: ScreenshotRequest): TimedResult {
val timings = mutableMapOf<String, Long>()
timings["context_create"] = measureTimeMillis {
context = browser.newContext(options)
}
timings["page_create"] = measureTimeMillis {
page = context.newPage()
}
timings["navigation"] = measureTimeMillis {
page.navigate(request.url)
}
timings["wait"] = measureTimeMillis {
page.waitForLoadState(LoadState.NETWORKIDLE)
}
timings["screenshot"] = measureTimeMillis {
bytes = page.screenshot()
}
timings["upload"] = measureTimeMillis {
url = storage.upload(bytes)
}
return TimedResult(bytes, timings)
}
Results for a typical site (example.com):
| Phase | Time |
|---|---|
| Context create | 180ms |
| Page create | 45ms |
| Navigation | 1,200ms |
| Wait for network idle | 2,100ms |
| Screenshot capture | 340ms |
| Upload to R2 | 280ms |
| Total | 4,145ms |
As you can see in the table above, the wait phase is the part which slows us down the most. The network idle waits for 500ms of no network activity. On sites with analytics, chat widgets, and lazy loading, that takes a significant amount of time.
Optimization 1: Smarter wait strategies
Instead of always waiting for network idle, we will offer more wait options:
enum class WaitStrategy {
LOAD, // DOMContentLoaded event (~fast)
DOMCONTENTLOADED, // Same as above
NETWORKIDLE, // No network for 500ms (~slow)
NONE // Don't wait at all (~fastest)
}
For most sites, LOAD + small delay is enough:
when (request.waitStrategy) {
WaitStrategy.LOAD -> {
page.waitForLoadState(LoadState.LOAD)
page.waitForTimeout(500.0) // Small buffer for JS, will be configurable
}
WaitStrategy.NETWORKIDLE -> {
page.waitForLoadState(LoadState.NETWORKIDLE)
}
WaitStrategy.NONE -> {
// Just wait for navigation to complete
}
}
Same site with LOAD strategy:
| Phase | Time |
|---|---|
| Wait | 650ms (was 2,100ms) |
| Total | 2,695ms (was 4,145ms) |
35% faster by changing one parameter.
Optimization 2: Block unnecessary resources
Analytics, ads, and tracking pixels often don’t affect visual appearance, or, at least they shouldn’t. So, we’re blocking them, or at least, some of them, and will offer a way later to be more selective in the type of blocking:
val blockedDomains = listOf(
"google-analytics.com",
"googletagmanager.com",
"facebook.net",
"doubleclick.net",
"hotjar.com",
"intercom.io",
"segment.com",
"mixpanel.com",
"amplitude.com"
)
page.route("**/*") { route ->
val url = route.request().url()
if (blockedDomains.any { url.contains(it) }) {
route.abort()
} else {
route.resume()
}
}
This saves 200-500ms on sites heavy with trackers, which frankly seems to be most sites these days.
Optimization 3: Browser reuse
We were creating new browser contexts for each request. The context creation is 150-200ms, which can be improved.
So, instead of creating a new browser context, we maintain a pool of warm contexts, especially for our sync browser API:
@Service
class BrowserPool(
private val poolSize: Int = 3
) {
private val browser = Playwright.create().chromium().launch()
private val contexts = ConcurrentLinkedQueue<BrowserContext>()
init {
repeat(poolSize) {
contexts.offer(createWarmContext())
}
}
fun acquire(): BrowserContext {
return contexts.poll() ?: createWarmContext()
}
fun release(context: BrowserContext) {
// Clear cookies, storage, etc.
context.clearCookies()
contexts.offer(context)
}
private fun createWarmContext(): BrowserContext {
return browser.newContext(defaultOptions)
}
}
With this change, the context acquisition dropped from 180ms to ~5ms. Every bit helps, and percentage wise, it’s a very significant improvement.
Optimization 4: Parallel upload
We were first capturing screenshots, and then uploading them. These actions can overlap for the next request:
// Start upload in background
val uploadFuture = CompletableFuture.supplyAsync {
storage.upload(screenshotBytes)
}
// Return result (upload continues async if needed)
// Actually, we need the URL... so this doesn't help much for sync API
For the sync API, this doesn’t help. But when we build async mode, we can return the job ID before the upload completes, which gives faster response times.
Optimization 5: Image compression
We noticed that sometimes large PNGs are slower to upload, especially when making full page screenshots. To address this, we added a JPEG option with quality control:
val screenshotOptions = Page.ScreenshotOptions()
.setType(if (request.format == "jpeg") ScreenshotType.JPEG else ScreenshotType.PNG)
.setQuality(if (request.format == "jpeg") request.quality ?: 80 else null)
JPEG at 80% quality is typically 60-70% smaller than PNG while still providing good quality. When a pixel perfect quality of the screenshot isn’t the most important thing, we now offer a way to influence this, and as a result, the upload time drops proportionally.
Results
After all optimizations, these are our numbers when capturing the same site:
| Phase | Before | After |
|---|---|---|
| Context acquire | 180ms | 5ms |
| Page create | 45ms | 45ms |
| Navigation | 1,200ms | 1,100ms |
| Wait | 2,100ms | 650ms |
| Screenshot | 340ms | 340ms |
| Upload | 280ms | 120ms (JPEG) |
| Total | 4,145ms | 2,260ms |
45% faster. We’re now under 2.5 seconds on average for our list of benchmark sites!
Several, more complex, sites (SPAs, heavy JS) still take 3-4 seconds to capture, while simple sites can be under 1.5 seconds. Some of this slowdown could be related to our own network location, or it could be that the target site is just a bit less responsive. We will need to do more analysis on this in the future.
API changes
We introduced new parameters:
{
"url": "https://example.com",
"wait_strategy": "load",
"format": "jpeg",
"quality": 80,
"block_trackers": true
}
Documented with recommendations:
- Use
wait_strategy: "load"for most sites - Use
format: "jpeg"when file size matters - Use
block_trackers: truefor faster captures
Our client’s response
We pushed the update and emailed our client:
We optimized screenshot capture: it should be 40-50% faster for most sites. There are a few new parameters available: wait_strategy, format, quality, block_trackers.
Their reply:
Just tested - you’re right, noticeably faster. The JPEG option is great for our thumbnails. Thanks!
We love this type of communications, it’s great. We have a happy customer and a faster product, so everybody wins!
Tomorrow: adding more customers
Day 26. We have one customer. Let’s try to find more! Perhaps a small promotion could help?
Book of the day
High Performance Browser Networking by Ilya Grigorik
Illy Grigorik (from Google, now at Shopify) wrote the definitive guide to web performance. It covers everything from TCP/IP to browser rendering.
The chapter on “Optimizing for Mobile Networks” is particularly relevant - understanding why pages load slowly helps you optimize screenshot capture. Latency, bandwidth, connection reuse - all concepts that apply.
The book is available for free online at hpbn.co. Worth reading for anyone building web tools.