Day 18: Error handling & edge cases

Day 18 of 30. Today we make the product robust.

#edge-cases #reliability

Day 18 of 30. Today we make the product robust.

The happy path now works. But the web is a messy and challenging place, with traps all over the place. Some sites block bots, some URLs are malformed, occasionally servers time out or are offline, JavaScript fails silently, or send us in an endless loop. A production-ready API should handle all of this gracefully.

The edge cases we’ve encountered

From testing and based on the feedback of early users, we’ve hit:

  1. Sites that block headless browsers - Cloudflare challenges, bot detection
  2. Invalid URLs - Missing protocols, typos, or non-existent domains
  3. Slow sites - Pages that take 30+ seconds to load, sometimes longer
  4. Infinite scroll - Full-page captures that never end
  5. JavaScript errors - SPAs that fail silently
  6. Login walls - Sites requiring authentication
  7. Broken SSL - Invalid certificates
  8. Redirects - Multiple hops, redirect loops
  9. Pop-ups - Cookie banners, modals, overlays
  10. Empty pages - Sites that render blank

Let’s handle each systematically.

Interactive error categorizer

See how we categorize different errors:

errors - Error handling flow
Simulate an error:
|
v
[*] Error Categorizer
categorizeError(exception) -> ScreenshotError
|
v
API Response:
{
  "error": {
    "code": "...",
    "message": "...",
    "suggestion": "..."
  }
}
Errors handled: 12 types
Edge cases covered: 10+
User-friendly: [OK] Always

Sites that block headless browsers

This is the hardest problem. Some sites actively detect and block automation.

Detection methods:

  • User agent checking
  • WebDriver detection (navigator.webdriver)
  • Missing browser features
  • Behavior analysis (no mouse movement)

Our mitigations:

fun createStealthContext(): BrowserContext {
    return browser.newContext(
        Browser.NewContextOptions()
            .setUserAgent(realUserAgent)  // Not the default Headless Chrome UA
            .setExtraHTTPHeaders(mapOf(
                "Accept-Language" to "en-US,en;q=0.9"
            ))
    ).apply {
        // Override webdriver detection
        addInitScript("""
            Object.defineProperty(navigator, 'webdriver', {
                get: () => undefined
            });
        """)
    }
}

This works for some sites. For aggressive bot detection (Cloudflare Under Attack mode), we fail fast with a clear error:

{
  "error": {
    "code": "bot_blocked",
    "message": "This site has bot protection that prevents automated screenshots",
    "details": {
      "suggestion": "Try a different URL or contact the site owner"
    }
  }
}

We detect this by checking for common Cloudflare/bot challenge signatures in the page content. We will work on improving our handling of bot detection in the near future.

Invalid URLs

Validation happens before queuing:

fun validateUrl(url: String): ValidationResult {
    // Check basic format
    val parsed = try {
        URI(url)
    } catch (e: URISyntaxException) {
        return ValidationResult.Invalid("Malformed URL")
    }

    // Must be http or https
    if (parsed.scheme !in listOf("http", "https")) {
        return ValidationResult.Invalid("URL must use http or https")
    }

    // Must have a host
    if (parsed.host.isNullOrBlank()) {
        return ValidationResult.Invalid("URL must have a host")
    }

    // Block private IPs (security)
    if (isPrivateIp(parsed.host)) {
        return ValidationResult.Invalid("Cannot screenshot private IP addresses")
    }

    // Block localhost
    if (parsed.host in listOf("localhost", "127.0.0.1", "0.0.0.0")) {
        return ValidationResult.Invalid("Cannot screenshot localhost")
    }

    return ValidationResult.Valid
}

The above list of validation is not a complete set, but we hope it gives a bit of an idea what kind of errors we potentially deal with. We also block private IP ranges to prevent SSRF attacks.

Slow sites

Our current default timeout is 30 seconds. If a site doesn’t load by then, we fail:

try {
    page.navigate(url, Page.NavigateOptions()
        .setTimeout(30000.0)
        .setWaitUntil(WaitUntilState.NETWORKIDLE)
    )
} catch (e: TimeoutError) {
    throw ScreenshotException(
        code = "timeout",
        message = "Page did not load within 30 seconds"
    )
}

We expose this error in the API:

{
  "error": {
    "code": "timeout",
    "message": "Page did not load within 30 seconds",
    "details": {
      "suggestion": "Try wait_for: 'domcontentloaded' for faster results"
    }
  }
}

Depending on the plan, we will allow a finer control over this in the future. Why is this depending on the plan? Well, good question. The challenge is that if a user specifies 5 minutes instead of 30 seconds, it will keep the browser busy for the next 5 minutes, taking up resources which cannot be used for other sessions. That’s fine, but if there’s a higher load on the server in which we have to handle many requests at the same time, then this will become more challenging in terms of resources, which is why we are thinking of giving our higher plans more flexibility in this area.

Infinite scroll / huge pages

Full-page screenshots need limits:

val MAX_PAGE_HEIGHT = 15000  // pixels

if (request.fullPage) {
    val pageHeight = page.evaluate("document.body.scrollHeight") as Int

    if (pageHeight > MAX_PAGE_HEIGHT) {
        // Capture up to limit, flag as truncated
        page.setViewportSize(
            page.viewportSize().width,
            MAX_PAGE_HEIGHT
        )
        job.metadata["truncated"] = true
        job.metadata["original_height"] = pageHeight
    }
}

The response indicates truncation:

{
  "status": "completed",
  "image_url": "...",
  "metadata": {
    "truncated": true,
    "original_height": 45000,
    "captured_height": 15000
  }
}

JavaScript errors

SPAs sometimes fail silently, leaving a blank or error page. We check for this:

// After page load, check for content
val hasContent = page.evaluate("""
    () => {
        const body = document.body;
        if (!body) return false;
        if (body.innerText.trim().length < 10) return false;
        return true;
    }
""") as Boolean

if (!hasContent) {
    // Check if it's an error page
    val isErrorPage = page.evaluate("""
        () => document.title.toLowerCase().includes('error') ||
              document.body.innerText.toLowerCase().includes('something went wrong')
    """) as Boolean

    if (isErrorPage) {
        throw ScreenshotException(
            code = "page_error",
            message = "The page displayed an error"
        )
    }
}

Login walls

We can’t bypass login walls at this moment. We detect them:

val hasLoginForm = page.evaluate("""
    () => {
        const forms = document.querySelectorAll('form');
        return Array.from(forms).some(form => {
            const inputs = form.querySelectorAll('input[type="password"]');
            return inputs.length > 0;
        });
    }
""") as Boolean

if (hasLoginForm && isLikelyLoginPage(page)) {
    // Don't fail, but warn
    job.metadata["warning"] = "Page appears to require login"
}

In the near future, we will allow users to provide credentials in a secure way, but right now, we focus on public websites.

Broken SSL

Some sites have invalid certificates. We allow this but flag it:

val context = browser.newContext(
    Browser.NewContextOptions()
        .setIgnoreHTTPSErrors(true)  // Don't fail on bad certs
)

// But track it
if (response.securityDetails()?.issuer() == null) {
    job.metadata["ssl_warning"] = "Site has SSL issues"
}

Redirect handling

We follow redirects (Playwright does this automatically) but track them:

var finalUrl = request.url
page.onResponse { response ->
    if (response.status() in 300..399) {
        job.metadata["redirected"] = true
    }
    finalUrl = response.url()
}

// After navigation
if (finalUrl != request.url) {
    job.metadata["final_url"] = finalUrl
}

Response includes redirect info:

{
  "status": "completed",
  "url": "http://example.com",
  "metadata": {
    "redirected": true,
    "final_url": "https://www.example.com/"
  }
}

The bane of screenshots. We’re adding optional auto-dismiss:

if (request.dismissCookieBanners) {
    // Common cookie consent selectors
    val consentSelectors = listOf(
        "[class*='cookie'] button[class*='accept']",
        "[class*='consent'] button[class*='accept']",
        "#onetrust-accept-btn-handler",
        ".js-cookie-consent-agree",
        "[data-testid='cookie-policy-banner-accept']"
    )

    for (selector in consentSelectors) {
        try {
            page.click(selector, Page.ClickOptions().setTimeout(1000.0))
            page.waitForTimeout(500.0)
            break
        } catch (e: Exception) {
            // Selector not found, try next
        }
    }
}

This is a best-effort. Cookie banners are incredibly inconsistent, but we will work on much better detection.

Error categorization

All errors now go through a categorizer:

fun categorizeError(e: Exception, page: Page?): ScreenshotError {
    return when {
        e is TimeoutError -> ScreenshotError("timeout", "Page load timed out")
        e.message?.contains("net::ERR_NAME_NOT_RESOLVED") == true ->
            ScreenshotError("dns_error", "Domain not found")
        e.message?.contains("net::ERR_CONNECTION_REFUSED") == true ->
            ScreenshotError("connection_refused", "Could not connect to server")
        e.message?.contains("net::ERR_SSL") == true ->
            ScreenshotError("ssl_error", "SSL certificate error")
        isCloudflareChallenge(page) ->
            ScreenshotError("bot_blocked", "Site has bot protection")
        else ->
            ScreenshotError("internal_error", "Screenshot failed: ${e.message}")
    }
}

What we built today

  • URL validation with security checks
  • Timeout handling with suggestions
  • Page height limits for full-page captures
  • JavaScript error detection
  • Login wall detection
  • SSL error handling
  • Redirect tracking
  • Cookie banner auto-dismiss (experimental)
  • Comprehensive error categorization

The product now handles most failures gracefully. Our users get very clear errors, which hopefully give enough insight in possible causes of failure.

Tomorrow: soft launch

On way 19 we share more widely. We will post more in communities, reach out to other developers and actually try to get some users. We’ve been building for 18 days now, time to see who is really interested in this.

Book of the day

Release It! by Michael Nygard

This book is about building software that survives production. Timeouts, circuit breakers, bulkheads - patterns for resilience.

Today’s work was directly inspired by it. The chapter on stability patterns covers exactly what we implemented: failing fast, providing clear errors, not letting one bad request take down the system.

The book uses real-world war stories from production outages. It’s both educational and entertaining. Essential reading for anyone running services.


Day 18 stats

Hours
████████░░░░░░░
53h
</> Code
███████████░░░░
3,800
$ Revenue
░░░░░░░░░░░░░░░
$0
Customers
░░░░░░░░░░░░░░░
0
Hosting
████░░░░░░░░░░░
$5.5/mo
Achievements:
[✓] Error categorization complete [✓] 10+ edge cases handled [✓] Cookie banner detection
╔════════════════════════════════════════════════════════════╗
E

Erik

Building Allscreenshots. Writes code, takes screenshots, goes diving.

Try allscreenshots

Screenshot API for the modern web. Capture any URL with a simple API call.

Get started