Day 11: Storage strategy - where do screenshots live?
Day 11 of 30. Today we figure out where to put all these images.
Day 11 of 30. Today we figure out where to put all these images.
Because we’re generating screenshots, they need to be stored somewhere. There are several constraints: costs, latency, security, and cleanup. Let’s work through our options.
The requirements
- Affordable. We’re bootstrapping and storage costs should be minimal until we have revenue.
- Fast delivery. When a user fetches their screenshot, it should load quickly.
- S3-compatible. Standard tooling without too much vendor lock-in.
- Secure. Screenshots might contain sensitive data. They can’t be publicly accessible forever.
- Automatic cleanup. We need a retention policy.
Evaluating options
AWS S3
For most people, this is probably the default choice. It’s reliable, well-documented, and almost every language supports it.
Costs:
- Storage: $0.023/GB/month
- GET requests: $0.0004 per 1000
- Egress: $0.09/GB (watch out here!)
A 1MB screenshot served 100 times costs $0.009 in egress. Multiply by thousands of screenshots and it adds up fast.
Backblaze B2
Much cheaper storage, and free egress through Cloudflare.
Costs:
- Storage: $0.006/GB/month
- Egress: Free with Cloudflare (and others, like Fastly, bunny.net, CacheFly, CoreWeave, Vultr, etc). See the Backblaze FAQ for more info.
The catch: you need to route through Cloudflare for free egress. That’s fine for us since we’re using Cloudflare anyway.
Cloudflare R2
S3-compatible, zero egress fees, tight Cloudflare integration.
Costs:
- Storage: $0.015/GB/month
- GET requests: $0.36 per million
- Egress: $0 (always free)
Free tier: 10GB storage, 10 million reads/month.
Our choice: Cloudflare R2
The deciding factor was experience. We’ve not used Backblaze in production before. While R2 costs ~2.5x more than Backblaze, this only adds up with millions of screenshots.
Screenshot APIs serve lots of images - egress costs on S3 would eat our margins, and AWS often has unexpected pricing, so we’re staying away from that.
Besides, the Cloudflare R2 free tier is generous enough that we won’t pay anything for months. And when we do pay, it’s a predictable cost.
Setting up R2
In the Cloudflare dashboard:
- Create an R2 bucket:
screenshot-api-prod - Generate API credentials (Access Key ID + Secret)
- Note the endpoint:
https://<account-id>.r2.cloudflarestorage.com
In our config:
STORAGE CONFIG:
endpoint = R2_ENDPOINT from environment
access_key = R2_ACCESS_KEY from environment
secret_key = R2_SECRET_KEY from environment
bucket = "screenshot-api-prod"
public_url = "https://screenshots.ourapi.com"
The storage service
We’re using the AWS S3 SDK, which works with any S3-compatible service:
SERVICE StorageService:
INITIALIZE with endpoint, access_key, secret_key, bucket:
CREATE s3_client with:
- endpoint override set to our R2 endpoint
- credentials from access_key and secret_key
- region set to US_EAST_1 (required but ignored by R2)
FUNCTION upload(key, data, content_type) -> signed_url:
CALL s3_client.put_object with:
- bucket name
- key (filename)
- content type
- the actual bytes
RETURN generate_signed_url(key)
FUNCTION generate_signed_url(key, expires_in = 24 hours) -> url:
CREATE presigner with same credentials as s3_client
BUILD presign request with:
- signature duration = expires_in
- bucket and key
RETURN presigner.presign_get_object(request).url
The storage flow
Watch how a screenshot moves through our system:
https://storage.allscreenshots.com/scr_abc123.png?X-Amz-Signature=...&X-Amz-Expires=86400 Signed URLs
We don’t make screenshots publicly accessible. Instead, we generate signed URLs that expire after 24 hours.
Benefits:
- Security. Only people with the URL can access the image
- Control. We can revoke access by deleting the object
- Analytics. We can track access if needed
A signed URL looks like:
https://bucket.r2.cloudflarestorage.com/scr_abc123.png
?X-Amz-Signature=...
&X-Amz-Expires=86400
In our API response:
{
"id": "scr_abc123",
"status": "completed",
"image_url": "https://...(signed URL)...",
"expires_at": "2024-01-16T10:30:00Z"
}
We explicitly tell users when the URL expires. If they need the image later, they can re-fetch it from our API.
Retention and cleanup
Screenshots aren’t permanent. We keep them for 30 days, then delete.
Why 30 days?
- Long enough for users to retrieve results
- Short enough to keep storage costs manageable
- Matches our billing cycle
Implementation: a scheduled job that deletes old objects:
SCHEDULED JOB cleanup_old_screenshots (runs daily at 3 AM):
cutoff = NOW minus 30 days
old_jobs = QUERY jobs WHERE completed_before(cutoff)
FOR EACH job IN old_jobs:
TRY:
CALL s3_client.delete_object(bucket, job.id + "." + job.format)
DELETE job FROM database
LOG "Cleaned up job {job.id}"
CATCH error:
LOG ERROR "Failed to cleanup {job.id}"
LOG "Cleanup complete. Deleted {count} screenshots."
File organization
We’re keeping it simple: a flat structure with the job ID as filename. This design will most likely change in the near future, for example, we organise the screenshots by customer id, but for now this will work.
screenshot-api-prod/
├── scr_abc123.png
├── scr_def456.png
├── scr_ghi789.jpeg
└── ...
So, currently we have no folders and no date hierarchies. With signed URLs and automatic cleanup, for now we don’t need complex organization.
Monitoring storage usage
R2 provides metrics in the Cloudflare dashboard. We’re also tracking in our database:
AFTER upload:
job.file_size = size of uploaded data
SAVE job to database
FUNCTION get_storage_stats() -> StorageStats:
RETURN:
total_screenshots = COUNT jobs WHERE status = "completed"
total_bytes = SUM of all file_size values
average_size = AVERAGE of file_size values
Cost projection
| Screenshots/month | Storage (30-day) | Requests | Total R2 Cost |
|---|---|---|---|
| 1,000 | ~1 GB | ~2,000 | $0 (free) |
| 10,000 | ~10 GB | ~20,000 | $0 (free) |
| 100,000 | ~100 GB | ~200,000 | ~$1.50 |
| 1,000,000 | ~1 TB | ~2M | ~$15 |
At a million screenshots per month, storage costs us $15. That’s sustainable.
What we built today
- Integrated Cloudflare R2 as storage backend
- Implemented upload with signed URLs
- Set up 24-hour URL expiration
- Created cleanup job for 30-day retention
- Added storage metrics tracking
Screenshots now persist properly and the core infrastructure is complete.
Tomorrow: the landing page
On day 12 we build a public face. Time to explain what our service is and let people sign up.
Book of the day
Cloud Native Patterns by Cornelia Davis
This book covers patterns for building applications that run well in cloud environments. The chapter on external configuration and the chapter on service discovery directly apply to what we built today.
Davis emphasizes treating external services (like S3/R2) as attached resources that can be swapped without code changes. Our storage service does exactly this - switch endpoints, switch providers.
The book is practical and pattern-focused. Each chapter presents a problem, explains why it matters in cloud environments, and provides concrete solutions. Worth reading if you’re building anything that runs in containers.