Can we extend the brand-kit-extractor we shipped last night into a general-purpose screenshot service? The infrastructure already exists. The question is what we save and what it costs to build.
Backend audit found 21 urlbox callsites all funneling through one wrapper
(UrlboxGenerateSignedUrlAction.php) — migration is one file swap plus a DB backfill. The bulk of the 140k/mo
volume is 3× shots per page from GeneratePageScreenshotsAction (desktop+tablet+mobile),
not the highlight-overlay path. CSS injection IS used for the red-dashed-border highlight overlay
(confirmed at 3 callsites). No stealth, no JS injection, no webhooks. The S3 layer is already R2/Spaces-compatible.
Cloudflare Browser Rendering at our volume: $43–$130/mo all-in, plus $0–$285/mo
Webshare proxy if we need stealth fallback (we don't pay for it today). Total: $80–$415/mo vs.
urlbox Ultra's $924/mo at 140k (math: $99 + 125k × $6.60/1k overage).
urlbox bills a flat per-render fee that bundles infrastructure, proxies, and managed ops. Self-hosting unbundles them — we pay CF for compute, Webshare for stealth bandwidth, and ourselves once for the build.
Three scenarios based on wall-clock time per screenshot (optimistic 5s, realistic 10s, pessimistic 15s). Wall time is the dominant cost driver because CF charges $0.09 per browser-hour and overage on the 10 included hours.
| Component | Optimistic (5s) | Realistic (10s) | Pessimistic (15s) | Notes |
|---|---|---|---|---|
| Workers Paid plan (base) | $5 | $5 | $5 | Required to use Browser Rendering |
| Browser-hour overage | $17 | $34 | $52 | $0.09/hr after 10hr/mo included |
| Concurrency overage | $20 | $40 | $60 | $2/extra browser, averaged over daily peaks |
| R2 storage (optional cache) | $1 | $1 | $1 | ~28GB/mo at $0.015/GB · egress free |
| Total Cloudflare | $43 | $80 | $118 | Add 20–30% margin for safety |
You don't currently use urlbox's stealth feature, but most replacements will need it sooner or later. Webshare is already wired in via brand-kit-extractor's smart-fallback path. Cost depends entirely on what % of traffic needs it. Assuming ~3 MB per page load (HTTP Archive 2025 median is 2.65 MB).
Most shots use CF's default egress (free). Datacenter pool catches sites that flag CF IPs but don't run bot management. Residential is the last-resort fallback for Cloudflare Bot Management / Akamai / DataDome — exactly where we already invoke Webshare in brand-kit-extractor today.
| Scenario | Stealth shots/mo | Bandwidth | Webshare plan | Monthly cost |
|---|---|---|---|---|
| Optimistic | 7,000 (5%) | ~21 GB | 25 GB pack | ~$65 |
| Realistic | 21,000 (15%) | ~63 GB | 100 GB plan | ~$165 |
| Conservative | 42,000 (30%) | ~126 GB | 100 GB + overage | ~$285 |
The brand-kit-extractor worker built last night already contains the hard parts: stealth fingerprinting, residential proxy
fallback, cookie banner dismissal, bot challenge detection, full-page stitching at 4000px. Extract the shared bits into a
library, build a thin new screenshot-service worker on top.
Audit of every urlbox callsite in the monorepo. Most usage is dead-simple: signed URL with width/height/block_ads. No JS or CSS injection. No element selectors. No async webhooks. The replacement surface is small.
| urlbox feature | Used today? | Replacement effort | Notes |
|---|---|---|---|
| Signed screenshot URL | Yes | Trivial | HMAC pattern from brand-kit-extractor reused as-is |
| Custom viewport (width/height) | Yes — 3 sizes | Already done | Desktop 1280, tablet 768, mobile 375 in GeneratePageScreenshotsAction |
| CSS injection (highlight overlay) | Yes — 3 callsites | ~½ day | page.addStyleTag() with border: 2px dashed red on selector. SVG-path truncation hack already in PHP — port verbatim. |
S3 upload (use_s3=true) | Yes | Trivial | Backend already uses S3-compatible endpoint (likely R2). Worker writes to same bucket. |
| scroll_to (Y position) | Yes | Trivial | page.evaluate(y => window.scrollTo(0, y)) |
| delay (ms before capture) | Yes — 2000ms | Trivial | await page.waitForTimeout(delay) |
| wait_until (domloaded | requestsfinished) | Both | Already done | Maps to waitUntil in page.goto. Brand-kit already does both. |
| Cookie banner hiding | Yes | Already done | uBO + autoconsent shipped in brand-kit-extractor |
Custom user_agent (atarim-worker) | Yes | Trivial | Must keep — sites whitelist this UA. page.setUserAgent(). |
Custom HTTP header (Proxied-For: Atarim) | Yes | Trivial | Must keep — Atarim proxy worker matches on this header. page.setExtraHTTPHeaders(). |
| Full-page stitching | Some | Already done | Lift the 4000px stitch loop from brand-kit-extractor:396 |
| clickAll (click selector) | Yes | Trivial | page.click(selector) in a loop |
Thumb resizing (thumb_width) | Page model only | ~½ day | CF Images transforms on R2 delivery OR post-render canvas resize |
| Quality (jpeg q=40, png lossless) | Yes | Trivial | page.screenshot({quality}) |
| img_fit=cover (crop to viewport) | Yes | Trivial | Default puppeteer behavior — no full-page flag |
JS injection (js=) | No | Skip v1 | Not used anywhere. page.evaluate() if ever needed. |
| Stealth mode | No | Skip v1 | Not paid for, not in any URL. Webshare fallback available if needed. |
| Async webhook callbacks | No | Skip v1 | Backend uses fire-and-forget GET pattern instead |
| Multi-format (PDF/MP4) | No | Skip | Not used. PNG + JPEG only. |
Backend audit complete. 21 urlbox callsites across api_wpfeedback, all funneling through
UrlboxGenerateSignedUrlAction (the single chokepoint). The dominant volume driver is
GeneratePageScreenshotsAction — fires 3 screenshots per page
(desktop 1280 + tablet 768 + mobile 375) on new pages, AI reviews, and metadata syncs.
SiteController::createSitePage, CollectSiteMetadata listener, and AI review action./site/activate, /sitedata/sync, and Rocket onboarding listener (2-min delay job).AiReviewViewportAction, growing with AI feature adoption./generate-image — open passthrough endpoint. Volume unknown. Security smell: forwards $request->all() straight to urlbox.
Backend stores the signed urlbox URL itself in DB columns (tasks.wpf_task_screenshot,
sites.image, pages.screenshot). Async listener does a fire-and-forget GET to warm urlbox
and push to S3. The S3 URL is sometimes also stored (atarim_task_screenshot, thumbnail_s3_url).
Used: width/height/full_page/img_fit, scroll_to, css injection, format (png/jpeg),
quality, max_height, wait_until (domloaded | requestsfinished), use_s3, s3_path, hide_cookie_banners, skip_scroll,
delay, clickAll, custom user_agent, custom header (Proxied-For: Atarim).
NOT used anywhere: stealth, js injection, selector clip, block_ads, cookie= injection, multi-format (PDF/MP4), webhooks, polling. The replacement surface is small.
The backend has 21 callsites but they all go through one wrapper. Replacing urlbox = replacing one file, plus a DB backfill to rewrite persisted urlbox.com URLs to the new CDN.
app/Actions/Support/UrlboxGenerateSignedUrlAction.php — replace $urlbox->generateSignedUrl() with an HMAC POST to the new worker. Same return contract.composer.json:99 — remove urlbox/screenshotsconfig/app.php:176 — remove UrlboxProviderconfig/services.php:54-57 — swap key/secret for worker URL/HMACUrlboxGenerateSignedUrlDTO — rename to ScreenshotRequestDTORegenerateThumbnail.php & RegenerateAutoScreenShot.php — use the wrapper instead of the urlbox facade directly
DB columns currently store https://api.urlbox.com/v1/... URLs. After migration these need to point at
the new CDN (or the existing S3 URL where available).
tasks.wpf_task_screenshottasks.atarim_task_screenshot (already S3 — no change)sites.imagesites.faviconsites.thumbnail_s3_url (already S3)sites.tablet_screenshot_urlsites.mobile_screenshot_urlpages.screenshotpages.tablet_screenshot_urlpages.mobile_screenshot_url
Easiest path: write Laravel migration that does UPDATE ... SET col = REPLACE(col, 'api.urlbox.com/v1/.../{token}/{fmt}', 'screenshots.atarim.io')
with token-aware regex. Chunked, idempotent, runnable in production.
Backend uses AWS_ENDPOINT + AWS_USE_PATH_STYLE_ENDPOINT + AWS_PUBLIC_URL env vars
(config/filesystems.php:60-90). This is an S3-compatible interface, almost certainly pointed at R2 or DO
Spaces already. The new worker can write to the same bucket — zero storage migration.
Workers Paid plan ships with 10 concurrent browsers by default. 140k/mo averages ~3 shots/sec but will spike to 30–60/sec during EU/US business hours. We need 50–100 concurrent for safe headroom. The hard account ceiling is 120 (raise via support ticket). Block on confirming our quota before MVP.
Every puppeteer.launch() is multi-second and counts against the 1-launch-per-second rate limit. Without
session reuse (via Durable Object pinning or browser.disconnect() / puppeteer.connect()),
we'll cap throughput and bleed cost on cold starts. Adds ~1d to implementation but unblocks scale.
api.urlbox.com URLs
Backend persists the signed urlbox URL itself in tasks.wpf_task_screenshot, sites.image,
pages.screenshot, and 7 other columns. After cutover these URLs would 404. Need a chunked, idempotent
Laravel migration that rewrites them to the new CDN. ~1 day of work; can run in production with no downtime.
Backend's GeneratePageScreenshotsAction fires 3 screenshots per page (desktop 1280 + tablet 768 + mobile 375)
and is the single largest volume driver (~55%). An easy optimization in the new worker: do all 3 viewport captures in
one browser session by changing viewport between captures. Saves 2× session-launch overhead per page,
cuts CF browser-hours by ~30%, and reduces concurrency pressure.
Several backend callsites use the sync path (Http::timeout(300)->get($signedUrl)) — blocks the PHP-FPM
worker for up to 5 minutes per screenshot. With the new worker we can shorten this aggressively (target 5–15s p95)
and free up PHP capacity. Latent throughput win not captured in the cost numbers.
ImageController::generateImage is an open passthrough
app/Http/Controllers/ImageController.php:17-25 forwards $request->all() straight to the
urlbox SDK. Any client of /generate-image can smuggle arbitrary urlbox params (cookie=, js=, etc.).
Lock this down during migration — tightly type the DTO and reject unknown fields.
R2 egress is free via Cloudflare's network. Binding the bucket to a custom domain (e.g. screenshots.atarim.io)
gives public CDN delivery without Workers invocation on read. Verify the current pricing page before launch.
urlbox auto-retries failed renders. We'd need a thin retry layer (Cloudflare Queues + DLQ) — covered in the "production hardening" tier above.
~$10K one-time build, ~$600/mo recurring savings (vs. $725 observed), 17-month TCO break-even at month 3, 5-year savings ≈ $37K. Backend audit confirmed: single chokepoint (one file), narrow feature surface (CSS injection + standard puppeteer ops, no exotic urlbox features), S3 layer already compatible. Only real unknowns left are (a) CF concurrency quota and (b) whether burst traffic warrants session-reuse architecture from day one.
Suggested next steps:
browser-rendering lib from brand-kit-extractor and prove the highlight-overlay page.addStyleTag() path produces visually equivalent output to urlbox's css= param on 5 representative real-customer URLs.RegenerateThumbnail artisan command, then uploadTaskScreenshotViaUrlBox, then GeneratePageScreenshotsAction) → DB backfill → urlbox shutoff.