Issue 02 · Design

Paste Bin

URL shortener plus content. The moment the payload outgrows the row is the moment your architecture changes entirely.

Published 28 May 2026 · Medium object storage two-tier storage expiry pipeline CDN NoSQL

Read / write ratio 10: 1

Peak QPS (reads) 3.5K

Latency target < 50ms p99

Avg paste size 10KB

Storage / 10 yr ~3.6PB

i. Requirements

Functional

Create a paste: accepts text or code up to 10 MB, returns a short unique URL
Retrieve a paste by its short URL: serve the raw content
Optional expiration: never, 10 min, 1 hour, 1 day, 1 week, 1 month, or custom
Optional: syntax highlighting by language (client-side is fine)
Optional: password-protected pastes (read requires correct passphrase)
Optional: custom alias (vanity slug), burn-after-read mode
Optional: fork / clone an existing paste

Non-Functional

Durability — content must not be lost; pastes are the product
Availability — reads at 99.99%, writes at 99.9%
Low read latency — p99 < 50 ms (content may be large; time-to-first-byte is the target)
Scalable storage — total content grows without bound; storage cost must be proportional to actual bytes
Read-heavy at ~10:1, skewed: popular pastes are fetched thousands of times, most are fetched once

Out of Scope

Real-time collaboration (Google Docs–style concurrent editing)
Full user account system, billing, teams
Server-side syntax highlighting at render time (offload to the client)
Diff / version history beyond forking

ii. Capacity Estimates

Parameter	Value	Notes
New pastes per day	10 million	1/10th of TinyURL — content is costlier to produce
Write QPS (avg)	~115	10M / 86,400
Write QPS (peak ≈ 3×)	~350
Read QPS (10:1, avg)	~1,150
Read QPS (peak)	~3,500
Average paste size	10 KB	Mix of tiny snippets and larger files; 95th pct < 100 KB
Max paste size	10 MB	Hard cap; reject at ingress
Metadata record size	~1 KB	IDs, timestamps, TTL, content_key, owner, size
Content storage / year	10M × 365 × 10KB = ~36.5 TB
Content storage / 10 years	~365 TB to ~3.6 PB	Lower bound assumes aggressive expiration; upper bound is worst-case infinite retention
Metadata storage / 10 years	~36 TB	Fits in Cassandra comfortably
Read bandwidth at peak	3,500 × 10 KB = 35 MB/s ≈ 0.28 Gbps	Without CDN; with CDN the origin sees 10–20% of this

Per-tier sizing — back of the envelope

The read QPS looks modest (3,500) but the byte rate is not. CDN absorption is the single biggest lever, especially since popular pastes are fetched repeatedly.

Tier	Load at peak	Sizing	Binding resource
CDN / Edge	~80% of reads (public popular pastes)	Anycast PoPs; no origin capacity needed for hits	Egress bandwidth
Read service (app)	~700 QPS post-CDN	5–10 pods; trivially small — the bottleneck is I/O not CPU	Open file handles / network to S3
Redis cluster	~630 QPS (90% L2 hit on post-CDN traffic)	2–4 shards; small pastes (<4 KB) inline, larger just cache metadata	Memory (~10 GB/shard)
Object storage (S3)	~70 QPS (10% cache miss)	No sizing needed — S3 auto-scales; cost is the real metric	GET latency (~5–30 ms)
Metadata DB (Cassandra)	~70 QPS reads, ~350 QPS writes	3 nodes per region, RF=3	Write IOPS
Write service (app)	~350 QPS peak	3–5 pods; bottleneck is S3 PUT + DB write, not app CPU	S3 PUT latency
Expiry worker	Background; bursty on TTL boundaries	1–2 pods per region; reads expiry queue, deletes S3 + DB	S3 DELETE throughput

iii. High-Level Design

Client
  │
  ▼
[Geo-DNS / Anycast]
  │
  ▼
[CDN Edge PoP]  ──hit (public paste)──▶  response (content from edge cache)
  │ miss
  ▼
[Regional Load Balancer]
  │                          │
  ▼                          ▼
[Read Service]          [Write Service]
  │                          │
  ├─▶ [Redis L2]             ├─▶ [ID Generator (base62)]
  │    hit → return          ├─▶ [Object Storage: S3/GCS]  ──▶  content bytes
  │    miss ↓                └─▶ [Metadata DB: Cassandra]  ──▶  row (id, ttl, key, …)
  ├─▶ [Metadata DB]
  │    get content_key
  │
  └─▶ [Object Storage: S3/GCS]  ──▶  stream content bytes to client

[Expiry Worker] ──reads TTL queue──▶ deletes S3 object + DB row

The core split: metadata (paste ID, expiry, owner, content_key, size, language) lives in Cassandra; content bytes live in object storage (S3). The read service stitches them together. This split is the central design decision — everything downstream follows from it.

For small pastes (<4 KB), content is stored inline in the Cassandra row and the S3 hop is skipped entirely. The 4 KB threshold keeps the DB row under the Cassandra recommended limit for inline blobs while eliminating the object storage round-trip for the majority of pastes (code snippets, config fragments, short logs).

iv. Key Design Decisions

ID scheme

Same base62 approach as the URL shortener — a 7-character slug gives 62⁷ ≈ 3.5 trillion unique IDs, enough for any realistic horizon. ID generation uses a distributed counter with ZooKeeper-coordinated ranges: each write pod claims a range of 1 million IDs, burns through them locally (no coordination per write), then claims another. Collision is structurally impossible within a range.

For custom aliases, the write path does a Cassandra LWT (lightweight transaction) INSERT IF NOT EXISTS on the alias. Races are rare; LWT handles them correctly without distributed locks.

The inline vs. object storage split

Paste size	Content storage	Read path	Rationale
< 4 KB	Cassandra row (`content_inline` blob column)	Single DB read, no S3 hop	Eliminates extra round-trip; ~50–60% of all pastes by count
4 KB – 10 MB	S3 object; Cassandra row stores `content_key`	DB read for metadata + S3 GET for bytes	Keeps rows small; S3 is cheaper and more durable per byte than DB storage

Database schema (Cassandra)

Cassandra is chosen over Postgres for the same reasons as the URL shortener: the access pattern is a pure point lookup by paste_id, writes are heavy (10M/day × RF=3 = 30M physical writes), and the 10-year storage horizon demands horizontal scaling. A Postgres primary + replicas would work for the first two years; migrate at ~500M rows.

-- Primary table: all reads go here
CREATE TABLE pastes (
  paste_id      text PRIMARY KEY,       -- base62 slug, e.g. "aB3kZ9m"
  created_at    timestamp,
  expires_at    timestamp,              -- null = never expires
  owner_id      text,                   -- null for anonymous
  language      text,                   -- "python", "sql", null
  title         text,
  size_bytes    int,
  content_type  text,                   -- "inline" | "s3"
  content_key   text,                   -- S3 key if content_type = "s3"
  content_inline blob,                  -- raw bytes if content_type = "inline"
  password_hash text,                   -- bcrypt hash; null = public
  burn_on_read  boolean,
  view_count    counter                 -- approximate; see note
) WITH default_time_to_live = 0        -- TTL managed at app layer, not Cassandra native
  AND compaction = { 'class': 'LeveledCompactionStrategy' };

-- Expiry index: drives the cleanup worker
CREATE TABLE pastes_by_expiry (
  expiry_bucket text,    -- "2026-05-28T14" (hour granularity)
  expires_at    timestamp,
  paste_id      text,
  PRIMARY KEY (expiry_bucket, expires_at, paste_id)
) WITH CLUSTERING ORDER BY (expires_at ASC, paste_id ASC);

-- Owner index: "my pastes" lookup (secondary access pattern)
CREATE TABLE pastes_by_owner (
  owner_id   text,
  created_at timestamp,
  paste_id   text,
  title      text,
  PRIMARY KEY (owner_id, created_at, paste_id)
) WITH CLUSTERING ORDER BY (created_at DESC, paste_id ASC);

Notes on schema decisions: Cassandra native TTL is not used for the main table because it fires on a per-row clock that cannot be changed after write. If an owner extends a paste's expiry, a native TTL would silently delete it anyway. App-layer expiry (checked at read time + async worker deletion) is more flexible. The view_count is a Cassandra counter column — approximate under concurrent increments, but accurate enough for analytics and "popular" ranking.

Caching strategy

Layer	What is cached	TTL	Target hit rate	Latency
Browser cache	Public paste content (Cache-Control: max-age)	Matches paste expiry or 1h for permanent	High for repeated local views	0 ms
CDN edge	Public paste responses (full HTTP response)	Matches paste expiry; private/password pastes excluded via `Cache-Control: private`	~80% of public reads	~5–15 ms
In-process L1 (app pod)	Metadata for recently accessed pastes; small (<4 KB) pastes in full	30s with jitter ±5s	~10% of post-CDN traffic	<1 ms
Redis L2	Metadata + inline content; large pastes: metadata only + pre-signed S3 URL	TTL = min(paste expiry – now, 24h) with ±10% jitter	~85% of post-CDN traffic	~1–3 ms
Object storage (S3)	Source of truth for large paste content	Permanent (managed by lifecycle rules)	~5% of post-CDN traffic	~10–50 ms

Negative caching: A missing paste_id (404) is cached in Redis for 60s with a sentinel value. Without this, a scan-and-fetch loop against random IDs would hammer Cassandra on every miss.

Stampede protection: The read service uses Go's singleflight in-process for concurrent identical requests (a popular paste shared via social media sees a burst of simultaneous first-loads). At Redis level, only one request holds a distributed mutex to rebuild the cache entry; others wait with a 200ms timeout, then fall through to S3 directly if the mutex holder times out.

Pre-signed S3 URLs: For large pastes, Redis caches a pre-signed S3 GET URL (valid 15 minutes) rather than the bytes themselves. The client is redirected to S3 directly, offloading bandwidth from the app tier entirely. This is the right trade-off when paste size is large enough that Redis memory cost exceeds S3 GET cost.

v. Deep Dives

Expiration pipeline

Expiry is the feature that quietly makes the system hard. A paste with expires_at = T must stop being readable at T, and its bytes must eventually be deleted from S3 to reclaim storage. There are three components:

Read-time enforcement: Every read checks expires_at against now() in the metadata returned from Cassandra or Redis. Expired pastes return 404 immediately. This is the correctness layer — it fires even if the async cleanup worker is lagging.
Expiry worker (async cleanup): A background service reads pastes_by_expiry in hour-bucket order. For each expired paste it: (a) deletes the S3 object if content_type = "s3", (b) deletes the Cassandra rows across all three tables, (c) invalidates CDN cache via purge API, (d) deletes from Redis. Order matters: S3 first, then DB. If the worker crashes between S3 delete and DB delete, a subsequent read gets a 404 from the DB (expired) or a 404 from S3 (object gone) — either way correct. The reverse order (DB then S3) would leave orphaned bytes on S3 indefinitely.
S3 lifecycle rules as backstop: A lifecycle rule on the S3 bucket tags objects at creation with expires_at and sets a lifecycle policy to delete objects after their tag date. This is a belt-and-suspenders measure — it catches anything the worker missed due to a multi-day outage.

Burn-after-read

A paste with burn_on_read = true is deleted immediately after the first successful read. The implementation requires care: reading and deleting must be atomic-ish. The read service: (1) fetches metadata from DB, (2) checks burn flag, (3) if set, issues a Cassandra DELETE before returning the content to the client, (4) purges CDN and Redis. The window between step 3 and the CDN purge propagating is typically < 5 seconds. If two readers race, one wins and the other gets a 404 — acceptable behavior, documented in the product.

Password-protected pastes

The password hash (bcrypt, cost factor 12) is stored in the Cassandra row. On read, the client submits the passphrase in a POST body (never in the URL — URLs end up in logs). The read service bcrypt-compares and returns 401 on mismatch. Password-protected pastes are excluded from CDN and Redis caching (served from origin only). Pre-signed S3 URLs are not used for password-protected pastes — the bytes must not be directly reachable without auth.

Syntax highlighting

Done entirely client-side using a library (Prism.js or highlight.js loaded from CDN). The server stores language as a metadata field; the client uses it to trigger the right grammar. This keeps the read path simple and stateless — no server-side rendering, no compute for highlighting. The trade-off is that curl-ing a paste returns raw text, which is almost always what developers want anyway.

Forking a paste

Fork = create a new paste with the same content but a new ID and new owner. The write service copies content from the original: for inline pastes, reads the blob from the parent row and writes it inline in the new row; for S3 pastes, issues an S3 server-side copy (no data moves across the network — S3 handles it internally, billing at copy cost not data transfer). The fork then proceeds as a normal write. This is cheap and correct.

vi. Bottlenecks by Tier

Tier	Practical ceiling	Binding resource	Lift
CDN	Vendor-defined; effectively unlimited for reasonable traffic	Egress cost / contract limit	Increase edge PoP coverage; negotiate egress pricing; use Anycast correctly so traffic is routed to closest PoP
Read service (app pods)	~2,000 concurrent S3 requests / pod (Go goroutine limit before connection pool starvation)	S3 connection pool exhaustion; open file descriptors	Increase pod count; tune S3 connection pool size; use HTTP/2 multiplexing to S3
Redis	~100K ops/sec/node; ~26 GB memory/node before eviction pressure	Memory (metadata + inline content fills fast)	Shard more aggressively; cache only metadata for large pastes (not the bytes); tune `maxmemory-policy` to `volatile-lru` so expiring pastes evict first
Cassandra (reads)	~20K reads/sec per node with LOCAL_ONE consistency; drops to ~8K at LOCAL_QUORUM	CPU for deserialization; SSTable read IOPS	Add read replicas; serve reads at LOCAL_ONE (we accept stale by seconds, not minutes); tune bloom filter FP rate to reduce disk seeks
Cassandra (writes)	~20K writes/sec per node (Cassandra is write-optimized via LSM)	Compaction I/O stealing from reads at high write QPS	Use LeveledCompactionStrategy (LCS) which trades more compaction CPU for fewer SSTables and more predictable read latency; add nodes horizontally
S3 (GETs)	~5,500 GET requests/sec/prefix by default; higher with prefix sharding	Request rate per prefix (S3 internal partitioning)	Distribute content_keys across multiple prefixes (e.g., first 2 chars of paste_id as prefix: `aB/aB3kZ9m`); S3 auto-scales within a prefix after ~30 min of sustained traffic
S3 (PUTs)	~3,500 PUT requests/sec/prefix	Same prefix partitioning	Same prefix sharding strategy; use multipart upload for pastes > 5 MB
Expiry worker	~1,000 deletes/sec sustained	S3 DELETE throughput + Cassandra write IOPS	Parallelize worker threads; use S3 batch operations for bulk deletes (up to 1,000 objects per request)

vii. Hot Keys / Skew / Pathological Data

The skew profile here is more extreme than in the URL shortener. A paste shared in a viral tweet or a Hacker News "Show HN" comment can go from 0 to 50,000 reads in 60 seconds. The paste ID becomes a hot key across every tier simultaneously: CDN, Redis, S3, and Cassandra all see it at once.

Mitigations

CDN as the first wall: Public pastes are aggressively cached at the CDN edge. A viral paste that reaches CDN cache serves infinitely without touching origin. Set Cache-Control: public, max-age=3600, stale-while-revalidate=60 on public permanent pastes. The stale-while-revalidate window means the CDN serves from cache while asynchronously refreshing, so origin never sees thundering herd on TTL expiry.
In-process LRU as the second wall: Each app pod maintains an LRU of the last 1,000 paste responses in memory (~10 MB at 10 KB average). A pod that has already seen this paste serves subsequent requests without touching Redis or S3.
Pre-signed S3 URLs redirect clients directly to S3: For large pastes, the app returns a 307 redirect to a pre-signed S3 URL. Bandwidth shifts from app pods + Redis to S3 directly, distributing the load across S3's own CDN. This is the key architectural advantage of object storage for large content.
Hot-paste detection and adaptive caching: A background counter (probabilistic sampling via reservoir sampling at 1%) identifies paste IDs with >1,000 requests/minute. These are flagged for: (a) extended CDN TTL via purge-and-rewrite, (b) promotion to in-process L1 on all pods via a lightweight pub-sub event, (c) S3 pre-signed URL caching in Redis for maximum duration. Detection latency is ~60 seconds — short enough to catch viral spikes before they saturate the origin.

viii. Multi-Region Architecture

Routing layer

Anycast geo-DNS routes readers to the nearest region. Pastebin reads are entirely region-local: the CDN serves from edge; cache misses hit the regional read service and regional Cassandra replica. No cross-region hop on the read path.

Writes are slightly different. Anonymous pastes are created in the nearest region. Authenticated user pastes are created in the user's "home region" (determined at account creation) to simplify the pastes_by_owner lookup. If a user in Singapore creates an account, their pastes live in the AP region's Cassandra. Reads from the US get a cross-region hop for cache misses — acceptable for a write-once-read-many workload where the vast majority of reads are already CDN hits.

What is region-local vs. globally coordinated

Concern	Region-local	Globally coordinated
Paste content reads	Yes — CDN edge or regional read service	No
Paste content bytes (S3)	Cross-region replication via S3 CRR; reads serve from nearest region bucket	Write lands in origin region; replication async
Metadata (Cassandra)	Regional RF=3 cluster with async cross-region replication	Custom alias creation uses LWT in home region
Custom alias uniqueness	No — must be globally unique	Yes — LWT in a designated "alias arbiter" region; rare operation, latency acceptable
View count	Incremented regionally	Aggregated asynchronously into a global counter; eventual consistency is fine for view counts
Burn-after-read	Attempted in originating region	Cross-region invalidation (CDN purge is global by default)

RPO / RTO matrix

Failure scope	Impact	RPO	RTO
Single AZ down (within region)	Reduced capacity; Cassandra RF=3 survives 1 AZ loss	0 (no data loss)	< 30s (LB removes unhealthy pods)
Full region down	Traffic re-routed to nearest healthy region via geo-DNS TTL	< 5 min for async-replicated content	~2–5 min (geo-DNS TTL)
S3 region outage	Cache miss reads fail; S3 CRR allows failover to replica bucket	~15 min replication lag (typical CRR lag)	< 10 min (update endpoint to replica bucket)
Metadata DB corruption (logical)	Paste metadata unreadable	Point-in-time Cassandra snapshots every 6h; max 6h data loss	Hours — Cassandra restore is slow; prioritize replaying write-ahead log
Expiry worker outage	Expired pastes remain readable until read-time check catches them; no data integrity loss	0 (read-time enforcement still works)	Worker restarts automatically; backlog processed within hours

ix. Failure Modes & Mitigations

Infrastructure failures

Failure	Blast radius / symptom	Mitigation
App pod crash	In-flight requests fail; LB health-check removes pod within 5s	Minimum 3 pods per region; LB health-check at 2s interval; circuit breaker in LB
Redis shard down	Cache misses for that shard's key space; latency spikes as requests fall through to S3/DB	Read service degrades gracefully to DB+S3; Redis cluster auto-promotes replica in < 30s; size DB tier to absorb full load
Redis cluster full (OOM)	Evictions begin; hit rate drops; DB and S3 absorb the load increase	`volatile-lru` eviction policy targets expiring keys first; alert at 75% memory used; add shard before hitting 90%
Cassandra node down	RF=3 with LOCAL_QUORUM writes: 1 node loss is transparent; 2 node loss: writes fail (quorum unavailable)	3-node minimum per region; RF=3; alert on node down immediately; replace within 4h (SLA)
S3 throttling (503 Slow Down)	Large paste reads and writes fail; app retries with exponential backoff + jitter	Prefix sharding to spread request rate; retry with jitter (1s base, 2× up to 30s, 3 attempts); circuit breaker after 5 consecutive failures
S3 region outage	Cache misses can't resolve for large pastes	S3 Cross-Region Replication (CRR); update read service endpoint to replica bucket via config flag; test failover quarterly
CDN PoP outage	Traffic falls through to origin; origin must handle full CDN-bypass load	Multi-CDN via geo-DNS health checks (Fastly + CloudFront failover); size origin for CDN-down case
Region-wide outage	All traffic for that region must reroute	Geo-DNS with low TTL (60s); cross-region Cassandra replication; S3 CRR; runbook for failover activation

Data-path pathologies

Failure	Blast radius / symptom	Mitigation
Cache stampede (viral paste)	CDN TTL expires on popular paste; thousands of simultaneous cache misses hit origin	`stale-while-revalidate` at CDN; singleflight in-process; distributed mutex at Redis; adaptive CDN TTL extension for hot keys
Hot key on Redis shard	Single shard CPU-bound; latency for all keys on that shard degrades	In-process L1 LRU absorbs top-N keys; pre-signed S3 URL redirect bypasses Redis for large pastes; key splitting not needed (paste IDs are globally unique — no synonyms to fan-out)
Hot partition on Cassandra	Node serving that paste_id's partition sees disproportionate read load	99%+ of reads hit CDN or Redis; Cassandra should almost never see hot-paste traffic directly
Replication lag (Cassandra cross-region)	Reader in EU reads older content than writer in US; typical lag < 500ms	Acceptable for pastebin; document in SLO as "eventual consistency across regions, <1s typical"; if correctness is critical, write to the reader's region via home-region routing
Poison paste (write succeeds, S3 object corrupted)	Readers get garbled bytes; no silent corruption — S3 returns ETag mismatch on read with integrity checks enabled	Enable S3 server-side checksum (SHA256); read service validates checksum before serving; serve 500 and alert on mismatch
Partial write (DB row written, S3 PUT failed)	Paste exists in metadata but content is absent; read returns 500 or empty body	Write service: S3 PUT first, then DB write. If S3 PUT fails, return error — no DB row created. If DB write fails after S3 PUT, a background reconciler finds DB-absent S3 objects and either retries the DB write or deletes the orphaned S3 object.
Expiry worker falling behind	Expired pastes remain in S3 / DB longer than expected; read-time check still enforces correctness	Monitor expiry queue depth; auto-scale worker pods on queue depth; S3 lifecycle rules as backstop
Clock skew between nodes	`expires_at` enforcement inconsistent across nodes; a paste may be readable on one pod and expired on another	Use NTP with <100ms tolerance; treat `expires_at` as a soft boundary with 5s grace on the read side; burn-after-read uses Cassandra LWT (Paxos) not wall clock
Cassandra compaction storm	Sustained high write QPS triggers compaction; read latency spikes as compaction steals I/O	LeveledCompactionStrategy minimizes read amplification; throttle compaction throughput via `compaction_throughput_mb_per_sec`; add nodes to spread compaction load

Operational / deployment failures

Failure	Blast radius / symptom	Mitigation
Bad deploy (read service)	New pods return 5xx on paste fetch	Canary deploy at 5% traffic; auto-rollback on error rate > 1% sustained 2 min; blue-green for schema-breaking changes
Cassandra schema migration	Adding a column is safe (Cassandra handles it); renaming or changing type is not	Always additive: new column alongside old; dual-write during migration; remove old column only after all readers on new schema; never rename in place
S3 bucket misconfiguration (public access)	All paste content publicly readable without auth	Block public access at bucket level; all access via pre-signed URLs or app-layer auth; quarterly S3 policy audit; AWS Macie for sensitive data detection
Expiry worker runaway (deletes live pastes)	Active pastes deleted prematurely	Worker reads expiry only from `pastes_by_expiry`; cross-checks `expires_at` in main table before deleting; 5-second delay between DB read and S3 delete as a sanity window
ID generator pod restart mid-range	Unused IDs in the claimed range are wasted — not a correctness problem, but a small key space leak	Ranges are 1M IDs; losing a partial range wastes at most ~1M IDs out of 3.5T — negligible; log range claim/return at pod startup/shutdown for audit

x. Security & Abuse

Threat model

Attack vector	Risk	Mitigation
Malware / phishing content hosting	High — short URLs and anonymous pastes are ideal phishing vehicles	Async content scan at write time (ClamAV for binaries; heuristic URL scanner for embedded links); Google Safe Browsing API check on URLs in paste content; flag for human review above a suspicion threshold; don't block writes synchronously — scan in background and tombstone within seconds of a positive hit
CSAM / illegal content	Critical — legal liability; platform must not host	PhotoDNA hash check for image content (text pastes: keyword signal + human review pipeline); hard delete within 60s of detection; preserve evidence hash for law enforcement in append-only audit log; legal hold prevents S3 lifecycle deletion
ID enumeration	Medium — attacker iterates IDs to harvest private pastes	IDs are base62 7-char (3.5T space); rate-limit 404s aggressively (10 per minute per IP / token); use non-sequential ID generation (shuffle counter range before encoding) to make sequential enumeration worthless; private pastes still require knowing the ID — obscurity is not security, but it raises the cost of enumeration significantly
Credential / secret dumping	High — pastes are a common accidental leak surface for API keys, passwords, private keys	Regex scan on write for known secret patterns (AWS keys, GitHub tokens, private key headers); flag and notify owner if authenticated; for anonymous pastes, tombstone and alert security team; integrate with GitHub secret scanning partner program
DDoS on write endpoint	Medium — 350 QPS peak write is modest; DDoS can overwhelm it easily	Rate-limit writes per IP (10/min anonymous, 100/min authenticated); CAPTCHA at write for anonymous users on abuse signals; Cloudflare / Akamai DDoS protection at edge; WAF rule for request size > 10 MB
DDoS on single paste (read amplification)	High — attacker shares URL broadly; origin gets slammed	CDN absorbs read DDoS for public pastes; rate-limit reads per IP at edge (1,000/min); auto-block IP on anomalous read rate; for private pastes behind auth, the auth wall is the rate limit
Redirect loop / open redirect via paste content	Low — pastebin doesn't redirect on content; content is served raw	N/A — no redirect functionality for content; only the short URL → paste page is a redirect, which is the intended behavior
GDPR / right to erasure	Medium — EU users can request deletion of their content	Deletion pipeline: tombstone DB row within 30s, S3 delete within 5 min (sync), CDN purge within 5 min; audit log records deletion event but not content; document data residency (content stored in which regions) in privacy policy
Audit / forensics	Regulatory — law enforcement requests for content	Append-only audit log (CloudWatch Logs + S3 + WORM bucket policy) records: create event, read events (sampled), delete events, abuse flags, legal holds; content itself is in S3 which has object-level versioning; legal hold flag in DB prevents expiry worker from deleting; legal team owns key for WORM bucket
Password brute-force	Medium — password-protected pastes are individually targetable	Rate-limit wrong password attempts: 5 per paste_id per 15 minutes per IP; after 3 failures, serve a CAPTCHA; bcrypt cost factor 12 (300ms on server) makes brute force expensive; lock paste_id after 10 failures from different IPs

Takedown propagation flow

Detection → flag in DB (status = 'tombstoned') → read service serves 451 (legal / abuse takedown) → purge CDN edge cache via API (propagates to all PoPs in < 10 seconds) → delete Redis cache → S3 delete or legal hold (depending on reason) → audit log entry. Total time from detection to edge cache cleared: < 60 seconds.

xi. Observability & SLOs

SLI targets (28-day rolling window)

SLI	Target	Error budget / month
Paste read availability (HTTP 2xx/3xx ratio, excluding 404/410)	99.99%	~4.3 minutes
Read p50 latency (time-to-first-byte, server-side, post-CDN)	< 20 ms	—
Read p99 latency (TTFB, server-side)	< 50 ms	—
Read p99.9 latency (TTFB)	< 200 ms	—
Paste write availability	99.9%	~43 minutes — writes are less critical than reads
Write p99 latency (create paste, end-to-end)	< 500 ms	— (includes S3 PUT; 95th pct < 200ms)
Expiry correctness (expired paste serves 404 within 5s of TTL)	99.9%	Read-time check is the enforcement; 0.1% failure budget for clock skew edge cases
Content durability	99.999999999% (S3 eleven-nines)	Inherited from S3; Cassandra RF=3 provides independent metadata durability

Golden signals — per service

Traffic: RPS at each tier (CDN, read service, Redis, Cassandra, S3); break out by HTTP status and by content_type (inline vs. S3 pastes)
Errors: 5xx rate (reliability metric); 404 rate (product metric — indicates abuse/enumeration if spike); 451 rate (abuse takedown metric)
Latency: TTFB at CDN edge (p50, p99); TTFB at read service (p50, p99, p99.9); S3 GET latency (p99 — this is the tail that controls large-paste p99); Redis GET latency
Saturation: Redis memory used %; Cassandra pending compactions; S3 throttle errors (503 Slow Down rate); app pod CPU and open connection count

Key alerts (burn-rate rules)

Page: read availability burn rate > 2% of monthly budget in 1 hour (indicates ~14 minutes of downtime burning in 60 min)
Page: read p99 TTFB > 200 ms for 5 consecutive minutes (SLO miss territory)
Page: Cassandra node down (RF=3 means one more failure away from quorum loss)
Page: S3 error rate > 1% sustained 5 minutes (affects all large-paste reads)
Ticket: Redis memory > 75% on any shard
Ticket: any single paste_id > 500 QPS (hot key candidate — engage adaptive caching playbook)
Ticket: expiry worker queue depth > 24 hours of backlog (worker falling behind)
Ticket: write latency p99 > 1s sustained (S3 PUT or DB write degradation)

Tracing & debugging

Trace ID propagated through CDN (via response header) → LB → read service → Redis → Cassandra → S3. A slow paste read can be unwound to the exact tier that added the latency.
Sampled tracing: 1% baseline; 100% for all 5xx, all requests > 200 ms, and all pastes > 1 MB
Cache hit/miss as span attributes: cache.l1_hit, cache.redis_hit, cache.s3_fetch — the trace shows exactly which tiers fired
S3 request ID in trace: every S3 GET/PUT includes AWS request ID as a span attribute; essential for opening S3 support tickets against a specific failed request

xii. Key Takeaways

The moment the payload outgrows the row, object storage becomes the architecture.

Split metadata from content. Metadata (IDs, expiry, owner, size) belongs in a fast, indexed database. Content bytes belong in object storage. Mixing them — putting 10 MB blobs in Cassandra rows — collapses both tiers simultaneously under load.
The inline threshold is a first-class decision. Storing small pastes (<4 KB) inline in the DB row eliminates an entire network hop for the majority of pastes. Tune this threshold based on observed size distribution, DB row limits, and S3 GET latency. It is a config value, not hard code.
Write order matters for consistency. S3 PUT before DB write. A failed S3 PUT leaves no trace — the paste simply doesn't exist. A DB write before a failed S3 PUT leaves a dangling metadata row with no content, which is harder to clean up and confusing to read-path code.
Expiry is a pipeline, not a flag. Read-time enforcement is the correctness guarantee. The async worker is the storage reclamation mechanism. S3 lifecycle rules are the backstop. All three must exist; each compensates for the others' failure modes.
Pre-signed S3 URLs are the bandwidth escape hatch. For large pastes, redirect the client directly to S3. The app tier serves a 307 — not bytes. This keeps app pods stateless and thin, offloads bandwidth entirely to S3's own infrastructure, and is free to implement.
Object storage has throughput ceilings per prefix. S3 allows ~3,500 PUT/s and ~5,500 GET/s per prefix. At low QPS this is invisible. At scale, prefix sharding (first 2 chars of the key) multiplies this ceiling by the number of prefixes. Design it in early — retrofitting prefix sharding onto an existing key scheme is painful.
Burn-after-read is eventually consistent by design. A race between two simultaneous readers means one gets the content and one gets a 404. Document this, set user expectations, and don't try to make it atomic with distributed locking — the cure is worse than the disease.
Abuse is load-bearing, not optional. Pastebin is one of the most abused infrastructure primitives on the internet. The content scanning pipeline, rate limiting, and takedown flow must be built before launch, not after the first incident.
The read path is shaped by bytes, not requests. 3,500 QPS sounds modest until you multiply by 10 KB average and realize you need ~280 Mbps sustained at peak — without CDN. Architecture everything around the byte rate, not the request rate. CDN is not optional; it is load-bearing infrastructure.
Observability at the content_type boundary. Split all metrics by whether the paste is inline or S3. A latency regression that affects only S3 pastes (large content) looks like a p99 spike but leaves p50 fine — you will miss it without the split. This is the trace attribute that pays for itself on the first incident.

xiii. Go Deeper

How would you implement real-time collaborative editing (Google Docs–style) on top of this architecture? What changes first — the storage model, the protocol, or the consistency model?
Design the content abuse pipeline end-to-end: ingestion → async scan → result routing → takedown → appeal. What's your false positive handling? How do you prevent legitimate pastes from being tombstoned?
Pastebin pastes can embed URLs (phishing). Design a URL scanner that processes pastes without adding latency to the write path. How do you handle pastes with 1,000 embedded links?
How would you implement a "paste diff" feature — showing the differences between two paste versions — at scale? Where does the diff computation happen, and how do you cache it?
The expiry worker processes 10M pastes/day at peak. How would you design it to survive a 3-day worker outage and then catch up without overloading Cassandra or S3 on recovery?
A user claims GDPR right-to-erasure. Their account has 50,000 pastes. Design the deletion pipeline. What's the SLA? What do you do with pastes that have been forked by other users?
How would you implement paste encryption at rest (client-side, not server-side) so that even the platform cannot read paste content? What does this do to abuse detection?

❦