Paste Bin
URL shortener plus content. The moment the payload outgrows the row is the moment your architecture changes entirely.
i. Requirements
Functional
- Create a paste: accepts text or code up to 10 MB, returns a short unique URL
- Retrieve a paste by its short URL: serve the raw content
- Optional expiration: never, 10 min, 1 hour, 1 day, 1 week, 1 month, or custom
- Optional: syntax highlighting by language (client-side is fine)
- Optional: password-protected pastes (read requires correct passphrase)
- Optional: custom alias (vanity slug), burn-after-read mode
- Optional: fork / clone an existing paste
Non-Functional
- Durability — content must not be lost; pastes are the product
- Availability — reads at 99.99%, writes at 99.9%
- Low read latency — p99 < 50 ms (content may be large; time-to-first-byte is the target)
- Scalable storage — total content grows without bound; storage cost must be proportional to actual bytes
- Read-heavy at ~10:1, skewed: popular pastes are fetched thousands of times, most are fetched once
Out of Scope
- Real-time collaboration (Google Docs–style concurrent editing)
- Full user account system, billing, teams
- Server-side syntax highlighting at render time (offload to the client)
- Diff / version history beyond forking
ii. Capacity Estimates
| Parameter | Value | Notes |
|---|---|---|
| New pastes per day | 10 million | 1/10th of TinyURL — content is costlier to produce |
| Write QPS (avg) | ~115 | 10M / 86,400 |
| Write QPS (peak ≈ 3×) | ~350 | |
| Read QPS (10:1, avg) | ~1,150 | |
| Read QPS (peak) | ~3,500 | |
| Average paste size | 10 KB | Mix of tiny snippets and larger files; 95th pct < 100 KB |
| Max paste size | 10 MB | Hard cap; reject at ingress |
| Metadata record size | ~1 KB | IDs, timestamps, TTL, content_key, owner, size |
| Content storage / year | 10M × 365 × 10KB = ~36.5 TB | |
| Content storage / 10 years | ~365 TB to ~3.6 PB | Lower bound assumes aggressive expiration; upper bound is worst-case infinite retention |
| Metadata storage / 10 years | ~36 TB | Fits in Cassandra comfortably |
| Read bandwidth at peak | 3,500 × 10 KB = 35 MB/s ≈ 0.28 Gbps | Without CDN; with CDN the origin sees 10–20% of this |
Per-tier sizing — back of the envelope
The read QPS looks modest (3,500) but the byte rate is not. CDN absorption is the single biggest lever, especially since popular pastes are fetched repeatedly.
| Tier | Load at peak | Sizing | Binding resource |
|---|---|---|---|
| CDN / Edge | ~80% of reads (public popular pastes) | Anycast PoPs; no origin capacity needed for hits | Egress bandwidth |
| Read service (app) | ~700 QPS post-CDN | 5–10 pods; trivially small — the bottleneck is I/O not CPU | Open file handles / network to S3 |
| Redis cluster | ~630 QPS (90% L2 hit on post-CDN traffic) | 2–4 shards; small pastes (<4 KB) inline, larger just cache metadata | Memory (~10 GB/shard) |
| Object storage (S3) | ~70 QPS (10% cache miss) | No sizing needed — S3 auto-scales; cost is the real metric | GET latency (~5–30 ms) |
| Metadata DB (Cassandra) | ~70 QPS reads, ~350 QPS writes | 3 nodes per region, RF=3 | Write IOPS |
| Write service (app) | ~350 QPS peak | 3–5 pods; bottleneck is S3 PUT + DB write, not app CPU | S3 PUT latency |
| Expiry worker | Background; bursty on TTL boundaries | 1–2 pods per region; reads expiry queue, deletes S3 + DB | S3 DELETE throughput |
iii. High-Level Design
Client
│
▼
[Geo-DNS / Anycast]
│
▼
[CDN Edge PoP] ──hit (public paste)──▶ response (content from edge cache)
│ miss
▼
[Regional Load Balancer]
│ │
▼ ▼
[Read Service] [Write Service]
│ │
├─▶ [Redis L2] ├─▶ [ID Generator (base62)]
│ hit → return ├─▶ [Object Storage: S3/GCS] ──▶ content bytes
│ miss ↓ └─▶ [Metadata DB: Cassandra] ──▶ row (id, ttl, key, …)
├─▶ [Metadata DB]
│ get content_key
│
└─▶ [Object Storage: S3/GCS] ──▶ stream content bytes to client
[Expiry Worker] ──reads TTL queue──▶ deletes S3 object + DB row
The core split: metadata (paste ID, expiry, owner, content_key, size, language) lives in Cassandra; content bytes live in object storage (S3). The read service stitches them together. This split is the central design decision — everything downstream follows from it.
For small pastes (<4 KB), content is stored inline in the Cassandra row and the S3 hop is skipped entirely. The 4 KB threshold keeps the DB row under the Cassandra recommended limit for inline blobs while eliminating the object storage round-trip for the majority of pastes (code snippets, config fragments, short logs).
iv. Key Design Decisions
ID scheme
Same base62 approach as the URL shortener — a 7-character slug gives 62⁷ ≈ 3.5 trillion unique IDs, enough for any realistic horizon. ID generation uses a distributed counter with ZooKeeper-coordinated ranges: each write pod claims a range of 1 million IDs, burns through them locally (no coordination per write), then claims another. Collision is structurally impossible within a range.
For custom aliases, the write path does a Cassandra LWT (lightweight transaction) INSERT IF NOT EXISTS on the alias. Races are rare; LWT handles them correctly without distributed locks.
The inline vs. object storage split
| Paste size | Content storage | Read path | Rationale |
|---|---|---|---|
| < 4 KB | Cassandra row (content_inline blob column) | Single DB read, no S3 hop | Eliminates extra round-trip; ~50–60% of all pastes by count |
| 4 KB – 10 MB | S3 object; Cassandra row stores content_key | DB read for metadata + S3 GET for bytes | Keeps rows small; S3 is cheaper and more durable per byte than DB storage |
Database schema (Cassandra)
Cassandra is chosen over Postgres for the same reasons as the URL shortener: the access pattern is a pure point lookup by paste_id, writes are heavy (10M/day × RF=3 = 30M physical writes), and the 10-year storage horizon demands horizontal scaling. A Postgres primary + replicas would work for the first two years; migrate at ~500M rows.
-- Primary table: all reads go here
CREATE TABLE pastes (
paste_id text PRIMARY KEY, -- base62 slug, e.g. "aB3kZ9m"
created_at timestamp,
expires_at timestamp, -- null = never expires
owner_id text, -- null for anonymous
language text, -- "python", "sql", null
title text,
size_bytes int,
content_type text, -- "inline" | "s3"
content_key text, -- S3 key if content_type = "s3"
content_inline blob, -- raw bytes if content_type = "inline"
password_hash text, -- bcrypt hash; null = public
burn_on_read boolean,
view_count counter -- approximate; see note
) WITH default_time_to_live = 0 -- TTL managed at app layer, not Cassandra native
AND compaction = { 'class': 'LeveledCompactionStrategy' };
-- Expiry index: drives the cleanup worker
CREATE TABLE pastes_by_expiry (
expiry_bucket text, -- "2026-05-28T14" (hour granularity)
expires_at timestamp,
paste_id text,
PRIMARY KEY (expiry_bucket, expires_at, paste_id)
) WITH CLUSTERING ORDER BY (expires_at ASC, paste_id ASC);
-- Owner index: "my pastes" lookup (secondary access pattern)
CREATE TABLE pastes_by_owner (
owner_id text,
created_at timestamp,
paste_id text,
title text,
PRIMARY KEY (owner_id, created_at, paste_id)
) WITH CLUSTERING ORDER BY (created_at DESC, paste_id ASC);
Notes on schema decisions: Cassandra native TTL is not used for the main table because it fires on a per-row clock that cannot be changed after write. If an owner extends a paste's expiry, a native TTL would silently delete it anyway. App-layer expiry (checked at read time + async worker deletion) is more flexible. The view_count is a Cassandra counter column — approximate under concurrent increments, but accurate enough for analytics and "popular" ranking.
Caching strategy
| Layer | What is cached | TTL | Target hit rate | Latency |
|---|---|---|---|---|
| Browser cache | Public paste content (Cache-Control: max-age) | Matches paste expiry or 1h for permanent | High for repeated local views | 0 ms |
| CDN edge | Public paste responses (full HTTP response) | Matches paste expiry; private/password pastes excluded via Cache-Control: private | ~80% of public reads | ~5–15 ms |
| In-process L1 (app pod) | Metadata for recently accessed pastes; small (<4 KB) pastes in full | 30s with jitter ±5s | ~10% of post-CDN traffic | <1 ms |
| Redis L2 | Metadata + inline content; large pastes: metadata only + pre-signed S3 URL | TTL = min(paste expiry – now, 24h) with ±10% jitter | ~85% of post-CDN traffic | ~1–3 ms |
| Object storage (S3) | Source of truth for large paste content | Permanent (managed by lifecycle rules) | ~5% of post-CDN traffic | ~10–50 ms |
Negative caching: A missing paste_id (404) is cached in Redis for 60s with a sentinel value. Without this, a scan-and-fetch loop against random IDs would hammer Cassandra on every miss.
Stampede protection: The read service uses Go's singleflight in-process for concurrent identical requests (a popular paste shared via social media sees a burst of simultaneous first-loads). At Redis level, only one request holds a distributed mutex to rebuild the cache entry; others wait with a 200ms timeout, then fall through to S3 directly if the mutex holder times out.
Pre-signed S3 URLs: For large pastes, Redis caches a pre-signed S3 GET URL (valid 15 minutes) rather than the bytes themselves. The client is redirected to S3 directly, offloading bandwidth from the app tier entirely. This is the right trade-off when paste size is large enough that Redis memory cost exceeds S3 GET cost.
v. Deep Dives
Expiration pipeline
Expiry is the feature that quietly makes the system hard. A paste with expires_at = T must stop being readable at T, and its bytes must eventually be deleted from S3 to reclaim storage. There are three components:
- Read-time enforcement: Every read checks
expires_atagainstnow()in the metadata returned from Cassandra or Redis. Expired pastes return 404 immediately. This is the correctness layer — it fires even if the async cleanup worker is lagging. - Expiry worker (async cleanup): A background service reads
pastes_by_expiryin hour-bucket order. For each expired paste it: (a) deletes the S3 object ifcontent_type = "s3", (b) deletes the Cassandra rows across all three tables, (c) invalidates CDN cache via purge API, (d) deletes from Redis. Order matters: S3 first, then DB. If the worker crashes between S3 delete and DB delete, a subsequent read gets a 404 from the DB (expired) or a 404 from S3 (object gone) — either way correct. The reverse order (DB then S3) would leave orphaned bytes on S3 indefinitely. - S3 lifecycle rules as backstop: A lifecycle rule on the S3 bucket tags objects at creation with
expires_atand sets a lifecycle policy to delete objects after their tag date. This is a belt-and-suspenders measure — it catches anything the worker missed due to a multi-day outage.
Burn-after-read
A paste with burn_on_read = true is deleted immediately after the first successful read. The implementation requires care: reading and deleting must be atomic-ish. The read service: (1) fetches metadata from DB, (2) checks burn flag, (3) if set, issues a Cassandra DELETE before returning the content to the client, (4) purges CDN and Redis. The window between step 3 and the CDN purge propagating is typically < 5 seconds. If two readers race, one wins and the other gets a 404 — acceptable behavior, documented in the product.
Password-protected pastes
The password hash (bcrypt, cost factor 12) is stored in the Cassandra row. On read, the client submits the passphrase in a POST body (never in the URL — URLs end up in logs). The read service bcrypt-compares and returns 401 on mismatch. Password-protected pastes are excluded from CDN and Redis caching (served from origin only). Pre-signed S3 URLs are not used for password-protected pastes — the bytes must not be directly reachable without auth.
Syntax highlighting
Done entirely client-side using a library (Prism.js or highlight.js loaded from CDN). The server stores language as a metadata field; the client uses it to trigger the right grammar. This keeps the read path simple and stateless — no server-side rendering, no compute for highlighting. The trade-off is that curl-ing a paste returns raw text, which is almost always what developers want anyway.
Forking a paste
Fork = create a new paste with the same content but a new ID and new owner. The write service copies content from the original: for inline pastes, reads the blob from the parent row and writes it inline in the new row; for S3 pastes, issues an S3 server-side copy (no data moves across the network — S3 handles it internally, billing at copy cost not data transfer). The fork then proceeds as a normal write. This is cheap and correct.
vi. Bottlenecks by Tier
| Tier | Practical ceiling | Binding resource | Lift |
|---|---|---|---|
| CDN | Vendor-defined; effectively unlimited for reasonable traffic | Egress cost / contract limit | Increase edge PoP coverage; negotiate egress pricing; use Anycast correctly so traffic is routed to closest PoP |
| Read service (app pods) | ~2,000 concurrent S3 requests / pod (Go goroutine limit before connection pool starvation) | S3 connection pool exhaustion; open file descriptors | Increase pod count; tune S3 connection pool size; use HTTP/2 multiplexing to S3 |
| Redis | ~100K ops/sec/node; ~26 GB memory/node before eviction pressure | Memory (metadata + inline content fills fast) | Shard more aggressively; cache only metadata for large pastes (not the bytes); tune maxmemory-policy to volatile-lru so expiring pastes evict first |
| Cassandra (reads) | ~20K reads/sec per node with LOCAL_ONE consistency; drops to ~8K at LOCAL_QUORUM | CPU for deserialization; SSTable read IOPS | Add read replicas; serve reads at LOCAL_ONE (we accept stale by seconds, not minutes); tune bloom filter FP rate to reduce disk seeks |
| Cassandra (writes) | ~20K writes/sec per node (Cassandra is write-optimized via LSM) | Compaction I/O stealing from reads at high write QPS | Use LeveledCompactionStrategy (LCS) which trades more compaction CPU for fewer SSTables and more predictable read latency; add nodes horizontally |
| S3 (GETs) | ~5,500 GET requests/sec/prefix by default; higher with prefix sharding | Request rate per prefix (S3 internal partitioning) | Distribute content_keys across multiple prefixes (e.g., first 2 chars of paste_id as prefix: aB/aB3kZ9m); S3 auto-scales within a prefix after ~30 min of sustained traffic |
| S3 (PUTs) | ~3,500 PUT requests/sec/prefix | Same prefix partitioning | Same prefix sharding strategy; use multipart upload for pastes > 5 MB |
| Expiry worker | ~1,000 deletes/sec sustained | S3 DELETE throughput + Cassandra write IOPS | Parallelize worker threads; use S3 batch operations for bulk deletes (up to 1,000 objects per request) |
vii. Hot Keys / Skew / Pathological Data
The skew profile here is more extreme than in the URL shortener. A paste shared in a viral tweet or a Hacker News "Show HN" comment can go from 0 to 50,000 reads in 60 seconds. The paste ID becomes a hot key across every tier simultaneously: CDN, Redis, S3, and Cassandra all see it at once.
Mitigations
- CDN as the first wall: Public pastes are aggressively cached at the CDN edge. A viral paste that reaches CDN cache serves infinitely without touching origin. Set
Cache-Control: public, max-age=3600, stale-while-revalidate=60on public permanent pastes. The stale-while-revalidate window means the CDN serves from cache while asynchronously refreshing, so origin never sees thundering herd on TTL expiry. - In-process LRU as the second wall: Each app pod maintains an LRU of the last 1,000 paste responses in memory (~10 MB at 10 KB average). A pod that has already seen this paste serves subsequent requests without touching Redis or S3.
- Pre-signed S3 URLs redirect clients directly to S3: For large pastes, the app returns a 307 redirect to a pre-signed S3 URL. Bandwidth shifts from app pods + Redis to S3 directly, distributing the load across S3's own CDN. This is the key architectural advantage of object storage for large content.
- Hot-paste detection and adaptive caching: A background counter (probabilistic sampling via reservoir sampling at 1%) identifies paste IDs with >1,000 requests/minute. These are flagged for: (a) extended CDN TTL via purge-and-rewrite, (b) promotion to in-process L1 on all pods via a lightweight pub-sub event, (c) S3 pre-signed URL caching in Redis for maximum duration. Detection latency is ~60 seconds — short enough to catch viral spikes before they saturate the origin.
viii. Multi-Region Architecture
Routing layer
Anycast geo-DNS routes readers to the nearest region. Pastebin reads are entirely region-local: the CDN serves from edge; cache misses hit the regional read service and regional Cassandra replica. No cross-region hop on the read path.
Writes are slightly different. Anonymous pastes are created in the nearest region. Authenticated user pastes are created in the user's "home region" (determined at account creation) to simplify the pastes_by_owner lookup. If a user in Singapore creates an account, their pastes live in the AP region's Cassandra. Reads from the US get a cross-region hop for cache misses — acceptable for a write-once-read-many workload where the vast majority of reads are already CDN hits.
What is region-local vs. globally coordinated
| Concern | Region-local | Globally coordinated |
|---|---|---|
| Paste content reads | Yes — CDN edge or regional read service | No |
| Paste content bytes (S3) | Cross-region replication via S3 CRR; reads serve from nearest region bucket | Write lands in origin region; replication async |
| Metadata (Cassandra) | Regional RF=3 cluster with async cross-region replication | Custom alias creation uses LWT in home region |
| Custom alias uniqueness | No — must be globally unique | Yes — LWT in a designated "alias arbiter" region; rare operation, latency acceptable |
| View count | Incremented regionally | Aggregated asynchronously into a global counter; eventual consistency is fine for view counts |
| Burn-after-read | Attempted in originating region | Cross-region invalidation (CDN purge is global by default) |
RPO / RTO matrix
| Failure scope | Impact | RPO | RTO |
|---|---|---|---|
| Single AZ down (within region) | Reduced capacity; Cassandra RF=3 survives 1 AZ loss | 0 (no data loss) | < 30s (LB removes unhealthy pods) |
| Full region down | Traffic re-routed to nearest healthy region via geo-DNS TTL | < 5 min for async-replicated content | ~2–5 min (geo-DNS TTL) |
| S3 region outage | Cache miss reads fail; S3 CRR allows failover to replica bucket | ~15 min replication lag (typical CRR lag) | < 10 min (update endpoint to replica bucket) |
| Metadata DB corruption (logical) | Paste metadata unreadable | Point-in-time Cassandra snapshots every 6h; max 6h data loss | Hours — Cassandra restore is slow; prioritize replaying write-ahead log |
| Expiry worker outage | Expired pastes remain readable until read-time check catches them; no data integrity loss | 0 (read-time enforcement still works) | Worker restarts automatically; backlog processed within hours |
ix. Failure Modes & Mitigations
Infrastructure failures
| Failure | Blast radius / symptom | Mitigation |
|---|---|---|
| App pod crash | In-flight requests fail; LB health-check removes pod within 5s | Minimum 3 pods per region; LB health-check at 2s interval; circuit breaker in LB |
| Redis shard down | Cache misses for that shard's key space; latency spikes as requests fall through to S3/DB | Read service degrades gracefully to DB+S3; Redis cluster auto-promotes replica in < 30s; size DB tier to absorb full load |
| Redis cluster full (OOM) | Evictions begin; hit rate drops; DB and S3 absorb the load increase | volatile-lru eviction policy targets expiring keys first; alert at 75% memory used; add shard before hitting 90% |
| Cassandra node down | RF=3 with LOCAL_QUORUM writes: 1 node loss is transparent; 2 node loss: writes fail (quorum unavailable) | 3-node minimum per region; RF=3; alert on node down immediately; replace within 4h (SLA) |
| S3 throttling (503 Slow Down) | Large paste reads and writes fail; app retries with exponential backoff + jitter | Prefix sharding to spread request rate; retry with jitter (1s base, 2× up to 30s, 3 attempts); circuit breaker after 5 consecutive failures |
| S3 region outage | Cache misses can't resolve for large pastes | S3 Cross-Region Replication (CRR); update read service endpoint to replica bucket via config flag; test failover quarterly |
| CDN PoP outage | Traffic falls through to origin; origin must handle full CDN-bypass load | Multi-CDN via geo-DNS health checks (Fastly + CloudFront failover); size origin for CDN-down case |
| Region-wide outage | All traffic for that region must reroute | Geo-DNS with low TTL (60s); cross-region Cassandra replication; S3 CRR; runbook for failover activation |
Data-path pathologies
| Failure | Blast radius / symptom | Mitigation |
|---|---|---|
| Cache stampede (viral paste) | CDN TTL expires on popular paste; thousands of simultaneous cache misses hit origin | stale-while-revalidate at CDN; singleflight in-process; distributed mutex at Redis; adaptive CDN TTL extension for hot keys |
| Hot key on Redis shard | Single shard CPU-bound; latency for all keys on that shard degrades | In-process L1 LRU absorbs top-N keys; pre-signed S3 URL redirect bypasses Redis for large pastes; key splitting not needed (paste IDs are globally unique — no synonyms to fan-out) |
| Hot partition on Cassandra | Node serving that paste_id's partition sees disproportionate read load | 99%+ of reads hit CDN or Redis; Cassandra should almost never see hot-paste traffic directly |
| Replication lag (Cassandra cross-region) | Reader in EU reads older content than writer in US; typical lag < 500ms | Acceptable for pastebin; document in SLO as "eventual consistency across regions, <1s typical"; if correctness is critical, write to the reader's region via home-region routing |
| Poison paste (write succeeds, S3 object corrupted) | Readers get garbled bytes; no silent corruption — S3 returns ETag mismatch on read with integrity checks enabled | Enable S3 server-side checksum (SHA256); read service validates checksum before serving; serve 500 and alert on mismatch |
| Partial write (DB row written, S3 PUT failed) | Paste exists in metadata but content is absent; read returns 500 or empty body | Write service: S3 PUT first, then DB write. If S3 PUT fails, return error — no DB row created. If DB write fails after S3 PUT, a background reconciler finds DB-absent S3 objects and either retries the DB write or deletes the orphaned S3 object. |
| Expiry worker falling behind | Expired pastes remain in S3 / DB longer than expected; read-time check still enforces correctness | Monitor expiry queue depth; auto-scale worker pods on queue depth; S3 lifecycle rules as backstop |
| Clock skew between nodes | expires_at enforcement inconsistent across nodes; a paste may be readable on one pod and expired on another | Use NTP with <100ms tolerance; treat expires_at as a soft boundary with 5s grace on the read side; burn-after-read uses Cassandra LWT (Paxos) not wall clock |
| Cassandra compaction storm | Sustained high write QPS triggers compaction; read latency spikes as compaction steals I/O | LeveledCompactionStrategy minimizes read amplification; throttle compaction throughput via compaction_throughput_mb_per_sec; add nodes to spread compaction load |
Operational / deployment failures
| Failure | Blast radius / symptom | Mitigation |
|---|---|---|
| Bad deploy (read service) | New pods return 5xx on paste fetch | Canary deploy at 5% traffic; auto-rollback on error rate > 1% sustained 2 min; blue-green for schema-breaking changes |
| Cassandra schema migration | Adding a column is safe (Cassandra handles it); renaming or changing type is not | Always additive: new column alongside old; dual-write during migration; remove old column only after all readers on new schema; never rename in place |
| S3 bucket misconfiguration (public access) | All paste content publicly readable without auth | Block public access at bucket level; all access via pre-signed URLs or app-layer auth; quarterly S3 policy audit; AWS Macie for sensitive data detection |
| Expiry worker runaway (deletes live pastes) | Active pastes deleted prematurely | Worker reads expiry only from pastes_by_expiry; cross-checks expires_at in main table before deleting; 5-second delay between DB read and S3 delete as a sanity window |
| ID generator pod restart mid-range | Unused IDs in the claimed range are wasted — not a correctness problem, but a small key space leak | Ranges are 1M IDs; losing a partial range wastes at most ~1M IDs out of 3.5T — negligible; log range claim/return at pod startup/shutdown for audit |
x. Security & Abuse
Threat model
| Attack vector | Risk | Mitigation |
|---|---|---|
| Malware / phishing content hosting | High — short URLs and anonymous pastes are ideal phishing vehicles | Async content scan at write time (ClamAV for binaries; heuristic URL scanner for embedded links); Google Safe Browsing API check on URLs in paste content; flag for human review above a suspicion threshold; don't block writes synchronously — scan in background and tombstone within seconds of a positive hit |
| CSAM / illegal content | Critical — legal liability; platform must not host | PhotoDNA hash check for image content (text pastes: keyword signal + human review pipeline); hard delete within 60s of detection; preserve evidence hash for law enforcement in append-only audit log; legal hold prevents S3 lifecycle deletion |
| ID enumeration | Medium — attacker iterates IDs to harvest private pastes | IDs are base62 7-char (3.5T space); rate-limit 404s aggressively (10 per minute per IP / token); use non-sequential ID generation (shuffle counter range before encoding) to make sequential enumeration worthless; private pastes still require knowing the ID — obscurity is not security, but it raises the cost of enumeration significantly |
| Credential / secret dumping | High — pastes are a common accidental leak surface for API keys, passwords, private keys | Regex scan on write for known secret patterns (AWS keys, GitHub tokens, private key headers); flag and notify owner if authenticated; for anonymous pastes, tombstone and alert security team; integrate with GitHub secret scanning partner program |
| DDoS on write endpoint | Medium — 350 QPS peak write is modest; DDoS can overwhelm it easily | Rate-limit writes per IP (10/min anonymous, 100/min authenticated); CAPTCHA at write for anonymous users on abuse signals; Cloudflare / Akamai DDoS protection at edge; WAF rule for request size > 10 MB |
| DDoS on single paste (read amplification) | High — attacker shares URL broadly; origin gets slammed | CDN absorbs read DDoS for public pastes; rate-limit reads per IP at edge (1,000/min); auto-block IP on anomalous read rate; for private pastes behind auth, the auth wall is the rate limit |
| Redirect loop / open redirect via paste content | Low — pastebin doesn't redirect on content; content is served raw | N/A — no redirect functionality for content; only the short URL → paste page is a redirect, which is the intended behavior |
| GDPR / right to erasure | Medium — EU users can request deletion of their content | Deletion pipeline: tombstone DB row within 30s, S3 delete within 5 min (sync), CDN purge within 5 min; audit log records deletion event but not content; document data residency (content stored in which regions) in privacy policy |
| Audit / forensics | Regulatory — law enforcement requests for content | Append-only audit log (CloudWatch Logs + S3 + WORM bucket policy) records: create event, read events (sampled), delete events, abuse flags, legal holds; content itself is in S3 which has object-level versioning; legal hold flag in DB prevents expiry worker from deleting; legal team owns key for WORM bucket |
| Password brute-force | Medium — password-protected pastes are individually targetable | Rate-limit wrong password attempts: 5 per paste_id per 15 minutes per IP; after 3 failures, serve a CAPTCHA; bcrypt cost factor 12 (300ms on server) makes brute force expensive; lock paste_id after 10 failures from different IPs |
Takedown propagation flow
Detection → flag in DB (status = 'tombstoned') → read service serves 451 (legal / abuse takedown) → purge CDN edge cache via API (propagates to all PoPs in < 10 seconds) → delete Redis cache → S3 delete or legal hold (depending on reason) → audit log entry. Total time from detection to edge cache cleared: < 60 seconds.
xi. Observability & SLOs
SLI targets (28-day rolling window)
| SLI | Target | Error budget / month |
|---|---|---|
| Paste read availability (HTTP 2xx/3xx ratio, excluding 404/410) | 99.99% | ~4.3 minutes |
| Read p50 latency (time-to-first-byte, server-side, post-CDN) | < 20 ms | — |
| Read p99 latency (TTFB, server-side) | < 50 ms | — |
| Read p99.9 latency (TTFB) | < 200 ms | — |
| Paste write availability | 99.9% | ~43 minutes — writes are less critical than reads |
| Write p99 latency (create paste, end-to-end) | < 500 ms | — (includes S3 PUT; 95th pct < 200ms) |
| Expiry correctness (expired paste serves 404 within 5s of TTL) | 99.9% | Read-time check is the enforcement; 0.1% failure budget for clock skew edge cases |
| Content durability | 99.999999999% (S3 eleven-nines) | Inherited from S3; Cassandra RF=3 provides independent metadata durability |
Golden signals — per service
- Traffic: RPS at each tier (CDN, read service, Redis, Cassandra, S3); break out by HTTP status and by content_type (inline vs. S3 pastes)
- Errors: 5xx rate (reliability metric); 404 rate (product metric — indicates abuse/enumeration if spike); 451 rate (abuse takedown metric)
- Latency: TTFB at CDN edge (p50, p99); TTFB at read service (p50, p99, p99.9); S3 GET latency (p99 — this is the tail that controls large-paste p99); Redis GET latency
- Saturation: Redis memory used %; Cassandra pending compactions; S3 throttle errors (503 Slow Down rate); app pod CPU and open connection count
Key alerts (burn-rate rules)
- Page: read availability burn rate > 2% of monthly budget in 1 hour (indicates ~14 minutes of downtime burning in 60 min)
- Page: read p99 TTFB > 200 ms for 5 consecutive minutes (SLO miss territory)
- Page: Cassandra node down (RF=3 means one more failure away from quorum loss)
- Page: S3 error rate > 1% sustained 5 minutes (affects all large-paste reads)
- Ticket: Redis memory > 75% on any shard
- Ticket: any single paste_id > 500 QPS (hot key candidate — engage adaptive caching playbook)
- Ticket: expiry worker queue depth > 24 hours of backlog (worker falling behind)
- Ticket: write latency p99 > 1s sustained (S3 PUT or DB write degradation)
Tracing & debugging
- Trace ID propagated through CDN (via response header) → LB → read service → Redis → Cassandra → S3. A slow paste read can be unwound to the exact tier that added the latency.
- Sampled tracing: 1% baseline; 100% for all 5xx, all requests > 200 ms, and all pastes > 1 MB
- Cache hit/miss as span attributes:
cache.l1_hit,cache.redis_hit,cache.s3_fetch— the trace shows exactly which tiers fired - S3 request ID in trace: every S3 GET/PUT includes AWS request ID as a span attribute; essential for opening S3 support tickets against a specific failed request
xii. Key Takeaways
- Split metadata from content. Metadata (IDs, expiry, owner, size) belongs in a fast, indexed database. Content bytes belong in object storage. Mixing them — putting 10 MB blobs in Cassandra rows — collapses both tiers simultaneously under load.
- The inline threshold is a first-class decision. Storing small pastes (<4 KB) inline in the DB row eliminates an entire network hop for the majority of pastes. Tune this threshold based on observed size distribution, DB row limits, and S3 GET latency. It is a config value, not hard code.
- Write order matters for consistency. S3 PUT before DB write. A failed S3 PUT leaves no trace — the paste simply doesn't exist. A DB write before a failed S3 PUT leaves a dangling metadata row with no content, which is harder to clean up and confusing to read-path code.
- Expiry is a pipeline, not a flag. Read-time enforcement is the correctness guarantee. The async worker is the storage reclamation mechanism. S3 lifecycle rules are the backstop. All three must exist; each compensates for the others' failure modes.
- Pre-signed S3 URLs are the bandwidth escape hatch. For large pastes, redirect the client directly to S3. The app tier serves a 307 — not bytes. This keeps app pods stateless and thin, offloads bandwidth entirely to S3's own infrastructure, and is free to implement.
- Object storage has throughput ceilings per prefix. S3 allows ~3,500 PUT/s and ~5,500 GET/s per prefix. At low QPS this is invisible. At scale, prefix sharding (first 2 chars of the key) multiplies this ceiling by the number of prefixes. Design it in early — retrofitting prefix sharding onto an existing key scheme is painful.
- Burn-after-read is eventually consistent by design. A race between two simultaneous readers means one gets the content and one gets a 404. Document this, set user expectations, and don't try to make it atomic with distributed locking — the cure is worse than the disease.
- Abuse is load-bearing, not optional. Pastebin is one of the most abused infrastructure primitives on the internet. The content scanning pipeline, rate limiting, and takedown flow must be built before launch, not after the first incident.
- The read path is shaped by bytes, not requests. 3,500 QPS sounds modest until you multiply by 10 KB average and realize you need ~280 Mbps sustained at peak — without CDN. Architecture everything around the byte rate, not the request rate. CDN is not optional; it is load-bearing infrastructure.
- Observability at the content_type boundary. Split all metrics by whether the paste is inline or S3. A latency regression that affects only S3 pastes (large content) looks like a p99 spike but leaves p50 fine — you will miss it without the split. This is the trace attribute that pays for itself on the first incident.
xiii. Go Deeper
- How would you implement real-time collaborative editing (Google Docs–style) on top of this architecture? What changes first — the storage model, the protocol, or the consistency model?
- Design the content abuse pipeline end-to-end: ingestion → async scan → result routing → takedown → appeal. What's your false positive handling? How do you prevent legitimate pastes from being tombstoned?
- Pastebin pastes can embed URLs (phishing). Design a URL scanner that processes pastes without adding latency to the write path. How do you handle pastes with 1,000 embedded links?
- How would you implement a "paste diff" feature — showing the differences between two paste versions — at scale? Where does the diff computation happen, and how do you cache it?
- The expiry worker processes 10M pastes/day at peak. How would you design it to survive a 3-day worker outage and then catch up without overloading Cassandra or S3 on recovery?
- A user claims GDPR right-to-erasure. Their account has 50,000 pastes. Design the deletion pipeline. What's the SLA? What do you do with pastes that have been forked by other users?
- How would you implement paste encryption at rest (client-side, not server-side) so that even the platform cannot read paste content? What does this do to abuse detection?