URL Shortener
A read-heavy redirect at planet scale, designed around a single elegant constraint: never make the user wait.
i. Requirements
Functional
- Given a long URL, generate a short URL (alias)
- Given a short URL, redirect to the original long URL
- Short URLs should be unique and not collide
- Optional: custom aliases, expiration dates, analytics
Non-Functional
- High availability — redirects must always work
- Low latency — redirect < 10 ms p99
- Durability — short URLs should not disappear
- Read-heavy: ~100:1 read/write ratio
Out of Scope
- User authentication
- Full analytics dashboard
- Link preview / safety scanning
ii. Capacity Estimates
| Parameter | Value |
|---|---|
| New URLs per day | 100 million |
| New URLs per second (write QPS, avg) | ~1,200 |
| New URLs per second (write QPS, peak ≈ 3×) | ~3,500 |
| Redirect QPS (100× reads, avg) | ~120,000 |
| Redirect QPS (peak) | ~360,000 |
| URL record size | ~500 bytes (long URL + metadata) |
| Storage for 10 years | 100M × 365 × 10 × 500B ≈ 180 TB |
| Cache (top 20% → 80% traffic) | hot set fits in Redis (~1 TB/day active) |
| Read bandwidth at peak (360K × ~600B request+response) | ~200 MB/s ≈ 1.6 Gbps egress per redirect tier |
Per-tier sizing — back of the envelope
At staff level, the interesting math is not the headline QPS — it is how that QPS distributes across tiers once cache hits absorb the bulk.
| Tier | Load at peak | Sizing |
|---|---|---|
| Edge / CDN | ~70% of redirects (popular links) | Anycast PoPs; no origin capacity needed for hits |
| Redirect service (app) | ~110K QPS post-CDN | ~40–60 pods at ~2K QPS each (Go/Rust); CPU-bound on TLS + JSON |
| Redis cluster | ~99K QPS (90% L2 hit) | 6–10 shards × replica, ~50K ops/sec/node headroom |
| DB read (post-cache) | ~10K QPS (1% miss × 110K) | 3–6 Cassandra nodes per region @ RF=3, or Postgres + 5–10 read replicas |
| Write service | ~3.5K QPS peak | 10–20 pods; bottleneck is DB write, not app |
| Token range broker (ZK/etcd) | ~1 claim per 1M IDs / server | 3 or 5-node ensemble; load is trivial — only sized for HA quorum |
iii. High-Level Design
Client
│
▼
[Geo-DNS / Anycast] ──▶ [CDN Edge PoP] ──hit──▶ 302
│ miss
▼
[Regional Load Balancer]
│ │
▼ ▼
[Write Service] [Redirect Service] ──▶ [In-process LRU (L1)]
│ │ │ miss
│ ▼ ▼
[ZK/etcd range] [Redis Cluster (L2)] ──miss──▶ [DB Read Replica]
│ ▲ │
▼ │ async fill / invalidate │
[DB Primary] ──────────┴──────────────────────────────────────┘
│
▼ async
[Kafka] ──▶ [Analytics OLAP] | [Replication → other regions]
Four caching layers sit between a redirect request and the database: browser cache (when using 301), CDN edge, in-process LRU on the redirect pod, then Redis. By the time a request reaches the database, three independent caches have agreed they do not know the answer — which itself is a useful signal (the URL is either cold or non-existent).
Write path
- Client POSTs long URL
- Write service calls token generator → gets a unique short code
- Writes
{short_code → long_url, created_at, expiry}to DB primary - Optionally pre-warms cache
- Returns short URL to client
Read path (redirect)
- Client GETs
tinyurl.com/abc123 - Redirect service checks Redis cache first
- Cache hit → 301/302 redirect immediately
- Cache miss → query DB read replica → cache the result → redirect
Cache is not a database with worse durability. It is a different shape of correctness.
iv. Key Design Decisions
Short Code Length
- Alphabet: base62 =
[a-z A-Z 0-9]= 62 characters - 7-character code: 62⁷ ≈ 3.5 trillion unique codes
- At 100M/day: covers ~95 years before exhaustion
- 6 characters = 56 billion → only ~1.5 years. Use 7.
Token Generation Strategy
See patterns/token-generation for all approaches. For URL shortener:
Chosen: Zookeeper-coordinated counter ranges
- Each write server claims a range of counter values from Zookeeper (e.g., server A gets 1–1M)
- Convert counter value to base62 → short code
- Servers generate tokens locally within their range — zero DB coordination overhead
- When range exhausted, claim new range from Zookeeper
Why not MD5/SHA hash?
Hash of URL → take first 7 chars → collision risk + not human-stable. Same URL hashed twice = same code (good for dedup but complicates custom aliases).
Why not UUID?
128 bits → need to shorten anyway; introduces randomness we don't need.
301 vs 302 Redirect
| 301 Permanent | 302 Temporary | |
|---|---|---|
| Browser caches? | Yes | No |
| Analytics possible? | No — browser skips server | Yes — every redirect hits server |
| Server load | Lower | Higher |
Database Choice
See patterns/database-selection for the full decision framework. The access pattern here is almost the textbook case for a wide-column store: a single equality lookup on a high-cardinality key, no joins, no range scans on the hot path, and a write that never updates an existing row.
Schema — Cassandra
CREATE TABLE urls (
short_code text PRIMARY KEY,
long_url text,
user_id bigint,
created_at timestamp,
expires_at timestamp,
is_custom boolean,
is_disabled boolean -- soft-delete tombstone for abuse takedowns
) WITH compaction = {'class':'LeveledCompactionStrategy'}
AND default_time_to_live = 0
AND gc_grace_seconds = 864000;
short_code is the partition key — every redirect is a single-partition read, the cheapest operation Cassandra offers. Leveled compaction is chosen over Size-Tiered because reads are random and we want bounded read amplification (≤ ~10 SSTables touched per query in steady state).
Replication & consistency
- Replication factor: RF = 3 per region, deployed across three availability zones
- Write consistency:
LOCAL_QUORUM(2 of 3) — survives a single-AZ outage without losing write availability - Read consistency:
LOCAL_ONEon the redirect path — minimal latency, eventual is acceptable for an idempotent redirect - Read consistency for custom alias availability check:
LOCAL_SERIALvia lightweight transaction (LWT, Paxos-backed) — prevents two users grabbing the same custom alias under a race - Cross-region: async multi-DC replication; do not require
EACH_QUORUMon the hot path or you will inherit cross-region RTT (~150 ms)
Secondary access — queries by user_id
Cassandra's golden rule: one table per query pattern. Do not use a secondary index on user_id at scale — it scatter-gathers across nodes and degrades non-linearly. Instead, denormalize:
CREATE TABLE urls_by_user (
user_id bigint,
created_at timestamp,
short_code text,
long_url text,
PRIMARY KEY ((user_id), created_at, short_code)
) WITH CLUSTERING ORDER BY (created_at DESC);
Both tables are written in the write path. The cost is ~2× write amplification, which is fine — writes are 1% of traffic. The benefit is that "list my links" becomes a single-partition range read.
Postgres alternative — when it's enough
- Single-region, < ~500M rows, write QPS < ~5K sustained
- Index
short_codewith a hash index (Postgres 10+) or BTREE; BTREE wins if you ever want range scans - Partition the
urlstable bycreated_atmonth for cheap expiration cleanup (DROP PARTITIONinstead ofDELETE) - Streaming replication to 5–10 read replicas; route reads with a connection-pool-level read/write split (pgbouncer + a router)
Caching
See patterns/caching for full strategies. The right framing for this system is not "add a cache" but "build a four-layer cache hierarchy where each layer absorbs an order of magnitude more traffic than the next."
| Layer | Where | Target hit rate | Latency |
|---|---|---|---|
| L0 — Browser | Client (via 301 + Cache-Control) | Variable — depends on user behavior | 0 ms (no network) |
| L1 — CDN edge | CloudFront / Fastly / Cloudflare PoP | 60–80% of redirects | 5–20 ms |
| L2 — In-process LRU | Redirect pod heap (Caffeine / Ristretto) | ~50% of post-CDN traffic | < 0.01 ms |
| L3 — Redis cluster | Regional Redis | ~90% of post-LRU traffic | ~0.5 ms |
| Origin — DB | Cassandra / Postgres | ≤ 1% of original request volume | 1–5 ms |
- Strategy: Cache-aside on read path. On miss, the pod (not Redis) reads the DB and writes to L3; L2 is populated naturally by the same pod over its next requests
- Cache key:
url:{short_code}→long_url. Keep values small (< 2 KB); store metadata as a separatemeta:{short_code}key only if expiry/disabled checks are needed - TTL:
min(24h, expires_at - now)with ±10% jitter to avoid synchronized expiry storms - Eviction: Redis
allkeys-lru(approximate LRU); never run Redis at 100% memory — setmaxmemorywith ~20% headroom - Negative caching: 404s are cached too —
url:{code} = "__MISS__"with a 60 s TTL. Without this, enumeration attacks (or a single bad QR scan) can repeatedly punch through to the DB - Bloom filter: A short-code-existence bloom filter (sized for 1B keys with 1% FPR → ~1.2 GB) sits in front of the DB. Membership check is ~150 ns. Most non-existent codes never reach Redis or DB at all
- Stampede protection: Per-key in-process singleflight (Go-style request coalescing) so 10K concurrent misses on the same key become one DB read. Stack a Redis-side mutex (
SET NX EX 5s) when the in-process layer can't help (cross-pod misses) - Stale-while-revalidate: On the CDN layer, serve the stale URL while async-refetching the origin. Acceptable because URL→long_url is functionally immutable for the life of a code
v. Deep Dives
Custom Aliases
- User specifies
tinyurl.com/my-brand - Write service checks availability first (
SELECTon short_code) - Reserve these in same table; mark
is_custom = true - Risk: squatting → rate-limit custom alias creation
URL Expiration
- Store
expires_atin DB - Redirect service checks expiry before returning URL
- Background job (cron) deletes/tombstones expired records
- Cache TTL should be
min(24h, expires_at - now)
Analytics (if in scope)
- Write click events asynchronously to Kafka
- Downstream consumer aggregates into ClickHouse or similar OLAP store
- Never block the redirect on analytics writes
vi. Bottlenecks by Tier
Every system has a binding constraint. The interesting question at staff level is not "is this fast?" but "which tier saturates first, and what does it take to lift that ceiling?" Below is the redirect path ceiling at each layer with realistic single-instance numbers, and what you do when you hit them.
| Tier | Practical ceiling | Binding resource | Lift |
|---|---|---|---|
| DNS / GSLB | essentially unbounded | none (delegated) | Use anycast + edge resolvers; never homegrown |
| L7 load balancer (single) | ~100K RPS, ~10 Gbps | CPU on TLS termination, conntrack | Horizontal scale; offload TLS to dedicated tier; use HTTP/2 multiplexing |
| Redirect pod | ~2–5K RPS (Go/Rust), ~500 RPS (Node/Python) | CPU; allocations on hot path | Add pods; zero-allocation request path; HPA on CPU + p99 latency |
| In-process LRU (L2) | millions of ops/sec/pod | none — heap-bound | Size to ~100K hottest keys; Caffeine/Ristretto, never map[string]string |
| Redis node | ~80–100K ops/sec | Single-threaded command loop | Shard by short_code hash; add replicas for reads; pipeline batched lookups |
| Cassandra node | ~10–30K reads/sec; ~30–50K writes/sec | Disk IOPS on reads; compaction on writes | Add nodes (linear); LCS for read-heavy; SSDs/NVMe always |
| Postgres primary (single) | ~5–10K writes/sec, ~20–40K reads/sec | WAL fsync; lock contention | Read replicas; logical sharding by short_code prefix; eventually migrate |
| ZK / etcd range broker | ~1 RPS (claims are rare) | none | 3 or 5-node ensemble; pre-claim ranges on pod startup |
| Cross-region link | RTT, not bandwidth | ~150 ms RTT | Region-local reads; async replication; never call across regions on hot path |
vii. Hot Keys & Viral URLs
A Super Bowl ad goes live with tinyurl.com/sb-ad. Within 30 seconds that one short code accounts for 10M hits/min — 90% of total system traffic on a single key. This is the classic hot key problem, and it is the single failure mode most likely to bring a URL shortener down in production.
Why it hurts
- The Redis shard owning that key sees all of the L3 traffic for that key on one CPU core (Redis is single-threaded)
- If the L2 in-process cache misses on a fresh pod, the new pod hammers Redis for the same key during the warm-up window
- The DB shard owning that key takes the same disproportionate load on any cache miss
Mitigations — layered
- CDN absorbs the brunt. A correctly configured CDN sees one origin pull per PoP per TTL window. At ~300 PoPs and a 5-minute TTL, the origin sees ~3,600 pulls/hour for a hot URL regardless of how viral it gets
- In-process LRU with long TTL. A 100K-entry LRU per pod means a viral key sits permanently in memory after the first request. Use Caffeine (Java), Ristretto (Go), or moka (Rust) — admission-based caches resist scan pollution
- Key splitting (hot key sharding). Maintain N replicas of the hot key in Redis (
url:abc:0…url:abc:9) and route requests by request hash. Spreads single-key load across N shards. Implement only after detection — most keys don't need this - Read from any replica. In Cassandra,
LOCAL_ONEreads can land on any of the RF=3 replicas — three nodes serve the single hot partition concurrently. Add replicas (RF=5) for known-viral campaigns - Hot key detection. Sample 1% of requests, count by short_code in a sliding window (count-min sketch in Redis or in-memory). When a code exceeds a threshold (e.g., 1K QPS to origin), auto-promote it to in-process LRU with infinite TTL and increase CDN TTL via API
- Pre-warming. For known campaigns (Super Bowl, product launch), the customer can request pre-warming — the system pushes the entry into every pod's L2 and Redis before the campaign starts
viii. Multi-Region Architecture
A global URL shortener cannot serve every redirect from one region — a request from Tokyo to a US-East origin pays ~150 ms RTT before the first byte. The interesting design question is what is regional-local vs globally coordinated.
Routing
- Geo-DNS or anycast routes the user to the nearest regional cluster. AWS Global Accelerator / CloudFront, GCP Cloud Load Balancing, or DNS-based (Route53 latency routing)
- Redirect path is region-local. A US user resolves to the US region, hits the US CDN, US Redis, US Cassandra replica. Zero cross-region traffic on the hot path
Write replication
- Cassandra multi-DC with NetworkTopologyStrategy: writes commit at
LOCAL_QUORUMin the originating region; async replication to other DCs with typical propagation < 1 s - Consequence: a user in Mumbai creating a short URL and immediately scanning the QR from Tokyo may briefly hit "not found" on Tokyo's replica. Acceptable for > 99.99% of cases; rare edge case is handled below
- Read-your-writes on the same region is guaranteed because the user's session is sticky to one region; cross-region read-your-writes is not promised
Token range coordination
- Global counter, regional ranges. A single global ZK/etcd ensemble (3 or 5 nodes, geographically diverse) is the source of truth for the next available counter block. Each region claims ranges of 1M IDs and assigns sub-ranges to local pods
- Region-prefixed ranges as an alternative: bake the region ID into high bits of the counter. Removes global coordination at the cost of slightly less compact codes
- Custom alias coordination is genuinely hard: two users in different regions could request
/launchat the same time. Solve with a globally-consistent registration step — a single primary region for alias reservations, or a CRDT-friendly "first-write-wins by timestamp + region tiebreak" rule. Document the chosen semantics
RPO & RTO
| Failure scenario | RPO | RTO |
|---|---|---|
| AZ failure within region | 0 (LOCAL_QUORUM survives) | ~30 s (LB health check + retry) |
| Full region failure | ~1 s (async replication lag) | ~5 min (DNS failover) or instant (anycast) |
| Cassandra cluster corruption (logical) | up to backup interval | hours (restore + replay) |
ix. Failure Modes & Mitigations
The four-row table from a junior design is replaced here with a systematic walk through what breaks in production, why, and what the mitigation looks like. Group by where the failure originates.
Infrastructure failures
| Failure | Blast radius | Mitigation |
|---|---|---|
| Single redirect pod OOM / crash | One pod's in-flight requests fail | Multiple replicas behind LB; readiness probes; circuit breakers in clients; pod disruption budget > 1 |
| Single Redis shard down | ~1/N of cached keys evaporate; reads fall through to DB | Redis Cluster with replicas (1 primary + 1 replica per shard); auto-failover via Sentinel/Cluster gossip; DB must absorb the surge — see "cache miss storm" below |
| Entire Redis cluster down | All L3 traffic falls through to DB (~10× expected DB load) | L2 in-process LRU continues to absorb the hottest keys; DB has 2× headroom; concurrency limiter on app→DB connections drops excess requests with 503 rather than queuing |
| DB primary down (Postgres) | Writes fail; reads on replicas still work | Patroni / Stolon orchestrated failover; promote replica in ~30 s; client-side connection retries with exponential backoff |
| Cassandra node down | One of three replicas unavailable for that partition | QUORUM still satisfied with 2/3; node replacement within 24h before hint expiry; repair on rejoin |
| Full AZ outage | 1/3 of capacity offline | Multi-AZ deployment with 50% spare capacity per zone; cross-AZ LB routing |
| Full region outage | All requests in that region | Anycast / DNS failover to another region; region must be sized for ~1.5× steady state to absorb the spillover |
| ZK / etcd quorum loss | New token range claims fail | Each pod pre-claims a 1M-ID range at startup → ~14 minutes of write headroom at 1.2K QPS before any pod needs to re-claim; ZK quorum restored long before that |
| CDN partial outage | Cache miss surge to origin (3–10× normal) | Multi-CDN strategy (Fastly + CloudFront active-active by DNS); origin shield layer between CDN and origin to dedupe surges |
Data-path pathologies
| Failure | Symptom | Mitigation |
|---|---|---|
| Cache stampede on TTL expiry | 10K simultaneous misses on the same key → 10K DB reads | Singleflight per pod; Redis-side mutex via SET NX; stale-while-revalidate at CDN; jittered TTL |
| Hot key saturating one Redis shard | One CPU core at 100%, p99 spikes on that shard | L2 in-process LRU; key splitting (replicate hot key N ways); auto-detect via sampled counters |
| Cache miss storm post-deploy | Fresh pods have empty L2 → temporary surge to L3 and DB | Rolling deploy with low surge (10% at a time); request shadowing to warm new pods before they take traffic; readiness probe gates on cache warm |
| Replication lag spike (Postgres) | Read-replica serves stale data; "I just created it but get 404" | Route the immediate post-write read to the primary for N seconds (read-your-writes); monitor replication lag and remove replica from LB pool when lag > threshold |
| Cassandra read amplification on SSTable buildup | p99 read latency climbs with compaction backlog | Monitor pending compactions; throttle writes during incidents; increase compaction throughput; size disk for 2× working set |
| Token range exhaustion at one pod | Pod blocks new writes mid-range | Claim next range proactively at 80% consumed, not 100% — overlap the claim with continued writes |
| Clock skew between pods (if Snowflake used) | Out-of-order or colliding IDs | NTP enforcement; refuse to issue IDs when clock drift > 50 ms; counter-range strategy avoids this entirely (it's deterministic) |
| Hash collision (hash-based generation only) | Wrong long URL returned — silent correctness bug | Don't use truncated hashes for primary generation; if dedup is wanted, store and check, never trust the hash alone |
| Poison row (single corrupted record) | Repeated retries amplify load | Per-key circuit breaker; quarantine the row; emit alert with short_code so operations can investigate |
| Disk full on Cassandra node | Writes fail; node may enter read-only or crash | Capacity planning at 50% disk utilization steady state; alerts at 70%; archival of expired URLs to cold storage |
Operational & deployment failures
| Failure | Symptom | Mitigation |
|---|---|---|
| Bad deploy crashes redirect pods | Error rate spike, latency spike | Canary deploy (1% → 10% → 100%); automatic rollback on SLO violation; deploy frozen during high-traffic windows |
| Schema migration on Cassandra | Inconsistent schema across nodes | Migrations gated through a coordinator; schema agreement check before considering migration complete |
| Dependency upgrade (Redis 7→8) misbehavior | Subtle correctness issues | Shadow cluster running new version; mirror 1% of traffic; compare responses before cutover |
| Backup restore wipes recent writes | Data loss | Point-in-time recovery (continuous WAL archiving); commit-log replay for Cassandra; pre-restore snapshot before any destructive operation |
| Runaway analytics consumer falls behind | Kafka backlog grows, eventual storage pressure | Click events use dedicated topic with separate retention; analytics is async and never blocks the redirect |
x. Security & Abuse
A URL shortener is a redirect-as-a-service that anyone on the internet can write to. That makes it a magnet for abuse, and the abuse pathway is usually more damaging to the business than the technical failure modes. Staff-level system design accounts for this from day one.
Threat model
| Threat | Risk | Mitigation |
|---|---|---|
| Phishing / malware redirect | Brand damage; potential CSAM / regulatory exposure | Scan target URL at write time against Google Safe Browsing, PhishTank, and an internal blocklist; async re-scan periodically; is_disabled tombstone for takedowns (instant, no DB delete) |
| Open-redirect abuse for phishing | Attacker uses your domain to mask a malicious URL in an email | Interstitial warning page for URLs with low age + low click count; reputation scoring |
| Enumeration of sequential short codes | Scraping all URLs exposes private links | Counter-based IDs leak business metrics (total URL count) and are enumerable. Mitigations: (a) randomize within the assigned range, (b) sparse base62 with a permutation step, or (c) use UUIDv7 prefix + truncate for the short code. Trade-off: harder to debug |
| DDoS on a single short code | Hot-key amplification used as an attack vector | Per-IP and per-short-code rate limiting at the edge; CAPTCHA challenge above threshold; the CDN must absorb — never let the attack reach origin |
| DDoS on write endpoint | Range exhaustion, DB write saturation | Hard per-IP write rate limit (e.g., 60/min anonymous, higher with API key); CAPTCHA for anonymous writes; bot detection (header/timing heuristics) |
| Custom alias squatting | Reserving common words / brand names | Reserved-word blocklist (legal + product); rate-limit custom aliases (1 per minute per user); paid tier for premium aliases |
| Long URL points back to the shortener | Redirect loop, DoS amplification | Resolve target URL at write time; reject self-referential or loop-prone targets; cap redirect chain depth at the client level |
| GDPR / right-to-erasure | User requests deletion of their links and click history | Per-user deletion API; is_disabled = true on URLs; tombstone propagates to caches via invalidation; analytics anonymized at ingestion (no raw IPs after N days) |
| Audit / forensics | Law enforcement requests, takedown traceability | Append-only audit log of writes and takedowns; retention policy aligned with jurisdiction; signed log entries |
Takedown propagation
When abuse is detected, the URL must stop redirecting fast. The flow:
- Set
is_disabled = truein DB (single write, immediately consistent within region) - Publish
invalidate:url:{code}to a Redis pub/sub channel — all redirect pods drop their L2 entry DEL url:{code}in Redis cluster- Issue a CDN purge for the redirect URL via the CDN API
- The whole flow completes in < 5 s globally; downstream redirects return a takedown page (HTTP 410 Gone, not 404)
xi. Observability & SLOs
Service-level objectives
| SLI | Target (28-day window) | Error budget |
|---|---|---|
| Redirect availability (HTTP 2xx/3xx ratio) | 99.99% | ~4.3 minutes / month |
| Redirect p50 latency (server-side) | < 5 ms | — |
| Redirect p99 latency | < 10 ms | — |
| Redirect p99.9 latency | < 50 ms | — |
| Write availability | 99.9% | ~43 minutes / month — lower tier than reads |
| Write p99 latency | < 100 ms | — |
Golden signals — per service
- Traffic: RPS at each tier (CDN, LB, app, Redis, DB); break out by region and by status code
- Errors: 4xx vs 5xx separated; 404 rate is a product metric (indicates broken or attacked codes), 5xx is a reliability metric
- Latency: p50, p99, p99.9 at each tier; latency budget allocation per tier sums to the SLO
- Saturation: CPU %, memory %, connection pool utilization, Redis memory used, Cassandra pending compactions
Key alerts
- Page on: redirect availability burn rate > 2% of monthly budget in 1 hour
- Page on: p99 latency > 50 ms for 5 minutes
- Page on: Cassandra node down (RF=3 means one more failure = data loss)
- Ticket on: any single short_code > 1K QPS (hot key candidate)
- Ticket on: Redis memory > 80% used
- Ticket on: token range claim rate > expected (indicates a pod issue or unexpected write surge)
Tracing & debugging
- Trace ID propagated through CDN → LB → app → Redis → DB so a slow request can be unwound across the stack
- Sampled tracing: 1% baseline, 100% for all 5xx and for any request > 100 ms
- Cache hit/miss as a span attribute at every layer — the trace tells you exactly where time was spent
xii. Key Takeaways
- Read-heavy → build a four-tier cache hierarchy. CDN → in-process LRU → Redis → DB. Each tier absorbs an order of magnitude. The CDN does the heaviest lifting.
- Base62 + counter beats hashing for uniqueness guarantees at scale. Counter ranges via ZK/etcd eliminate per-write coordination.
- Cassandra with
short_codeas partition key is the textbook fit. RF=3, LOCAL_QUORUM writes, LOCAL_ONE reads, LWT only for custom alias races. - Cache is an optimization, never a dependency. Size the DB tier to absorb the cache-down case, or you've built a system that secretly requires every component to be healthy.
- Hot keys are the asymmetric risk. 99.9% of keys behave; the 0.1% that go viral can saturate one Redis shard, one Cassandra partition, one anything. Layer mitigations: CDN, in-process LRU, key splitting, replica fan-out.
- The redirect path is region-local. Anycast routing, region-local replicas, async cross-region replication. Never call across regions on the hot path.
- Abuse is the real failure mode. Phishing scans at write time, takedown propagation in seconds, enumeration-resistant ID schemes, rate limiting at the edge.
- Separate write and redirect services — they have completely different scaling profiles, SLOs, and failure tolerances. Don't deploy them together.
- 301 vs 302 is a real trade-off. 301 minimizes server load and enables browser caching; 302 keeps analytics flowing. The right answer depends on your product.
- Observability buys you the budget to ship boldly. SLOs with error budgets, golden signals at every tier, traces that span CDN to DB, and alerts that fire on burn rate — not on threshold crossings.
xiii. Go Deeper
- How would you implement URL preview (unfurl og:image) without slowing down the write path?
- Design the analytics system — how do you count unique visitors using HyperLogLog while staying under storage and privacy constraints?
- How would you migrate from Postgres to Cassandra without downtime — what's the dual-write / shadow-read plan?
- How would you implement A/B-tested redirects (one short URL → different targets per user segment) without breaking caching?
- If the system has to support 1M write QPS, what changes first — token generation, write service, or DB?
- How would you build a privacy-preserving click-analytics pipeline that still gives the URL owner useful aggregates?