Vol. I  ·  Pattern §3 ← back to cover MMXXVI
Pattern · Reference

Database Selection

Every design hinges on a storage decision — start with access patterns, not database popularity.

i. The Decision Framework

Ask these questions in order:

  1. What is the access pattern? Key-value lookup? Range scan? Full-text search? Graph traversal?
  2. What are the consistency requirements? Strong consistency or eventual OK?
  3. What is the scale? Single node? Horizontal? Multi-region?
  4. What is the read/write ratio? Read-heavy → indexes + replicas. Write-heavy → LSM, partitioning.
  5. Do you need transactions? Multi-row ACID? Cross-shard?

ii. Postgres (SQL / Relational)

Best for: Structured data with relationships, ACID transactions, complex queries, moderate scale.

StrengthDetail
ACID transactionsSafe for financial data, inventory, anything requiring atomicity
Flexible queryingJOINs, aggregations, subqueries — no schema pre-planning needed
Mature ecosystempgvector, PostGIS, JSON support, rich extension library
Read replicasStreaming replication for read scale-out

Weaknesses: Horizontal write scaling requires sharding (adds complexity). Schema migrations at scale are painful.

iii. Cassandra (Wide-Column / NoSQL)

Best for: Write-heavy, globally distributed, time-series, pure key-value at massive scale.

StrengthDetail
Horizontal scalingAdd nodes linearly; no primary bottleneck
Write throughputLSM-tree writes to memory (MemTable) first → very fast
Multi-regionBuilt-in multi-master replication across DCs
Tunable consistencyPer-query: ONE, QUORUM, ALL
Time-series nativeWide rows model fits (user_id, timestamp) patterns

Weaknesses: No JOINs — must denormalize; one table per query pattern. No multi-row transactions. Data model design is query-driven — harder to evolve. Eventual consistency by default.

When to choose Cassandra: Write QPS > 10K/s sustained. Data fits partition key → clustering key → value. Multi-region is a hard requirement. You can live with eventual consistency.

iv. Redis (In-Memory / Cache + Data Structure Store)

Best for: Caching, sessions, pub/sub, leaderboards, counters, queues.

StrengthDetail
Sub-millisecond latencyAll operations in-memory
Rich data structuresSorted sets, streams, hyperloglogs, bloom filters
Atomic operationsINCR, LPUSH, etc. — race-condition-free counters
TTL supportNative expiry on any key

Weaknesses: Memory-limited. Persistence optional — not a true DB replacement. Single-threaded command execution.

Use as: Cache layer in front of any DB, session store, rate-limit counter, distributed lock (Redlock), real-time leaderboard.

v. DynamoDB (Managed Key-Value / Document)

Best for: Serverless, AWS-native, unpredictable traffic (auto-scaling), simple access patterns.

StrengthDetail
Fully managedNo ops burden
Auto-scalingHandles traffic spikes automatically
DynamoDB StreamsCDC for downstream consumers
Global TablesMulti-region replication

Weaknesses: Expensive at high sustained throughput. Query flexibility limited to partition + sort key. Vendor lock-in.

vi. Search & OLAP Stores

Elasticsearch / OpenSearch

Best for: Full-text search, log analytics, faceted search.

ClickHouse / OLAP Stores

Best for: Analytics queries over billions of rows (aggregations, GROUP BY, time-series analytics).

vii. Sharding Strategies

Range Sharding

Shard 1: user_id 0–1M
Shard 2: user_id 1M–2M
...

Hash Sharding

shard = hash(user_id) % num_shards

Consistent Hashing

Directory-Based Sharding

viii. Replication Patterns

PatternHowUse case
Leader-FollowerWrites to leader, replicas sync asyncMost common; reads scale-out; leader is write bottleneck
Multi-LeaderMultiple write nodesMulti-region writes; conflict resolution complexity
Leaderless (Quorum)Write to W nodes, read from R nodes; W+R > NCassandra, DynamoDB; no SPOF; tunable consistency

ix. Quick Decision Table

ScenarioPick
Financial transactions, complex queriesPostgres
High-write, time-series, multi-regionCassandra
Caching, sessions, real-time countersRedis
Full-text searchElasticsearch
Analytics over billions of rowsClickHouse
Serverless / AWS / variable trafficDynamoDB
Graph relationshipsNeo4j

x. Key Points