Ledger System Design

A Senior Staff Engineer's Interview Guide for Stripe

Part 1: Financial Foundations

10-15 minutes

Before diving into distributed systems complexities, you need to speak the language of finance. Interviewers at Stripe expect you to understand why accounting systems work the way they do, not just how to build them.

Why Double-Entry Accounting Exists

Double-entry bookkeeping was formalized in 1494 by Luca Pacioli, but its brilliance is timeless: every transaction affects at least two accounts, and the sum of all debits must equal the sum of all credits. This isn't bureaucratic overhead—it's an error-detection mechanism built into the data model itself.

The Core Insight

Double-entry is essentially a checksum for money. If debits ≠ credits, something went wrong. You don't need to find the bug—the system tells you one exists.

Consider what happens without double-entry: if you simply track account balances, a bug that adds $100 to one account without removing it from another is invisible until external reconciliation catches it—potentially months later.

Debits vs Credits: The Mental Model

The debit/credit system confuses many engineers because the words seem backwards. Here's the model that actually makes sense:

The Balance Sheet Equation

Assets = Liabilities + Equity

Rearranged: Assets - Liabilities - Equity = 0

Account Type Debit (DR) Credit (CR) Normal Balance
Assets (Cash, Receivables) Increases (+) Decreases (-) Debit
Liabilities (Payables, Deposits) Decreases (-) Increases (+) Credit
Equity (Revenue, Capital) Decreases (-) Increases (+) Credit
Revenue Decreases (-) Increases (+) Credit
Expenses Increases (+) Decreases (-) Debit

Practical example at Stripe: When a customer pays $100 for a merchant's product:

Cash (Asset) DR CR +$100 Customer Funds (Liability) DR CR +$100 Journal Entry: DR Cash $100 CR Customer Funds $100 Debit increases assets Credit increases liabilities Sum: DR = CR ✓

Figure 1.1: T-accounts showing a $100 payment. Every transaction has balanced debits and credits.

Journal → Ledger → Trial Balance

Financial systems process transactions through a specific flow:

Journal • Chronological record • Each entry is complete • Immutable (append-only) post Ledger • Grouped by account • Running balances • Derived from journal summarize Trial Balance • All account totals • Verifies DR = CR • Point-in-time snapshot Source of Truth Queryable State Validation The journal is the source of truth. Everything else is a derived view.

Figure 1.2: The flow from journal entries to ledger accounts to trial balance.

Balance Computation: Stored vs Computed

This is your first major architectural decision. How do you answer "What's the balance of account X?"

Option A: Stored Balances

SELECT balance
FROM accounts
WHERE id = 'acct_123';

Pros:

  • O(1) read performance
  • Simple queries

Cons:

  • Must update on every transaction
  • Risk of drift from journal
  • Historical balances require snapshots

Option B: Computed Balances

SELECT SUM(CASE
    WHEN type='debit' THEN amount
    ELSE -amount END)
FROM entries
WHERE account_id = 'acct_123';

Pros:

  • Always accurate (no drift)
  • Natural point-in-time queries
  • Simpler write path

Cons:

  • O(n) read performance
  • Expensive for active accounts
The Hybrid Approach (What Stripe Does)

In practice, most production systems use both: compute from journal entries as the source of truth, but maintain cached/materialized balances for performance. The challenge is keeping them in sync—more on this in Part 3.

Why Immutability is Non-Negotiable

In financial systems, you never delete or modify records. This isn't just good practice—it's often legally required.

Original Entry #1001 DR Cash $100, CR Revenue $100 WRONG UPDATE entry SET... CORRECT Entry #1002 (reversal): DR Revenue $100, CR Cash $100 Entry #1003 (corrected): DR Cash $80, CR Revenue $80 Errors are corrected by adding reversing entries, preserving history.

Figure 1.3: Correcting errors with reversing entries, not mutations.

Deep Dive: Types of Reversing Entries

Full Reversal: Completely undo the original entry (same accounts, opposite direction).

Partial Reversal: Undo part of an entry (e.g., $20 of a $100 charge).

Adjustment Entry: Add a separate entry that adjusts the net effect without explicitly reversing.

The reversal entry typically references the original entry ID for traceability:

INSERT INTO entries (account_id, amount, type, reverses_entry_id)
VALUES ('acct_revenue', 100, 'debit', 1001);

Part 2: Core Data Model Design

10-15 minutes

With financial concepts understood, let's design the schema. The data model you propose in an interview reveals whether you truly understand the domain.

Account Hierarchies and Chart of Accounts

A Chart of Accounts (COA) is the complete listing of all accounts in a ledger, organized hierarchically. At Stripe-scale, this hierarchy serves multiple purposes:

Root Assets Liabilities Equity Cash Receivables Merchant Payable Customer Deposits merchant_001 merchant_002 Legend: Root Category Sub-category Leaf (transactable)

Figure 2.1: Account hierarchy for a payment processor. Only leaf nodes hold actual balances.

Transaction vs Entry: A Critical Distinction

This distinction trips up many engineers. In accounting:

A transaction always contains two or more entries. The entries must balance (sum to zero).

Why This Matters for Schema Design

Many engineers mistakenly create a single "transactions" table with from_account and to_account columns. This works for simple transfers but breaks down for:

  • Fee splits (1 debit, 3+ credits)
  • Currency conversion (multiple currencies)
  • Compound transactions (payment + fee + tax)

Schema Design with PostgreSQL

-- Core tables for a production ledger

CREATE TABLE accounts (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    parent_id       UUID REFERENCES accounts(id),
    code            VARCHAR(50) UNIQUE NOT NULL,  -- e.g., 'assets.cash.stripe_balance'
    name            VARCHAR(255) NOT NULL,
    type            VARCHAR(20) NOT NULL CHECK (type IN ('asset', 'liability', 'equity', 'revenue', 'expense')),
    currency        CHAR(3) NOT NULL DEFAULT 'USD',
    normal_balance  VARCHAR(6) NOT NULL CHECK (normal_balance IN ('debit', 'credit')),
    is_leaf         BOOLEAN NOT NULL DEFAULT true,  -- Only leaf accounts can hold entries
    metadata        JSONB DEFAULT '{}',
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),

    CONSTRAINT leaf_or_parent CHECK (is_leaf OR parent_id IS NOT NULL)
);

CREATE TABLE transactions (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    idempotency_key VARCHAR(255) UNIQUE,  -- Prevents duplicate processing
    description     TEXT,
    metadata        JSONB DEFAULT '{}',

    -- Temporal fields (see explanation below)
    event_time      TIMESTAMPTZ NOT NULL,      -- When the business event occurred
    processed_at    TIMESTAMPTZ NOT NULL DEFAULT now(),  -- When we recorded it
    effective_date  DATE NOT NULL,             -- When it affects balances

    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE entries (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    transaction_id  UUID NOT NULL REFERENCES transactions(id),
    account_id      UUID NOT NULL REFERENCES accounts(id),
    entry_type      VARCHAR(6) NOT NULL CHECK (entry_type IN ('debit', 'credit')),
    amount          BIGINT NOT NULL CHECK (amount > 0),  -- Always positive, in smallest unit
    currency        CHAR(3) NOT NULL,

    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Ensure transactions balance
CREATE OR REPLACE FUNCTION check_transaction_balance()
RETURNS TRIGGER AS $$
BEGIN
    IF (
        SELECT SUM(CASE WHEN entry_type = 'debit' THEN amount ELSE -amount END)
        FROM entries
        WHERE transaction_id = NEW.transaction_id
    ) != 0 THEN
        RAISE EXCEPTION 'Transaction % does not balance', NEW.transaction_id;
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Indexes for common queries
CREATE INDEX idx_entries_account_id ON entries(account_id);
CREATE INDEX idx_entries_transaction_id ON entries(transaction_id);
CREATE INDEX idx_transactions_effective_date ON transactions(effective_date);
CREATE INDEX idx_transactions_idempotency ON transactions(idempotency_key) WHERE idempotency_key IS NOT NULL;
Design Decision: Amount Storage

Store amounts as BIGINT in the smallest currency unit (cents for USD, yen for JPY). Never use floating-point for money. The schema above uses BIGINT which handles up to ~$92 quadrillion in cents—sufficient for any practical use case.

Multi-Currency Considerations

Two approaches exist for handling multiple currencies:

Option 1: Currency per Account

Each account holds one currency. Cross-currency transfers require conversion entries.

USD Cash $1,000 Convert Rate: 1.10 EUR Cash €909 Transaction entries: CR USD Cash $1,000 DR EUR Cash €909 DR FX Gain/Loss $91*

Figure 2.2: Currency conversion with separate currency accounts. *FX gain/loss tracks conversion impact.

Option 2: Multi-denomination Accounts

Single account holds multiple currency balances. Simpler for some use cases, but complicates aggregation.

-- Alternative: store currency with each entry, allow mixed currencies per account
SELECT currency, SUM(CASE WHEN entry_type = 'debit' THEN amount ELSE -amount END) as balance
FROM entries
WHERE account_id = 'acct_123'
GROUP BY currency;

The Three Timestamps

Financial systems need three distinct temporal concepts. Getting this wrong causes subtle, hard-to-debug issues.

Timestamp Definition Example Mutable?
Event Time When the business event occurred in the real world Customer swiped card at 3:42pm No
Processing Time When your system recorded the event Network latency: recorded at 3:42:03pm No
Effective Date When the entry affects balances for reporting Settlement date: tomorrow Sometimes*

*Effective date may be set at creation and adjusted for settlements, but the adjustment itself is recorded.

Why Effective Date Matters

A payment authorized on Dec 31 but settled on Jan 2 should appear in January's books for accounting purposes, even though the event occurred in December. This is critical for financial reporting and reconciliation.

Part 3: Consistency & Correctness

15-20 minutes

This is where financial systems get hard. Anyone can build a ledger that works when everything goes right. The challenge is ensuring correctness when things go wrong—and in distributed systems, things always go wrong.

The "Balance Can Never Go Negative" Constraint

For many account types (e.g., a user's spending account), you must guarantee the balance never drops below zero. This seemingly simple constraint has profound architectural implications.

Why This Is Hard

With concurrent transactions, two requests might both read a balance of $100, both decide to spend $80, and both succeed—leaving the account at -$60. Without proper concurrency control, you've created money out of thin air.

Approach 1: Pessimistic Locking

Lock the account before reading, hold lock through write.

BEGIN;

-- Acquire exclusive lock on the account row
SELECT * FROM accounts
WHERE id = 'acct_123'
FOR UPDATE;

-- Compute current balance (or read cached balance)
SELECT COALESCE(SUM(
    CASE WHEN entry_type = 'debit' THEN amount ELSE -amount END
), 0) AS balance
FROM entries
WHERE account_id = 'acct_123';

-- If balance >= requested amount, insert entries
INSERT INTO entries (...) VALUES (...);

COMMIT;
Pros Cons
Simple to reason about Serializes all operations on that account
Strong consistency guarantee Deadlock potential with multi-account txns
Works with any database Doesn't scale for hot accounts

Approach 2: Optimistic Locking with Version Check

Read balance, attempt write with version check, retry on conflict.

-- Add version column to accounts table
ALTER TABLE accounts ADD COLUMN version BIGINT DEFAULT 0;
ALTER TABLE accounts ADD COLUMN cached_balance BIGINT DEFAULT 0;

-- Optimistic update pattern
UPDATE accounts
SET
    cached_balance = cached_balance - 8000,  -- $80.00 in cents
    version = version + 1
WHERE id = 'acct_123'
    AND version = 42  -- Expected version from read
    AND cached_balance >= 8000;  -- Balance check

-- Check rows affected: 0 = conflict, retry; 1 = success
Pros Cons
Higher throughput for low contention Retry storms under high contention
No lock waiting Requires version tracking
Better for read-heavy workloads Cached balance can drift (needs reconciliation)
Concurrent Transaction Conflict - Optimistic Locking Time Transaction A Transaction B t1 Read: bal=$100, v=1 Read: bal=$100, v=1 t2 Write $80 WHERE v=1 t3 COMMIT (v=2, bal=$20) t4 Write $80 WHERE v=1 t5 0 rows affected → RETRY Version mismatch!

Figure 3.1: Optimistic locking detects conflicts through version mismatch, forcing retry.

Idempotency Keys: Design and Implementation

Network failures mean clients may retry requests. Without idempotency, retries can cause duplicate transactions. This is critical for financial systems.

CREATE TABLE idempotency_keys (
    key             VARCHAR(255) PRIMARY KEY,
    request_hash    BYTEA NOT NULL,      -- Hash of request parameters
    response        JSONB,                 -- Cached response
    transaction_id  UUID REFERENCES transactions(id),
    status          VARCHAR(20) NOT NULL DEFAULT 'processing',
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    expires_at      TIMESTAMPTZ NOT NULL   -- Typically 24-72 hours
);

CREATE INDEX idx_idempotency_expires ON idempotency_keys(expires_at);

The Idempotency Flow

Request arrives Key exists? (SELECT FOR UPDATE) No INSERT key (processing) Process transaction UPDATE key (completed) Yes Status? completed Return cached processing 409 Conflict / Wait Return response

Figure 3.2: Idempotency key flow. The SELECT FOR UPDATE prevents race conditions between concurrent retries.

Request Hash Validation

Always hash and store the request parameters with the idempotency key. If a retry comes with the same key but different parameters, that's a bug in the client—return an error, don't process it.

Distributed Transactions: Patterns for Multi-Account Operations

What happens when a transaction touches accounts on different database shards? Or when you need to coordinate with external systems?

Pattern 1: Two-Phase Commit (2PC)

Coordinator Shard A (Debits) Shard B (Credits) External System Phase 1: Prepare PREPARE PREPARE PREPARE YES YES Phase 2: Commit (if all YES) or Abort (if any NO)

Figure 3.3: Two-Phase Commit coordinates atomic commits across multiple participants.

2PC Pros 2PC Cons
Strong atomicity guarantee Blocking: participants hold locks during entire protocol
Well-understood protocol Coordinator failure can leave participants stuck
Built into some databases Latency: requires multiple round-trips

Pattern 2: Saga with Compensating Transactions

For long-running or cross-service transactions, Sagas break the operation into steps, each with a compensating action.

T1: Debit Source Account C1: Credit back T2: Reserve Inventory C2: Release inv. T3: Credit Merchant C3: Debit back T4: Notify FAILS! Execute C3 → C2 → C1 (compensate in reverse order) T = Transaction step C = Compensating action

Figure 3.4: Saga pattern. If step 4 fails, compensating transactions undo steps 3, 2, 1 in reverse.

Saga Considerations
  • Compensating actions must be idempotent
  • Intermediate states are visible (eventual consistency)
  • Some operations can't be compensated (sending an email)
  • Need to handle compensation failures too

Pattern 3: Transactional Outbox

The outbox pattern ensures reliable event publishing alongside database changes.

-- Within the same database transaction:
BEGIN;

-- 1. Write to ledger
INSERT INTO entries (...) VALUES (...);

-- 2. Write event to outbox table
INSERT INTO outbox (event_type, payload, created_at)
VALUES ('TransactionCompleted', '{"txn_id": "..."}', now());

COMMIT;

-- Separate process polls outbox and publishes to message queue
-- Deletes from outbox after confirmed delivery

Handling Partial Failures Gracefully

In distributed systems, you must handle scenarios where some operations succeed and others fail.

The Stripe Philosophy

Stripe's public engineering content emphasizes: "It's better to fail open with proper cleanup than to leave money in limbo." Design for recovery, not just success.

Failure Mode Catalog
Failure Detection Recovery
Network timeout during commit Query transaction status Idempotent retry or confirm existing
Database crash after partial write Recovery log shows uncommitted Auto-rollback by DB
Application crash mid-saga Saga state table shows incomplete Resume or compensate based on state
External system unavailable Timeout/error response Queue for retry, alert if prolonged

Part 4: Scale & Performance

20-25 minutes

Critical Section

This is the section interviewers care most about. The hardest question in ledger design: How do you scale writes while maintaining strong consistency guarantees?

4a: Single-Primary Optimization

The Hot Account Problem

Not all accounts are equal. A merchant settlement account at Stripe might see thousands of transactions per second, while most user accounts see one per day. This creates hot spots.

Account Transaction Volume Distribution Settlement Platform Popular Merchant Long tail of regular accounts ~1000 TPS

Figure 4.1: Transaction volume follows a power law. A few accounts generate enormous load.

Vertical Scaling Limits

Before distributing, exhaust single-node optimizations:

Practical PostgreSQL limits on modern hardware (2024):

Read Replicas and Consistency Windows

-- Route balance checks to replica (eventual consistency OK for display)
-- postgresql.conf on replica: hot_standby_feedback = on

-- Application routing logic:
if query.is_read_only and not query.requires_strong_consistency:
    connection = replica_pool.get_connection()
else:
    connection = primary_pool.get_connection()
Replication Lag Dangers

If a user checks balance on a replica, then immediately spends based on that balance with a write to primary, the replica's stale read could show $100 when the primary has $0. Always check balance constraints on the primary.

4b: Beyond Single Primary - Distributed Consistency

This is what Stripe interviews probe deeply

When a single PostgreSQL primary can't handle your write volume, you have three main options:

Option 1: Application-Level Sharding

Partition accounts across multiple database instances. Each shard is a fully independent PostgreSQL cluster.

Application + Shard Router Shard 1 Accounts A-H Primary + Replicas Shard 2 Accounts I-P Primary + Replicas Shard 3 Accounts Q-Z Primary + Replicas Problem: Transfer from account "Alice" (Shard 1) to "Quinn" (Shard 3)?

Figure 4.2: Application-level sharding by account. Cross-shard transactions are the hard part.

Cross-shard transaction strategies:

  1. Saga with compensating transactions: Accept eventual consistency between shards
  2. Two-phase commit: Coordinate across shards (latency cost)
  3. Avoid cross-shard transactions: Design shard keys to keep related accounts together
-- Shard key design: Use merchant_id as shard key
-- All accounts for a merchant live on same shard

def get_shard(account_id):
    merchant_id = account_to_merchant[account_id]
    return hash(merchant_id) % NUM_SHARDS

-- Payment: customer → merchant (same merchant = same shard = local txn)
-- Settlement: merchant → Stripe bank (cross-shard, can be async)

Option 2: Distributed SQL (Spanner, CockroachDB, TiDB)

These databases provide the abstraction of a single logical database while distributing data across nodes.

Distributed SQL Architecture (Spanner-like) TrueTime / Hybrid Logical Clocks - Global ordering without coordination Node 1 (us-east) Range: accounts[a-m] Raft Leader for range Node 2 (us-west) Range: accounts[n-z] Raft Leader for range Node 3 (eu-west) Replicas of both Raft Follower Raft replication Key Insight: Cross-range transactions use 2PC internally, but the database handles it - you write normal SQL.

Figure 4.3: Distributed SQL systems handle sharding and cross-shard transactions automatically.

System Consistency Latency Use Case
Google Spanner External consistency (strongest) Higher (TrueTime wait) Google-scale, strictest requirements
CockroachDB Serializable Moderate Spanner-like, open source
TiDB Snapshot isolation (default) Lower MySQL-compatible, good OLTP
YugabyteDB Serializable / Snapshot Moderate PostgreSQL-compatible
When Banks Use Distributed SQL

Major banks and fintech companies use Spanner/CockroachDB for specific workloads: high-volume payment processing, real-time fraud scoring, and global balance tracking. Core general ledgers often remain on traditional databases with application-level sharding.

Option 3: Partition by Account with Careful Boundary Design

The key insight: most transactions don't need to be atomic across accounts. Design your system so that cross-partition transactions are rare or can be eventually consistent.

-- Account "ownership" model
-- Each transaction type has a natural owner

-- Payment: owned by merchant (merchant's shard handles atomically)
INSERT INTO entries VALUES
    ('customer_funds_held', -100),  -- Stripe's liability to customer
    ('merchant_receivable', +97),   -- What we owe merchant
    ('stripe_revenue', +3);         -- Our fee

-- The customer's balance update happens on customer's shard
-- via async message, with idempotency guarantee

The Stripe Approach (Public Knowledge)

Based on Stripe's published engineering content:

Deep Dive: How Stripe Handles Settlement Accounts

Settlement accounts (where all merchant payouts originate) would be impossibly hot if naively implemented. Stripe's approach (from public talks):

  1. Virtual sub-accounts: Each merchant has a virtual position, aggregated for settlement
  2. Batch processing: Settlements are batched, not real-time
  3. Pre-funding: Settlement accounts are pre-funded, avoiding negative balance checks
  4. Separate write path: Settlement transactions bypass the hot account entirely until batch commit

4c: Caching and Async Processing

Balance Caching: When It's Safe

Safe to Cache

  • Display balances (UI)
  • Analytics/reporting
  • Fraud screening (approximate is OK)
  • Notification triggers

Never Cache

  • Spend authorization
  • Withdrawal limits
  • Overdraft protection
  • Any debit decision
-- Materialized balance with staleness tracking
CREATE TABLE balance_cache (
    account_id      UUID PRIMARY KEY,
    balance         BIGINT NOT NULL,
    last_entry_id   UUID NOT NULL,  -- Last entry included in balance
    computed_at     TIMESTAMPTZ NOT NULL,

    -- Verify cache freshness
    CONSTRAINT balance_not_stale CHECK (
        computed_at > now() - INTERVAL '5 minutes'
    )
);

-- Async worker keeps cache updated
-- On cache miss or staleness, compute from entries

What Can Be Eventual vs Synchronous

Operation Consistency Rationale
Payment capture Synchronous Customer expects immediate confirmation
Balance display Eventual (seconds) Slight staleness acceptable for UX
Merchant payout Eventual (hours) Batch processing is expected
Spend authorization Synchronous Must check real balance
Statement generation Eventual (daily) Periodic batch job
Refund processing Eventual (minutes) Queue-based, retry on failure

Part 5: Auditability & Compliance

10-15 minutes

Financial systems face regulatory scrutiny that most software doesn't. Understanding these requirements shows interview maturity.

Immutable Audit Trails

Every change must be traceable. The append-only design of ledgers naturally supports this, but you need additional metadata.

CREATE TABLE audit_log (
    id              BIGSERIAL PRIMARY KEY,
    event_type      VARCHAR(50) NOT NULL,
    entity_type     VARCHAR(50) NOT NULL,
    entity_id       UUID NOT NULL,

    -- Who/what made the change
    actor_type      VARCHAR(20) NOT NULL,  -- 'user', 'system', 'api_key'
    actor_id        VARCHAR(255) NOT NULL,

    -- What changed
    old_value       JSONB,
    new_value       JSONB,

    -- Context
    ip_address      INET,
    user_agent      TEXT,
    request_id      UUID,

    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Write-once: no updates or deletes allowed
CREATE RULE audit_log_no_update AS ON UPDATE TO audit_log DO INSTEAD NOTHING;
CREATE RULE audit_log_no_delete AS ON DELETE TO audit_log DO INSTEAD NOTHING;

Reconciliation Systems

Reconciliation verifies that your internal records match reality. There are two types:

Internal Reconciliation

Verify internal consistency: Do cached balances match computed balances? Do all transactions balance?

-- Daily internal reconciliation job

-- 1. Verify all transactions balance (debits = credits)
SELECT t.id,
       SUM(CASE WHEN e.entry_type = 'debit' THEN e.amount ELSE 0 END) as total_debits,
       SUM(CASE WHEN e.entry_type = 'credit' THEN e.amount ELSE 0 END) as total_credits
FROM transactions t
JOIN entries e ON e.transaction_id = t.id
GROUP BY t.id
HAVING SUM(CASE WHEN e.entry_type = 'debit' THEN e.amount ELSE -e.amount END) != 0;
-- Should return 0 rows

-- 2. Verify cached balances match computed
SELECT bc.account_id,
       bc.balance as cached,
       SUM(CASE WHEN e.entry_type = 'debit' THEN e.amount ELSE -e.amount END) as computed
FROM balance_cache bc
JOIN entries e ON e.account_id = bc.account_id
GROUP BY bc.account_id, bc.balance
HAVING bc.balance != SUM(CASE WHEN e.entry_type = 'debit' THEN e.amount ELSE -e.amount END);

External Reconciliation

Match your records against external systems (banks, card networks, payment processors).

Internal Ledger $1,234,567.89 as of 2024-01-15 Reconciliation Match by reference Flag discrepancies Bank Statement $1,234,567.89 as of 2024-01-15 ✓ Matched

Figure 5.1: External reconciliation matches internal records against bank statements and partner reports.

Point-in-Time Balance Queries

Regulators and auditors need to know account balances at any historical point. This is called "time travel" querying.

-- Balance as of a specific date/time
CREATE FUNCTION balance_at(
    p_account_id UUID,
    p_as_of TIMESTAMPTZ
) RETURNS BIGINT AS $$
    SELECT COALESCE(SUM(
        CASE WHEN e.entry_type = 'debit' THEN e.amount ELSE -e.amount END
    ), 0)
    FROM entries e
    JOIN transactions t ON t.id = e.transaction_id
    WHERE e.account_id = p_account_id
      AND t.effective_date <= p_as_of::date;
$$ LANGUAGE SQL STABLE;

-- Usage: What was merchant_123's balance on Dec 31, 2023?
SELECT balance_at('merchant_123', '2023-12-31 23:59:59');

Regulatory Requirements Overview

Regulation Applies To Key Ledger Requirements
SOX (Sarbanes-Oxley) Public companies Audit trails, access controls, change tracking
PCI-DSS Card payment handlers Encryption, access logging, data retention limits
GDPR EU personal data Data minimization, right to erasure (conflicts with immutability!)
AML/KYC Money transmission Transaction monitoring, suspicious activity reporting
GDPR vs Immutability

GDPR's "right to erasure" conflicts with ledger immutability. The solution: separate PII from financial records. The ledger references an anonymized account ID; personal details live in a separate system that can be purged. Financial records remain intact but de-identified.

Soft Deletes and Corrections

Nothing is ever truly deleted. "Deletions" are reversing entries.

-- "Voiding" an incorrect transaction
INSERT INTO transactions (idempotency_key, description, event_time, effective_date)
VALUES (
    'void_txn_12345',
    'Void: Original transaction 12345 entered in error',
    now(),
    CURRENT_DATE
);

-- Insert reversing entries (opposite direction of original)
INSERT INTO entries (transaction_id, account_id, entry_type, amount, currency)
SELECT
    'new_void_txn_id',
    account_id,
    CASE WHEN entry_type = 'debit' THEN 'credit' ELSE 'debit' END,
    amount,
    currency
FROM entries
WHERE transaction_id = 'original_txn_12345';

Part 6: Advanced Architectural Patterns

10-15 minutes

These patterns come up in senior interviews. Know when to use them and when they're overkill.

Event Sourcing for Ledgers

Event sourcing stores state as a sequence of events rather than current values. Ledgers are naturally event-sourced—the journal IS the event log.

Event Store (Journal) AccountOpened {id: 123, ...} MoneyDeposited {amt: 100, ...} MoneyWithdrawn {amt: 30, ...} project Balance View account_123: $70 Statement View Monthly summary... Key Insight: Events are facts. Views are interpretations. Rebuild any view from events.

Figure 6.1: Event sourcing naturally fits ledger design. The journal is the source of truth.

Event Sourcing Pros Event Sourcing Cons
Perfect audit trail Eventual consistency for read models
Rebuild any view from events Complex to query across aggregates
Temporal queries trivial Event schema evolution is hard
Easy debugging (replay events) Storage grows unboundedly

CQRS: Separating Read and Write Models

Command Query Responsibility Segregation uses different models for writes (commands) and reads (queries).

Client Commands Write Model Validates business rules Writes to event store Optimized for consistency Queries Read Model Denormalized views Pre-computed aggregates Optimized for queries Async projection / Event handlers

Figure 6.2: CQRS separates the write path (optimized for consistency) from read path (optimized for queries).

When to Use CQRS
  • Read and write patterns differ significantly
  • Read scale >> write scale
  • Complex queries that don't fit the write model
  • Different consistency requirements for reads vs writes

When not to use: Simple CRUD apps, low scale, when added complexity isn't justified.

Multi-Region Deployments

For global availability and latency, you may need ledgers in multiple regions. This introduces consistency challenges.

Strategies:

  1. Single primary, global replicas: Writes go to one region, reads can be local. Simplest, but write latency for distant users.
  2. Active-passive failover: Secondary region takes over on primary failure. Potential data loss during failover.
  3. Active-active with partitioning: Each region owns certain accounts. Cross-region transactions need coordination.
  4. Distributed SQL (Spanner): Automatic multi-region with strong consistency. Highest latency, simplest programming model.
Multi-Region Financial Systems

Most financial institutions choose single primary with synchronous replication to a nearby region for durability. Global active-active is rare for core ledgers due to consistency challenges. Instead, they use regional ledgers with async reconciliation.

Testing Strategies for Financial Correctness

-- Property-based testing: Every transaction must balance
def test_transactions_always_balance(transaction):
    entries = create_transaction(transaction)
    debits = sum(e.amount for e in entries if e.type == 'debit')
    credits = sum(e.amount for e in entries if e.type == 'credit')
    assert debits == credits, f"Unbalanced: {debits} != {credits}"

-- Invariant checking: Balance never negative (for certain accounts)
def test_balance_never_negative():
    for account in get_non_credit_accounts():
        balance = compute_balance(account)
        assert balance >= 0, f"{account} has negative balance: {balance}"

-- Reconciliation test: Cached = Computed
def test_cache_consistency():
    for account in sample_accounts(1000):
        cached = get_cached_balance(account)
        computed = compute_balance_from_entries(account)
        assert cached == computed, f"Cache drift: {cached} vs {computed}"

Testing Categories

Part 7: Interview Execution Strategy

10 minutes

Knowing the material isn't enough. You need to present it effectively under interview pressure.

Structuring Your 45-60 Minute Answer

Phase Time Focus
1. Requirements 5-8 min Clarify scope, constraints, scale expectations
2. High-Level Design 10-12 min Core data model, main components, API sketch
3. Deep Dive #1 12-15 min Consistency/correctness (usually interviewer-directed)
4. Deep Dive #2 10-12 min Scale or another area of interviewer interest
5. Edge Cases 5-8 min Failure modes, operational concerns, monitoring

The "Stripe Framework"

A mental model for structuring financial system design:

  1. Requirements: What operations? What scale? What consistency guarantees?
  2. Data Model: Accounts, transactions, entries. Schema design.
  3. Correctness: How do you ensure money is never created/destroyed? Concurrency control?
  4. Scale: What's the bottleneck? How do you scale past it?
  5. Operations: How do you detect problems? Recover from failures?

What L6 Interviewers Look For (vs L5)

L5 (Senior) Expectations

  • Solid fundamentals
  • Can design a working system
  • Handles common edge cases
  • Makes reasonable tradeoffs

L6 (Staff/Senior Staff) Expectations

  • Deep expertise in specific areas
  • Proactively identifies non-obvious problems
  • Discusses production operational concerns
  • References real-world systems and their tradeoffs
  • Challenges assumptions in the problem
  • Thinks about organizational/process implications

Red Flags and Common Mistakes

Avoid These
  • Using floats for money: Instant credibility loss
  • Ignoring concurrent access: "Just use a database" without discussing locking
  • No idempotency: Retries will corrupt data
  • Mutable records: Suggesting UPDATE instead of append-only
  • Overcomplicating early: Jumping to distributed systems before understanding single-node limits
  • Handwaving consistency: "We'll use a message queue" without discussing failure modes
  • Ignoring the domain: Not asking about financial constraints, regulatory requirements

Sample Interview Dialogue

Click to expand sample dialogue

Interviewer: "Design a ledger system for a payment processor."

You: "Before diving in, I'd like to understand the scope. Are we designing for a specific scale? What types of transactions—just simple A-to-B transfers, or complex multi-party settlements? Are there specific regulatory requirements I should consider?"

Interviewer: "Let's say 10,000 transactions per second, mostly simple payments but some multi-party. SOX compliance is required."

You: "Got it. Let me start with the core data model. I'll use double-entry bookkeeping since it gives us built-in error detection—every transaction must balance..."

[Draw schema, explain accounts/transactions/entries distinction]

You: "...For consistency, the key challenge is preventing overdrafts under concurrent access. I'd use optimistic locking with a version check for most accounts, falling back to pessimistic locking for high-contention accounts. Let me draw the flow..."

Interviewer: "What happens if the database crashes mid-transaction?"

You: "Good question. There are a few scenarios to consider..."

[Discuss transaction atomicity, idempotency keys, failure modes]

Interviewer: "How would you scale this to handle Stripe-level traffic?"

You: "At 10K TPS we're likely fine with a single primary PostgreSQL with read replicas. But let's say we need 100K+ TPS. The first thing I'd examine is whether we have hot accounts..."

Appendices

Appendix A: Glossary of Financial Terms

Term Definition
Double-entry bookkeeping Accounting system where every transaction affects at least two accounts, with debits equaling credits
Debit (DR) An entry that increases assets/expenses or decreases liabilities/equity/revenue
Credit (CR) An entry that decreases assets/expenses or increases liabilities/equity/revenue
Journal Chronological record of all transactions (source of truth)
Ledger Account-organized view of transactions, derived from journal
Trial Balance Summary of all account balances to verify debits equal credits
Chart of Accounts Complete list of all accounts in a ledger system
Reconciliation Process of verifying internal records match external sources
Settlement Final transfer of funds between parties
Idempotency Property where repeated operations produce the same result

Appendix B: Quick Reference Schema

-- Minimal production ledger schema

CREATE TABLE accounts (
    id              UUID PRIMARY KEY,
    code            VARCHAR(50) UNIQUE NOT NULL,
    type            VARCHAR(20) NOT NULL,  -- asset/liability/equity/revenue/expense
    currency        CHAR(3) NOT NULL,
    normal_balance  VARCHAR(6) NOT NULL   -- debit/credit
);

CREATE TABLE transactions (
    id              UUID PRIMARY KEY,
    idempotency_key VARCHAR(255) UNIQUE,
    event_time      TIMESTAMPTZ NOT NULL,
    effective_date  DATE NOT NULL,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE entries (
    id              UUID PRIMARY KEY,
    transaction_id  UUID NOT NULL REFERENCES transactions(id),
    account_id      UUID NOT NULL REFERENCES accounts(id),
    entry_type      VARCHAR(6) NOT NULL,   -- debit/credit
    amount          BIGINT NOT NULL,        -- smallest currency unit
    currency        CHAR(3) NOT NULL
);

CREATE INDEX idx_entries_account ON entries(account_id);
CREATE INDEX idx_entries_txn ON entries(transaction_id);

Appendix C: Stripe-Specific Context

Based on publicly available information (engineering blog, conference talks):

Relevant Stripe engineering posts:

Appendix D: Practice Questions

  1. Basic: Design a schema for a simple double-entry ledger. Walk through a payment transaction.
  2. Intermediate: How do you prevent overdrafts with concurrent transactions? Compare pessimistic vs optimistic approaches.
  3. Advanced: A merchant's settlement account is receiving 50,000 transactions/second. How do you handle this hot spot?
  4. Systems: Design idempotency handling for a payment API. What happens on network timeout?
  5. Scale: Your ledger needs to support 1M TPS globally. Walk through your architecture.
  6. Edge case: Two transactions are submitted simultaneously, each spending the full balance. How do you ensure only one succeeds?
  7. Operations: How do you detect and recover from ledger inconsistencies?
  8. Compliance: How do you design for GDPR's right-to-erasure while maintaining an immutable audit trail?
Good Luck!

Remember: The goal isn't to have a perfect answer—it's to demonstrate structured thinking, domain knowledge, and the ability to navigate tradeoffs. Ask clarifying questions, state your assumptions, and explain your reasoning.