Free cookie consent management tool by TermsFeed Generator

Beyond the Blueprint

This is personal blog about the human side of engineering. About learning, growing, and sharing experiences along the way. Authored by Yuriy Polyulya.

Filter by tag View all series Atom RSS Newest first ↓
A standalone thinking framework for distributed engineers. Perfect systems do not exist — not because engineers fail to build them, but because impossibility is formally provable. This series turns that formal result into a practical instrument: the achievable region that defines what is possible, the Pareto frontier where genuine trade-offs live, and a decision framework for choosing your operating point deliberately.
Part 1

The Impossibility Tax — How Formal Proofs Clear the Design Space Before You Start

CAP, FLP, SNOW, and HAT are not engineering constraints — they are proofs. Each one clears a corner of the design space before the first line of code is written: operating points that no implementation effort can reach, trade-offs that no optimization can dissolve. What the proofs leave behind is the achievable region — the set of positions that actually exist — and its Pareto frontier, where every real engineering decision lives. This post builds those objects, names the tax each theorem extracts, and maps the three movements available from any position: toward the frontier, along it, or expanding it.

Part 2

The Physics Tax — The Coherency Bill Your Hardware Runs Before the Protocol Speaks

Hardware runs a coherency bill on every distributed system before any protocol is chosen. Cache invalidation, NIC saturation, and memory bus contention set a throughput ceiling that grows quadratically with node count under the Universal Scalability Law — a ceiling no software optimization can move. Tail latency fans out geometrically through every microservice hop, invisible to average-latency dashboards. Both are irreducible. The Pareto Ledger — fitted coherency coefficients kappa+beta, measured N_max, coordinated-omission-free P99 — converts these pre-protocol costs into documented numbers before any architecture decision is made.

Part 3

The Logical Tax — Consistency is a Loan You Repay in Round Trips

Every consistency guarantee is a loan taken against latency: you borrow ordering and pay back in round trips. The consistency spectrum from strict serializability to eventual consistency is a price list — every level has a denominated RTT cost. Every consensus protocol sets a coherency coefficient beta that determines where N_max sits. Right-sizing the loan means choosing the minimum guarantee the application requires, implemented with the protocol that delivers it at the lowest beta the team can operate. This post prices each level, compares the protocols, and adds the read-path merge tax that conflict-free merge structures defer from writes to reads.

Part 4

The Stochastic Tax — AI Doesn't Escape the Frontier — It Just Navigates It Differently

AI expands the achievable region on new axes — accuracy, explainability, privacy — and automates navigation along them. It does not escape the frontier. Compression moves along the accuracy/latency trade-off; it does not dissolve it. A multi-objective RL navigator learns to find Pareto-optimal operating points; it does not create them. The stochastic tax prices what learning costs: fidelity gap between model and explanation, exploration budget spent acquiring policy knowledge, privacy budget that degrades accuracy under formal data-use constraints. All three stack on top of the physics and logical taxes already owed.

Part 5

The Reality Tax — Survival in a Non-Deterministic World

The Pareto frontier is not a line - it is a ribbon. Its width is dictated by environmental taxes exacted on every production system. Measurement interference shifts the coherency coefficient the moment observability is enabled. Cloud multi-tenancy injects stochastic jitter, transforming crisp hardware limits into probability clouds. State accumulation - LSM compaction debt, table bloat, heap fragmentation - degrades the operating point over time without any configuration changes. This post formalizes these forces as the Reality Tax: the systematic error term of distributed architecture.

Part 6

The Governance Tax — Four Gates Between Your Trade-off and Your Next Production Incident

Every architectural compromise already has a position in the trade-off space. The question is whether that position was chosen or accumulated. Four gates stand between an undocumented compromise and the incident that exposes it: measure the frontier, verify hard constraints, price the meta-trade-offs, enforce the safety boundary. For most decisions, two gates and four ADR fields are enough. The full procedure exists for AI-navigated systems and cross-team migrations where the stakes justify the overhead. An undocumented operating point is not a neutral default — it is a debt that compounds until production calls it in.

Edge systems can't treat disconnection as an exceptional error — it's the default condition. This series builds the formal foundations for systems that self-measure, self-heal, and improve under stress without human intervention, grounded in control theory, Markov models, and CRDT state reconciliation. Every quantitative claim comes with an explicit assumption set.
Part 1

Why Edge Is Not Cloud Minus Bandwidth

At the edge, a radio transmission costs 100x more energy than a local computation, and the network may be unreachable for hours. This article builds the formal foundation: how to model contested connectivity with Markov chains, when local autonomy mathematically beats cloud control, and what keeps autonomous control loops stable when they can't phone home.

Part 2

Self-Measurement Without Central Observability

When the monitoring service is unreachable, anomaly detection has to run on the node being monitored. This article covers on-device detection, gossip health propagation with bounded staleness, Byzantine-tolerant aggregation, and a proxy-observer pattern for legacy hardware — along with a frank note on what happens when you miscalibrate your priors.

Part 3

Self-Healing Without Connectivity

Detection is the easy part — acting without making things worse is harder. This article works through the MAPE-K autonomic loop adapted for edge conditions: stability conditions, confidence-gated action thresholds, dependency-ordered recovery to prevent cascades, and a self-throttling law that keeps the loop from consuming the very resources it's trying to protect.

Part 4

Fleet Coherence Under Partition

When two clusters reconnect after hours apart, merging their state means choosing between information loss and accepting Byzantine-injected garbage — neither is acceptable. This article covers CRDT merge with HLC timestamps, a reputation-gated admission filter for Byzantine state, and a burst-process divergence model that's more realistic than the usual Poisson assumption.

Part 5

Anti-Fragile Decision-Making at the Edge

Resilience returns you to baseline; anti-fragility means coming out better than you went in. This article formalizes that distinction, shows why anti-fragile policies win under fleet-wide policy competition, and builds the bandit and Bayesian update machinery that makes improvement possible — with a caveat: the math only works if you defined success before the failure happened.

Part 6

The Constraint Sequence and the Handover Boundary

The right build order prevents sophisticated capabilities from collapsing before their foundations exist. This article derives the prerequisite graph, constraint migration, and phase gate framework for sequencing autonomic edge capabilities — then formalizes five handover constructs: predictive triggering for cognitive inertia, asymmetric trust dynamics, Merkle-gated command validation, semantic compression against alert fatigue, and the L0 physical interlock that no autonomic loop can override.

In distributed systems, solving the right problem at the wrong time is just an expensive way to die. We've all been to the optimization buffet - tuning whatever looks tasty until things feel 'good enough.' But here's the trap: your system will fail in a specific order, and each constraint gives you a limited window to act. The ideal system reveals its own bottleneck; if yours doesn't, that's your first constraint to solve. Your optimization workflow itself is part of the system under optimization.
Part 1

Why Latency Kills Demand When You Have Supply

Users abandon before experiencing content quality. No amount of supply-side optimization matters. Latency kills demand and gates every downstream constraint. Analysis based on Duolingo's business model and scale trajectory.

Part 2

Why Protocol Choice Locks Physics For Years

Once latency is validated as the demand constraint, protocol choice determines the physics floor. This is the second constraint - and it's a one-time decision with 3-year lock-in.

Part 3

Why GPU Quotas Kill Creators Before Content Flows

While demand-side latency is being solved, supply infrastructure must be prepared. Fast delivery of nothing is still nothing. GPU quotas - not GPU speed - determine whether creators wait 30 seconds or 3 hours. This is the third constraint in the sequence - invest in it now so it doesn't become a bottleneck when protocol migration completes.

Part 4

Why Cold Start Caps Growth Before Users Return

New users arrive with zero history. Algorithms default to what's popular - which on educational platforms means beginner content. An expert sees elementary material three times and leaves. The personalization that retains power users actively repels newcomers. This is the fourth constraint in the sequence.

Part 5

Why Consistency Bugs Destroy Trust Faster Than Latency

Users tolerate slow loads. They don't tolerate lost progress. A 16-day streak reset at midnight costs more than 300ms of latency ever could. At 3M DAU, eventual consistency creates 10.7M user-incidents per year, putting $6.5M in annual revenue at risk through the Loss Aversion Multiplier. Client-side resilience with 25x ROI prevents trust destruction that no support ticket can repair. This is the fifth constraint in the sequence.

Part 6

The Constraint Sequence Framework

A synthesis of Theory of Constraints, causal inference, reliability engineering, and second-order cybernetics into a unified methodology for engineering systems under resource constraints. The framework provides formal constraint identification, causal validation protocols, investment thresholds, dependency ordering, and explicit stopping criteria. Unlike existing methodologies, it includes the meta-constraint: the optimization workflow itself competes for the same resources as the system being optimized.

A comprehensive series exploring the design and architecture of real-time advertising platforms. From system foundations and ML inference pipelines to auction mechanisms and production operations, we dive deep into building systems that handle 1M+ QPS while maintaining sub-150ms latency at P99.
Part 1

Real-Time Ads Platform: System Foundation & Latency Engineering

Building the architectural foundation for ad platforms serving 1M+ QPS with 150ms P95 latency. Deep dive into requirements analysis, latency budgeting across critical paths, resilience through graceful degradation, and P99 tail latency defense using low-pause GC technology.

Part 2

Dual-Source Revenue Engine: OpenRTB & ML Inference Pipeline

Implementing the dual-source architecture that generates 30-48% more revenue by parallelizing internal ML-scored inventory (65ms) with external RTB auctions (100ms). Deep dive into OpenRTB protocol implementation, GBDT-based CTR prediction, feature engineering, and timeout handling strategies at 1M+ QPS.

Part 3

Caching, Auctions & Budget Control: Revenue Optimization at Scale

Building the data layer that enables 1M+ QPS with sub-10ms reads through L1/L2 cache hierarchy achieving 85% hit rate. Deep dive into eCPM-based auction mechanisms for fair price comparison across CPM/CPC/CPA models, and distributed budget pacing using Redis atomic counters with proven ≤1% overspend guarantee.

Part 4

Production Operations: Fraud, Multi-Region & Operational Excellence

Taking ad platforms from design to production at scale. Deep dive into pattern-based fraud detection (20-30% bot filtering), active-active multi-region deployment with 2-5min failover, zero-downtime schema evolution, clock synchronization for financial ledgers, observability with error budgets, zero-trust security, and chaos engineering validation.

Part 5

Complete Implementation Blueprint: Technology Stack & Architecture Guide

Series capstone: complete technology stack with decision rationale. Why each choice matters (Java 21 + ZGC for GC pauses, CockroachDB for cost efficiency, Linkerd for latency). Includes cluster sizing, configuration patterns, system integration, and implementation roadmap. Validates all requirements met. Reference architecture for 1M+ QPS real-time ads platforms.