Free cookie consent management tool by TermsFeed Generator

Fleet Coherence Under Partition


Prerequisites

The three preceding parts addressed capabilities that live within a single node or a cluster that can communicate internally. Why Edge Is Not Cloud Minus Bandwidth established what the system faces: connectivity regime s where partition is the default, and capability levels that define what must be preserved. Self-Measurement Without Central Observability established how the system knows its own state: local anomaly detection, gossip -based health propagation with bounded staleness, and Byzantine -tolerant aggregation. Self-Healing Without Connectivity established what the system does about failures: the MAPE-K autonomic loop, confidence-gated healing actions, recovery ordering by dependency, and cascade prevention.

Each of those capabilities assumed a cluster’s internal state was knowable. The remaining problem is what happens between clusters.

When CONVOY ’s vehicles split at a mountain pass, each group continues operating independently. Each heals its own members, updates its own state, makes its own authority-delegated decisions — all drawing on the contested-connectivity baseline, self-measurement, and self-healing frameworks. When the groups reunite, their states have diverged. Neither group made an error. But they are inconsistent, and some of their decisions may conflict.

Fleet coherence is the problem of managing this divergence: bounding how far state can drift during partition, and resolving conflicts efficiently at reconnection. The CAP theorem [1, 2] makes the trade-off explicit: no distributed system can simultaneously guarantee consistency, availability, and partition tolerance. The contested-connectivity, self-measurement, and self-healing articles committed this series to availability — systems keep operating during partition. This article addresses what that commitment costs and how to pay it: the reconciliation protocols, CRDT semantics [3] , and authority structures that make eventual consistency [5] tractable in physical edge deployments.


Overview

Fleet coherence maintains consistent state when the network prevents communication. Each concept integrates theory with design consequence:

ConceptFormal ContributionDesign Consequence
State Divergence (Poisson lower bound)Use burst-process Proposition 41b for sizing; Poisson gives conservative floor
CRDTsMerge \(\sqcup\) forms join-semilatticeChoose CRDT type matching state semantics
Authority Tiers Delegate bounded authority during partition
Merkle Reconciliation\(O(k \log(n/k) + k)\) messages for \(k\) divergent itemsEfficient sync for large state with sparse changes
Entity ResolutionConfidence update Merge multi-observer data probabilistically

This extends eventual consistency [5] and CRDT s [3] for physical edge deployments.


Opening Narrative: CONVOY Split

CONVOY : 12 vehicles traverse a mountain pass. At km 47, terrain creates radio shadow.

Forward group (vehicles 1-5) receives SATCOM: bridge at km 78 destroyed, reroute via Route B. They adjust course.

Rear group (vehicles 6-12) receives ground relay minutes later: Route B blocked by landslide, continue to bridge. They maintain course.

When both groups emerge from the radio shadow with full connectivity:

The coherence challenge: physical positions cannot be reconciled, but fleet state - route plan, decisions, threat assessments - must converge to consistent view.


The Coherence Challenge

Local autonomy is required during partition — but it causes state divergence. Each cluster makes correct decisions given its own information. When they reconnect, their states are inconsistent, and some decisions may conflict. Neither cluster made an error; the inconsistency is a structural consequence of operating without communication.

Bounding divergence with a formal model ( Proposition 41 ), choosing data structures with merge-safe semantics (CRDTs), and resolving conflicts using authority tiers that determine who wins when clusters disagree addresses this structural problem. More predetermined coordination rules enable more coherence but reduce adaptability — a fully pre-specified decision policy achieves perfect coherence but cannot adapt to novel situations. The right balance gives critical decisions deterministic resolution while leaving operational decisions flexible.

Local Autonomy vs Fleet Coordination

The contested-connectivity, self-measurement, and self-healing articles developed local autonomy — essential, since without it partition means failure. But local autonomy creates coordination problems. Independent actions may complement each other (Node A handles zone X, Node B handles zone Y — a good outcome), duplicate effort (both handle zone X, wasting resources), or conflict (incompatible actions leading to mission failure).

The table below contrasts the two operating modes across four dimensions to make the tension concrete; the correct design point lies between these extremes.

DimensionLocal AutonomyFleet Coordination
Decision speedFast (local)Slow (consensus)
Information usedLocal sensors onlyFleet-wide picture
Failure modeSuboptimal but functionalComplete if quorum lost
Partition behaviorContinues operatingBlocks waiting for consensus

Coordination without communication is only possible through predetermined rules. If every node follows the same rules and starts with the same information, they will make the same decisions. But partition means information diverges - different nodes observe different events.

The tradeoff: more predetermined rules enable more coherence, but reduce adaptability. A fleet that pre-specifies every possible decision achieves perfect coherence but cannot adapt to novel situations. A fleet with maximum adaptability achieves minimum coherence - each node does its own thing.

Edge architecture must find the balance: enough rules for critical coherence, enough flexibility for operational adaptation.

State Divergence Sources

Definition 57 (State Divergence). For system states \(\Sigma_A\) and \(\Sigma_B\) (in this article \(\Sigma\) denotes the key-value CRDT state store — distinct from \(\Sigma\) in Self-Healing Without Connectivity where it denotes the health classification space) represented as key-value pairs, the divergence \(D(\Sigma_A, \Sigma_B)\) is the normalized symmetric difference:

Physical translation: counts the key-value pairs that differ between the two states — entries present in one but not the other, or present in both but with different values. Dividing by the total combined key space normalizes to \([0,1]\). At \(D = 0\) the replicas are byte-for-byte identical; at \(D = 1\) they share no information at all. In practice, \(D\) between 0.05 (illustrative value) and 0.15 (illustrative value) after a short partition is common for tactical edge systems.

A single fleet-wide \(D\) value hides safety-critical divergence: \(D\) must be tracked separately per object class (\(D_{\max} \approx 0.1\) for safety-critical objects; \(\approx 0.3\) for operational data stores) since a low aggregate can mask full divergence on a critical subset.

where \(D \in [0, 1]\), with \(D = 0\) indicating identical states and \(D = 1\) indicating completely disjoint states.

In other words, divergence is the fraction of the combined key space on which the two states disagree; zero means byte-for-byte identical, one means no key-value pair is shared.

Analogy: Two navigators using dead reckoning from the same starting point — divergence is how far apart their estimated positions drift after \(T\) minutes without comparing notes. The longer they go without communicating, the more their maps disagree.

Logic: Under a Poisson update model, \(D(\tau) = 1 - e^{-\lambda\tau}\) gives the expected fraction of keys that diverge after partition duration \(\tau\). This is a lower bound; burst processes produce higher divergence at burst onset (Proposition 41b).

During partition, state diverges through multiple mechanisms:

First, environmental inputs differ: each cluster observes different events. Cluster A sees threat T1 approach from the west; Cluster B, on the other side of the partition, sees nothing. Their threat models diverge.

Second, decisions are made independently. Self-healing requires local decisions. Cluster A decides to redistribute workload after node failure. Cluster B, unaware of the failure, continues assuming the failed node is operational. Their understanding of fleet configuration diverges.

Third, time drift occurs without network time synchronization. After 6 hours (illustrative value) of partition at 100ppm (illustrative value) drift, clocks differ by 2 seconds (illustrative value). Timestamps become unreliable for ordering events.

Fourth, message loss during the establishment of the partition affects propagation: some gossip messages [6] reach some nodes before the partition fully severs. The partial propagation creates uneven knowledge — Node A heard about event E, Node B did not. Their histories diverge.

Real edge update streams alternate between violent high-rate burst windows and near-silent quiet windows; the burst process model replaces the uniform-rate Poisson assumption with a two-phase alternating renewal process that captures this temporal clustering.

Definition 57b (Burst Process). State-changing events follow an alternating renewal process with two phases. The burst phase lasts at event rate ; the quiet phase lasts at the much lower rate .

Typical parameters for tactical edge:

System \(F\)
RAVEN5 s300 s8
CONVOY10 s600 s12

Fano factor measured from operational logs.

The burst and quiet phase durations in this table are illustrative values calibrated from CONVOY and RAVEN exercise logs and should be remeasured for any new deployment.

Why Poisson fails for tactical edge: Operational update streams alternate between burst and quiet phases ( Definition 57b ). The Fano factor for tactical edge systems is 3–79 (illustrative value) (measured on CONVOY exercises), not \(F = 1.0\) as Poisson assumes. Burst events cluster temporally — contact events, terrain transitions, and threat detections arrive in correlated waves, followed by extended quiet periods. A uniform Poisson rate collapses this structure: it underestimates divergence at burst onset and overestimates it during quiet, making it unreliable for buffer sizing in either regime. The Poisson result below ( Proposition 41 ) serves as a lower bound; use Proposition 41b for all design calculations.

Watch out for: the burst phase parameters \(\tau_{\text{burst}}\), \(T_{\text{quiet}}\), and \(F\) must be measured from operational logs of the specific fleet and mission type; \(F\) ranges from 3 to 79 across CONVOY exercises depending on mission phase, and using a mismatched \(F\) from a different regime reverses the direction of the buffer-sizing error — underestimating by as much as a factor of \(F\) at burst onset.

Proposition 41 (Divergence Growth Rate — Poisson Lower Bound). If state-changing events arrive according to a Poisson process with rate \(\lambda\), the expected divergence after partition duration \(\tau\) is:

The longer two clusters are apart, the more of their shared state will have diverged — this formula gives the lower bound on how much.

Physical translation: is the probability that a given key has received zero updates during the partition window — i.e., the probability it is still synchronized. The complement is the probability it has diverged. At \(\tau = 0\): zero divergence. As : divergence approaches 1.0. The \(1/\lambda\) time constant marks when expected divergence crosses 63% (theoretical bound): a system with \(\lambda = 0.01\) (illustrative value) updates/s reaches 63% (theoretical bound) divergence after 100 seconds (illustrative value) of partition.

This establishes the Poisson baseline — a lower bound on expected divergence valid for theoretical comparison only. For buffer sizing, the burst-process result in Proposition 41b provides a tighter, higher bound by accounting for the burstiness of real edge workloads.

Proof sketch: Model state as a binary indicator per key: identical (0) or divergent (1). Under independent Poisson arrivals with rate \(\lambda\), the probability a given key remains synchronized is . The expected fraction of divergent keys follows the complementary probability. For sparse state changes, provides a tight lower bound (lower bound under burst processes: actual burst-driven divergence exceeds this Poisson baseline — see Proposition 41b for the burst-process correction).

Empirical status: The Poisson rate \(\lambda\) must be measured at the 95th-percentile burst rate from operational logs; using mean rate understates divergence by a factor of \(F\) (Fano factor 3–79 for tactical edge).

This gives the minimum expected divergence under the least bursty possible workload. Burst-driven workloads exceed this bound — see Proposition 41b below.

Watch out for: the \(\lambda\) in this formula must be the worst-case burst rate, not the mean rate; using mean rate produces a bound that understates divergence by a factor of \(F\) at burst onset, which causes reconciliation buffers sized from this result to be undersized by the same factor — see Proposition 41b for the burst-corrected bound.

Primary design model: For most edge deployments, Proposition 41b (burst process) is the design-relevant result. Proposition 41 establishes the Poisson baseline for comparison. Use Proposition 41b for all buffer sizing and reconciliation planning.

Proposition 41b [BOUND] (Burst-Averaged Divergence). For an alternating burst/quiet process (Definition 57b), expected divergence after partition duration \(\tau\) is:

In CONVOY and RAVEN workloads, updates arrive in violent bursts; this proposition gives the design-relevant divergence that accounts for that clustering rather than assuming steady-rate updates.

where:

For worst-case buffer sizing, condition on partition onset coinciding with burst start:

For , the burst exponent saturates and the quiet term dominates growth:

Proof sketch: Model each key as a binary indicator (synchronized / diverged). During the burst epoch of duration , events arrive at rate \(F \cdot \lambda\); the probability a given key survives synchronized is . During the subsequent quiet epoch, the surviving fraction faces event rate \(0.1 \cdot \lambda\); the joint survival probability is the product of the two exponentials.*

The worst-case analysis conditions on partition onset at burst start, removing the mixture weight and accumulating burst divergence first — this is the relevant bound for buffer sizing since it maximises the fraction of keys diverged per unit partition time. For , the formula reduces to the Poisson model at elevated rate \(F \cdot \lambda\), recovering the original Proposition 41 result.

Empirical status: Fano factors \(F = 8\) (illustrative value) (RAVEN) and \(F = 12\) (illustrative value) (CONVOY) were measured on exercise logs; real-deployment burst factors depend on operational tempo and should be validated from field data before using these values for buffer sizing.

Physical translation: Burst arrivals — short periods of many concurrent writes, like a STOCKSYNC warehouse pair reconciling a full inventory after a 4-hour partition — inflate the apparent divergence bound temporarily. This proposition bounds that peak, showing that the time-averaged divergence remains finite even under bursty workloads, provided the burst arrival rate stays below the gossip dissipation rate.

In other words, divergence grows quickly at first and then saturates: a long partition does not drive divergence much higher than a medium one, because most keys diverge within the first few event intervals.

Watch out for: this result assumes that burst and quiet epochs alternate independently of partition events; if operational tempo is correlated with the communication environment — high-activity periods both generate more writes and cause more link outages — then \(F\) and \(\lambda\) are not independently measurable, and the two-phase model underestimates effective divergence.

Corollary 5. Reconciliation cost is linear in divergence: where \(c\) is per-item sync cost.

In other words, the total work at reconnection scales with how many key-value pairs diverged multiplied by the constant cost to transfer and merge one item; sizing the reconciliation budget requires estimating both the divergence fraction and the total state size.

Watch out for: the linearity holds only when items can be merged independently at constant cost \(c\); if items have cross-key ordering constraints — for example, inventory records that must be reconciled in arrival order — then conflict resolution cascades and actual reconciliation cost grows super-linearly with divergence.

Corollary 5b (Reconciliation Buffer Sizing). Buffer size should accommodate worst-case divergence over the maximum expected partition duration :

This gives the minimum on-device buffer in bytes needed to absorb worst-case post-partition divergence. \(D_{\text{worst}}\) comes from the burst-corrected Proposition 41 result; \(b_{\text{item}}\) ranges from \(64\) (illustrative value) to \(256\) (illustrative value) bytes per object depending on state encoding.

where \(|S|\) is total state items and is bytes per item. For RAVEN (\(|S| = 500\) (illustrative value), \(\lambda = 2\) (illustrative value) events/s, \(F = 8\) (illustrative value), (illustrative value), (illustrative value)):

Both exponents are large — the burst phase alone ( (theoretical bound under illustrative parameters)) saturates divergence near 1.0 within seconds. Buffer bytes \(\approx 16\) KB (theoretical bound under illustrative parameters) (size for full state copy). For short partitions ( ), the formula reduces to Poisson at elevated rate:

Practical implications: For short partitions ( ), the burst assumption is correct and buffers should be sized using . For long partitions ( ), divergence saturates near 100% and a full state copy should be buffered. For medium partitions ( ), the full formula from Proposition 41b applies.

For RAVEN ( ), the large \(F \cdot \lambda = 16\) means divergence saturates within seconds of any burst, so buffer sizing defaults to full state copy for all but the briefest partitions.

For CONVOY ( , , \(F = 12\)), medium partitions (10–600 s) are common during terrain-induced shadows. These require the full two-phase formula to avoid over-allocating buffer for the extended quiet period.

Watch out for: the worst-case formula conditions on partition onset coinciding exactly with burst start; if operational data shows that partitions are disproportionately triggered by the same high-tempo events that drive elevated update rates — communication outages caused by the same interference producing the burst — the burst-onset coincidence is not a conservative edge case but the typical case, and \(D_{\text{worst}}\) should be treated as the expected divergence, not the worst-case.

Cognitive Map: State divergence is not a failure — it is the mathematically inevitable consequence of operating without communication. Proposition 41 quantifies it: divergence grows as , saturating as most keys diverge within the first burst epoch. The burst-aware extension ( Proposition 41b ) is essential for sizing reconciliation buffers — Poisson underestimates divergence at burst onset and overestimates it during the quiet phase. The key design output from this section is the minimum reconciliation buffer: . For RAVEN, burst saturation means buffering a full state copy regardless of partition duration. Next: choosing data structures with provably safe merge semantics eliminates the “who wins” conflict at reconciliation time.


Conflict-Free Data Structures

When two clusters reconnect after partition, their states have diverged. A naive merge — last-writer-wins by wall-clock timestamp — is incorrect under clock drift, and arbitrary merge logic can produce inconsistent state that violates application invariants. Choosing data structures (CRDTs) [3] whose merge function is provably commutative, associative, and idempotent guarantees that any two replicas receiving the same set of updates converge to the same final state regardless of merge order or network delay. The right CRDT type depends on the state semantics — LWW-Register is simplest but strategically manipulable (faster clocks always win), intersection is safest for authorization but produces false negatives, and the merge function embeds a fairness decision that must be made explicit to prevent silent design errors.

CRDTs at the Edge

Definition 58 (Conflict-Free Replicated Data Type). A state-based CRDT is a tuple \((S, s^0, q, u, m)\) where \(S\) is the state space, \(s^0\) is the initial state, \(q\) is the query function, \(u\) is the update function, and \(m: S \times S \rightarrow S\) is a merge function satisfying:

These properties make \((S, m)\) a join-semilattice, guaranteeing convergence regardless of merge order.

Analogy: Two doctors independently updating the same patient record while the hospital network was down — CRDTs let you combine both records without losing either doctor’s updates, because every field has a deterministic winner rule. You don’t need to call a meeting; the rules decide.

Logic: The three semilattice properties — commutativity \(m(s_1, s_2) = m(s_2, s_1)\), associativity, and idempotency \(m(s, s) = s\) — guarantee that any merge order produces the same final state once all updates are exchanged.

    
    sequenceDiagram
    participant N1 as Node 1
    participant N2 as Node 2
    participant M as Merge Engine
    Note over N1,N2: Partition — both write independently
    N1->>N1: write(key=A, val=X, ts=10)
    N2->>N2: write(key=A, val=Y, ts=12)
    Note over N1,N2: Reconnection
    N1->>M: send delta(A=X, ts=10)
    N2->>M: send delta(A=Y, ts=12)
    M->>M: LWW rule: max(ts=10,ts=12) keeps Y
    M->>N1: sync(A=Y)
    M->>N2: sync confirmed
    Note over N1,N2: Converged: both see A=Y

State schema composition: the concrete state type \(S\) for fleet autonomic operation includes three logical layers: (1) mission state fields (coordinates, task assignments, resource levels) using standard CRDT types; (2) health state fields from Synthetic Health Metric and observation regime counters, using Last-Write-Wins semantics within a single node’s entries; and (3) trust and reputation scores from the Peer-Validation Layer ( Definition 64 ), using monotonically-increasing counters. Layer 2 weighting rules from Self-Healing Without Connectivity apply unchanged within the CRDT merge function.

Definition 58b (Reputation-Weighted Merge). For a fleet of \(n\) nodes with reputation weights \(w_i \in [0,1]\) ( ) and a global trust threshold , define the reputation-weighted join as a two-stage operation:

Stage 1 — Byzantine admission filter:

Stage 2 — standard semilattice merge on admitted updates:

where \(\sqcup\) is the standard CRDT join from Definition 58 . Reputation weights are initialized to \(w_i = 1/n\) and updated from Phase 0 attestation results and historical accuracy. Default threshold: (threshold — requires BFT quorum of 2f+1 with f < n/3), matching the BFT quorum of Proposition 48 .

Semilattice properties under : the underlying \(\sqcup\) in Stage 2 retains commutativity, associativity, and idempotency for any fixed admitted set. itself is a quorum-admission gate rather than a classical semilattice operator — it does not satisfy commutativity over all inputs by design, because Byzantine tolerance requires distinguishing honest from dishonest contributors.

When the number of Byzantine nodes satisfies , the honest quorum dominates Stage 1 and Stage 2 convergence is guaranteed. A compromised node with attestation weight cannot alone drive a state update — its contribution is discarded in Stage 1 regardless of the CRDT value it presents.

Corollary (Poison Resistance): A “signed but compromised” node cannot poison swarm state unless it controls a coalition with collective weight . For (threshold — requires BFT quorum of 2f+1 with f < n/3) and uniform weights, this requires \(f > n/3\) Byzantine nodes — exactly the BFT threshold.

Connection to semantic convergence ( Definition 5b ): Definition 58b guarantees — every admitted update is data-consistent after Stage 2 join. It does not guarantee \(\gamma = 1\): merged state may still violate system policy even when no Byzantine node contributed (e.g., two clusters both committed the same exclusive resource independently). For the stability condition and its effect on reconciliation cost, see Definition 5b .

Note: \(\sqcup_W\) is a gated union operator, not a classical semilattice. For any fixed admitted set (all quorum checks pass), it satisfies the commutativity, associativity, and idempotency of Definition 58 . The admission gate is monotone: once a node’s trust score exceeds , it remains admitted absent explicit revocation — so the CRDT convergence guarantee ( Proposition 42 ) holds within the admitted partition.

Semilattice preservation: the standard CRDT convergence proof ( Proposition 42 ) requires the merge function to be associative, commutative, and idempotent. The Byzantine admission gate prepended in this definition must also satisfy these three properties. The gate is designed to be idempotent (applying it twice to the same inputs yields the same result) because it uses deterministic reputation scores. Commutativity and associativity follow because the gate either admits or rejects entries independently of merge order. If a deployment modifies the gate logic to use state-dependent admission (e.g., a threshold that changes during merge), these properties must be re-verified.

In other words, any two replicas that have received the same set of updates will reach the same final state, regardless of the order in which they applied those updates or exchanged state with each other.

Compute Profile: CPU: per merge — LWW-register and GSet merges are , but semantic commit ordering requires a causal sort adding the log factor. Memory: — one entry per CRDT key in the state store. Prefer delta-mutant merges over full-state merges when the state store grows beyond 1,000 entries (illustrative value).

Six standard CRDT types cover the majority of edge state patterns; selecting the right one depends on whether state grows only, shrinks too, or requires last-writer semantics.

CRDT TypeOperationEdge Application
G-CounterIncrement onlyMessage counts, observation counts
PN-CounterIncrement and decrementResource tracking (\(\pm\))
G-SetAdd onlySurveyed zones, detected threats
2P-SetAdd and remove (once)Active targets, current alerts
LWW-RegisterLast-writer-wins valueDevice configuration values and node status, where the latest write is authoritative (*)
MV-RegisterMulti-value (preserve conflicts)Fields where concurrent updates from separate clusters must both be preserved for later review

(*) Requires HLC timestamps ( Definition 61 , defined below in the Hybrid Logical Clocks section of this post) for correctness under clock drift — plain wall-clock LWW-Register does not satisfy semilattice idempotency in contested environments where clocks diverge.

MV-Register vs. LWW-Register decision criterion: The choice is primarily driven by write semantics, with cost analysis as confirmation. LWW-Register applies when writes are superseding — each new write represents the authoritative current state so older concurrent values are irrelevant — with condition . MV-Register applies when writes are independent observations — concurrent writes from separate partitions each contribute information that LWW would silently discard — with condition .

The causal history \(H(v)\) of an MV-Register value \(v\) must remain bounded to prevent unbounded memory growth on constrained edge nodes:

where is the maximum number of concurrent write versions to retain. Values exceeding are resolved by applying LWW with ordering to the oldest conflicting pair, progressively reducing the conflict set.

RAVEN example: Drone position updates are superseding writes — Drone 7’s position at \(t = 12\) makes its position at \(t = 9\) irrelevant regardless of which cluster observed it. LWW-Register with ordering applies directly. Threat assessments from separate clusters are independent observations — Cluster A’s classification of a contact as hostile and Cluster B’s simultaneous classification as unknown must both reach human analysts; discarding either is a tactical intelligence loss. MV-Register applies, with .

G-Set example: RAVEN surveillance coverage

Each drone maintains a local set of surveyed grid cells. When drones reconnect, the merged coverage is simply the union of all cells observed by either cluster — no coordination or conflict resolution is needed because adding a new cell never invalidates any other cell.

Proposition 42 ( CRDT Convergence). If all updates eventually propagate to all nodes (eventual delivery), and the merge function satisfies commutativity, associativity, and idempotency, then all replicas converge to the same state.

Any two RAVEN clusters that exchanged any updates will reach identical CRDT state once communication resumes — order of merging does not matter.

In other words, as long as no update is permanently lost and the merge function obeys the three semilattice rules, two clusters that were partitioned for any finite duration will always reach identical state once they exchange updates.

Proof sketch: Eventual delivery ensures all nodes receive all updates. The semilattice properties ensure merge order doesn’t matter.

Liveness, not safety: Proposition 42 is a liveness property — Strong Eventual Consistency (SEC) [5] : replicas that receive the same update set will eventually agree on state. It is not a safety property and makes no claim about whether concurrent actions on that consistent state are conflict-free. Two nodes that both read drone_C.battery_low = true from a fully converged CRDT and both carry a Healer role will both dispatch to drone C’s coordinate — the CRDT records both dispatches correctly, but convergence does not prevent the duplicate commitment or the physical conflict. Safety — “at most one node commits to an exclusive resource per task” — is a separate property that requires a coordination layer operating before commitment, not after. Under connectivity, Definition 66 (Logical Quorum) enforces hard mutual exclusion for high-stakes decisions. Under partition, hard mutual exclusion for exclusive resources is impossible without communication — a direct consequence of the CAP theorem’s partition branch: no leaderless protocol can prevent two isolated nodes from both deciding to act on the same target without some form of coordination signal. The achievable bound under partition is probabilistic: Definition 77 (Conflict-Aware Claim Probability) lowers each node’s unilateral action probability so that the expected number of redundant commits is bounded by Proposition 55 . CRDTs handle what happened (state consistency); the arbitration layer governs whether to act (action coordination). Both are required for fleet coherence — convergence alone is not enough.

Physical translation: The fleet’s shared state works like a notarized document: two nodes can only merge their versions if a reputation check confirms neither is corrupted. Once merged, the result is final — no future operation can un-merge it. This is what “convergence” means in practice: every node eventually sees the same state regardless of what order updates arrived.

Watch out for: convergence requires eventual delivery of all updates; a node that suffers permanent hardware failure after accumulating partition state — without ever reconnecting — leaves an irrecoverable gap in fleet history, and the tombstone pruning safety bound ( Proposition 50 ) must be modified to treat that node as having permanently acknowledged all prior tombstones rather than waiting indefinitely for its acknowledgement.

Edge suitability: CRDT s require no coordination during partition. Updates are local. Merge is deterministic. This matches edge constraints perfectly.

The diagram below shows how two clusters independently add items to a G-Set during partition and arrive at identical merged state upon reconnection, with no coordination required.

    
    graph TD
    subgraph During_Partition["During Partition (independent updates)"]
    A1["Cluster A
State: {1,2,3}"] -->|"adds item 4"| A2["Cluster A
State: {1,2,3,4}"] B1["Cluster B
State: {1,2,3}"] -->|"adds item 5"| B2["Cluster B
State: {1,2,3,5}"] end subgraph After_Reconnection["After Reconnection"] M["CRDT Merge
(set union)"] R["Merged State
{1,2,3,4,5}"] end A2 --> M B2 --> M M --> R style M fill:#c8e6c9,stroke:#388e3c style R fill:#e8f5e9,stroke:#388e3c,stroke-width:2px style During_Partition fill:#fff3e0 style After_Reconnection fill:#e8f5e9

The merge operation is automatic and deterministic - no conflict resolution logic needed. Both clusters’ contributions are preserved.

Limitations: CRDT s impose semantic constraints. A counter that only increments cannot represent a value that should decrease. A set that only adds cannot represent removal. Application data must be structured to fit available CRDT semantics.

Choosing the right CRDT : The choice depends on application semantics. The mapping from requirements to type is a function of four inputs — the permitted operations, the desired conflict resolution policy, the available memory budget, and the relative cost of discarding a concurrent write versus preserving it:

G-Set is the simplest option with lowest overhead, but supports no removal. 2P-Set supports removal, but an element cannot be re-added once removed. OR-Set provides full add/remove semantics at higher overhead (unique tags per add). LWW-Element-Set must use HLC timestamps ( Definition 61 ), never wall-clock time — wall-clock LWW-Element-Set is incorrect at the edge because a fast-clock node’s stale elements permanently overwrite slow-clock nodes’ current elements; with HLC, concurrent additions resolve via Add-wins (OR-Set semantics) and causal ordering via replaces the physical-time comparator.

Bounded-Memory Tactical CRDT Variants

Standard CRDT s assume unbounded state growth - problematic for edge nodes with constrained memory. We introduce bounded-memory variants tailored for tactical operations.

Sliding-Window G-Counter:

The bounded counter sums only the counts from active time windows, discarding history older than ; here \(c_w\) is the count accumulated during window \(w\), and \(W(t)\) is the set of windows that fall within the retention interval ending at time \(t\):

where is the active window set. Memory: instead of unbounded.

RAVEN application: Track observation counts per sector for the last hour. Older counts archived to fusion node when connectivity permits, then pruned locally.

Bounded OR-Set with Eviction:

The Add operation inserts element \(e\) into set \(S\) directly when capacity allows, and otherwise evicts the lowest-priority existing element before inserting; is the fixed capacity bound:

where . The eviction maintains CRDT properties:

Eviction commutativity proof sketch: Define . For deterministic priority function, when both exceed .

Prerequisite: This maintains CRDT properties only when the priority function is deterministic and identical across all nodes given the same inputs. If priority depends on observation time or local state, evictions at different nodes may produce non-deterministic merge outcomes. STOCKSYNC uses as a stable priority; deployments with node-specific priority functions must use a tie-breaking rule (e.g., node ID) to preserve idempotency.

Priority functions for tactical state:

CONVOY application: Track at most 50 active threats. When capacity exceeded, evict lowest-priority (low-threat, stale) entities. Memory: fixed regardless of operation duration.

Compressed Delta- CRDT :

Standard delta- CRDT s transmit state changes. We compress deltas using domain-specific encoding. The compressed delta size equals \(H(\Delta)\), the information-theoretic entropy of the delta (the minimum achievable size), plus a logarithmic overhead term from the encoding scheme itself.

where \(H(\Delta)\) is the entropy of the delta. For state with predictable patterns (low entropy deltas), compression can achieve significant reduction; the ratio depends on the specific entropy characteristics of the application.

Compression techniques: Spatial encoding represents position updates as offsets from the predicted trajectory. Temporal batching merges multiple updates to the same entity before transmission. Dictionary encoding maps common values (status codes, threat types) to compact indices.

OUTPOST application: Sensor health updates compressed to 2-3 bytes per sensor versus 32 bytes uncompressed. 127-sensor mesh health fits in single packet.

Hierarchical State Pruning:

Tactical systems naturally have hierarchical state importance. The table below defines four retention levels ordered by operational criticality, together with the retention duration and the condition that triggers pruning at each level.

LevelRetentionPruning Trigger
Critical (threats, failures)IndefiniteNever auto-prune
Operational (positions, status)1 hourTime-based
Diagnostic (detailed health)10 minutesMemory pressure
Debug (raw sensor data)1 minuteAggressive

State automatically demotes under memory pressure. The level of state item \(s\) at time \(t\) drops by one tier per pressure event but never falls below the minimum level defined for that state type:

where is the minimum level for state type \(s\).

Memory budget enforcement:

Each CRDT type has a memory budget \(B_i\). The constraint requires that the sum of memory consumed by all CRDT instances \(M_i\) stays within total available memory minus the reserved headroom needed for runtime overhead:

When approaching limit, the system:

  1. Prunes diagnostic/debug state
  2. Compresses operational state
  3. Evicts low-priority entries from bounded sets
  4. Archives to persistent storage if available
  5. Drops new low-priority updates as last resort

RAVEN memory profile: \(50 \times 2\)KB state budget = 100KB CRDT state. Bounded OR-Set for 200 threats (4KB), sliding-window counters for 100 sectors (2KB), health registers for 50 nodes (1.6KB). Total: ~8KB active CRDT state, well within budget.

Commercial Application: STOCKSYNC Multi-Warehouse Inventory

STOCKSYNC manages inventory across multiple distribution centers. Each center must continue during outages - receiving, fulfilling, counting - while maintaining eventual consistency with central systems and peers.

The inventory coherence challenge: Traditional inventory systems use centralized databases with strong consistency. When connectivity fails, warehouses either stop operations (unacceptable) or operate blind (leads to overselling, stockouts, allocation conflicts). STOCKSYNC uses CRDT s to enable continuous operation with guaranteed convergence.

CRDT selection for inventory operations:

Each inventory operation maps to a CRDT type chosen to match the operation’s semantics — the key observation is that receiving events can only increment, while holds require add-and-remove capability.

Inventory OperationCRDT TypeWhy This TypeMerge Behavior
ReceivingG-Counter per warehouseShipments only add inventorySum across warehouses
Shipping/SalesPN-Counter (or G-Counter for deductions)Orders remove inventorySum of additions minus deductions
Location transfers2P-Set of (item, from, to, qty)Transfers are atomic eventsUnion; dedup by transfer ID
Cycle count adjustmentsLWW-RegisterLatest count is authoritativeMost recent timestamp wins
Inventory holdsOR-Set of (item, order, qty)Holds can be added/removedAdd-wins semantics

Inventory quantity as CRDT : The total available quantity for SKU S at warehouse W is computed:

Each term is tracked separately: Received uses a G-Counter incremented on each receiving event; Shipped uses a G-Counter incremented on each shipment; TransferIn/Out is derived from the transfer events set; Adjustment uses an LWW-Register updated from the latest cycle count.

The diagram below shows how two warehouses independently receive and ship inventory during partition, then merge into a single consistent quantity via CRDT union — with no coordination step required.

    
    graph TD
    subgraph "Warehouse A (during partition)"
        A_RCV["Receive: +100 units SKU-123"]
        A_SHIP["Ship: -30 units SKU-123"]
        A_STATE["Local state:
Received: 100
Shipped: 30
Available: 70"] end subgraph "Warehouse B (during partition)" B_RCV["Receive: +50 units SKU-123"] B_SHIP["Ship: -20 units SKU-123"] B_STATE["Local state:
Received: 50
Shipped: 20
Available: 30"] end subgraph "After Reconnection" MERGE["CRDT Merge"] FINAL["Combined state:
Received: 150 (100+50)
Shipped: 50 (30+20)
Total Available: 100"] end A_RCV --> A_STATE A_SHIP --> A_STATE B_RCV --> B_STATE B_SHIP --> B_STATE A_STATE --> MERGE B_STATE --> MERGE MERGE --> FINAL style MERGE fill:#c8e6c9,stroke:#388e3c style FINAL fill:#e8f5e9,stroke:#388e3c

Handling overselling during partition: The primary risk of disconnected operation is overselling - both warehouses selling the same inventory. STOCKSYNC mitigates through local reservation quotas: the sellable quantity for SKU \(S\) at warehouse \(W\) is capped at whichever is smaller — the local on-hand quantity or the pre-assigned partition quota — so that combined sales across all warehouses cannot exceed total available stock.

During normal operation, quotas are set generously (typically 80–90% (illustrative value) of on-hand). When partition begins, each warehouse can only sell its quota - preventing combined sales from exceeding total inventory.

Scope: This bound assumes warehouse inventories are physically disjoint (no shared stock pools). If multiple warehouses hold claims on the same physical inventory, the safety factor \(f_s\) must cover cross-claim risk in addition to measurement error.

Quota calculation: The quota for SKU \(S\) at warehouse \(W\) is its proportional share of total inventory, weighted by local sales velocity relative to the fleet-wide total and reduced by a safety factor to absorb uncertainty during partition.

where SafetyFactor \(\approx 0.85\) (illustrative value) provides margin for uncertainty.

Reconciliation protocol: When connectivity restores between warehouses, STOCKSYNC performs a three-phase reconciliation:

Phase 1 - Summary Exchange (2-5 seconds):

Phase 2 - Divergent State Sync (10-60 seconds):

Phase 3 - Operational Reconciliation (background):

Correctness analysis:

Assumption Set : Bounded counters initialized to actual inventory, quota sum \(\leq\) total inventory, CRDT merge semantics.

Oversell probability bound: Under , oversells occur only when:

  1. Quota was set incorrectly (human error, rate \(\epsilon_h\))
  2. Race condition in local quota enforcement (system bug, rate \(\epsilon_s\))

The bound states that the oversell probability under quotas is at most , which is far smaller than the baseline oversell probability without quotas (the product of partition probability and concurrent-sale probability).

Reconciliation time: Dominated by Merkle tree traversal \(O(k \log(n/k) + k)\) where \(k\) is divergent items. For sparse divergence (\(k \ll n\)): .

Data integrity: CRDT merge guarantees no data loss under assumption . Convergence follows from semilattice properties.

Hierarchical authority for allocation conflicts: When two warehouses simultaneously commit the last unit of inventory to different orders, the winning warehouse is the one whose commit timestamp is earlier — is the Hybrid Logical Clock timestamp ( Definition 61 ) at which warehouse \(w\) recorded its commitment.

Commit timestamps must use HLC ordering ( Definition 61 ) rather than wall-clock time to preserve correctness under clock drift during partition; see Proposition 49 for the complete NTP-Free Split-Brain Resolution.

The warehouse with the earlier commit time fulfills its order. The losing warehouse must either source from another location or backorder. This creates occasional customer friction but maintains system integrity.

The authority hierarchy assigns override scope by role: warehouse managers can override quotas locally (L1 authority), regional directors can reallocate inventory between warehouses (L2 authority), and central operations can globally rebalance inventory across the entire network (L3 authority).

Last-Writer-Wins vs Application Semantics

Last-Writer-Wins (LWW) is a common conflict resolution strategy: when values conflict, the most recent timestamp wins. The correct form for edge deployments uses HLC timestamps \(h = (l, c, n)\) via the ordering of Definition 62not raw wall-clock time \(t\):

(Warning: the wall-clock form is incorrect at the edge. At of clock drift — reachable in under 83 minutes on a 100 ppm crystal oscillator under normal conditions, and in under 42 minutes under thermal stress at 200 ppm — a fast-clock node’s causally older update permanently overwrites a slow-clock node’s causally newer update. Replacing \(t\) with is not an optimization; it is a correctness requirement. See the PPM drift analysis below.)

LWW works for:

LWW fails for:

Oscillator PPM analysis — drift budget: The figure is not a hypothetical; it is the regime that breaks plain LWW in a single operational session.

Clock sourceDrift rate \(\delta\)Time to divergenceLWW status
GPS-disciplined1 ppm = s/s hSafe for all missions
TCXO (no GPS)2 ppm = s/s69.4 hSafe for most deployments
Crystal (\(20^\circ\text{C}\))50 ppm = s/s2.8 hExceeded within a single watch
Crystal (thermal stress, \(70^\circ\text{C}\))200 ppm = s/s42 minExceeded mid-sortie
RC oscillator (L0 beacon)10,000 ppm = s/s50 sExceeded immediately

At 100 ppm (a conservative OUTPOST sensor specification), s/s, so the budget exhausts in . Any crystal-oscillator edge deployment with partitions exceeding 83 minutes will experience LWW inversions without HLC correction — this includes every OUTPOST mission scenario and most CONVOY urban-canyon transit periods. The HLC pivot in Definition 59 (Clock Trust Window) fires before this threshold; the Drift-Quarantine Re-sync Protocol ( Definition 63 ) repairs clocks on reconnection.

Edge complication: Wall-clock LWW assumes reliable timestamps. The oscillator analysis above shows “latest” becomes meaningless at timescales under 2 hours for thermal-stressed crystals. The structural fix is not a heuristic timeout but a causal ordering system (HLC + vector clocks) that is correct regardless of oscillator quality.

Semantic resolution — choosing the right merge logic by data type:

When HLC detects \(h_1 \parallel h_2\) (concurrent writes, neither causally dominates), the CRDT join \(s_1 \sqcup s_2\) is the formally correct result — but “CRDT join” is an abstraction that maps to different concrete policies depending on the data type. Selecting the wrong policy here is where most LWW bugs lurk:

Data typeConcurrent-update policyRationale
Observed zone setAdd-wins (OR-Set join: \(S_1 \cup S_2\))Discarding an observed zone is always a tactical loss — OR-Set is correct by definition
Inventory countMax-wins (\(\max(v_1, v_2)\))Over-reporting is recoverable; under-reporting causes oversell
Node health scoreMin-wins (\(\min(v_1, v_2)\))Fail-safe: take the most pessimistic concurrent health reading
Drone positionConfidence-wins ( )GPS fix beats dead-reckoning regardless of timestamp; ordering is second tiebreaker
Mission parametersAuthority-wins (higher from Def 14)L0 commander override beats field node regardless of clock order
Configuration valuesCausal-LWW ( )Superseding writes; HLC ordering is sufficient when clocks within
Log entries / audit recordsAll-wins (MV-Register / RGA append)No entry is ever discarded; use RGA (Definition 69) for ordered log semantics

The pattern: LWW (time-wins) is correct only for superseding writes within the clock trust window. Outside that window, or for non-superseding semantics, one of the seven policies above applies. The Semantic Commit Order ( Definition 67 ) formalizes the authority-wins and causal-LWW paths; this table extends it to the full semantic space.

Conflict Resolution Logic: LWW-to-Causal Pivot

The clock drift problem cannot be patched within LWW — it requires a structural pivot to causal ordering when physical time becomes untrusted. The trigger is the partition accumulator from Definition 15 : once exceeds the maximum duration for which hardware clock drift remains within the HLC skew bound , the component of every HLC timestamp is no longer reliable.

Definition 59 (Clock Trust Window). Given hardware clock drift rate (fractional drift rate, dimensionless; e.g., 1 ppm = \(10^{-6}\) — a constant from the hardware manufacturer’s datasheet, not a tunable system parameter) and HLC skew bound from Proposition 45, the Clock Trust Window is:

T_trust vs T_acc: \(T_{\mathrm{trust}}\) accumulates across partition epochs; it is not reset at reconnection, unlike the per-epoch \(T_{\mathrm{acc}}\) accumulator in the autonomic control layer (see Self-Healing Without Connectivity). \(T_{\mathrm{acc}}\) measures contiguous time in the disconnected regime and resets when a partition ends; \(T_{\mathrm{trust}}\) is a fixed threshold derived from hardware drift rate and is evaluated against \(T_{\mathrm{acc}}\) at each MAPE-K tick.

Note: \(\delta_\text{max}\) is dimensionless (fractional drift rate, e.g., \(10^{-6}\) for a GPS-disciplined oscillator at 1 ppm). The units of are seconds / dimensionless = seconds.

Derivation. A clock drifting at fractional rate \(\delta_\text{max}\) accumulates an error of seconds over a partition of duration \(T\). The HLC watermark remains physically trustworthy while this accumulated drift stays below the initial HLC skew bound \(\varepsilon\):

Beyond \(T_\text{trust}\), the physical component of the HLC timestamp is no longer reliable; the system pivots to logical-only ordering (causal order preserved, but wall-clock alignment lost). Example: crystal oscillator (\(\delta_\text{max} = 10^{-4}\)), \(\varepsilon = 1\,\text{s}\): . GPS-disciplined oscillator (\(\delta_\text{max} = 10^{-6}\)): .

During , the HLC physical watermark remains within of true time; full HLC ordering ( Definition 61 ) is used. When , the system pivots to pure logical ordering , comparing only the counter and node-ID components and ignoring .

Node classClock source (Prop 41) Implication
RAVEN droneGPS-disciplined (1 ppm)1 s\(\approx\) 11.6 daysGPS partition safe through typical mission cycles
CONVOY ECUTCXO + GPS fallback (2 ppm)3 s\(\approx\) 17 daysSafe through most operational deployments
OUTPOST sensorCrystal oscillator (100 ppm)1 s\(\approx\) 2.8 hPhysical time untrusted after 3 hours of partition
Ultra-L0 beaconRC oscillator (10,000 ppm)1 s\(\approx\) 100 sPhysical time untrusted within 2 minutes

For 30-day partitions ( ), only GPS-disciplined nodes with remain within . All crystal-oscillator nodes pivot to automatically.

The trust window is the maximum partition duration for which the HLC physical watermark remains trustworthy. The drift tolerance \(\delta_{\max}\) comes from hardware datasheets (GPS: 1 ppm; TCXO: 2–24 ppm; crystal: 50–200 ppm; RC oscillator: 1%–12%), and \(\varepsilon\) is calibrated via Proposition 45 . Without the pivot, a crystal-oscillator node drifting 259 seconds over 30 days silently overwrites 4 minutes of peer updates on reconnect with timestamps that are physically “newer” but logically older.

Physical translation: How long can you trust a wall-clock timestamp for LWW (Last-Write-Wins) ordering before accumulated drift invalidates it? T_trust is that window. After T_trust seconds without an NTP sync, all timestamps are ambiguous within the drift envelope — the HLC ( Definition 61 ) takes over as the ordering authority.

The complete conflict resolution decision tree, combining Definition 59 with Definition 62 ’s merge rules:

    
    flowchart TD
    S["Two versions arrive: v₁,h₁ and v₂,h₂"] --> P{"T_acc > T_trust?
Def 59: physical clock untrusted"} P -->|"No — T_acc ≤ T_trust
full HLC trusted"| F["Compare h₁ ≺ h₂
Def 61 HLC ordering"] P -->|"Yes — T_acc > T_trust
drop l component"| G["Compare (c₁,n₁) vs (c₂,n₂)
≺_logic: logical-only ordering"] F --> D{"Causally ordered?"} G --> D D -->|"h₁ ≺ h₂ — v₁ caused v₂"| W1["v₂ wins"] D -->|"h₂ ≺ h₁ — v₂ caused v₁"| W2["v₁ wins"] D -->|"h₁ ∥ h₂ — concurrent"| M["CRDT join: s₁ ⊔ s₂
Def 62 HLC-Aware CRDT Merge"] W1 --> Z(["Deliver result"]) W2 --> Z M --> Z

Proposition 43 (LWW-to-Causal Pivot Correctness).

Switching from wall-clock to logical-only ordering after long OUTPOST partitions never discards valid updates — it only treats more pairs as concurrent.

The pivot from HLC ordering to logical ordering at ( Definition 59 ) satisfies:

Monotonicity holds: no previously committed value is overridden, and reclassifies some causal pairs as concurrent but never the reverse. Completeness holds: every update pair is classified as causally ordered (one dominates) or concurrent (requiring CRDT join) with no update silently discarded. Conservative correctness follows: over-approximates concurrency — it may classify causally ordered updates as concurrent (triggering a safe CRDT join) but never classifies concurrent updates as causally ordered, which would silently discard one. Post-repair convergence also holds: after Definition 63 Drift-Quarantine repair, updates classified as concurrent under converge to the same final state as if they had been correctly classified, since the CRDT join is idempotent and commutative.

Proof. Property 1: uses — sub-components of ( Definition 62 ). Any pair with also satisfies — logical dominance is a necessary condition for causal dominance. Incorrectly concurrent pairs resolve via CRDT join, which is monotone by Definition 58 (semilattice).

Property 2: is a total preorder on pairs; every pair is comparable or equal-then-tied. Property 3 follows from Property 1.

Property 4: after Definition 63 Phase 2 (HLC repair), is corrected; Phase 3 (causality audit) reclassifies previously-concurrent pairs; Definition 58 idempotency guarantees re-merge produces the same join result. \(\square\)

This establishes formal correctness guarantees for the physical-to-logical clock pivot under long partitions. The proof depends on \(\varepsilon\) and \(T_{\mathrm{trust}}\) from Definition 59 and on all deployed CRDT types satisfying the Definition 58 semilattice properties (idempotency, commutativity, associativity). Properties 1–4 prove the pivot only adds CRDT merges, never removes causally ordered updates — eliminating the pivot-induced data loss risk.

Physical translation: The fleet switches from “last writer wins” to full causal tracking only when conflicts become frequent enough to matter. Under light load — most of the time — the simpler rule is used. The pivot is triggered by evidence of actual contention, not preemptively, which avoids paying the overhead cost of causal tracking during calm operations.

The rolling window length \(N\) defaults to (two gossip periods expressed in ticks), giving a window of approximately 60 seconds for typical deployments. Operators may configure \(N\) in the range \([10, 200]\); values below 10 produce a pivot that is too sensitive to transient conflict bursts; values above 200 cause the system to tolerate sustained conflict rates that indicate structural fleet divergence.

Watch out for: Property 4 (post-repair convergence) depends on all deployed CRDT types being semilattices — an application-defined merge function that violates associativity will produce different final states depending on which node’s Phase 3 audit runs first, silently breaking the convergence guarantee without triggering any protocol-level error.

Vector Clocks for Causality

Before examining hybrid approaches, consider pure vector clocks [7] . Each node \(i\) maintains a vector \(V_i[1..n]\) where \(V_i[j]\) represents node \(i\)’s knowledge of node \(j\)’s logical time.

Definition 60 (Vector Clock). A vector clock \(V\) is a function from node identifiers to non-negative integers. The vector clock ordering \(\leq\) is defined as:

Events are causally related iff their vector clocks are comparable; concurrent events have incomparable vectors.

In other words, \(V_A \leq V_B\) means node A’s clock can only have happened before node B’s clock; if neither \(V_A \leq V_B\) nor \(V_B \leq V_A\) holds, the two events occurred concurrently with no causal dependency between them.

Analogy: Postmarks on letters — even without synchronized clocks, you can tell which letter was written in response to which by the sequence numbers. A reply can’t reference a letter it hasn’t seen. The vector clock is the postmark, one counter per correspondent.

Logic: The ordering \(V_A \leq V_B \iff \forall i: V_A[i] \leq V_B[i]\) encodes the happens-before relation. When neither vector dominates, the events are concurrent — both carry valid information, and the system must merge rather than discard.

Proposition 44 (Vector Clock Causality). For events \(e_1\) and \(e_2\) with vector timestamps \(V_1\) and \(V_2\):

Two CONVOY vehicles can determine whether one event caused another — or whether both happened independently — by comparing integer vectors alone, with no clock synchronisation.

In other words, you can determine the causal relationship between any two events purely by comparing their vector timestamps: strict component-wise ordering means one caused the other, while incomparable vectors mean the events happened independently and neither influenced the other.

The update rules are: on a local event, increment \(V_i[i] \gets V_i[i] + 1\); on sending a message, attach the current \(V_i\); on receiving a message with \(V_m\), apply for all \(j\), then increment \(V_i[i] \gets V_i[i] + 1\).

Edge limitation: Vector clocks grow linearly with node count. For a 50-drone swarm, each message carries 50 integers. For CONVOY with 12 vehicles, overhead is acceptable. For larger fleets, compressed representations or hierarchical clocks are needed.

Bandwidth budget verification: CONVOY (12 vehicles, 4-byte integers, 100 state updates/minute, 9.6 kbps half-duplex link at 60% duty cycle = 5,760 bits/s available): vector clock overhead per update = \(12 \times 4 = 48\) bytes; at 100/min: \(48 \times 100/60 = 80\) bytes/s = 640 bits/s. Overhead fraction: \(640/5760 \approx 11\%\) — acceptable.

RAVEN (47 drones, 200 updates/minute per node, same link class): \(47 \times 4 = 188\) bytes/update; at 200/min: \(188 \times 200/60 \approx 627\) bytes/s = 5,016 bits/s \(\approx 52\%\) overhead — marginal and breaks under burst traffic.

Mitigation for RAVEN : switch to interval tree clocks (Mukund & Kulkarni 2016) or dotted version vectors [10] — these encode integers instead of , reducing per-update overhead to ~48 bytes (8 cluster IDs × 4 bytes + 8 sequence numbers × 4 bytes), dropping link overhead from 52% to ~10%. Causality guarantees are preserved at cluster granularity rather than node granularity.

DVV scaling and the large-fleet MTU problem. A full Dotted Version Vector requires one counter per node: at 2 bytes per entry, a 47-node RAVEN swarm uses 94 bytes, and a 500-node IoT mesh uses 1 KB — exceeding the LoRaWAN maximum payload (51–222 bytes) and saturating a single BLE advertisement packet. For fleets above ~100 nodes, three strategies reduce wire cost while preserving causal ordering:

One option replaces the DVV with a fixed-size Bloom filter (~32 bytes) — the Bloom-clock — encoding causal history with a tunable false-positive rate; at 32 bytes this yields approximately 0.3% probability of accepting a causally-violating message, which is acceptable for health gossip but not for authoritative state commits. A second option, epoch-scoped DVV, has each sub-fleet maintain a DVV over its local \(K\) members only (\(K \leq 20\) typically) during partition, reconstructing the full fleet DVV at reconnect by merging epoch records; this provides exact causal ordering within each partition at \(O(K)\) wire cost (see Definition 70 , Delta-Sync). A third option, hierarchical DVV, has authority-tier leaders ( Definition 68 ) maintain a compressed aggregate DVV for their sub-tier while members carry only their tier’s DVV, reducing wire cost to \(O(M)\) where \(M\) is the tier size.

The causal ordering guarantees in Proposition 44 (Vector Clock Causality) apply to whichever DVV variant is chosen, provided the variant preserves the happens-before relation. Bloom-clock trades a bounded false-positive rate for constant wire cost; evaluate this trade-off explicitly before deploying at scale.

Warning: Full per-node DVVs exceed LoRaWAN MTU above ~100 nodes. Choose Bloom-clock, epoch-scoped DVV, or hierarchical DVV based on the required causal guarantee strength and the acceptable false-acceptance rate.

Physical translation: Vector clocks are the fleet’s substitute for a shared wall clock. If event A’s clock is strictly less than event B’s clock in every position, then A provably happened before B. If the clocks are incomparable (neither is strictly less), the events were concurrent — no causal relationship can be inferred and the system must treat both as potentially valid.

Watch out for: if a replacement node reuses the identifier of a destroyed node — even briefly, to reclaim a network slot — its vector clock starts at zero, causing the fleet to misclassify all post-replacement events as causally prior to pre-destruction events from that identifier and silently corrupting causal ordering for the replacement node’s entire subsequent operation.

Mitigation: Hybrid Logical Clocks add a monotonic counter to wall time to handle NTP skew; see Kulkarni et al. (2014) [11] for the foundational analysis. For CONVOY , LWW on route decisions is unreliable because a vehicle with a fast clock always wins regardless of information freshness — application semantics require considering intel recency, not just decision timestamp. Def 40–63 below formalize the HLC structure, the causality-aware merge function, and the re-sync protocol for massively drifted nodes.

Definition 61 ( Hybrid Logical Clock ). A Hybrid Logical Clock on node \(j\) is a tuple \(h_j = (l_j, c_j)\) where \(l_j\) is the logical watermark — the maximum physical timestamp ever observed by node \(j\) from its own clock or received messages — and \(c_j\) is a counter that increments when consecutive events share the same watermark. HLC tuples are ordered lexicographically:

On a local send or write event at node \(j\), letting denote the watermark before the event and the current physical clock reading:

On receiving a message with HLC \((l_m, c_m)\), letting :

Each CRDT operation carries metadata \((n_j, h_j)\) where \(n_j\) is the node identifier. The pair \((n_j, h_j)\) globally and uniquely identifies the operation.

Compute Profile: CPU: per message event with HLC — three scalar comparisons and one conditional increment, versus for a full vector clock over a fleet of nodes. Memory: per node — one 64-bit HLC timestamp, versus for a full vector clock. HLC is the preferred choice for fleets above nodes on this basis.

Definition 61b (Message Delay Bound). Let be the maximum expected one-way message delivery time under normal network conditions at connectivity level \(C\), measured in physical time units:

bounds the 99th-percentile one-way delivery time under normal (non-adversarial) conditions and is calibrated from operational measurements. It does not bound adversarially injected delay. The anomaly detection threshold in Proposition 45 depends on at the current connectivity level.

Notation: \(\tau_{\max}\) here denotes maximum one-way network message delivery time (milliseconds to seconds). This is distinct from the staleness time constant in Self-Healing Without Connectivity, which measures the observation expiry window (minutes to hours). Full series notation registry: Notation Registry.

Proposition 45 ( HLC Causal Ordering Properties). For any two events \(e\), \(f\) with HLC timestamps \(h_e\), \(h_f\) on a fleet with maximum physical clock skew \(\epsilon\):

An HLC watermark never falls behind the real clock and stays within the fleet’s clock-skew envelope, so RAVEN drones can detect anomalous timestamps without NTP.

  1. at all times — the HLC watermark never lags physical time.
  2. If \(e \to f\) (e causally precedes f ), then \(h_e \prec h_f\).
  3. — the watermark exceeds physical time by at most the fleet-wide clock skew.
  4. If a received message has (Def 40b at current connectivity), then either (a) the sender’s clock is anomalous — the watermark violates property 3 — or (b) message delivery exceeded (a network anomaly). Single-message violations are ambiguous; if multiple consecutive messages from the same sender satisfy this condition, conclude (a).

Proof sketch: Properties 1–3 follow from the update rules: \(l\) always advances by at least the current physical timestamp, and a receive event produces \(h_j\) strictly greater than \(h_m\) (by incrementing \(c\) when watermarks coincide), preserving causal monotonicity. Property 3 holds when fleet clocks are bounded by a common authority (GPS PPS or NTP stratum 1) with skew \(\leq \epsilon\).

Property 4: Let \(t_s\) be the physical send time and the one-way delivery delay. Property 3 at the sender gives . At the receiver, is evaluated at receive time ; if the receiver’s clock lags the sender’s by the delivery interval, the watermark leads receiver physical time by up to . Therefore under normal delivery ( ):

If the observed difference exceeds , either the sender violated property 3 (clock anomaly) or (network anomaly). \(\square\)

Calibration: Set to the 99th percentile of measured one-way delivery times during normal operation.

For RAVEN on a local mesh with GPS-disciplined clocks (\(\epsilon \approx 1\)s): measured , so , giving anomaly threshold .

For CONVOY in mountain terrain: , so , giving threshold 6 s. Both thresholds are evaluated against the current connectivity regime (Def 40b); updates when regime transitions are detected.

Watch out for: Property 3 (the watermark skew bound \(l_j - \lfloor \mathrm{pt}_j \rfloor \leq \epsilon\)) requires fleet clocks to share a common disciplining authority (GPS PPS or NTP stratum 1) with aggregate skew ≤ ε; when GPS is denied and clocks free-run, the bound no longer holds and the anomaly detection threshold in Property 4 becomes unreliable, silently passing clock-drift anomalies as legitimate late-delivery events.

Physical translation: The Hybrid Logical Clock combines the precision of a physical timestamp with the causal guarantees of a vector clock. When two events have the same physical millisecond, the logical counter breaks the tie causally. A node returning from isolation can join the fleet’s causal ordering without a synchronized wall clock, as long as its logical counter is updated on first contact with a connected peer.

Definition 62 ( HLC -Aware CRDT Merge Function). Let \(s_1, s_2\) be CRDT states with HLC timestamps \(h_1, h_2\) and node identifiers \(n_1, n_2\). The HLC -Aware Merge function replaces the physical-timestamp LWW comparator :

where \(h_1 \parallel h_2\) denotes causal concurrency (neither precedes the other) and \(\sqcup\) is the CRDT join (least upper bound in the semilattice). When \(h_1 \parallel h_2\), no write is discarded — the join resolves the conflict without relying on clock ordering.

Per-type instantiation of the concurrent case:

CRDT typeCausal caseConcurrent case (\(s_1 \sqcup s_2\))
LWW-RegisterHigher \(h\) winsDeterministic tiebreaker: higher \(n\) wins
G-CounterNot applicableComponentwise max per node
OR-SetNot applicableUnion of (element, tag) pairs
RGA sequenceInsert ordered by \(h\)Concurrent inserts ordered by \((h, n)\)

The node-ID tiebreaker for concurrent LWW-Register writes gives every node a deterministic, agreed-upon total order extension — no coordination required. A clock that drifts 10 minutes fast no longer silently wins all concurrent decisions; it simply dominates the \(l\) field, which after Phase 2 repair ( Definition 63 ) is corrected before merging.

Formal extension: The tiebreaker in the table above is the lexicographic extension of HLC ordering with node identifier as a third level:

The LWW-Register merge is then \(s_1\) if , \(s_2\) otherwise — a total order on \((h, n)\) pairs requiring no coordination. The three-level comparison subsumes the general \(h_1 \parallel h_2\) concurrent case from Definition 62 .

Clock drift bound ( Proposition 45 , Property 3): With drift rate \(\delta_{\text{ppm}}\) (\(\delta_{\text{ppm}}\) denotes fractional clock drift rate; distinct from the compute-to-transmit energy ratio \(\rho = T_d/T_s\) in Why Edge Is Not Cloud Minus Bandwidth) (fractional; for crystal oscillators, for GPS-denied tactical edge), maximum clock divergence after partition duration \(T\) is:

For a 2-hour GPS-denied partition (\(T = 7200\) s, ): s. Since HLC watermarks track the maximum observed physical time rather than raw wall clocks, logical counters absorb any remaining tie-breaking load without requiring NTP.

Definition 63 (Drift-Quarantine Re-sync Protocol). When partitioned node \(j\) rejoins with signed drift (positive: clock ran fast; negative: clock ran slow), execute the following four phases:

Phase 0 — Detection: Compute on first peer message exchange. Let be the current connectivity-regime delay bound (Def 40b).

Phase 1 — Quarantine: Node \(j\) enters read-only mode — no new local writes are accepted; only incoming gossip is processed.

Phase 2 — HLC Repair: Reset the watermark and advance the counter past all observed peers:

In Phase 3 — Causality Audit — each operation generated during the partition falls into one of three cases. If concurrent ( for all conflicting peers at that key), it is a legitimate concurrent write and the CRDT join \(\sqcup\) is applied via (Def 41), discarding no data. If fast-clock (\(\Delta_j > 0\)), may dominate peer HLCs at causally prior positions: reissue \(o\) with repaired HLC where \(k\) preserves causal order among local operations, so the operation joins the fleet as concurrent, not as a “future winner.” If slow-clock (\(\Delta_j < 0\)), and \(o\) is causally prior: overrides \(o\) only if a peer write is its causal successor; concurrent peer writes are joined via \(\sqcup\), not silently discarded.

Phase 4 — Exit Quarantine: When all partition operations are audited and CRDT state has converged with peers, exit read-only mode and resume normal HLC operation.

Quarantine timeout: if quorum confirmation is not received within (computed from the local Weibull estimate), the node takes one of two escalating actions: (1) if the measured divergence \(\Delta_\text{acc}\) is below the soft-ejection threshold \(\delta_\text{soft}\), the node self-certifies, appends an audit log entry flagging the unconfirmed re-sync, and resumes read-write operation with reduced reputation weight; (2) if , the node escalates to a read-only safe posture equivalent to Terminal Safety State until a human operator or a fully connected quorum clears it.

Quorum-based quarantine clearance requires the node to be in BOM (Beacon-Only Mode) or better — it must be capable of receiving and processing a QUARANTINE_CLEAR message authenticated by a k-of-n quorum signature. A node in PLM (Passive Listening Mode) cannot process clearance messages and requires direct hardware operator intervention. A node in HSS (Hardware Shutdown State) cannot be cleared remotely under any circumstances.

Permanent isolation: If a quarantined node’s partition duration accumulator exceeds (the Proposition 37 Weibull circuit-breaker threshold from Self-Healing Without Connectivity) without re-sync completing, the node self-transitions to Terminal Safety State ( Definition 53 ): ceases all write operations, freezes its CRDT state, and broadcasts a QUARANTINE_PERMANENT flag to any reachable peers for audit logging. This prevents a permanently isolated node’s stale state from contaminating fleet state on a surprise reconnect after an operationally long isolation.

Proposition 46 (Re-sync Correctness). The Drift-Quarantine Protocol (Definition 63) guarantees:

A RAVEN drone whose clock drifted 12 minutes fast loses no valid updates on re-join — the quarantine protocol reclassifies its writes as concurrent rather than silently letting them overwrite peers.

  1. Convergence: after re-sync, all nodes agree on the same CRDT state in \(O(|P| \cdot n)\) gossip rounds, where \(|P|\) is the number of partition operations and \(n\) the fleet size.
  2. Completeness: no partition operation is silently discarded — every operation is classified as causally ordered or concurrent and handled by Def 41.
  3. Fast-clock neutralization: no reissued operation retains an HLC watermark above the repaired network watermark. The “future writes win” failure mode is eliminated.
  4. Slow-clock protection: operations with low HLC watermarks are not silently overwritten; they survive as concurrent when no peer operation is their causal successor.

Proof sketch: Property 1 follows from CRDT convergence ( Proposition 42 ): the merged state is the join of all operations, converging regardless of evaluation order. Property 2 follows from Phase 3: every operation is explicitly classified. Properties 3–19 follow from Phase 2 HLC repair: after repair, , neutralizing the fast-clock advantage; concurrent operations survive via \(\sqcup\). \(\square\)

Concrete scenario: Drone 23 partitioned for 47 minutes; its RTC drifted 12 minutes fast (\(\Delta_j = +720\)s). Without HLC : all 847 local writes have s, silently overwriting 12 minutes of correct fleet state on rejoin — the fleet loses cohesion. With Definition 63 : Drone 23 enters quarantine, Phase 2 repairs \(l_j\), Phase 3 classifies all 847 writes as causally concurrent with peer writes in the same time window, and they are merged via OR-Set and RGA join operations — not silently winning. Convergence completes in 3 gossip rounds (consistent with Prop 4 on the RAVEN 47-drone topology).

Phase timeline clarification — 3 gossip rounds vs. \(O(|P| \times n)\) audit rounds: The “3 gossip rounds” figure refers to CRDT state propagation only (Phase 4 of Definition 63 : disseminating the merged CRDT join to all peers).

The Phase 3 Causality Audit — classifying all 847 partition operations as causally ordered or concurrent — runs asynchronously and takes \(O(|P| \cdot n)\) gossip rounds, where \(|P| = 847\) (partition operations) and \(n = 47\) (fleet size). At the RAVEN gossip rate of \(\lambda = 1\) Hz and ~10 seconds per round, the full audit requires \(847 \times 47 \approx 39{,}809\) audit-exchanges at 1 Hz, completing in approximately 11 hours under continuous connectivity.

In practice, the audit parallelizes across nodes (each node classifies its own \(|P_j|\) operations independently), reducing wall-clock time to seconds (18 operations per drone, diameter 8).

The causality audit is not on the critical path for mission continuation — the node exits quarantine and resumes operations after state propagation completes in 3 rounds; the audit runs as a background verification process. The distinction matters operationally: “re-sync complete in 3 gossip rounds” means the fleet has a consistent CRDT state, not that causality has been fully audited.

Watch out for: Phase 2 HLC repair requires the quarantined node to receive the current network watermark from a peer; if the node is in BOM (Beacon-Only Mode) with receive-only links, it can hear the watermark but cannot complete the causality audit handshake, blocking the audit indefinitely — in this state the node should be treated as requiring manual operator intervention rather than automatic re-sync clearance.

Logical Validation: Peer Corroboration and Byzantine-Resilient Quorum

The CRDTs and vector clocks in the previous sections solve consistency — ensuring all nodes converge to the same state after partition. They say nothing about correctness. A node that merges consistently can still inject false sensor readings into shared state, and those readings propagate with the same convergence guarantees as honest ones. Secure Boot attestation proves a node runs verified firmware. It does not prove its sensors report truthful physical state. A physically compromised node with valid attestation can inject false position, sensor readings, or health data — passing all cryptographic checks while corrupting the fleet’s shared state model. The following three definitions address logical validation: truth as what the fleet independently verifies from physics, not what a node asserts. They build in order — first detecting false claims at a single node (Def 43), then weighting voters by track record (Def 44), then requiring a supermajority of high-trust nodes before any irreversible decision (Def 45).

Definition 64 (Peer-Validation Layer). For claim \(c = (\tau_c, v, h, \sigma_j)\) of type \(\tau_c\) with value \(v\), HLC timestamp \(h\), and signature \(\sigma_j\) from node \(j\), define the physical plausibility predicate at neighbor \(i\) as . Per claim type:

Claim typePlausibility condition (\(\phi(c,i) = 1\) when)
Positioni’s LIDAR/radar detects an object within of claimed coordinates
Sensor readingreading deviation from neighbor is bounded by physical gradient times separation distance
Battery statereported level is within energy-model margin of the predicted value

For sensor readings, the condition is , where is the maximum physical field gradient (e.g., temperature ). The corroboration count over \(k\) nearest neighbors \(N_k(j)\) is:

Claim \(c\) from node \(j\) is accepted into shared CRDT state only if . For RAVEN : \(k = 6\) nearest neighbors, (two-thirds majority of neighbors).

In practice, this means no node’s self-reported sensor data enters fleet state without independent corroboration from its physical neighbors — attestation proves the software is unmodified, but only neighbor sensors can prove the physical world matches what the node claims.

Physical translation: Drone 23 reports it is at grid position X. Before that claim enters the shared CRDT state, at least 4 of its 6 nearest neighbors must independently confirm — using their own LIDAR, radar, or optical sensors — that there is an object at position X. A compromised drone fabricating a false position cannot pass this check unless it also controls at least 4 surrounding drones’ sensor readings.

Proposition 47 (Peer-Validation False-Acceptance Bound). Let be the probability that a single neighbor sensor is independently fooled into corroborating a false claim. Under independent sensor compromise, the probability a false claim is accepted is:

Requiring four of six RAVEN neighbors to independently confirm a position report drops a compromised node’s false-acceptance rate to about one-in-a-thousand.

This gives the false-acceptance probability as a function of quorum size \(k\) and per-validator corroboration error rate \(p_{\mathrm{fool}}\). At \(p_{\mathrm{fool}} = 0.1\) (illustrative value): \(k = 1\) (single validator) is trivially compromised; \(k = 2\) admits 11% (theoretical bound under illustrative parameters) false acceptance; \(k \geq 3\) reduces false acceptance below 3% (theoretical bound under illustrative parameters); \(k \geq 5\) reduces it below 0.4% (theoretical bound under illustrative parameters). The quorum must be sized to bring false-acceptance probability below mission risk tolerance.

For RAVEN (\(k = 6\) (illustrative value), (illustrative value), (illustrative value)):

Empirical status: The \(p_{\mathrm{fool}} = 0.10\) per-validator false-corroboration rate is a conservative baseline; use \(p_{\mathrm{fool}} = 0.30\) in contested environments and calibrate from red-team exercises for the specific sensor modalities deployed.

Correlated attack caveat: Independence fails under coordinated GPS spoofing or cluster-level physical compromise. Countermeasure: require corroboration from sensors of distinct physical modalities (LIDAR, radar, optical, magnetometer) — spatial correlation of spoofing across modalities is far lower than within a single modality. An adversary simultaneously fooling independent sensing principles has achieved a level of compromise that is qualitatively outside the Byzantine fault model of Def 7.

Watch out for: the bound assumes independent sensor failure probability \(p_{\mathrm{fool}}\), which holds under random hardware faults but fails under any adversary with knowledge of the validation topology — a correlated-spoofing attack requires at most \(k_{\mathrm{accept}}\) physically proximate compromises rather than independent \(k\)-fold coincidence.

Watch out for: the Peer-Validation filter combines two independent Byzantine-detection mechanisms — physical-plausibility corroboration (above) and HLC watermark anomaly detection (Property 4 of the HLC Causal Ordering Properties) — but when GPS is denied and fleet clocks free-run, the HLC component can no longer distinguish Byzantine clock injection from legitimate drift; a Byzantine node claiming an inflated physical timestamp is indistinguishable from a legitimately drifted peer, so admission filtering degrades to content-corroboration-only, leaving a newly encountered Byzantine node that has not yet contradicted any neighbor’s physical readings undetectable until it makes an inconsistent claim.

Topology-aware adversary attack (defeats Prop 43 with probability 1): The independence assumption in Proposition 47 fails not only under correlated sensor spoofing but also under formation-aware positioning. An adversary with knowledge of the drone formation map can place a compromised node at a position where its \(k\) nearest neighbors are all within the same sensor shadow — a terrain feature, building, or signal reflector that makes all neighbor readings consistent with the false claim. Example: Drone 23 reports a false position 50 m north of its actual location. If the adversary knows that Drones 14, 18, 22, 27, 31, and 36 are all north of Drone 23 (in the same direction as the false displacement), their LIDAR returns from Drone 23’s general direction will be consistent with the false position within — all 6 neighbors independently corroborate a false claim deterministically, not probabilistically. The multi-modal countermeasure partially addresses this: magnetometer readings are invariant to LIDAR-visible terrain shadows. Required supplemental countermeasure: cross-modal corroboration must include at least one modality with independent failure geometry (e.g., RF time-of-flight ranging, which requires adversary control of radio propagation rather than optical/terrain knowledge). Additionally, require that corroborating neighbors span at least two distinct angular sectors from the claimant’s perspective — a corroboration quorum from a single sector is vulnerable to formation-aware adversaries regardless of how many nodes it contains.

Definition 65 (Reputation-Weighted Fleet Coherence Vote). Each node \(i\) maintains a reputation vector with \(r_j \in [0, 1]\), updated by EWMA on corroboration outcomes:

The update rule exponentially smooths reputation with additive recovery on corroboration passes and multiplicative decay on failures. The asymmetry is intentional — fast down, slow up — requiring \(\alpha_r \leq 1 - \exp(-T_{\text{epoch}} / \tau_{\text{rep}})\) to prevent reputation recovery faster than one partition epoch. Equal gain/loss rates would allow a node to recover in a single success after a dangerous failure; an asymmetry of at least 5:1 (illustrative value) is required to prevent Byzantine nodes from accumulating trust through easy low-stakes interactions.

where \(\alpha_r \ll 1\) (slow adaptation — prevents Byzantine manipulation of the reputation update itself). The weighted vote of node \(i\) on claim \(c\) from node \(j\) is . Claim \(c\) is accepted under reputation weighting if:

The effective Byzantine fraction after \(T\) gossip rounds, as Byzantine nodes consistently fail corroboration and receive EWMA weight 0 each round:

where . Byzantine nodes decay toward \(r_j = 0\) without removal — their vote weight becomes negligible.

Physical translation: A drone that repeatedly submits claims its neighbors cannot corroborate — fabricated positions, impossible battery levels — accumulates a poor reputation score. After enough failed validations, its votes on future claims are weighted near zero, so it can no longer influence fleet decisions even if it has not been formally expelled. This matters during partition, when expulsion requires quorum: reputation weighting silences a bad actor immediately, without requiring coordination.

Ejection floor: a node whose reputation score falls below \(r_\text{eject} = 0.05\) (threshold — requires reputation weight below 5% of neutral-weight node) is formally ejected from the merge pool. Below this threshold, the node’s weight contribution is negligible (less than 5% (theoretical bound) of a neutral-weight node) but its continued participation consumes protocol overhead. Ejected nodes are logged in the Post-Partition Audit Record and may seek re-admission only after a clean attestation round via the Trust-Root Anchor mechanism.

Connection to Prop 6: Prop 6 ( Byzantine Tolerance Bound) requires \(f < n/3\) under uniform trust weights. Def 44 relaxes this: once , the weighted scheme satisfies the tolerance bound even if the physical count — provided the reputation system has accumulated sufficient rounds. The minimum round count is (the time for to fall below \(1/3\) when \(f_0 < 1/3\)).

Definition 66 (Logical Quorum for High-Stakes Decisions). For capability level L3 and above transitions (Collaborative Planning, Full Integration) or commanded terminal safety state triggers (Def 124), a logical quorum \(Q \subseteq V\) must satisfy all five conditions simultaneously:

The five conditions are: (1) size — ; (2) reputation — ; (3) corroboration currency — every \(i \in Q\) has passed peer validation ( ) within the last seconds (Prop 79); (4) causal consistency — where \(h_i\) is the HLC vote timestamp (Def 40), so all quorum votes lie in the same causal window; and (5) spatial diversity — no single communication cluster contributes more than votes, so the quorum spans at least two physically separated clusters.

Decision \(D\) is logically validated if a valid logical quorum \(Q\) exists and:

In practice, this means irreversible fleet-wide decisions require not just a head count but a trust-weighted, spatially distributed, causally consistent vote — a single compromised cluster that controls many nodes still cannot unilaterally pass the quorum gate.

A 2/3 supermajority of reputation-filtered voters is required before any irreversible high-stakes decision. The reputation filter on \(Q\) carries as much weight as the 2/3 ratio: a quorum of low-trust nodes provides no Byzantine safety, since a 34% Byzantine coalition can break a simple majority at 50%+1 but cannot overcome a reputation-gated 2/3 threshold.

RAVEN “Change Mission” parameters (\(n = 47\) (illustrative value), \(f \leq 15\) (illustrative value)):

HSS trigger asymmetry (Def 124, Prop 93): Autonomous local HSS trigger (battery below and threat conditions met) requires no quorum — it is a unilateral safety action that must remain available in the Denied regime (C=0) where quorum is unreachable. Remotely commanded HSS requires a logical quorum plus pre-enrolled cryptographic command authority. In the Denied regime, commanded HSS must fall back to pre-authorized standing orders in L0 firmware, established at deployment enrollment; real-time quorum formation is impossible when partition is complete.

Physical translation: Before the RAVEN swarm changes its mission assignment — an irreversible act that disperses 47 drones to new grid sectors — at least 33 drones must vote YES, the voters must collectively hold 75% of the fleet’s total reputation weight, every voter must have passed peer validation within the last 11 seconds, and their votes must span at least two physically separated communication clusters. This five-condition gate ensures that neither a single compromised cluster nor a coalition of reputation-degraded nodes can redirect the swarm.

Proposition 48 (Logical Quorum BFT Resistance).

Even with valid Secure Boot, up to 15 compromised RAVEN drones cannot collectively redirect the swarm — any two quorums of 33 drones must share at least one honest node.

Under Def 7 with at most \(f < n/3\) Byzantine nodes [12, 13] — including nodes with valid Secure Boot attestation:

  1. Safety: no two contradictory decisions \(D\) and \(\lnot D\) can both be logically validated — any two valid logical quorums intersect in at least one honest node.
  2. Liveness: if \(n - f\) honest nodes are connected and currently corroborated, a valid logical quorum exists.
  3. Reputation convergence: after rounds, Byzantine nodes’ combined weight falls below , excluding them from condition 2.
  4. Anti-Sybil via spatial diversity: condition 5 prevents a single compromised cluster from unilaterally forming a quorum even if every node in that cluster holds valid attestation.

Proof sketch:

Property 1 — By condition 1, . By inclusion-exclusion, . At least one node in \(Q_1 \cap Q_2\) is honest; an honest node votes consistently, so \(Q_1\) and \(Q_2\) cannot simultaneously validate \(D\) and \(\lnot D\).

Property 2 — With \(n - f > 2n/3\) honest corroborated nodes, condition 1 is satisfiable; conditions 2–5 are satisfiable because honest nodes maintain high and growing reputation, and the fleet topology ensures multi-cluster connectivity.

Property 3Byzantine nodes accumulate EWMA weight 0 each round; geometrically at rate \((1-\alpha_r)\).

Property 4 — Any single cluster \(C\) with \(\vert C \vert < 2n/3\) cannot satisfy condition 1 alone; condition 5 forces a second cluster to contribute, and that cluster contains honest nodes whose votes are not controlled by the compromised cluster. \(\square\)

Physical translation: The inclusion-exclusion argument says that any two quorums of 33+ drones in a 47-drone fleet must share at least 16 drones — more than the 15-drone Byzantine fault tolerance limit — so at least one of those overlapping drones is honest and will refuse to vote YES for both a decision and its contradiction. The geometry of the quorum size requirement is what makes contradictory decisions structurally impossible, not just unlikely.

Watch out for: the \(f < n/3\) safety threshold must be evaluated using the current reputation-weighted effective fault count \(f_{\mathrm{eff}}(T)\), not the physical node count; a fleet whose reputation filter has not accumulated sufficient rounds to exclude Byzantine nodes from the merge pool can exceed the effective \(n/3\) threshold while the physical \(f\) remains safely below it — the convergence round count \(T_{\mathrm{excl}}\) from Property 3 must elapse before the quorum BFT resistance holds under reputation weighting.

Combined Byzantine + fast-clock threat: a Byzantine node that also runs a fast clock can simultaneously inject false claims (exploiting the Byzantine tolerance gap) and win LWW contests (exploiting the HLC drift window) before the Drift-Quarantine Protocol detects the drift. Defense: the Peer-Validation Layer ( Definition 64 ) and Drift-Quarantine Protocol ( Definition 63 ) must both clear a node before it can write to shared state. A node in Drift-Quarantine is automatically read-only — its LWW wins are discarded even if the timestamps are locally valid. This two-gate design means the combined attack requires compromising both the trust-based admission gate and the clock-drift detection simultaneously.

Proposition 48b (Compound Guarantee Uniqueness). No system in the class {Classical CRDTs , Byzantine CRDT variants, BFT consensus, HLC -based causal consistency, Byzantine gossip} satisfies all four of the following conditions simultaneously: (C1) Byzantine fault tolerance via trust-weighted admission; (C2) eventual convergence without consensus; (C3) causal ordering under clock drift without NTP; (C4) correct local operation under asynchronous partitions.

SystemC1: Trust-weighted BFTC2: Convergence w/o consensusC3: Causal ordering under driftC4: Async partitions
Classical CRDTsFailPassFailPass
Byzantine CRDT variantsFailFailFailPartial
BFT consensusFailFailFailFail
HLC causal consistencyFailPassPassPass
Byzantine gossipFailFailFailFail

Classical CRDTs fail C1 — the join absorbs Byzantine operations without authentication — and fail C3 because LWW timestamps are vulnerable to clock drift under GPS jamming. Byzantine CRDT variants fail C1 (authentication-based filtering is not behavioral-reputation weighting) and C2 (key distribution requires coordination). BFT consensus fails C2 by construction and C4 (quorum is unreachable during partition). HLC -based causal consistency satisfies C2 , C3 , and C4 but fails C1 — the max-selection rule \(\max(\text{physical}, \text{causal}+1)\) is exploitable by a Byzantine node claiming an inflated physical timestamp, with no behavioral-reputation gate to detect it. Byzantine gossip fails C2 (threshold counting is implicit coordination) and C4 (convergence requires ongoing delivery).

C1 is the condition that separates the compound design from all five comparison systems — every system satisfying C2 , C3 , and C4 fails C1 alone. The trust-weighted admission layer ( Definition 64 , Definition 65 , Definition 66 ) is therefore the structurally novel component.

Watch out for: HLC causal consistency satisfies C2 , C3 , and C4 and is the closest prior system to the compound design — the C1 gap holds only after rounds of reputation accumulation (from Proposition 48, Property 3); before that convergence window closes, the compound design and a pure HLC causal system are operationally indistinguishable, and any deployment that does not allow sufficient warm-up time before high-stakes decisions effectively reduces to the weaker prior system.

Causal Commit Ordering: NTP-Free Split-Brain Resolution

The CONVOY mitigation above surfaces a deeper structural flaw: wall-clock time defines an ordering that is neither causal nor semantic. It correlates with “which node last touched this value,” not “which value is operationally correct.” A vehicle whose clock is 3 seconds fast wins every concurrent route decision — regardless of information quality — and this bias is exploitable under active GPS jamming. The fix replaces the \(t_1 > t_2\) comparison with a three-level lexicographic ordering that needs no synchronized clocks.

Definition 67 (Semantic Commit Order). For two concurrent update records and — where \(V_i\) is the vector clock (Definition 60), is the application-assigned semantic priority, and is the authority-tier identity (Definition 68, with fixed at manufacturing) — the Semantic Commit Order \(\prec\) is determined by applying the following rule in sequence until a winner is found:

  1. Causal dominance (no wall clock): if \(V_1 < V_2\), then \(u_2 \succ u_1\); if \(V_2 < V_1\), then \(u_1 \succ u_2\). If incomparable (concurrent), proceed to step 2.
  2. Semantic priority: the higher-\(p_i\) record wins when \(p_1 \neq p_2\). If equal, proceed to step 3.
  3. Authority tie-breaker: lower tier number wins (L0 authority outranks L1; Definition 68 ). Among equal tiers, higher wins (globally unique; assigned at manufacture).

The order is total: because values are globally unique, step 3 never produces a tie.

In practice: semantic priority is set by the application developer at write time, encoding the operational importance of each update. A route closure (priority 90) wins over a logistics estimate (priority 20) regardless of which node sent it or which arrived first — content urgency determines commit order, not node seniority or timestamp.

Priority assignment and tie-breaking: priority values are assigned from the set \(\{0, 1, 2, 3\}\) where 0 is lowest and 3 is safety-critical. Priority values are not required to be unique across concurrent operations. Ties are broken deterministically by the tuple in lexicographic order, where node IDs are assigned at commissioning and are globally unique within the fleet. This ordering ensures that all nodes resolve the same tie identically without communication.

Semantic priority assignment for CONVOY :

Update typePriority \(p\)Rationale
Threat sighting (confirmed)100Overrides all other route data
Route closure (kinetic)90Safety-critical path change
Checkpoint status50Operational but non-urgent
Fuel and logistics estimate20Informational; tolerates staleness
Maintenance and comfort report5Yields to all operational data

Proposition 49 (NTP-Free Split-Brain Resolution). The Semantic Commit Order (Definition 67) satisfies all four properties required for correct split-brain resolution:

Two CONVOY vehicles that diverged for 45 minutes resolve conflicting route updates the instant they reconnect, with no clock sync required — authority tier breaks all ties deterministically.

  1. Totality: for any two distinct records \(u_1 \neq u_2\), exactly one of \(u_1 \succ u_2\) or \(u_2 \succ u_1\) holds.
  2. Causal consistency: if \(u_1 \to u_2\) (\(u_1\) causally precedes \(u_2\)), then \(u_2 \succ u_1\).
  3. Clock independence: steps 2 and 3 compare fields fixed at write and manufacture time; no wall-clock timestamp appears.
  4. Determinism: every node computes the same winner from the same two records, regardless of arrival order or local clock.

Proof: (1) Step 3 compares globally unique values — ties are impossible. (2) \(u_1 \to u_2\) iff \(V_1 < V_2\) by Proposition 44 ; step 1 resolves this before reaching steps 2–12. (3) \(p_i\) and are integer constants assigned at write and manufacture time; no clock participates. (4) The rule is a deterministic function of . \(\square\)

Connection to LWW: When \(p_1 = p_2\) and updates are concurrent, the authority-tier tie-breaker acts as a deterministic LWW with a “clock” that cannot drift — the is fixed at boot and invariant under GPS jamming, NTP failure, and deliberate clock manipulation. This is strictly stronger than wall-clock LWW: consistent under arbitrary clock skew and adversarial time sources.

Physical translation: Authority tier breaks symmetry — the higher-authority node’s version wins, with no need for a neutral majority. Two fleet vehicles that diverged during a 45-minute blackout resolve their conflict the moment they reconnect, without waiting for a third vehicle to cast a deciding vote.

Watch out for: the semantic priority field \(p_i\) is set by the application developer at write time and is not validated by the protocol — a misconfigured application that assigns priority 100 to a low-stakes update will cause that update to win every causal tie regardless of operational relevance, permanently displacing higher-importance concurrent writes with no protocol-level indication that the ordering is wrong.

Custom Merge Functions

When standard CRDT s don’t fit, define custom merge functions satisfying the same semilattice requirements as Definition 58 : commutativity, associativity, and idempotency.

Example: Surveillance priority list

Each cluster maintains a list of priority targets. During partition, both clusters may add or reorder targets.

Merge function:

  1. Union of all targets:
  2. Priority = maximum priority assigned by any cluster
  3. Flag conflicts where clusters assigned significantly different priorities

The merged priority of target \(t\) is the higher of the two clusters’ individual priority scores, so that important targets identified by either cluster are never downgraded during reconciliation.

This is commutative and associative. Conflicts are flagged for human review rather than silently resolved.

Example: Engagement authorization

Critical: a target should only be engaged if both clusters agree.

The merged authorization set contains only targets that both clusters independently authorized — using intersection rather than union ensures that a single cluster’s unilateral authorization is insufficient to commit the fleet to an engagement.

If Cluster A authorized target T but Cluster B did not, the merged state does not authorize T. Conservative resolution for high-stakes decisions.

Verification: Custom merge functions must be proven correct. For each function, verify:

  1. Commutativity: formal proof or exhaustive testing
  2. Associativity: formal proof or exhaustive testing
  3. Idempotency: formal proof or exhaustive testing
  4. Safety: merged state satisfies application invariants

Game-Theoretic Extension: Merge Semantics as Fair Division

Each CRDT merge function implicitly makes a fairness decision about whose divergent state takes precedence. The game-theoretic framing makes these choices explicit.

Fair division framework: When clusters A and B hold divergent states \(s_A \neq s_B\), the merge function \(m(s_A, s_B)\) allocates value. Under the Nash bargaining solution, the fair merge maximizes the product of utility gains over each cluster’s fallback:

A merge function fairness audit classifies the five standard choices by their semantic properties. LWW-Register (last-write-wins) favors the cluster with faster clocks or lower node ID tie-breaker — asymmetric and strategically manipulable by controlling commit timing. Intersection (engagement authorization) uses a unanimity rule that minimizes false authorizations but maximizes false negatives; correct when . Maximum (surveillance priority) is appropriate when priorities are independent across targets but over-allocates when two clusters assign effort to the same target (substitutable priorities). Union (G-Set coverage) is correct for additive capabilities with no conflict. Proportional quota ( STOCKSYNC ) satisfies Nash bargaining axioms when fallback utilities are zero.

Practical implication: For each CRDT in the system, document the merge function against the fairness criterion it satisfies. For resource allocation (quotas, task assignments), use proportional or Nash bargaining allocation. For irreversible decisions (engagement authorization), use intersection. For additive state (coverage maps, sensor readings), use union or maximum as appropriate to whether the underlying quantities are complementary or substitutable.

Cognitive Map: CRDTs solve the “who wins” problem structurally, not procedurally — the merge function determines the outcome at design time, not at reconciliation time. The three semilattice properties (commutativity, associativity, idempotency) are what make this work: any replica that has seen the same updates reaches the same state. The Reputation-Weighted extension ( Definition 58b ) adds a Byzantine admission gate before the semilattice merge — a compromised node cannot poison fleet state unless it controls a coalition above . The six standard CRDT types (G-Counter, PN-Counter, G-Set, 2P-Set, LWW-Register, MV-Register) cover the majority of edge state patterns; the selection criterion is the semantic intent of concurrent writes, not implementation convenience. Next: when CRDT semantics cannot resolve a conflict (two clusters both committed the same exclusive resource), authority tiers determine precedence.


Hierarchical Decision Authority

CRDTs handle data-level convergence but cannot resolve decisions. When two clusters independently commit an exclusive resource — a target engagement authorization, a shared processing slot — no merge function can make both assignments valid simultaneously. Classifying decisions by scope (node, cluster, fleet, command) and pre-delegating authority for the appropriate tier before partition occurs gives each node clear knowledge of which decisions it may make autonomously and which require escalation. Broader pre-delegated authority enables faster autonomous operation during partition but increases the risk of conflicting decisions, so the optimal delegation scope minimizes expected reconciliation cost — bounded delegation is always better than either full autonomy or full lockout.

Decision Scope Classification

Definition 68 (Authority Tier). The authority tier for classifies decisions by the scope of affected nodes. The scope of a decision \(d\) is the set of nodes whose state is affected by \(d\).

In other words, a decision belongs to tier based solely on how many nodes its outcome touches: a self-repair action on one drone is tier 0, a formation change within a cluster is tier 1, a mission-wide route update is tier 2, and anything requiring external command approval is tier 3.

TierNameScopeExample
\(\mathcal{Q}_0\)NodeSingle nodeLocal healing action
\(\mathcal{Q}_1\)ClusterLocal clusterFormation adjustment
\(\mathcal{Q}_2\)FleetAll nodesMission parameter change
\(\mathcal{Q}_3\)CommandBeyond fleetRules of engagement

Notation: (higher index = higher authority) denotes a tier level (a compile-time constant per node class). is the runtime effective tier at time \(t\) — may be downgraded from the design tier during partition. is the tier granted by a higher-authority node for a partition of duration \(\tau\). All three share the codomain . (See also the Constraint Sequence in Constraint Sequence, which formalizes the prerequisite ordering under which authority tiers must be satisfied before capability levels advance.)

Authority Tiers vs. Capability Levels. These are two orthogonal hierarchies that both use the term “level” but govern different aspects of node behaviour:

ConceptSymbolDetermined byGoverns
Capability Level (L_0–L_4)\(q\)Remaining battery / resource budgetEnergy fidelity, MAPE-K tick rate, measurement stack activation
Authority Tier\(Q_j\)Partition duration + pre-delegation rulesDecision scope, write authority, which actions a node may take autonomously

A node at capability level L1 (low battery) may simultaneously hold authority tier Q3 (high autonomy) if it was pre-delegated before the partition. The two hierarchies are independent. Capability level describes what functional service a node delivers; authority tier describes what decisions a node may make autonomously. When other sections say “escalate to L0,” context determines which hierarchy is meant: in the healing framework, “L0” always means capability level (functional degradation); in conflict-resolution rules in this article, “authority tier” always means decision scope.

Disambiguation: Capability level ( ) governs what the node can do given its resources; authority tier ( ) governs what decisions the node is permitted to make autonomously. A low-battery L0 node can still hold high Q-tier delegation.

Gateway nodes and tier assignment: a Autonomic Gateway wraps legacy hardware that has no native MAPE-K APIs. Gateway nodes are assigned (cluster-scope) authority by default. Healing actions issued through a gateway inherit the gateway’s tier ceiling. For actions that would require authority, the gateway must escalate the request to a node rather than forwarding directly.

Not all decisions have the same scope. The authority tier hierarchy is defined in Definition 68 above. During partition: and decisions continue normally; decisions become problematic since fleet-wide coordination is impossible; decisions cannot be made and the system must operate within pre-authorized bounds.

The diagram contrasts the full four-tier hierarchy under normal connectivity with the collapsed two-tier structure after partition, highlighting how the cluster lead absorbs elevated authority in the absence of higher tiers.

    
    graph TD
    subgraph Connected["Connected State (full hierarchy)"]
    A3C["A3: Command
(strategic decisions)"] --> A2C["A2: Fleet
(fleet-wide coordination)"] A2C --> A1C["A1: Cluster
(local coordination)"] A1C --> A0C["A0: Node
(self-management)"] end subgraph Partitioned["Partitioned State (delegated authority)"] A1P["A1: Cluster Lead
(elevated to A2 authority)"] --> A0P["A0: Node
(autonomous operation)"] end A1C -.->|"partition
event"| A1P style A3C fill:#ffcdd2,stroke:#c62828 style A2C fill:#fff9c4,stroke:#f9a825 style A1C fill:#c8e6c9,stroke:#388e3c style A0C fill:#e8f5e9,stroke:#388e3c style A1P fill:#fff9c4,stroke:#f9a825 style A0P fill:#e8f5e9,stroke:#388e3c

Read the diagram: Left side (Connected State): the full four-tier hierarchy, red at top (Command, strategic decisions) flowing down through Fleet \(\to\) Cluster \(\to\) Node. Right side (Partitioned State): only two tiers remain — the Cluster Lead (yellow, elevated to A2 authority) and the individual Node (green). The dotted arrow from A1-Connected to A1-Partitioned marks the partition event trigger. The elevation is explicit delegation, not automatic assumption: the cluster lead received pre-authorized A2 authority before the partition began.

Authority elevation during partition: When connectivity is lost, authority must be explicitly delegated downward. The system cannot simply assume lower levels can make higher-level decisions.

Interaction with healing admission: The Authority Tier check precedes the Healing Admission Condition (HAC) in Self-Healing Without Connectivity. A node whose does not evaluate HAC — the action is rejected before the Lyapunov gate is reached. This ordering ensures partitioned nodes with temporarily elevated effective tier cannot issue healing actions they lack the authority to perform.

Authority Delegation Under Partition

Formal Decision Problem:

Objective Function:

The objective selects the authority tier that maximizes expected mission value minus the cost of reconciling any decisions made at that authority tier when connectivity resumes.

Higher authority enables more effective local action but increases reconciliation cost upon reconnection.

Constraint Set:

Four constraints bound the delegation: authority cannot exceed what was pre-authorized, must be warranted by partition duration, is capped below command-tier, and must remain within an approved decision scope.

State Transition Model:

The delegated authority tier grows by one level per time units of continuous partition, saturating at so that command-tier authority is never granted automatically.

Authority increases by one tier per time units of partition, capped at . For CONVOY , minutes (illustrative value) — after 15 minutes (illustrative value) of partition, the cluster lead’s authority escalates by one tier.

The effective authority at the next time step depends on the current connectivity regime \(\Xi(t)\): when the fleet is connected or degraded ( or ), authority is restored to the cluster’s pre-configured ceiling; when the fleet is isolated or nominal-partition ( or ), authority falls back to the time-escalated delegated level.

Pre-delegated authority rules:

Bounded delegation constraints:

Authority vacuum scenario: When delegated authority expires and higher tiers remain unreachable, the system enters a constrained operating mode. The cluster continues operations (local decisions only) but cannot make fleet-affecting choices. This may result in suboptimal fleet behavior but prevents unauthorized scope expansion. Escape hatch: Pre-configured “mission abort” or “rally point” behavior activates after extended authority vacuum (e.g., 8+ hours), returning assets to safe configuration without requiring higher authority.

Mission-phase dependent modulation:

During critical phases, authority takes precedence over partition duration rules: a 30-minute partition does not elevate to while a critical phase is active.

Risk: Parallel partitions may both claim authority. Cluster A and Cluster B both think they’re the senior cluster and both make L2 decisions. On reconnection, they have conflicting fleet-wide decisions.

Mitigation: Tie-breaking rules defined in advance.

Game-Theoretic Extension: Incentive-Compatible Delegation

The delegation optimization addresses what authority to grant but not whether the agent will exercise it aligned with the principal’s interests. During partition, a cluster lead faces situations where mission-aligned actions conflict with asset preservation — the moral hazard risk is that a disconnected cluster lead with pre-committed authority may exercise that authority in ways that serve local asset preservation over mission objectives, and the principal (command tier) cannot observe or correct this during partition.

Actionable design: Pre-commit authority grants based on threat level, and require a post-reconnection accountability audit scaled to the authority exercised. A cluster lead that claims high threat to access elevated authority must submit a complete decision log and debrief at reconnection. This creates a cost for false threat inflation that exceeds the decision-making benefit unless the threat is genuine, making truthful reporting the dominant strategy without requiring complex mechanism design.

Practical implication: Extend pre-delegation rules with a self-reporting component: cluster leads submit a threat assessment at the start of isolation that determines their authority scope. Non-standard decisions are logged against this assessment and reviewed at reconnection, creating reputational accountability.

Conflict Detection at Reconciliation

Reconciliation Strategy Selection: Formal Problem

When clusters reconnect, multiple reconciliation strategies are available, differing in bandwidth use, latency, and residual divergence. The objective is to pick the strategy that minimizes total reconciliation cost while staying within coherence and bandwidth constraints.

Objective Function: The optimal strategy \(r^*\) minimizes the weighted sum of reconciliation time and residual divergence. Weight \(w\) reflects the relative cost of leaving divergence unresolved.

Minimize reconciliation time subject to bounded residual divergence.

Constraint Set: Three constraints bound the feasible strategies. Residual divergence must stay below threshold . Bandwidth consumed must not exceed what the current connectivity state \(C(t)\) provides. Any conflict must be resolvable deterministically via the CRDT semilattice join.

State Transition Model: After all partition clusters merge, the unified state is the CRDT join of every cluster’s local state — the join operator \(\bigsqcup\) applies the same commutative, associative, idempotent merge across all partitions simultaneously.

where \(\sqcup\) is the CRDT join (commutative, associative, idempotent).

Game-Theoretic Extension: VCG-Based Conflict Resolution

First-commit-wins and LWW-based conflict resolution create commitment races: both clusters have incentives to commit decisions early (to win the race), discarding information that arrives after commitment. The mechanism design solution eliminates this race.

The commitment race: Under LWW, if cluster A prefers decision \(d_A\) and knows cluster B will commit at \(\hat{t}_B\), cluster A commits at \(t_A < \hat{t}_B\) regardless of information quality. Nash equilibrium: both commit immediately, discarding all post-commitment information.

Second-price rule: Resolve conflicts in favor of the cluster whose commitment carries the highest declared value, with the winner paying the second-highest value to the loser (as compensation for opportunity cost). This is strategy-proof - no cluster benefits from misrepresenting decision value:

STOCKSYNC application: Replace “first commit fulfills the order” with a sealed-bid allocation: each warehouse submits the order’s declared value alongside the commitment. The highest-value commitment fulfills the order; the losing warehouse receives a credit equal to the declared value of its lost opportunity. This eliminates the commitment race and allocates inventory to the highest-value use.

MULTIWRITE application: For field service documentation conflicts, the cluster whose documentation covers the higher-priority task resolution wins, with the other cluster’s additions merged as supplementary notes. The priority ordering is determined by the task severity taxonomy, not commit timing.

Value-weighted conflict resolution with compensation transfers eliminates the commitment race for resources where both clusters may hold legitimate divergent values — the higher-value commitment wins, and the losing cluster receives a credit equal to its declared opportunity cost, removing the incentive to strategically inflate declared values.

When clusters reconnect, compare decision logs:

Detection: Identify overlapping authority claims. The conflict set collects all pairs of decisions \((d_A, d_B)\) where both the scope of \(d_A\) and the scope of \(d_B\) cover at least one common node, and the two decisions differ in content.

Two decisions conflict if they affect overlapping scope and differ.

Classification: Reversible vs irreversible. Reversible actions include route decisions before execution, target prioritization, and resource allocation. Irreversible actions are those where physical work has been done, resources consumed, or information disclosed.

Resolution for reversible: Apply hierarchy.

If Cluster A made decision \(d_A\) and Cluster B made decision \(d_B\):

  1. If : \(d_A\) wins
  2. If : Apply tie-breaker
  3. Update both clusters to winning decision

Resolution for irreversible: Flag for human review.

Cannot undo physical actions. Log the conflict, document both decisions and outcomes, present to command for analysis. Learn from the conflict to improve future protocols.

Commercial Application: MULTIWRITE Field Service Documentation

MULTIWRITE enables field technicians to edit work orders and documentation in locations with intermittent connectivity: industrial facilities, remote infrastructure, underground installations. Technicians collaborate on a shared system that must remain available regardless of connectivity.

The collaboration coherence challenge: Traditional document systems require online access. Field technicians working in basements, tunnels, or remote sites lose access precisely when they need documentation most. MULTIWRITE uses CRDT s to enable offline editing with automatic merge on reconnection.

Document structure as CRDT composition:

The diagram shows how a single work-order document is decomposed into a hierarchy of CRDT types, with the appropriate merge semantics chosen for each element — note how text sections use RGA while attachments use the simpler G-Set.

    
    graph TD
    subgraph "Document CRDT Structure"
        DOC["Document
LWW-Register (metadata)"] SECTIONS["Sections
OR-Set (add/remove sections)"] SEC1["Section 1
RGA (text sequence)"] SEC2["Section 2
RGA (text sequence)"] MEDIA["Media
G-Set (attachments)"] COMMENTS["Comments
OR-Set (threaded)"] end DOC --> SECTIONS SECTIONS --> SEC1 SECTIONS --> SEC2 DOC --> MEDIA DOC --> COMMENTS style DOC fill:#e3f2fd,stroke:#1976d2 style SEC1 fill:#e8f5e9,stroke:#388e3c style SEC2 fill:#e8f5e9,stroke:#388e3c

CRDT selection by document element:

ElementCRDT TypeConcurrent Edit Behavior
Document titleLWW-RegisterLatest edit wins
Section listOR-SetBoth additions preserved
Section textRGA (Replicated Growable Array)Character-level merge
Checklist itemsOR-Set with LWW statusItems merged; status by timestamp
Photos/attachmentsG-Set with metadataAll attachments preserved
SignaturesLWW-Register per signatoryLatest signature wins

RGA for text editing: The Replicated Growable Array preserves insertion order across concurrent edits. Each character has a unique identifier based on (site_id, sequence_number), enabling deterministic ordering:

When technicians A and B both insert text at position P during partition:

This may create awkward text (“valvepump replaced”) but never loses edits. Technicians resolve semantic conflicts on review.

Real-world example: Two technicians simultaneously edit an equipment inspection report:

TechnicianLocationEditTimestamp
AliceBasement (offline)Adds “Motor bearing wear detected”14:32
BobControl room (online)Changes status to “Requires immediate attention”14:35
AliceReturns to lobbyAdds photo of bearing damage14:41

When Alice’s tablet syncs at 14:42:

  1. Her text insertion merges into document (RGA preserves position)
  2. Bob’s status change applies (LWW, Bob’s edit was later)
  3. Her photo adds to media set (G-Set union)
  4. Document now contains both contributions with correct status

RGA tombstone accumulation: Every deletion in RGA creates a tombstone — the deleted character record is marked but retained, not hard-removed, because any peer that has not yet received the deletion message would re-insert the character on merge. In a 256KB microcontroller running 30-plus minutes of offline MULTIWRITE edits, unacknowledged tombstones accumulate without bound.

Definition 69 (RGA Tombstone Pruning Strategy). An RGA causal sequence is a set of records , each of the form , where \(c\) is the character value (\(\perp\) for a tombstone), is a globally unique causal identifier, is the causal predecessor’s , is the tombstone flag, and \(K \subseteq V\) is the set of nodes that have acknowledged this record.

The acknowledgement vector \(A_i[j]\) at node \(i\) stores the highest sequence number from site \(j\) that node \(i\) has received. A tombstone \(r\) satisfies the global acknowledgement condition — and is safe to remove — when:

The pruning operation removes all records from and re-points parent references in surviving records to the nearest non-pruned ancestor, preserving the causal chain for all live characters. Pruning is triggered when RGA memory consumption exceeds — the half-budget threshold from the memory budget enforcement above.

Permanent-failure case: If a node’s acknowledgment vector has not advanced for seconds (recommended default: twice the Weibull P95 partition duration, Definition 13 ), treat that node as having acknowledged all prior tombstones: advance its ack to the global maximum before evaluating the pruning condition. This prevents RGA memory deadlock when a node suffers a hardware casualty rather than a temporary partition. must be set conservatively larger than the longest anticipated recoverable partition.

Interaction with Byzantine ejection: A soft-ejected node ( Definition 34 ) remains in the RGA acknowledgment set — ejection removes it from merge decisions (\(w_j \leftarrow 0\)) but does not remove it from gossip participation. A live ejected node continues to send ack confirmations normally; no tombstone deadlock arises from ejection alone. The T_ack-stale rule applies when an ejected node is also partitioned or suffers a hardware casualty — in that case, both conditions are handled identically by the staleness timeout above.

Proposition 50 (Tombstone Pruning Safety Bound).

MULTIWRITE can safely compact deleted-character records once every live node has confirmed receipt — this bound tells you how long to wait before garbage-collecting tombstones.

In an \(n\)-node fleet with gossip rate \(\lambda\), mesh diameter \(D\), and per-message loss probability , the expected time until a tombstone satisfies the global acknowledgement condition is bounded by the convergence time from Proposition 13 . At steady state, the unpruned tombstone count and memory footprint satisfy:

where is the fleet-wide deletion rate (deletions per second) and \(B_r\) is the per-record byte cost (site + seq + parent pointer + flags). Proof: by Proposition 13 , acknowledgement propagates to all nodes within seconds with probability \(\geq 1 - 1/n\). Tombstones younger than are unprunable; older ones satisfy the safe condition. The steady-state count follows from Little’s Law (\(N = r \cdot T\)). \(\square\)

MULTIWRITE calibration (\(n = 12\), \(D = 3\), , , , ):

Empirical status: The MULTIWRITE calibration (\(n=12\), \(D=3\), \(p_{\mathrm{loss}}=0.1\)) is illustrative; tombstone accumulation is highly sensitive to deletion rate \(r_{\mathrm{del}}\) and network diameter — profile these from production logs before setting the \(B_{\mathrm{RGA}}/2\) pruning threshold.

Negligible under normal operation. The risk materialises during extended partitions: a 30-minute offline period at accumulates \(2 \times 1800 = 3600\) tombstones, approximately 86KB — crossing the pruning threshold and triggering the archival step (step 4 of the memory budget enforcement above).

Physical translation: A deletion record (tombstone) is safe to compact only after every live node has confirmed receipt. Pruning too early — before an offline node catches up — would make the deleted item “reappear” when that node reconnects. The safety bound defines the minimum observation window before garbage collection.

Watch out for: the safety bound uses Little’s Law with the gossip convergence time \(T_{\mathrm{conv}}\) from Proposition 13; if the mesh diameter \(D\) has been underestimated — common when failed nodes create routing gaps — the actual convergence time is longer than \(T_{\mathrm{conv}}\), and pruning before the corrected value causes deleted items to reappear at nodes that have not yet acknowledged.

Hierarchical authority for documentation: The three documentation roles form a strict authority containment chain — every action a technician can take is also permitted to a supervisor, and every supervisor action is also permitted to an engineer, but not vice versa.

Decision scope for documentation assigns actions by role: technicians (L0) create observations, add photos, and draft findings; supervisors (L1) approve work orders, modify assignments, and override findings; engineers (L2) certify inspections and approve safety-critical changes; compliance officers (L3) lock documents and submit to regulatory authorities.

During partition, technicians can create and edit at L0. Supervisor actions queue for sync. If urgency requires L1 decision offline, the supervisor can invoke elevated offline authority with mandatory post-sync review.

Conflict resolution for regulatory documents: Some fields require single authoritative value (serial numbers, measurement readings). MULTIWRITE handles these specially: the merge outcome is last-writer-wins when the two recorded values are compatible (differ by less than tolerance \(\epsilon\)), and an explicit conflict flag requiring human resolution when they exceed that tolerance.

When two technicians record different measurement values for the same reading:

  1. If difference < measurement tolerance: Average values
  2. If difference > tolerance: Flag as conflict, require re-measurement
  3. Maintain audit trail of both original values

Sync prioritization for field operations: When bandwidth is limited, MULTIWRITE transmits data in the order shown below — priority 1 goes first because safety observations may trigger immediate field action, while lower-priority data such as photos and notes can safely wait for a more stable connection.

PriorityData TypeRationale
1Safety observationsMay trigger immediate action
2Work order statusAffects scheduling and dispatch
3Equipment readingsTime-sensitive data
4Photos and attachmentsLarge but deferrable
5Comments and notesContextual, can wait

Field tablets opportunistically sync whenever connectivity permits, prioritizing safety-critical data even over brief connections.

Correctness analysis:

Assumption Set : RGA for text, G-Counter for quantities, LWW for metadata, semantic conflict detection for measurements.

Automatic merge rate bound: Conflicts occur when:

  1. Same text position edited (probability )
  2. Measurement discrepancy exceeds tolerance (probability )

The automatic merge succeeds when neither collision event occurs; the union bound gives the conservative lower bound by treating the two failure modes as independent.

For disjoint work regions ( ) and consistent measurement technique ( ): .

Data loss bound: Under CRDT semantics, by construction - all operations merge via semilattice join.

Utility improvement: , where is eliminated waiting time for connectivity.

Cognitive Map: Authority tiers convert the “who wins” decision from an ad-hoc judgment into a pre-agreed structure. The four tiers (node, cluster, fleet, command) map directly to the scope of each decision’s effects. The key insight from Proposition 51 and the MULTIWRITE scenario: authority delegation is bounded in both scope and time — the cluster lead receives authority for partition duration \(\tau\), with the delegated authority expiring automatically on reconnection. The VCG mechanism and Nash bargaining extension make the delegation incentive-compatible: nodes cannot improve their outcome by misreporting their decision scope. Next: when reconnection occurs, the reconciliation protocol must efficiently identify and merge the divergent state accumulated during partition.


Reconnection Protocols

When two clusters reconnect after partition, they may have hours of diverged state. Exchanging the full state is too expensive on bandwidth-constrained links, so the protocol must identify only the divergent items, transfer them efficiently, and handle re-partition mid-sync gracefully. Merkle tree reconciliation identifies the \(k\) divergent items in \(O(k \log(n/k) + k)\) messages rather than \(O(n)\); Delta-Sync transfers only the changes; and HLC timestamps provide causal ordering without NTP, enabling correct LWW resolution under clock drift. Merkle reconciliation is efficient for sparse divergence (small \(k\)), but when \(k\) approaches \(n\) — full divergence after a long burst partition — a full state copy is cheaper than Merkle traversal. The burst-corrected divergence model ( Proposition 41b ) determines which regime applies.

State Reconciliation Sequence

When partition heals, clusters must reconcile state efficiently. Bandwidth may be limited during reconnection window. Protocol must be robust to partial completion if partition recurs.

Phase 1: State Summary Exchange

Each cluster computes a compact summary of its state using Merkle trees [14] :

Where \(H\) is a hash function and \(s_i\) are state elements.

Exchange roots. If roots match, states are identical - no further sync needed.

Phase 2: Divergence Identification

If roots differ, descend Merkle tree to identify divergent subtrees. Exchange hashes at each level until divergent leaves are found.

Proposition 51 (Reconciliation Complexity). For \(n\)-item state with \(k\) divergent items, Merkle-based reconciliation requires \(O(k \log(n/k) + k)\) messages to identify and transfer divergences. When divergent items are spatially concentrated in the tree, this reduces to \(O(\log n + k)\).

A STOCKSYNC warehouse pair with 10,000 (illustrative value) inventory records but only 200 (illustrative value) divergent SKUs needs around 1,400 (theoretical bound under illustrative parameters) messages — not 10,000; keep partitions short to bound divergence count, not fleet size.

In other words, when only a small fraction of state actually diverged, Merkle-based sync is far cheaper than exchanging the full state: instead of \(O(n)\) messages you need only \(O(k \log n)\) in the general case and \(O(\log n + k)\) when divergent keys are clustered together.

Physical translation: Doubling the fleet roughly doubles the sync cost for small divergences (\(O(k \log n)\) scales slowly in \(n\)), but doubling the divergence itself doubles the cost directly (\(O(k)\) factor). At fleet scale this means a 100-vehicle fleet (illustrative value) with 5 (illustrative value) divergent items needs ~35 messages (theoretical bound); a 200-vehicle fleet (illustrative value) with the same 5 (illustrative value) divergent items needs only ~38 messages (theoretical bound) — nearly identical. But 100 (illustrative value) divergent items in a 100-vehicle fleet (illustrative value) needs ~700 messages (theoretical bound). The practical implication: keep partition durations short to bound \(k\), not fleet size. For OUTPOST with 127 (illustrative value) sensors and 5 (illustrative value) differing records after a 2-hour (illustrative value) partition, reconciliation requires approximately (theoretical bound under illustrative parameters) message hashes — a trivially small overhead compared to transmitting all 127 records in full.

Proof: The Merkle tree has height \(O(\log n)\). In each traversal round, parties exchange hashes for differing subtrees. At depth \(i\), at most \(\min(k, 2^i)\) subtrees differ. Traversal terminates after \(O(\log(n/k))\) levels, when each subtree contains at most one divergent item, yielding \(O(k \log(n/k))\) hash comparisons. Adding \(O(k)\) data transfers gives total message complexity \(O(k \log(n/k) + k)\). When all \(k\) divergences fall within a single subtree (spatially concentrated case — common when related state keys are grouped), the traversal depth is \(O(\log n)\) regardless of \(k\), reducing total complexity to \(O(\log n + k)\). For sparse divergence (\(k \ll n\)), provides the general upper bound.

Watch out for: the \(O(k \log(n/k))\) bound assumes the Merkle tree partitions state keys roughly uniformly; if the application uses a time-ordered key scheme (e.g., HLC timestamps as primary keys), all \(k\) divergent items from a recent burst will cluster in the rightmost subtree, reducing traversal to the \(O(\log n + k)\) case — but if the scheme distributes keys uniformly across leaves (e.g., hash-based partitioning), the \(O(k \log(n/k))\) bound is tight and cannot be improved without restructuring the tree.

Phase 3: Divergent Data Exchange

Transfer the actual divergent key-value pairs. Prioritize by importance (see Priority Ordering for Sync below).

Phase 4: Merge Execution

Apply CRDT merge or custom merge functions to divergent items. Compute unified state.

Phase 5: Consistency Verification

Recompute Merkle roots. Exchange and verify they now match. If mismatch, identify remaining divergences and repeat from Phase 3.

Phase 6: Coordinated Operation Resumption

With consistent state, resume fleet-wide coordination. Notify all nodes that coherence is restored.

The diagram shows the complete six-phase reconciliation protocol as a loop: the key pattern is that root comparison acts as a fast-path gate — only divergent states proceed through the Merkle traversal and merge stages.

    
    graph TD
    A["Partition Heals
(connectivity restored)"] --> B["Exchange Merkle Roots
(state fingerprints)"] B --> C{"Roots
Match?"} C -->|"Yes"| G["Resume Coordination
(fleet coherent)"] C -->|"No"| D["Identify Divergences
(traverse Merkle tree)"] D --> E["Exchange Divergent Data
(priority-ordered)"] E --> F["Merge States
(CRDT merge)"] F --> B style A fill:#c8e6c9,stroke:#388e3c style G fill:#c8e6c9,stroke:#388e3c,stroke-width:2px style C fill:#fff9c4,stroke:#f9a825 style D fill:#bbdefb style E fill:#bbdefb style F fill:#bbdefb

Loop termination: the reconnection merge loop executes at most \(N_\text{merge,max}\) iterations, where and \(\delta_\text{batch}\) is the maximum delta-set size per round. Under Byzantine re-injection, a node that generates more than \(N_\text{merge,max}\) delta batches in a single reconnection window is flagged as suspicious and its reputation is reduced. The merge loop exits and logs an audit entry after \(N_\text{merge,max}\) iterations regardless of outstanding deltas.

Priority Ordering for Sync

Limited bandwidth during reconnection requires prioritization.

Priority 1: Safety-critical state

Priority 2: Mission-critical state

Priority 3: Operational state

Priority 4: Audit and logging

Sync Priority 1 first. If partition recurs, at least safety-critical state is consistent. Lower priorities can wait for more stable connectivity.

Optimization: Within each priority level, order sync items by expected information value: the value of syncing state item \(s\) is the product of its operational impact (how much mission outcome depends on this state) and its staleness (how long it has been out of date).

High-impact, stale items should sync first. Low-impact, fresh items can wait.

Delta-Sync Protocol: Bounded Partial Synchronization

The priority ordering above specifies what to sync first but not how to bound the sync to a short connectivity window. For CONVOY vehicles that reconnect for 5-second windows before returning to dead zones, a protocol that assumes unlimited connectivity will stall mid-sync when the window closes, leaving partial state applied. The Delta-Sync Protocol formalizes the sync as a sequence of atomic, self-contained steps that are safe to interrupt at any point.

Definition 70 (Delta-Sync Protocol). Given reconnected nodes \(i\) and \(j\) with connectivity window \(T_W\) seconds and bandwidth \(B\) bits per second, the Delta-Sync Protocol operates in three phases:

Phase 1 — Fingerprint exchange (fixed overhead \(C_F\) bytes regardless of state size): Node \(i\) transmits a compact fingerprint , where \(\vec{V}_i\) is \(i\)’s current vector clock and \(\vec{h}_i\) is the vector of Merkle hash roots for each priority tier. Node \(j\) reciprocates. Total fingerprint cost: \(2C_F\) bytes, completing in \(2C_F / B\) seconds.

Phase 2 — Priority-ordered delta generation: For each priority tier , node \(i\) computes the tier-\(k\) delta:

Items are serialized in tier order — all \(\Delta_1\) items first, then \(\Delta_2\), then \(\Delta_3\), then \(\Delta_4\). Each item is self-contained, carrying its own causal identifier ( from Definition 69 ) and its full CRDT value.

Phase 3 — Windowed transmission: Items from the serialized delta are transmitted in priority order. Transmission halts cleanly at the end of the connectivity window. Because each item is self-contained, a partial sync leaves the recipient in a consistent state — no item is ever half-applied.

Physical translation: Only the state differences are synced — not the full state snapshot. If two CONVOY vehicles have 10,000 shared state entries but only 500 changed during a 6-hour partition, only those 500 entries (plus the 64-byte fingerprint) cross the link. The ratio of changed to total state is the bandwidth reduction factor: 5% update fraction yields \(20\times\) less data transferred versus a full-state exchange.

Compute Profile: CPU: per reconnection — Phase 1 Merkle fingerprint over tiers; Phase 2 delta generation vector clock scan; Phase 3 priority-tier sort . Memory: — delta buffer grows with partition duration; switch to chunked streaming when entries to avoid linear memory pressure at reconnection.

Proposition 52 (Delta-Sync Coverage Bound). Given window \(T_W\), bandwidth \(B\), fingerprint overhead \(C_F\), and per-item byte cost \(b_k\) at tier \(k\), the number of tier-1 items transmitted in a single window is:

A 5-second CONVOY reconnection window at 250 kbps can push all four priority tiers — the minimum window for full coverage is directly computable from this formula.

Priority-2 items begin only after all priority-1 items are transmitted, and so on. The minimum window for full-tier coverage is:

where \(S_k\) is the byte size of \(\Delta_k\). Proof: Phase 1 consumes \(2C_F\) bytes. The remaining \(B \cdot T_W - 2C_F\) bytes are allocated to delta items in priority order. follows from integer division. \(T_W^*\) is the window at which all tiers fit. \(\square\)

CONVOY calibration (\(T_W = 5\,\text{s}\), , ):

Available for state sync: .

TierContentSizeSynced in 5 s?
1 — Safety-criticalThreat locations, node liveness2 KBYes — first 0.06 s
2 — Mission-criticalObjective status, positions12 KBYes — first 0.4 s
3 — OperationalSensor readings, health metrics40 KBYes — first 1.7 s
4 — Audit and loggingDecision logs, timestamps80 KBYes — first 4.3 s

At 250 kbps, CONVOY achieves full-state sync in 4.3 seconds — inside the 5-second window. Under partial jamming at 50 kbps: available, covering tiers 1 and 2 with 16 KB to spare. Safety-critical and mission-critical state converges even under heavy jamming; audit logs queue for the next window.

Empirical status: The CONVOY calibration (250 kbps, 5 s window) reflects nominal link capacity; margin-link conditions (50 kbps under partial jamming) are included in the table and remain the more realistic design point for contested environments.

Why self-contained items matter: If the window closes mid-transmission, untransmitted items remain in for the next window. Transmitted items are applied atomically on receipt. The receiver’s state is always the union of fully-applied CRDT records — never a partial merge — because each delta item is a complete causal record, not a raw byte fragment.

Watch out for: the coverage formula computes item count using the fixed bandwidth estimate \(\hat{B}\) measured once by the pre-sync probe; if the link degrades mid-window — as is common on MANET links under mobility or multipath fading — actual items transmitted may be fewer than \(n_1^{\max}\), and Proposition 54’s first backoff interrupts transmission after tier-1 items are safely delivered rather than indicating protocol failure.

Physical translation: Only changed items are transmitted on reconnection — but coverage is guaranteed. A node that missed 1,000 updates during a partition receives exactly those 1,000 delta-mutants, not a full state dump. Sync overhead grows with edit count, not total state size.

Causality Header: Zero-Overhead HLC Integration

The Delta-Sync fingerprint from Definition 70 already travels in Phase 1. Embedding the HLC state and clock-trust signal into this fingerprint costs 10 additional bytes — absorbed entirely within the constant — leaving per-item and per-packet structure unchanged.

Definition 71 (Causal Packet Header). The extended fingerprint augments Definition 70’s with a 10-byte Causality Header:

On receiving from peer , node immediately determines — with no additional round-trips:

FieldSizeSourceReceiver action
— HLC watermark4 BDefinition 61 send ruleApply receive rule of Def 40; check anomaly condition of Prop 41
— HLC counter2 BDefinition 61 send ruleMerge into local HLC counter
— partition age4 B accumulator (Def 68)Select vs (Def 59); decide quarantine (Def 63)
Total10 BAbsorbed into ; zero per-item overhead

The Causality Header embeds three signals — HLC state, clock trust, and quarantine decision — into the Phase 1 fingerprint at a total cost of 10 B added to \(C_F\), with no per-delta-item overhead. The three fields are \(l_i\) and \(c_i\) from the Definition 61 send rule, and \(T_{\mathrm{acc},i}\) from the Definition 15 accumulator, all updated at each MAPE-K tick. Without \(T_{\mathrm{acc}}\) in the fingerprint, a receiver cannot determine whether incoming HLC timestamps are trustworthy before applying accumulated deltas from a long partition.

Proposition 53 (Causality Header Overhead Bound). Adding the 10-byte Causality Header to the Definition 70 fingerprint increases by 10 bytes. The coverage loss relative to Proposition 52 is:

Adding full clock-trust signalling to every reconnection handshake costs at most one delta item per window — the overhead is negligible across all three deployment scenarios.

For any (minimum self-contained causal record), the coverage loss is zero items. For (minimum viable record: site_id 2 B + seq 4 B + HLC 6 B + CRDT value 4 B), the loss is at most 1 item per window.

Proof. The window budget changes from to . Coverage loss is exactly items. At : loss is item. \(\square\)

Scenario before Coverage lossImpact
CONVOY (250 kbps, 5 s)64 B74 B156 KB1 item (16 B)< 0.01%
OUTPOST (9.6 kbps, 30 s)64 B74 B36 KB1 item (16 B)< 0.05%
RAVEN (1 Mbps, 2 s)64 B74 B250 KB1 item (16 B)< 0.01%

This computes the exact item-count coverage loss from adding the 10-byte Causality Header to the Phase 1 fingerprint, parameterized by \(b_1\) (minimum 16 B for a viable causal record) and the deployment constants \(C_F\), \(B\), \(T_W\). The assumption that header additions linearly cost throughput is incorrect: the impact is bounded to at most 1 item per reconnection window across all deployment scenarios.

Watch out for: the zero-item coverage loss proof requires \(b_1 \geq 20\) bytes; below that threshold the 20-byte header addition costs exactly one item per window — an application that packs delta items below 20 bytes (e.g., 4-byte counters without causal identifiers) is sacrificing causal ordering for compression, and the coverage loss shifts from negligible to one item per reconnection window.

Unified Autonomic Header: Synthesis of Three Fixes

Definitions 59 and 61 (Clock Fix), the Nonlinear Safety Guardrail from Self-Healing Without Connectivity (Stability Fix), and the Zero-Tax Hash Chain from The Constraint Sequence and the Handover Boundary (Resource Fix) each add fields to the gossip fingerprint. Rather than three independent headers, they compose into a single 20-byte Unified Autonomic Header (UAH) — the concrete answer to the three structural design constraints introduced in Why Edge Is Not Cloud Minus Bandwidth.

Definition 72 (Unified Autonomic Header). The UAH is a 20-byte field appended to the Phase 1 fingerprint of Definition 71, structured as follows:

OffsetFieldSizeSourceReceiver action
0–3 — HLC watermark4 BDefinition 61 send ruleApply Def 40 receive rule; check Prop 41 anomaly condition
4–5 — HLC counter2 BDefinition 61 send ruleMerge into local HLC counter
6–9 — partition age4 BDefinition 15 accumulatorSelect vs (Def 59); decide quarantine (Def 63)
10 — CBF margin1 BProp 80, Q0.8 encodingCheck NSG veto; means outside safe set
11 — mode index1 BCapability level L0–L4Skip high-severity healing requests if (OBSERVE)
12–15 — hash suffix4 BDefinition 29 chain suffixCompare to local suffix; divergence within one tick = metric corruption
16 1 BComputed each tickSee flag table below
17–19padding3 B4-byte boundary alignment

Physical translation: The 20-byte Unified Autonomic Header is the fleet’s common language for the operational context of any message. Every packet carrying fleet state or healing intent includes this header, allowing any receiving node to immediately understand the sender’s connectivity tier, authority level, and reliability requirements without additional handshake overhead.

Flag byte :

BitsNameMeaning
0trust_flag1 when — physical clock trusted; use full HLC
2:1zt_state00 = OBSERVE, 01 = WAKEUP, 10 = ACTIVE (Definition 128 Zero-Tax state)
3nsg_veto1 when — sender outside CBF safe set; enforced
7:4reservedMust be zero

The three fixes are mutually reinforcing: a receiver seeing zt_state = 00 (OBSERVE) knows simultaneously that the sender’s vector clock is frozen (no delta items expected), the hash suffix is the sole integrity signal, and on the sender — no healing requests should be directed to it. A receiver seeing nsg_veto = 1 suppresses high-severity healing requests regardless of the local anomaly score . A receiver seeing trust_flag = 0 applies to all sender deltas before merging.

The UAH synthesizes the Clock Fix (HLC + \(T_{\mathrm{acc}}\)), Stability Fix (CBF margin + mode index), and Resource Fix (hash suffix + Zero-Tax state) into a single 20-byte field updated at each MAPE-K tick. Encoding conventions: \(\rho_{q,i}\) as unsigned byte (0 = margin 0.0, 255 = margin 1.0); \(h_{\mathrm{sfx},i}\) = last 4 bytes of the Definition 29 chain; \(q_i \in {0,1,2,3,4}\) matching L0–L4. Three independent fixes would grow the fingerprint by 16 B with redundant alignment; the UAH absorbs all three into 20 B with a single 3-byte pad.

UAH scope: The UAH is not a compressed vector clock. The c_i field (2 B) is the HLC sub-second disambiguator: it increments only when two events share the same physical second and resets to zero at the next clock tick. Under typical MAPE-K operation (5 s/tick), c_i ∈ {0, 1} at almost every tick — 2 bytes is not a constraint in practice. The trust_flag and T_acc fields signal which ordering to use; the vector clock vec{V}_i is the ordering mechanism itself. Both are required: the UAH alone provides only within-second causal disambiguation, insufficient for post-partition reconciliation of multi-day state.

The counter-wrap risk lives in vec{V}_i, not in the UAH. Under ≺_logic (physical clock untrusted, T_acc > T_trust), the system orders events by (c_i, n_i). Because c_i resets every second, this has no cross-second resolution — day-1 and day-29 events can both carry c_i = 0. Day-level causal ordering is provided by the vector clock vec{V}_i — the variable-size Phase 1 fingerprint component, not the 20-byte UAH.

For RAVEN (47 drones), vec{V}_i using dotted version vectors at 8 clusters \(\times\) 4-byte counters = 64 bytes. At 200 updates/min from ~6 drones per cluster, the per-cluster sequence number accumulates events over 30 days — well within the 4.29 B capacity of a 4-byte counter. No wrap for a 30-day mission. If 1-byte counters were erroneously substituted, the first wrap would occur in 255/200 min \(\approx\) 77 seconds, producing false causal links on reconnection.

The Day-497 bug (OUTPOST). RAVEN is mission-bound to 30 days; OUTPOST is a persistent installation. A hot sensor node at 100 Hz generates 6,000 events/min. Its per-cluster sequence number wraps at \(\approx\) 497 days. On Day 498, the counter wraps to low values and the fusion node — without wrap-detection — treats all post-wrap events as causally prior, potentially overwriting 497 days of current state with stale data.

Stale Vector Counter (SVC) quarantine. During Phase 1 fingerprint exchange, apply sequence-number arithmetic (RFC 1323 Sec. 3 [15] — the same technique TCP uses) to each cluster entry:

A counter that appears to have decreased by more than \(2^{31}\) is flagged as a suspected wrap. Response mirrors Definition 63 : (1) enter read-only mode for node \(k\)’s state; (2) broadcast a counter-epoch request — any peer whose last-known sequence for \(k\) is near \(2^{32}\) (within \(2^{29}\)) confirms the wrap; (3) if quorum confirms, increment epoch counter \(e_k\) and re-order events as \((e_k, V[k])\) tuples; (4) exit quarantine when all entries are epoch-consistent with the quorum.

Deployment decision: for OUTPOST, either (a) use 8-byte sequence counters — wraps at years, problem eliminated; or (b) deploy SVC quarantine with 4-byte counters and accept the minor coordination cost on Day 497. For RAVEN, 4-byte counters are sufficient at all mission-relevant timescales.

Reconnection Storm Mitigation

Definition 70 assumes bandwidth \(B\) is known a priori and the link remains stable throughout the sync window. In practice, the first opportunistic uplink after a long Weibull partition is marginal — MANET RF at the edge of coverage, or a satellite bounce during a brief atmospheric window [16] . Transmitting the full CRDT state before measuring the link saturates the channel before any critical state is exchanged.

Four extensions close this gap in sequence:

  1. A formal delta-state CRDT ( Definition 73 ) reduces payload size.
  2. A pre-sync bandwidth probe ( Definition 74 ) measures \(\hat{B}\) before Phase 1 of Definition 70 .
  3. A QoS byte budget ( Definition 75 ) translates \(\hat{B}\) into guaranteed tier allocations.
  4. An exponential backoff condition ( Proposition 54 ) detects link saturation mid-sync and pauses before destabilizing the uplink.

Mass Reconnection Protocol

When the fraction of fleet nodes reconnecting within a single \(T_\text{gossip}\) window exceeds \(f_\text{mass} = 0.30\) (illustrative value), the individual quarantine protocol is suspended. Instead: (1) the highest-reputation node that remained continuously connected throughout the partition serves as the reference-state node; (2) all returning nodes execute a single coordinated reconciliation round against the reference state; (3) individual quarantine applies only to nodes whose divergence exceeds \(\delta_\text{max}\) after the coordinated round. If no node remained continuously connected, the fleet executes a coordinated round using the node with the most recent non-partitioned timestamp as the reference.

Reference-node staleness validation: the reference node’s CRDT state may itself be stale relative to what fleet consensus would have been during the partition — it received no updates from the partitioned group. After the coordinated reconciliation round, any returning node whose merged state differs from the reference node’s state by more than \(\delta_\text{sanity}\) must flag this discrepancy in the Post-Partition Audit Record. If a majority of returning nodes flag the reference node as divergent, the reference node’s reputation score is reduced by \(\Delta r_\text{ref} = 0.1\) (illustrative value) and a fresh quorum confirmation round is initiated.

Definition 73 (Delta-State CRDT Mutant). For a join-semilattice \((S, \sqcup)\) (Definition 58), the delta mutant for operation \(m\) applied at state \(x \in S\) is the minimal sub-state satisfying:

The delta mutant extracts the minimal lattice sub-state produced by one mutation, enabling transmission of only the changed portion rather than full state. Delta groups accumulated since the last sync epoch prevent full-state retransmission regardless of how few fields changed: for CONVOY at \(f \approx 5\%\) (illustrative value) update fraction, this yields a \(20\times\) bandwidth reduction per sync cycle. Note: \(\delta_m^x\) here denotes the delta-state structure; \(f\) is the sparse-update fraction, distinct from the state divergence \(D\) in Definition 11. Individual deltas still carry full message header overhead — batching multiple mutations per transmission is required for the reduction to pay. In Shapiro et al.’s dot-kernel representation, each lattice element carries a dot — a pair \((i, e)\) where \(i\) is the node identifier and \(e\) is a per-node event counter. The delta mutant \(\delta_m^x\) produced by mutation \(m\) at node \(i\) corresponds to generating exactly one new dot \((i, e)\): the minimal sub-state containing that dot and no others. The delta group is then the dot store — the join of all dots generated at node \(i\) since epoch \(e\). This correspondence makes causal context explicit in every delta transmission without requiring a full vector clock per message: the receiver can reconstruct which dots it is missing from alone, without exchanging an \(|N|\)-dimensional clock vector.

Physical translation: Only the changed portion of the CRDT state is transmitted, not the full state. A vehicle that updated 500 of its 10,000 position-history entries during a partition transmits those 500 entries as a delta group — not all 10,000. The semilattice join guarantees the peer can merge the delta group into its own state correctly without receiving the unchanged 9,500 entries.

Notation note. \(\delta_m^x\) denotes the delta-state CRDT structure. This is distinct from the staleness decay function \(\delta(t_{\text{stale}})\) defined in Self-Healing Without Connectivity; the shared \(\delta\) symbol is disambiguated by subscript.

The delta group accumulated since sync epoch \(e\) is the join of all delta mutants produced by mutations \(m_1, m_2, \ldots\) applied at states \(x_1, x_2, \ldots\) since \(e\):

Three properties follow from the semilattice structure: correctness ( for any ), a size bound ( always), and sparse reduction (for updates covering fraction \(f\) of \(S\), ).

Definition 70 Phase 2 implicitly constructs via vector-clock comparison; Definition 73 makes the delta-state structure explicit, enabling MCU-local delta storage without retaining the full join history.

CONVOY calibration: 10,000-entry CRDT state ( ); 500 entries updated during a 6.7 hr partition (\(f = 0.05\)). Definition 73 reduces Phase 2 payload from 640 KB to — a \(20\times\) reduction, shifting from a 20-second sync at 250 kbps to under 1 second.

Definition 74 (Link Bandwidth Probe). A bandwidth probe fires at link detection, completing before Definition 70 Phase 1. The probe consists of a single request-response pair: transmit a probe packet of bytes (default: 512 bytes), record . The instantaneous bandwidth estimate is:

This estimates current bandwidth from a 512-byte (illustrative value) probe with \(\alpha = 0.25\) (illustrative value) EMA smoothing before each reconnection sync session, preventing link saturation from transmitting a full Merkle fingerprint over a marginal link. If RTT exceeds 2 s (illustrative value) or \(\hat{B}\) falls below \(B_{\min} = 9.6\) kbps (illustrative value), the fallback sets \(\hat{B} = B_{\min}\) and restricts transmission to Service Level \(\mathcal{L}_1\) items only.

Updated via \(\alpha\)-exponential moving average across successive probe windows:

If (default: 2 s) or (default: 9.6 kbps): set and flag as marginal link — only Service Level items ( Definition 75 ) are transmitted. Probe cost: bytes, 1 RTT — negligible against any usable link.

Definition 75 (QoS Byte Budget). Note: ‘service level’ here refers to capability-level tiers ( from Self-Healing Without Connectivity), not the authority-tier concept (\(Q_j\) from Definition 68). Given probed bandwidth \(\hat{B}\) and connectivity window \(T_W\), the total byte budget (fingerprint overhead pre-deducted per Definition 70 Phase 1) is:

The usable byte budget for the sync window is computed after subtracting two-way framing overhead and partitioned by service-level reservation: \(\alpha_1 = 0.50\) (illustrative value), \(\alpha_2 = 0.25\) (illustrative value), \(\alpha_3 = 0.15\) (illustrative value), \(\alpha_4 = 0.10\) (illustrative value). Unused budget from service level \(k\) cascades to service level \(k+1\), ensuring Service Level \(\mathcal{L}_1\) critical state cannot be crowded out by large lower-priority deltas.

Note: ‘service level’ here refers to the capability level ( ) that reserves this bandwidth allocation — distinct from the decision-scope authority tier ( ) of Definition 68 .

The budget is partitioned into service-level-reserved allocations:

Service LevelContentReservation \(\alpha_k\)Guaranteed bytes
\(\mathcal{L}_1\) (safety/control) — \(\mathcal{L}_0\) criticalThreat vectors, node liveness\(\alpha_1 = 0.50\)\(0.50\,\Omega\)
\(\mathcal{L}_2\) (mission) — MissionPosition, objectives\(\alpha_2 = 0.25\)\(0.25\,\Omega\)
\(\mathcal{L}_3\) (operational) — OperationalSensor readings, health\(\alpha_3 = 0.15\)\(0.15\,\Omega\)
\(\mathcal{L}_4\) (telemetry) — \(\mathcal{L}_4\) telemetryLogs, timestamps\(\alpha_4 = 0.10\)\(0.10\,\Omega\)

Unused Service Level allocation cascades to : if , the surplus is added to the Service Level budget.

Critical state receives its floor regardless of link quality. At (illustrative value) with \(T_W = 5\,\text{s}\) (illustrative value), (theoretical bound under illustrative parameters) and (theoretical bound under illustrative parameters) is reserved for threat vectors and liveness — sufficient for CONVOY ’s 2 KB (illustrative value) Service Level delta.

Physical translation: Safety-critical messages always get their 50% slice of the byte budget, even when bandwidth collapses to the minimum 9.6 kbps link. At that floor, a 5-second window yields only 6 KB total — but 3 KB is hard-reserved for threat vectors and node liveness. Audit logs and telemetry queue for the next window. The service-level tiers work like lanes on a highway: Tier 1 owns the fast lane and can never be squeezed out by a bulk reconciliation burst from OUTPOST or a routine telemetry flush from CONVOY.

Self-throttling interaction: Nodes already in self-throttle mode (see Self-Healing Without Connectivity) operate within a reduced quota tier; the QoS byte budget is applied to the already-throttled resource budget, not the nominal capacity. Concretely, a node in self-throttle mode has its effective \(\hat{B}\) capped at the self-throttle transmit ceiling before the service-level allocation is computed — Service Level still receives its \(\alpha_1\) fraction, but of the reduced budget.

Budget starvation: When the total byte budget falls below the Service Level minimum reservation ( ), the node enters communication blackout: only L0 hardware-level heartbeats (\(\leq 8\) bytes/s) are permitted. All autonomic sync is suspended until for two consecutive probe intervals.

Proposition 54 (Sync Stability Bound). During Definition 70 Phase 3, monitor ACK round-trip time continuously.

Exponential backoff prevents CONVOY vehicles from flooding a marginal link at reconnection — safety-critical state already committed before the first backoff fires.

A saturation event is declared when \(\mathrm{RTT}(t) > \beta \cdot \mathrm{RTT}_{\text{probe}}\), with \(\beta = 2.0\) (threshold — requires moderate-variance MANET link; calibrate from channel measurements):

On saturation event \(n\) (\(n = 0, 1, 2, \ldots\)), pause transmission for , then resume from the first un-ACKed item (safe because Definition 70 items are self-contained). The total sync time satisfies:

This bounds total sync duration including exponential backoff overhead from link saturation events, where \(T_0 = 100\) ms is the base backoff interval and \(N_{\text{sat}}\) counts saturation events per session. The bound must fit within the available reconnection window; persistent link degradation is distinguished from transient saturation by the number of consecutive saturation events rather than by a fixed timeout.

where and is the number of saturation events. Service Level safety invariant: Service Level items transmit in the first seconds; since \(T_1 < T_0\) for all CONVOY -class links (\(T_1 = 0.06\,\text{s}\), \(T_0 = 0.1\,\text{s}\)), Service Level completes before the first backoff period can fire. The first backoff delays only Service Level and below.

Empirical status: The saturation threshold \(\beta = 2.0\) and base backoff \(T_0 = 100\) ms are tunable; the \(\beta\) value should be calibrated from channel measurements — wireless MANET links can exhibit RTT spikes from multipath that are not true saturation events.

Proof sketch. Each backoff period \(T_0 \cdot 2^n\) is bounded. The geometric sum gives total backoff overhead. The Service Level safety invariant follows from the ordering guarantee of Definition 75 (Service Level transmitted first) and , so Service Level finishes well before the first backoff fires at . \(\square\)

CONVOY calibration ( (illustrative value), \(T_W = 5\,\text{s}\) (illustrative value), (illustrative value) marginal link): maximum tolerable saturation events before window expires — (theoretical bound under illustrative parameters). After 5 (illustrative value) saturation events, total backoff overhead is (theoretical bound under illustrative parameters). Service Level and state (16 KB (illustrative value) at 50 kbps (illustrative value) \(= 2.6\,\text{s}\) (theoretical bound)) is already committed; audit logs queue for the next window.

Watch out for: the saturation threshold \(\beta = 2.0\) is calibrated for MANET links with moderate multipath variance; wireless links in indoor or urban environments exhibit RTT spikes from transient reflections that are not true saturation events — at RAVEN’s update rate, 5 false backoffs waste 3.1 seconds of a 5-second window, effectively preventing all but tier-1 data from transferring even though the link is not actually saturated.

Physical translation: The exponential backoff \(T_0 \cdot 2^n\) prevents sync storms during partition recovery. Without backoff, a congested link would cause both vehicles to retransmit simultaneously, collide, retransmit again, and lock the channel. With backoff: after the first saturation event the sender pauses 100 ms, after the second 200 ms, after the third 400 ms. The total penalty for 5 saturation events is only 3.1 seconds — well inside the 5-second window, and Service Level critical state was already committed in the first 60 ms before any backoff could fire.

Cognitive Map: Reconnection protocols address three overlapping challenges: identifying what diverged (Merkle trees in \(O(k \log n)\)); transferring only the delta (Delta-Sync with QoS byte budget); and ordering ambiguous concurrent updates without NTP (Hybrid Logical Clocks). HLC is the linchpin: it provides causal ordering that survives partition and re-sync by combining physical and logical timestamps — LWW-Register becomes correct only with HLC, not wall-clock. The Drift-Quarantine Re-sync Protocol handles the case where clocks have drifted too far for HLC to bridge; Proposition 46 bounds the correctness condition. Delta-Sync’s QoS service levels ensure safety-critical state (Service Level ) completes before the first backoff event can fire. Next: even with correct reconciliation, some partition actions conflict at the semantic level and require arbitration rather than merge.


Handling Actions Taken During Partition

Physical actions cannot be “merged” logically. If Cluster A drove north and Cluster B drove south, they cannot merge to “drove north and south simultaneously.”

Classification of partition actions:

Complementary actions: Both clusters did useful, non-overlapping work.

Redundant actions: Both clusters did the same work.

Conflicting actions: Actions are mutually incompatible.

The table below maps each action type to its detection signature and the appropriate resolution action at reconnection.

TypeDetectionResolution
ComplementaryNon-overlapping scopeAccept both; update state
RedundantIdentical scope and actionDeduplicate; note inefficiency
ConflictingOverlapping scope, different actionFlag for review; assess damage

Audit trail: All partition decisions must be logged with:

Post-mission review uses audit trail to:


Conflict-Aware Arbitration Layer

CRDTs guarantee that state converges after reconnection — but convergence describes what happened, not whether what happened was efficient. Two clusters that each expend the only EOD-capable platform solve the same problem twice and leave the fleet with nothing in reserve. The action-classification table above identifies redundant outcomes but does not prevent them. Prevention requires a coordination mechanism that operates before commitment — during the partition itself, before both clusters commit the same finite resource. The constraint is severe: no leader exists (leadership is an emergent property of connectivity that may be absent), and available bandwidth may be zero. Each node computes a claim probability from local health and divergence, samples once, and either emits an intent token or abstains — no leader required. The mechanism is probabilistic and reduces but does not eliminate redundancy and omission; under Denied connectivity the intent token cannot propagate and nodes fall back to uncoordinated baseline. Fleet size is the primary lever: doubling the cluster size halves per-node claim probability at the same fleet-level target.

The Double-Spend Problem

Return to CONVOY at the mountain pass. Before the two clusters lost contact, both knew that Sector 7 needed EOD clearance and that the convoy has exactly one EOD-capable platform (Vehicle 4). During the 47-minute partition, Cluster Alpha (Vehicles 1–6) assessed Sector 7 as the primary threat, dispatched Vehicle 4 north, and Vehicle 4 expended its full kit in 8 controlled detonations — a kit that cannot be restocked until the base resupply depot is reached. Cluster Bravo (Vehicles 7–12) independently reached the same threat assessment, attempted to dispatch Vehicle 4, found it absent from the local mesh, concluded it had self-tasked, and proceeded to Sector 7 via an alternate route deploying their own improvised countermeasures at \(3\times\) the time cost.

At reconnection, the CRDT merge completes correctly: both clusters’ observations are reconciled, Vehicle 4’s kit-expended state is propagated, and the sector-cleared event is logged. State is consistent. But the resource is gone, and the improvised countermeasures cost 47 additional minutes.

This is the double-spend problem. CRDT s guarantee convergence of what happened — they cannot prevent what was done from being wasteful. A coordination layer is needed that operates before commitment.

Quantifying the Risk: Redundant-Omission Loss Function

Definition 76 (Redundant-Omission Loss Function). Let task \(\tau_k\) have mission value \(v_k > 0\) and finite resource cost \(c_k > 0\). Define the per-task loss:

Notation: \(\alpha\) here denotes the waste coefficient (marginal cost of redundant resource expenditure); \(\beta\) denotes the omission penalty (marginal cost of failing to complete the task). These are distinct from \(\alpha\) as a utility loss-cost slope in Why Edge Is Not Cloud Minus Bandwidth, as an EMA smoothing weight in Self-Measurement Without Central Observability, and as a Pareto tail index in Self-Healing Without Connectivity. Full series notation registry: Notation Registry.

where \(\alpha > 0\) is the waste coefficient (marginal cost of expending a resource unnecessarily) and \(\beta > 0\) is the opportunity coefficient (marginal cost of failing to complete the task). The coordination target \(p^* \in (0, 1)\) minimises expected fleet-wide loss per task:

This gives the per-node claim probability minimizing total fleet-wide cost by balancing redundancy waste (\(\alpha\)) against omission loss (\(\beta\)) per task tier. Uniform-probability allocation treats critical telemetry identically to low-priority logs; the tiered formula corrects this by scaling \(p^*\) with the per-tier cost ratio \(\beta v_k / (\alpha c_k + \beta v_k)\). In the MULTIWRITE scenario, tiered \(p^*\) reduced redundant writes by 40% (illustrative value) while keeping critical-tier omission probability below 0.1% (illustrative value).

Physical translation: \(p^*\) is the mission-value-weighted fraction of total expected cost attributable to omission. If omitting the task costs five times more than wasting the resource (\(\beta v_k = 5 \alpha c_k\)), then \(p^* = 5/6 \approx 0.83\) — act eagerly, redundancy is cheap. If the resource is irreplaceable and the task can be mitigated (\(\alpha c_k = 9 \beta v_k\)), then \(p^* = 1/10\) — be conservative, only claim when confident. The formula converts a policy judgment (how bad is waste vs. omission?) into a dimensionless probability that every node can compute independently from its local cost estimates.

Interpretation. When — high-value task, cheap or abundant resource — \(p^* \to 1\): both clusters should attempt the task and accept the possibility of redundancy. When — expensive or irreplaceable resource, low-value task — \(p^* \to 0\): be conservative and risk omission rather than waste. For CONVOY ’s EOD platform, : the kit is irreplaceable mid-mission, and Sector 7 can be partially mitigated by other means. The correct \(p^*\) is low — closer to 0.3 than 0.9.

The coefficients \(\alpha\) and \(\beta\) are fleet-wide policy parameters calibrated from post-partition audit history ( Definition 78 ). During partition, each node uses its cached pair — stale estimates are safe to use because \(p^*\) is bounded in \((0, 1)\) regardless of coefficient drift.

Probabilistic Reservation: Conflict-Aware Claim Probability

Definition 77 (Conflict-Aware Claim Probability). Let node \(i\) have health score \(H_i(t) \in [0,1]\) (from the gossip health vector of Definition 24) and local divergence estimate \(D_i(t) \in [0,1]\) (a per-node scalar approximation of Definition 57’s pairwise metric, computed against the node’s last-known fleet consensus snapshot). The claim probability for task \(\tau_k\) is:

Physical translation: \(P(a_i \mid \tau_k)\) multiplies three independent gates. \(p^*\) sets the fleet-wide target — how often, across all nodes, should this task be claimed. \(H_i(t)\) scales that target down for unhealthy nodes — a drone at 30% health should not be committing scarce resources. \((1 - D_i(t))\) scales it down further for nodes with stale state — a cluster that has been partitioned for three hours knows less about current fleet posture than one partitioned for five minutes. All three factors must be non-negligible for a node to claim.

Claim mechanics. Before committing resource \(c_k\) to task \(\tau_k\), node \(i\):

  1. Computes \(P(a_i \mid \tau_k)\) from local state.
  2. Samples .
  3. If \(u > P(a_i \mid \tau_k)\): abstains. Task may be claimed by a healthier or lower-divergence node.
  4. If : emits an intent token via the gossip protocol.
  5. Any node receiving a prior intent token for \(\tau_k\) before committing resource \(c_k\) cancels its pending claim and does not commit. Under connected conditions, this window corresponds to ( Proposition 12 ); under Denied regime the token may never arrive, in which case the node proceeds at its own \(P(a_i \mid \tau_k)\) risk.
  6. Node \(i\) commits resource \(c_k\) only after expires without receiving a counter-token.

No-leader guarantee. Each node decides independently using only local state \((H_i(t),\, D_i(t))\) and the broadcast gossip record. No coordinator is required: the probability function itself is the coordination mechanism. During Denied connectivity ( Definition 6 ), and the intent token cannot propagate — the protocol degrades gracefully to the uncoordinated baseline, which the loss function already accounts for.

Byzantine resistance. Intent tokens are gossip-propagated and subject to the reputation-weighted admission filter of Definition 58b (Stage 1). A compromised node cannot reliably suppress a legitimate intent token: doing so requires controlling a quorum of the claimant’s \(k\) nearest gossip neighbors — the same threshold as Proposition 47 .

Self-suppression under degradation. As \(H_i(t) \to 0\) (node is failing) or \(D_i(t) \to 1\) (node has severely divergent state), . Degraded nodes automatically yield resource commitments to healthier peers — the same gradient that governs healing action gating ( Proposition 29 ) now governs resource stewardship.

Proposition 55 (Claim Collision Bound). For \(n\) independent nodes each claiming task \(\tau_k\) with probability \(P(a)\), the probability of omission (no node claims) and the expected number of redundant claims (claims beyond the first) are:

Setting each CONVOY cluster’s per-node claim probability to \(p^*/n\) makes the EOD platform claimed by exactly one cluster 74% of the time — intentionally conservative for an irreplaceable resource.

This computes the probability all \(n\) nodes independently skip a task, which must be below mission tolerance for each tier. Fleet size \(n\) is the dominant lever: at fixed claim probability \(P(a) = 0.5\) (illustrative value), a fleet of \(n = 12\) (illustrative value) achieves omission probability \(\approx 0.02\%\) (illustrative value); adding 2 nodes (illustrative value) to a 5-node fleet (illustrative value) reduces omission probability by 75% (theoretical bound under illustrative parameters) without changing \(P(a)\).

The fleet-optimal claim probability — distributing \(p^*\) across \(n\) peers — sets , which for small \(p^*\) approximates \(P(a) \approx p^* / n\). At this threshold, the probability that at least one node claims \(\tau_k\) equals \(p^*\), achieving the inter-cluster coordination target under the independent-node assumption.

Proof sketch. Model claims as independent Bernoulli trials. by independence. Expected claims above 1 by substitution. The fleet-optimal \(P(a)\) follows from a target-probability argument: setting gives \((1-P(a))^n = 1-p^*\), hence . This distributes the inter-cluster action target \(p^*\) across \(n\) symmetric peers. The small-\(p^*\) approximation \(P(a) \approx p^*/n\) follows from the first-order Taylor expansion of . \(\square\)

Empirical status: The CONVOY calibration uses \(p^* = 0.30\) derived from the mission value ratio \(\alpha c_k / \beta v_k\) for the EOD platform scenario; this coefficient ratio must be calibrated per resource class and updated via post-partition audit records ( Definition 78 ) after field exercises.

CONVOY calibration. With \(n = 6\) (illustrative value) peers in each cluster and \(p^* = 0.30\) (illustrative value) for the EOD platform, the fleet-optimal per-node claim probability is \(P(a) \approx 0.30 / 6 = 0.05\) (theoretical bound under illustrative parameters). Expected redundant claims: (theoretical bound under illustrative parameters). Expected omission probability: \(0.95^6 \approx 0.74\) (theoretical bound under illustrative parameters). At \(p^* = 0.30\) (illustrative value), omission is the intended outcome \(74\%\) (theoretical bound under illustrative parameters) of the time — only one vehicle in the cluster should claim the EOD platform, not both clusters simultaneously.

Watch out for: the independence assumption fails when nodes share a common trigger — a simultaneous contact event that all nodes in a cluster observe will cause correlated claiming decisions if \(P(a)\) is set as a function of shared fleet state; at \(P(a) = 1.0\) (every node claims on contact), the collision bound returns \(n - 1\) redundant claims, not the \(n \cdot P(a) - 1\) result from the independent-node formula.

Post-Partition Auditing and Fleet Policy Update

Definition 78 (Post-Partition Audit Record). Upon reconnection, each cluster emits an audit record:

where \(\tau_k\) is the task identifier, indicates whether cluster \(i\) committed the resource, is the resource consumed, and \(t_k\) is the HLC-stamped commitment time ( Definition 61 ). Audit records from all reconnecting clusters are merged via a CRDT G-Set (grow-only set — commutative, associative, idempotent), converging at the first reconnection regardless of merge order.

Trigger: Audit record generation is automatic — triggered by the first successful gossip exchange after partition end, before any CRDT merge operations begin. Operator review is required within 24 hours for any audit flagging claim collisions above the Proposition 55 bound.

Outcome classification. The merged audit identifies, for each task, three mutually exclusive outcomes:

OutcomeCondition update update
CleanExactly one cluster actedNoneNone
RedundantBoth clusters acted ( )IncrementNone
OmissionNeither cluster acted ( )NoneIncrement

Coefficient update. After each audit cycle the fleet updates its policy pair using exponential smoothing with learning rate \(\eta \in (0, 1)\) (\(\hat{\alpha}\) here is the learned redundancy fraction — a policy coefficient distinct from all uses of \(\alpha\) in Self-Healing Without Connectivity: the EMA smoothing factor, CBF scheduling parameters, QoS tier reservations, and anti-fragility convexity coefficient all use \(\alpha\) with different subscripts; \(\hat{\alpha}\) with a hat denotes the fleet-coherence learned state here):

Physical translation: Each formula is an exponential moving average over audit outcomes. tracks the fraction of the resource budget consumed redundantly — if 30% of this partition’s resource spend was duplicated work, drifts toward 0.30. tracks the fraction of task value that went unserved — if 15% of mission value was left uncaptured, drifts toward 0.15. The learning rate \(\eta\) controls how fast historical patterns are forgotten: small \(\eta\) (0.05 (illustrative value)) gives stable policy but slow adaptation; large \(\eta\) (0.3 (illustrative value)) reacts quickly to regime changes but oscillates under noise.

The updated pair is propagated to all nodes as an LWW-Register CRDT , arbitrated by HLC timestamp ( Definition 62 , Proposition 45 ). This write requires L4 capability (full fleet integration): the policy update is a fleet-wide commitment that should not be issued by a single disconnected cluster. Clusters reconnecting below L4 (Degraded or Intermittent regime) queue the pending policy write locally; it executes when is re-established.

No-leader guarantee. The G-Set audit merge is commutative, associative, and idempotent — any node with reconnection can initiate it, in any order, with any subset of peers. The LWW-Register policy update is arbitrated by HLC, not by a designated writer. Simultaneous policy updates from two reconnecting clusters resolve deterministically via Proposition 45 without a coordinator. Neither the audit merge nor the policy update requires a leader at any step.

Connection to the CONVOY protocol. The CONVOY Coherence Protocol in the next section demonstrates the overall reconciliation sequence. The arbitration layer operates before that sequence: it governs which resources were committed during partition. The audit record is an input to post-partition reconciliation — it feeds the “Handling Actions Taken During Partition” classification table above, replacing the informal “note inefficiency” entry for redundant actions with a structured feedback loop that updates \(p^*\) for the next partition.

Cognitive Map: The arbitration layer solves a coordination problem without a coordinator. The loss function ( Definition 76 ) converts policy values into a claim probability \(p^*\). The claim probability ( Definition 77 ) gates each node’s commitment attempt using local health and divergence. The collision bound ( Proposition 55 ) proves the fleet-level omission and redundancy rates that result. The audit record ( Definition 78 ) closes the loop — post-partition outcomes update and , which shift \(p^*\) for the next partition. The system learns to be less redundant when waste is costly and less conservative when omissions hurt.


CONVOY Coherence Protocol

Two vehicle groups execute separate missions for 45 minutes with no shared communication; each makes locally valid decisions — different routes, different threat assessments. When they reconnect, their states conflict on facts that can no longer be undone. The reconnection protocol proceeds in six phases: exchange Merkle roots, identify divergences, share divergent data, merge states, verify consistency, and resume coordinated operation. Irreconcilable physical decisions (position, route already traveled) are accepted as fait accompli; divergent knowledge is merged. The protocol recovers information coherence but cannot undo resource expenditure or physical commitment. Pre-agreed routing rules for partition scenarios reduce the probability of irreconcilable route conflicts in future partitions.

Return to the CONVOY partition at the mountain pass.

State During Partition

Forward group (vehicles 1-5):

Rear group (vehicles 6-12):

State divergence :

Reconnection at Mountain Base

Radio contact restored as both groups clear the mountain pass. Vehicle 1 and Vehicle 6 exchange Merkle roots — mismatch detected immediately. Targeted comparison identifies three divergences: route plan, position, and intel. Forward group shares Route B’s successful traverse; rear group reveals the bridge is actually intact; both exchange full threat intel received during partition.

Merge: Intel reconciliation marks bridge status UNCERTAIN (conflicting regional command reports), then updates to INTACT from rear group visual confirmation. Route B marked UNCERTAIN from initial report, then updated to PASSABLE from forward group traverse. Route decisions are physically irreversible — both groups made valid L2 calls. Resolution: accept current positions and converge at km 95 junction.

Both groups confirm unified intel and acknowledge routes are fait accompli. Forward group continues on Route B; rear group continues to the bridge; convoy reunifies at km 95.

Lessons Learned

Regional command and the forward group gave conflicting information — neither fully accurate — demonstrating the need to assign confidence weights to intel sources. Route decisions that execute cannot be undone, making it essential to pre-agree routing rules for known partition corridors. Km 47–105 is now a confirmed radio shadow and should be flagged for future transits. Finally, the partition confirmed that Vehicles 7–12 operated effectively for 45 minutes under local lead authority, validating the L2 delegation model.

The fleet emerges from partition with richer knowledge than either cluster held independently — the protocol’s goal is knowledge coherence, not history revision.


RAVEN Coherence Protocol

Three drone clusters operate independently under terrain and jamming-induced partition. Each cluster observes different zones, detects different threats, and loses one drone to a collision; when they reconnect, their observation sets may overlap — two clusters may have independently detected the same physical threat. Coverage and threats merge as G-Sets (union, no conflicts possible for distinct observations); health merges as LWW-Registers (latest wins). For potentially duplicate observations, entity resolution uses confidence-weighted position averaging and the complementary-probability confidence update. Entity resolution requires an explicit similarity threshold \(\theta\): set it too high and genuine matches are missed (duplicate threat entries); set it too low and distinct threats are collapsed (false identity merge). The independence assumption in confidence fusion fails when sensor models share correlated errors.

The RAVEN swarm of 47 drones experiences partition due to terrain and jamming, splitting into three clusters.

State During Partition

Cluster A (20 drones, led by Drone 1):

Cluster B (18 drones, led by Drone 21):

Cluster C (9 drones, led by Drone 40):

Reconnection as Swarm Reforms

Clusters gradually reconnect as jamming subsides.

Coverage merge (G-Set): The swarm’s total surveyed coverage after reconnection is the set union of all zones covered by each cluster independently during partition.

Simple union. No conflicts possible.

Threat merge: The swarm’s unified threat picture is the union of threats detected by each cluster; since T1 and T2 were observed at distinct positions, no deduplication is needed.

Union of detected threats. No conflict - different threats at different positions.

Health merge:

Each drone’s health is LWW-Register. Latest observation wins.

Coherence challenge: What if Cluster A and B both detected threats near zone W boundary?

Entity resolution: Compare threat attributes.

AttributeCluster A (T1)Cluster B (T3)
Position(34.5102, -118.2205)(34.5114, -118.2193)
Time offsetFirst observation+2.5 minutes
SignatureVehicle, moving NEVehicle, moving NE

Position difference: 170 meters. Time difference: roughly 2.5 minutes. Same signature. Likely same entity observed from different angles at different times.

Resolution: Merge into single threat T1 with combined observations:

The merged position is the confidence-weighted average of the two observed positions, where \(c_A\) and \(c_B\) are each cluster’s observation confidence scores and \(p_A\), \(p_B\) are the corresponding position vectors.

Physical translation: The merged position is a weighted average that pulls toward whichever cluster’s observation was more confident. If Cluster A had confidence 0.9 and Cluster B had confidence 0.3, the merged position is \(0.9/(0.9+0.3) = 75\%\) of the way toward Cluster A’s reading — not a simple midpoint. High-confidence observations dominate; low-confidence observations contribute proportionally.

Where \(c\) is confidence and \(p\) is position.

Entity Resolution Formalization

For distributed observation systems, entity resolution is critical. Multiple observers may detect the same entity and assign different identifiers.

Observation tuple: \((id, pos, time, sig, observer)\)

Match probability: Given two observations \(o_1\) and \(o_2\), the probability they describe the same physical entity is a function \(f\) of the distance between their positions, the gap between their timestamps, and the similarity between their sensor signatures .

Where is signature similarity function.

Merge criteria: If , merge observations. Otherwise, keep as separate entities.

Confidence update: Merging two independent observations of the same entity raises the combined confidence using the complementary-probability rule: is 1 minus the probability that both observers were simultaneously wrong.

Physical translation: is the probability that at least one of the two observers was correct — the complement of both being wrong simultaneously. If each observer has a 20% error rate (confidence 0.8), the merged confidence is \(1 - (0.2)(0.2) = 0.96\) — two independent eyes are much harder to fool than one. The formula assumes errors are uncorrelated; the note below explains when that assumption breaks.

Note: This assumes observation errors are independent across nodes. If both observers share the same sensor model or environmental bias (e.g., both use identical LIDAR firmware with the same calibration error), confidence-weighted averaging amplifies the shared error rather than averaging it out. Use cross-sensor validation (Self-Measurement Without Central Observability, Proposition 16 ) to detect and correct correlated errors before entity resolution.

Cognitive Map: RAVEN’s reconvergence demonstrates CRDT semantics in practice. Coverage and threat sets merge as G-Sets — union is always correct because coverage is monotonically additive. Health merges as LWW-Registers — latest timestamp wins per drone. Entity resolution handles the hard case: same physical threat, two local identifiers. Confidence-weighted averaging fuses the position observations; the complementary-probability rule fuses the confidence scores. The correlated-error warning is the critical constraint: independent sensors are powerful; sensors with shared firmware or calibration flaws are not independent.


OUTPOST Coherence Protocol

A 127-sensor perimeter mesh may be partitioned for days. Each sensor makes local detection decisions without access to the fusion node; when bandwidth-constrained reconnection arrives, transmitting full state is infeasible and detection events from the partition window are unreconciled. The approach stratifies state by priority and assigns CRDT semantics per tier: G-Set union for detection events, LWW-Register for sensor health, confidence-weighted coverage maps. Delta encoding reduces bandwidth by transmitting only state changes since the last sync; Merkle roots verify completeness without full transmission. Proposition 56 shows divergence grows with both event rate and partition duration — deploying more fusion nodes (increasing \(k\)) shrinks the sensor-to-fusion ratio and bounds divergence, but each fusion node adds hardware cost and maintenance burden.

The OUTPOST sensor mesh faces distinct coherence challenges: ultra-low bandwidth, extended partition durations (days to weeks), and hierarchical fusion architecture.

State Classification for Mesh Coherence

OUTPOST state partitions into categories with different reconciliation priorities:

State TypeUpdate FrequencyReconciliation StrategyPriority
Detection eventsPer-eventUnion with deduplicationHighest
Sensor healthPer-minuteLatest-timestamp-winsHigh
Coverage mapPer-hourMerge with confidence weightingMedium
ConfigurationPer-dayVersion-based with rollbackLow

Multi-Fusion Coordination

When multiple fusion nodes operate, they must coordinate coverage and avoid duplicate alerts:

The diagram highlights the overlap zone where Sensor 6 reports to both fusion nodes simultaneously — that shared coverage is where deduplication coordination between nodes is required.

    
    graph TD
    subgraph Zone_A["Zone A (Fusion A responsibility)"]
    S1[Sensor 1]
    S2[Sensor 2]
    S3[Sensor 3]
    end
    subgraph Zone_B["Zone B (Fusion B responsibility)"]
    S4[Sensor 4]
    S5[Sensor 5]
    end
    subgraph Overlap["Overlap Zone (shared responsibility)"]
    S6["Sensor 6
(reports to both)"] end subgraph Fusion_Layer["Fusion Layer"] F1[Fusion A] F2[Fusion B] end S1 --> F1 S2 --> F1 S3 --> F1 S4 --> F2 S5 --> F2 S6 --> F1 S6 --> F2 F1 <-.->|"deduplication
coordination"| F2 style Overlap fill:#fff3e0,stroke:#f57c00 style Zone_A fill:#e3f2fd style Zone_B fill:#e8f5e9

Overlapping coverage reconciliation: When sensors report to multiple fusion nodes:

Resolution rules: when the same event arrives with the same timestamp from both fusion nodes, deduplicate by event ID; when the same event arrives with different timestamps, use the earliest detection time; when assessments conflict, combine confidence scores and flag for review.

Long-Duration Partition Handling

OUTPOST may operate for days without fusion node contact. Special handling for extended autonomy:

Local decision authority: Each sensor can make detection decisions locally. Decisions are logged for later reconciliation.

Detection event structure for eventual consistency:

The flag tracks whether the event has been confirmed by fusion node. Unreconciled events are treated with lower confidence.

Bandwidth-efficient reconciliation: Given ultra-low bandwidth (often < 1 Kbps), OUTPOST uses compact delta encoding. The delta is the set difference between the current state and the state at the last successful sync point — only this incremental change is transmitted rather than the full state.

Only changed state transmits. Merkle tree roots validate completeness without transmitting full state.

Sensor-Fusion Authority Hierarchy

The three OUTPOST tiers form a strict authority containment chain, with each higher tier’s permitted actions fully including those of the tier below — authority over the outer network and policy does not flow down to individual sensors, and sensor-level detections do not directly authorize responses without passing through the fusion and uplink tiers.

Decision scopes are assigned by tier: sensor authority covers detection reporting, self-health assessment, and local alerts; fusion authority covers alert correlation, threat classification, and response recommendations; uplink authority covers response authorization, policy updates, and threat escalation.

During partition:

Proposition 56 ( OUTPOST Coherence Bound).

Deploying more fusion nodes is the primary lever for bounding OUTPOST divergence during multi-day partitions — doubling fusion node count halves expected unreconciled events.

For an OUTPOST mesh with \(n\) sensors, \(k\) fusion nodes, and partition duration \(T_p\), the expected state divergence is bounded by:

where \(\lambda\) is the event arrival rate and the factor \((n-k)/k\) reflects the sensor-to-fusion ratio.

Note: \(D_{\text{expected}}\) here represents the expected number of unreconciled state events (not the normalized divergence \(D \in [0,1]\) from Definition 57 ). For the normalized interpretation, apply the exponential form from Proposition 41 : , which bounds the fraction of nodes that diverge rather than the event count.

Empirical status: The linear divergence bound assumes uniform event rate \(\lambda\) across sensors; real OUTPOST deployments exhibit spatially clustered detection events (perimeter sectors vary in threat density), making the homogeneous \(\lambda\) assumption conservative in low-activity sectors.

In other words, divergence grows proportionally with both the event rate and the partition duration, and scales with how many sensors share each fusion node — deploying more fusion nodes (increasing \(k\)) shrinks the ratio and keeps divergence bounded even during long partitions.

Physical translation: \(\lambda \cdot T_p\) is the expected number of events that arrive at one sensor during the partition — the raw volume of unreconciled data. The factor \((n-k)/k\) is the sensor-to-fusion ratio minus 1: with 127 sensors and 3 fusion nodes, that ratio is 41. A 1-hour partition at 1 event/minute produces \(60 \times 41 = 2460\) expected divergence events. Deploying 6 fusion nodes instead of 3 cuts the bound in half; halving the event rate (coarser detection thresholds) cuts it in half again. Divergence has two independent control levers.

Watch out for: the linear bound \(\lambda \cdot T_p \cdot (n-k)/k\) assumes uniform event rate \(\lambda\) across all sensors; in perimeter deployments where event density varies by sector, a sector with 10× the mean event rate contributes 10× the expected divergence, and the bound should be computed per sensor class separately using the sector-specific rate rather than the fleet-wide mean.

Cognitive Map: OUTPOST’s coherence strategy is tiered by urgency. Detection events get highest priority and G-Set semantics — no event is ever lost, just deduplicated. Health state uses LWW-Register — latest wins because stale health reports are useless. Configuration changes are deferred until uplink authority reconnects. The bandwidth constraint makes delta encoding mandatory: the Merkle tree confirms completeness without transmitting what you already have. Proposition 56 gives the bound that tells you whether your fusion-node deployment is sufficient for your expected partition duration and event rate.


The Limits of Coherence

The coherence mechanisms described so far — CRDTs, Merkle reconciliation, authority tiers — assume conflicts can be resolved through merge functions. Some cannot: physical facts may be contradictory, resources may be permanently consumed, and nodes may have been destroyed before syncing. The right response is to accept that perfect coherence is unachievable. Stratify state by coherence requirement — safety-critical state demands high coherence, logging state tolerates eventual consistency. Apply Byzantine detection and isolation for adversarial cases; for irreconcilable conflicts, flag for human verification. The coherence-autonomy inverse relationship is a fundamental constraint, not an engineering failure: requiring consensus before action blocks the system during partition, and the architect’s task is choosing the minimum coherence needed per state tier, not maximizing coherence everywhere.

Irreconcilable Conflicts

Some conflicts cannot be resolved through merge functions or hierarchy.

Physical impossibilities: Cluster A reports target destroyed. Cluster B reports target escaped. Both cannot be true. The merge function cannot determine which is correct from state alone.

Resolution: Flag for external verification. Use sensor data from both clusters. Accept uncertainty if verification impossible.

Resource allocation conflicts: Cluster A allocated sensor drones to zone X. Cluster B allocated same drones to zone Y. The drones are physically in one place - but which?

Resolution: Trust current position reports. Update state to reflect actual positions. Flag allocation discrepancy for review.

Byzantine Actors

A compromised node may deliberately create conflicts:

Detection: Byzantine behavior often creates patterns:

Isolation: Nodes detected as potentially Byzantine :

  1. Reduce trust weight in aggregation
  2. Quarantine from decision-making
  3. Flag for human review

Byzantine -tolerant CRDT s exist but are expensive. Recent work by Kleppmann et al. addresses making CRDT s Byzantine fault-tolerant, but the overhead is significant. Edge systems often use lightweight detection plus isolation rather than full Byzantine tolerance.

Stale-Forever State

Some state may never reconcile:

Acceptance: Perfect consistency is impossible in distributed systems under partition and failure. The fleet must operate with incomplete history.

Mitigation: Redundant observation. If multiple nodes observe the same event, loss of one doesn’t lose the observation.

The Coherence-Autonomy Tradeoff

Perfect coherence requires consensus before action. Consensus requires communication. Communication may be impossible.

Maximum coherence means no action without agreement - the system blocks during partition. Maximum autonomy means action without coordination - coherence is minimal.

Edge architecture accepts imperfect coherence in exchange for operational autonomy. The question is not “how to achieve perfect coherence” but “how to achieve sufficient coherence for mission success.”

Sufficient coherence: The minimum consistency needed for the mission to succeed.

Engineering Judgment

When should the system accept incoherence as the lesser evil?

This is engineering judgment, not algorithmic decision. The architect must define coherence requirements per state type and accept that perfect coherence is unachievable.

Cognitive Map: The limits section reframes the problem. Irreconcilable conflicts are not architecture failures — they are physical facts (a target cannot be both destroyed and escaped). Byzantine actors require lightweight detection and isolation, not full BFT CRDT overhead. Stale-forever state is an acceptance criterion, not a bug. The coherence-autonomy tradeoff formalizes what every partition scenario demonstrated: maximum coherence blocks the system; maximum autonomy loses state. The architect’s only choice is where to stand on the inverse curve, per state tier.


Model Scope and Failure Envelope

Every coherence mechanism in this post relies on assumptions that may be violated in the field — CRDTs require eventual delivery, Merkle reconciliation requires sparse divergence, authority resolution requires available tie-breakers. Knowing the mechanism is not enough. Knowing when it breaks is equally important. For each mechanism, this section enumerates the validity domain and failure envelope: for each assumption, the failure mode, how it is detected, and what mitigates it. Treat the summary claim-assumption-failure table as a deployment checklist. Failure envelopes do not disappear — they expand or shift with mitigations. Cryptographic hashes reduce collision probability but add compute; Byzantine CRDTs add correctness guarantees but add overhead. The architect’s goal is to push the failure envelope outside the expected operating range, not to eliminate it.

Each mechanism has bounded validity. When assumptions fail, so does the mechanism.

CRDT Eventual Consistency

Validity Domain:

CRDT convergence is guaranteed only when updates reach all nodes eventually and the merge function is a true semilattice join; failures in either condition break convergence regardless of CRDT type.

where:

Failure Envelope:

Assumption ViolationFailure ModeDetectionMitigation
Permanent partitionClusters diverge foreverPartition duration > missionAccept divergence; human reconciliation
Merge function violationNon-deterministic convergenceSame inputs yield different outputsFormal verification of merge
Byzantine state corruptionInvalid state propagatesState invariant violationsAuthenticated updates; Byzantine CRDT

Counter-scenario: Node destroyed before synchronizing updates. Its updates are permanently lost. The fleet converges to state that excludes its contributions - potentially missing critical observations. Detection: membership protocol shows node loss before sync. Mitigation: critical updates require acknowledgment from multiple nodes.

Merkle Reconciliation

Validity Domain:

The \(O(k \log(n/k))\) complexity advantage over full-state transfer holds only when divergences are sparse relative to total state size; dense divergence collapses the advantage to \(O(n)\).

where:

Complexity Bound: \(O(k \log(n/k) + k)\) general case; \(O(\log n + k)\) when divergences are spatially concentrated ( Proposition 51 ).

Failure Envelope:

Assumption ViolationFailure ModeDetectionMitigation
Dense divergence (\(k \approx n\))\(O(n)\) complexityCompare \(k\) to \(n\)Full sync; skip Merkle
Unbalanced treeWorst-case \(O(n)\)Tree depth exceeds \(2\log n\)Rebalance triggers
Hash collision (unlikely)Missed divergenceIndependent verification failsCryptographic hash

Uncertainty bound: “Sparse divergence” means \(k/n < 0.1\). Above this threshold, Merkle advantage diminishes. For extended partitions with high update rates, expect \(k/n > 0.1\) and plan for linear reconciliation time.

Hierarchical Authority Resolution

Validity Domain:

Deterministic authority resolution requires that tie-breaker inputs exist, rules are identical on all nodes, and no node strategically reports different priorities to different peers.

where:

Failure Envelope:

Assumption ViolationFailure ModeDetectionMitigation
Tie-breaker unavailableCannot resolve authorityMissing required inputFallback rule; multiple tie-breakers
Rule inconsistencyDifferent nodes reach different conclusionsAuthority claim conflictsVersion-controlled rules
EquivocationSplit-brain persistsSame node, different claimsEquivocation detection; isolation

Counter-scenario: GPS denied environment - timestamp is tie-breaker. But clock drift during partition means timestamps are unreliable. Nodes may have conflicting “most recent” claims. Detection: clock skew exceeds threshold. Mitigation: use relative ordering ( vector clock s) instead of absolute time.

Summary: Claim-Assumption-Failure Table

ClaimKey AssumptionsValid WhenFails When
CRDTs guarantee convergenceEventual delivery, no ByzantineTemporary partitionPermanent partition; Byzantine
Merkle sync is \(O(k \log(n/k) + k)\)Sparse divergence, balanced treeBrief partitionExtended partition; skewed updates
Authority resolution is deterministicTie-breaker available, rules consistentGPS/time availableGPS denied; rule version mismatch
Conflict resolution is correctConflict rules capture semanticsWell-defined conflictsSemantic ambiguity; novel conflicts

Cognitive Map: The model scope section is a deployment-readiness checklist. CRDT validity requires three properties: eventual delivery, semilattice merge, and no Byzantine corruption. Merkle validity requires three: collision resistance, balanced tree, and sparse divergence. Authority resolution validity requires three: available tie-breakers, consistent rules, and no equivocation. The table at the end maps assumption violations directly to failure modes and mitigations — consult it before deploying any mechanism in a new environment or adversarial scenario.


Irreducible Trade-offs

Fleet coherence mechanisms solve specific problems within specific validity envelopes, but some tensions cannot be resolved — they are fundamental Pareto fronts where improving one objective unavoidably degrades another. Four irreducible trade-offs recur throughout: CAP (consistency vs. availability), speed vs. bandwidth, authority granularity vs. conflict rate, and state completeness vs. sync cost. For each, the multi-objective formulation enables principled Pareto choices rather than ad hoc compromises. The shadow price table converts these abstract trade-offs into concrete resource comparisons: conflict resolution (5.00 c.u./conflict) costs \(250\times\) more than state storage (0.02 c.u./KB-hr) — this ordering should drive architectural choices about which conflicts to prevent vs. accept.

No design eliminates these tensions. The architect selects a point on each Pareto front.

Trade-off 1: Consistency vs. Availability (CAP Theorem)

Multi-objective formulation:

The CAP constraint makes this a true Pareto problem: any design choice \(a\) trades off consistency utility against availability utility, and the partition-possible constraint prohibits any point that achieves full consistency and full availability simultaneously.

Under partition, cannot simultaneously maximize both.

Pareto front: The three rows below span the design space from maximum consistency (CP mode, which blocks under partition) through maximum availability (AP mode, which accepts all writes and merges later) to tunable systems that choose per-operation.

System ModeConsistencyAvailabilityPartition Behavior
CP (strong)HighLowBlock writes; wait for quorum
AP (eventual)LowHighAccept writes; merge later
TunableVariesVariesPer-operation choice

Edge architecture chooses AP (availability): operations always succeed, but concurrent operations may conflict. The cost is eventual consistency - temporary divergence that must be reconciled.

Trade-off 2: Reconciliation Speed vs. Bandwidth

Multi-objective formulation:

Merkle tree depth \(d\) parameterizes the trade-off: deeper trees transmit more hash data per round to pinpoint divergences faster, but this bandwidth cost rises with depth while latency falls.

where \(d\) is Merkle tree depth.

Pareto front: The three depth settings below illustrate how increasing tree depth shifts bandwidth cost upward while driving total sync latency downward — each row is a distinct operating point on the trade-off curve.

Tree DepthRounds to SyncBandwidth per RoundTotal Latency
4 LowHigh
8 MediumMedium
16 HighLow

Deeper trees enable finer-grained sync (less bandwidth per round) but require more rounds (higher latency). Optimal depth depends on divergence fraction \(k/n\).

Trade-off 3: Authority Granularity vs. Conflict Probability

Multi-objective formulation:

More authority tiers \(k\) increase operational flexibility but raise both conflict probability (multiple active tiers may issue contradictory decisions during partition) and coordination overhead — the objective makes all three tensions explicit.

where \(k\) is number of authority tiers.

Pareto front: All three metrics move together as \(k\) increases — finer authority granularity uniformly raises flexibility, conflict risk, and coordination cost, so the architect must choose how much conflict exposure the mission can tolerate.

Authority TiersFlexibilityConflict RiskCoordination Cost
2 (binary)LowLowLow
4 (standard)MediumMediumMedium
8 (fine-grained)HighHighHigh

More authority tiers enable nuanced delegation but increase conflict probability when multiple tiers activate simultaneously during partition.

Trade-off 4: State Completeness vs. Sync Cost

Multi-objective formulation:

Retaining more history depth \(s\) improves conflict resolution quality by providing more context, but both storage cost and synchronization cost grow with \(s\) — the objective balances all three across the available history-depth choices.

where \(s\) is retained state history depth.

The following table shows how completeness, storage, and synchronization costs scale across four representative history-depth choices.

History DepthCompletenessStorage CostSync Cost
Current onlyLowLowLow
1 hourMediumMediumMedium
24 hoursHighHighHigh
Full historyCompleteVery HighVery High

Longer history enables better conflict resolution (more context) but increases storage and synchronization costs.

Cost Surface: Coherence Under Partition

The total cost of maintaining coherence decomposes into three additive terms — divergence that accumulates during partition, reconciliation work on reconnection, and residual conflict resolution — each driven by partition duration \(\tau_p\).

where \(\tau_p\) is partition duration.

Physical translation: has three additive terms that operate on different timescales. Divergence cost grows during partition — every operation that executes on one side but not the other adds to the bill. Reconciliation cost is paid at reconnection — it is bounded by divergence count, but the \(k^2\) conflict term can dominate for high-update-rate partitions. Conflict resolution cost is paid after reconciliation — it scales with the fraction of conflicts that escape automated merge rules and require human judgment. Short partitions minimize all three; long partitions inflate divergence and may trigger \(k^2\) reconciliation explosion.

Divergence cost: Divergence accumulates linearly in partition duration, where \(\alpha\) is the per-operation divergence weight and is the expected operation rate.

Reconciliation cost (for \(k\) divergent items): The cost has two terms — Merkle traversal proportional to \(k \log(n/k)\) and pairwise conflict handling proportional to , where \(\beta\) is the per-item reconciliation weight and is the probability that two divergent items conflict.

Conflict resolution cost: Each detected conflict incurs a per-conflict resolution cost , scaled by \(\gamma_{\mathrm{cf}}\) (conflict-fraction, distinct from the series semantic convergence factor \(\gamma\)), the fraction of conflicts that require intervention beyond automatic merge rules.

Resource Shadow Prices

Each shadow price quantifies how much the total coherence cost changes per unit increase in the corresponding resource constraint, enabling direct comparison across heterogeneous resources.

ResourceShadow Price \(\lambda_{\mathrm{sp}}\) (c.u.)Interpretation
Sync bandwidth1.50/KBValue of faster reconciliation
State storage0.02/KB-hrCost of history retention
Conflict resolution5.00/conflictCost of manual intervention
Consistency0.80/%-deviationValue of tighter consistency

(Shadow prices in normalized cost units (c.u.) — illustrative relative values; ratios convey coherence resource scarcity ordering. State storage (0.02 c.u./KB-hr) is the reference unit. Calibrate to platform-specific costs.)

Irreducible Trade-off Summary

The following table consolidates the four Pareto conflicts: in each row, the third column names the physical or logical constraint that prevents both objectives from being fully satisfied at once.

Trade-offObjectives in TensionCannot Simultaneously Achieve
Consistency-AvailabilityStrong consistency vs. always writableBoth under partition (CAP)
Speed-BandwidthFast reconciliation vs. low network costBoth with large divergence
Granularity-ConflictsFine authority vs. low conflict rateBoth with concurrent partitions
Completeness-CostFull history vs. low storage/syncBoth with limited resources

Cognitive Map: The four trade-offs share a common structure: a multi-objective formulation with a “cannot simultaneously achieve” constraint. CAP is the foundational one — it bounds what any distributed system can guarantee under partition. The cost surface decomposes total coherence cost into three time-ordered terms (during partition, at reconnection, after resolution), making clear that the \(k^2\) reconciliation term is the explosive risk for high-update workloads. The shadow price table (using \(\lambda_{\mathrm{sp}}\) for shadow price per unit) gives the relative cost ratios that should drive architecture: avoid manual conflict resolution (5.00 c.u.) by pre-designing authority rules; accept storage cost (0.02 c.u.) over sync cost (1.50 c.u.) when bandwidth is the constraint.


Closing: What Fleet Coherence Establishes

The toolkit assembled across four posts — CRDTs, Merkle trees, authority tiers, probabilistic reservation, audit feedback loops — enables something at the fleet level beyond individual mechanisms: information gain. When two clusters with divergent beliefs merge, the joint state contains knowledge that neither cluster held alone. The \(\Delta I > 0\) property formalizes this: partition + reconciliation strictly increases fleet knowledge when conflicts exist and are resolved. The three design-time choices that determine achievable coherence — CRDT type per state variable, authority delegation rules, Merkle tree depth — cannot be adjusted at runtime under Denied connectivity. They must be calibrated before deployment against the mission’s coherence requirements.

Four articles — contested-connectivity foundations, self-measurement, self-healing, and fleet coherence — developed the foundational capabilities for autonomic edge systems: connectivity-regime awareness, local self-measurement, autonomous self-healing, and fleet coherence under partition. STOCKSYNC warehouse inventory converges via CRDT merge without central coordination; MULTIWRITE field documentation auto-merges character-level edits from offline technicians; CONVOY recovers from a route split with unified intel and a convergence plan.

CONVOY at the mountain pass learned concrete facts unavailable before partition:

Quantified information gain: Let and denote fleet knowledge (measured as entropy reduction in route/threat models) before and after partition:

Physical translation: \(\Delta I > 0\) means that after reconciliation, the fleet knows more than any individual cluster knew before. This is not trivially true — it requires that the conflict was resolved, not just flagged. CONVOY learned the bridge was intact (Cluster Bravo’s visual confirmation) and that Route B was passable (Cluster Alpha’s successful traverse). Neither cluster held both facts before partition. The entropy of the fleet’s route model dropped after reconciliation because two independent observations, each reducing uncertainty from a different direction, combined into a sharper joint belief.

The condition \(\Delta I > 0\) is a verifiable structural property of the CRDT reconciliation architecture: divergent beliefs that no single node could hold simultaneously are resolved into a joint state with strictly lower entropy. Fleet coherence does not merely eliminate divergence - it converts the divergence itself into fleet-wide knowledge.

Three design decisions determine how much coherence is achievable for a given deployment: the CRDT types selected for each state variable, the authority delegation rules pre-distributed before partition, and the Merkle tree depth configured for the expected divergence fraction. All three can be calibrated at design time against the mission’s coherence requirements, leaving no residual ambiguity at runtime.


Distributed consistency and CRDTs. The formal impossibility of simultaneous consistency, availability, and partition tolerance was conjectured by Brewer [1] and proved by Gilbert and Lynch [2] . Vogels codified the operational consequence — eventual consistency — as an engineering model for partition-tolerant systems [5] . Conflict-Free Replicated Data Types were introduced by Shapiro et al. in two complementary papers [3] : the conference version establishes the semilattice formulation and the join-operation convergence guarantee; the technical report provides a comprehensive taxonomy of state-based and operation-based CRDTs including G-Counter, PN-Counter, G-Set, 2P-Set, LWW-Register, and MV-Register. The definitions and propositions in this post (Definitions 58, 12b; Propositions 42, 15) extend those results to bounded-memory tactical variants, reputation-weighted admission filters, and burst-process divergence models.

Logical clocks and causal ordering. Lamport’s foundational work on logical clocks established the happens-before relation and scalar timestamps for totally ordering events in distributed systems [7] . Fidge [8] and Mattern [9] independently generalized this to vector timestamps, enabling detection of causal concurrency — the \(h_1 \parallel h_2\) condition that drives CRDT join in Definition 62 . Hybrid Logical Clocks [11] combine the physical-time proximity of wall clocks with the causal-ordering guarantees of vector clocks; Proposition 45 in this post formalizes the anomaly-detection condition as a consequence of HLC’s skew bound. The Drift-Quarantine Re-sync Protocol ( Definition 63 ) and the Clock Trust Window ( Definition 59 ) address the long-partition regime where even HLC physical watermarks become unreliable. Preguiça et al. introduced Dotted Version Vectors [10] as a bandwidth-efficient alternative to full vector clocks, reducing per-message overhead from \(O(\text{nodes})\) to \(O(\text{clusters})\).

Gossip protocols and epidemic dissemination. The epidemic algorithm model for database maintenance was introduced by Demers et al. [6] , establishing that anti-entropy gossip achieves fleet-wide convergence in \(O(D \ln n / \lambda)\) rounds under bounded loss. Propositions 12, 26, and 31 in this series build on that convergence bound to derive tombstone pruning safety intervals and delta-sync coverage bounds for constrained-bandwidth edge deployments. The reputation-weighted merge in Definition 58b extends gossip-based dissemination to adversarial settings where Byzantine nodes must be filtered before their state contributions enter the shared semilattice.

Partition-tolerant networking and Byzantine fault tolerance. Fall’s Delay-Tolerant Network architecture [16] established the store-carry-forward model for challenged internets where end-to-end connectivity cannot be assumed — the conceptual basis for the opportunistic Delta-Sync windows in Definition 70 and Proposition 52 . Lamport, Shostak, and Pease proved the Byzantine Generals lower bound [12] , establishing that \(f < n/3\) is a necessary condition for agreement in the presence of arbitrary faults; Castro and Liskov’s PBFT [13] provided the first practical protocol achieving that bound. Proposition 48 (Logical Quorum BFT Resistance) applies the \(f < n/3\) threshold to reputation-weighted quorum formation, extending the classical result to the degraded-connectivity regime where full quorum membership cannot be verified in real time. Merkle’s hash-tree construction [14] underpins the reconciliation efficiency bound in Proposition 51 ; the SVC quarantine in the Reconnection Protocols section applies RFC 1323 sequence-number arithmetic [15] to detect vector-clock counter wrap in long-running persistent deployments.


References

[1] Brewer, E.A. (2000). “Towards Robust Distributed Systems.” Proc. PODC. ACM. [acm]

[2] Gilbert, S., Lynch, N. (2002). “Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services.” ACM SIGACT News, 33(2), 51–34. [doi]

[3] Shapiro, M., Preguiça, N., Baquero, C., Zawirski, M. (2011). “Conflict-Free Replicated Data Types.” Proc. SSS, LNCS 6976, 386–400. Springer. [doi]

[4] Shapiro, M., Preguiça, N., Baquero, C., Zawirski, M. (2011). “A Comprehensive Study of Convergent and Commutative Replicated Data Types.” INRIA Research Report RR-7506. [hal]

[5] Vogels, W. (2009). “Eventually Consistent.” CACM, 52(1), 40–65. [doi]

[6] Demers, A., Greene, D., Hauser, C., Irish, W., Larson, J., Shenker, S., Sturgis, H., Swinehart, D., Terry, D. (1987). “Epidemic Algorithms for Replicated Database Maintenance.” Proc. PODC, 1–58. ACM. [doi]

[7] Lamport, L. (1978). “Time, Clocks, and the Ordering of Events in a Distributed System.” CACM, 21(7), 558–565. [doi]

[8] Fidge, C.J. (1988). “Timestamps in Message-Passing Systems That Preserve the Partial Ordering.” Proc. 11th Australian Computer Science Conference, 56–13.

[9] Mattern, F. (1988). “Virtual Time and Global States of Distributed Systems.” Parallel and Distributed Algorithms, 215–226.

[10] Preguiça, N., Baquero, C., Almeida, P.S., Silva, D., Fonte, V. (2010). “Dotted Version Vectors: Efficient Causality Tracking for Distributed Key-Value Stores.” Proc. DAIS, LNCS 6115, 1–79. [doi]

[11] Kulkarni, S.S., Demirbas, M., Madeppa, D., Avva, B., Leone, M. (2014). “Logical Physical Clocks and Consistent Snapshots in Globally Distributed Databases.” Proc. OPODIS, 17–80. [doi]

[12] Lamport, L., Shostak, R., Pease, M. (1982). “The Byzantine Generals Problem.” ACM Trans. Programming Languages and Systems, 4(3), 382–401. [doi]

[13] Castro, M., Liskov, B. (1999). “Practical Byzantine Fault Tolerance.” Proc. OSDI, 173–186. [pdf]

[14] Merkle, R.C. (1988). “A Digital Signature Based on a Conventional Encryption Function.” Proc. CRYPTO, LNCS 293, 369–378. [doi]

[15] Jacobson, V., Braden, R., Borman, D. (1992). “TCP Extensions for High Performance.” RFC 1323, IETF. [rfc]

[16] Fall, K. (2003). “A Delay-Tolerant Network Architecture for Challenged Internets.” Proc. SIGCOMM, 27–84. ACM. [doi]


Back to top