4. Layer 1: Transport

4.1 Wire Format

Frames are length-prefixed JSON over TCP. Each frame consists of:

+-------------------+---------------------------+
| 4 bytes           | N bytes                   |
| UInt32BE (length) | UTF-8 JSON payload        |
+-------------------+---------------------------+
  • The length field is a 4-byte big-endian unsigned 32-bit integer encoding the byte length of the JSON payload.
  • Implementations MUST reject frames with length 0 or length exceeding 1,048,576 bytes (1 MiB). Rejection MUST close the transport connection.
  • The JSON payload MUST be a valid JSON object with a type field (string). Frames that fail JSON parsing or lack a type field MUST be silently discarded.
  • Implementations MUST handle partial reads (TCP stream reassembly).
  • Implementations MUST silently ignore frames with unrecognised type values (forward compatibility).

Frame size. Senders MUST NOT produce frames exceeding MAX_FRAME_SIZE bytes (default: 1,048,576). Receivers MUST close the connection with error code 1003 (FRAME_TOO_LARGE) if a received frame exceeds this limit.

ABNF grammar (RFC 5234):

frame        = frame-length LF json-object LF
frame-length = 1*DIGIT                    ; decimal byte count of json-object
json-object  = "{" *( json-member ) "}"   ; RFC 8259 JSON object
LF           = %x0A                       ; newline delimiter

Each frame is a single JSON object preceded by its byte length as a decimal string, delimited by newline characters.

4.2 Wire Examples

Handshake frame:

Length prefix: 00 00 00 57 (87 bytes)
Payload:
{
  "type": "handshake",
  "nodeId": "a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d",
  "name": "my-agent",
  "version": "0.2.0",
  "extensions": []
}

Ping frame:

Length prefix: 00 00 00 11 (17 bytes)
Payload: {"type":"ping"}

CMB frame:

{
  "type": "cmb",
  "timestamp": 1711540800000,
  "cmb": {
    "key": "cmb-b2c3d4e5f6a7b8c9",
    "createdBy": "melomove",
    "createdAt": 1711540800000,
    "fields": {
      "focus":       { "text": "user coding for 3 hours, energy declining" },
      "issue":       { "text": "sedentary since morning, skipping lunch" },
      "intent":      { "text": "recommend movement break before fatigue worsens" },
      "motivation":  { "text": "3 agents reported declining energy in last hour" },
      "commitment":  { "text": "fitness monitoring active, 10min stretch queued" },
      "perspective": { "text": "fitness agent, afternoon session, home office" },
      "mood":        { "text": "concerned, low energy", "valence": -0.3, "arousal": -0.4 }
    },
    "lineage": {
      "parents": ["cmb-a1b2c3d4e5f6"],
      "ancestors": ["cmb-a1b2c3d4e5f6"],
      "method": "SVAF-v2"
    }
  }
}

4.3 TCP Transport (LAN)

The primary LAN transport. Nodes MUST listen on a TCP port and advertise it via DNS-SD (Section 5.1). Connection timeout MUST be no longer than 10,000 ms.

4.4 WebSocket Relay Transport (WAN)

A relay is an optional WebSocket intermediary that enables connectivity between peers on different networks. Peers on the same LAN discover each other directly via Bonjour mDNS (§4.2) and do not require a relay. The relay provides internet-scale routing between peers behind NAT, a peer directory with wake-channel gossip, and per-token channel isolation for multi-tenant deployments.

A relay is pure routing infrastructure. It does not store CMBs, evaluate SVAF, or participate in mesh cognition. Payloads are opaque to the relay. The relay MUST NOT inspect or modify frame payloads.

4.4.1 Authentication

Clients connect via WebSocket (RFC 6455) and MUST send a relay-auth frame within 10 seconds. Failure results in close code 4001.

{
  "type": "relay-auth",
  "nodeId": "<uuid-v7>",
  "name": "<display-name>",
  "token": "<channel-token>",
  "wakeChannel": {
    "platform": "apns",
    "token": "<push-token>",
    "environment": "production"
  }
}
  • nodeId, name: MUST be present. Missing fields result in close code 4002.
  • token: SHOULD be present if the relay requires authentication. Invalid token results in close code 4003.
  • wakeChannel: MAY be present. Registers push notification credentials for waking this peer when offline (§5.5).

On success, the relay registers the connection, sends a relay-peers response, and broadcasts relay-peer-joined to all other clients on the same channel.

4.4.2 Peer List

Immediately after authentication, the relay sends the current peer list:

{
  "type": "relay-peers",
  "peers": [
    { "nodeId": "<uuid>", "name": "<name>", "wakeChannel": {...}, "offline": false }
  ]
}

The array includes all connected peers on the same channel (excluding the requester) plus offline peers with registered wake channels (offline: true). Clients SHOULD treat each non-offline entry as a peer-joined event.

4.4.3 Peer Presence

{ "type": "relay-peer-joined", "nodeId": "<uuid>", "name": "<name>" }
{ "type": "relay-peer-left",   "nodeId": "<uuid>", "name": "<name>" }

Broadcast to all peers on the same channel when a peer joins or leaves.

4.4.4 Message Routing

Clients send frames with a routing envelope:

{ "to": "<target-nodeId>", "payload": { "type": "cmb", ... } }

If to is present, the relay forwards to that peer only. If absent, the relay broadcasts to all peers on the same channel. The relay adds from and fromName to forwarded frames. The relay MUST NOT route frames across channels.

4.4.5 Keepalive

The relay sends relay-ping at a regular interval (RECOMMENDED: 10 seconds). Clients MUST respond with relay-pong. A client that misses two consecutive pings is closed with code 4005. Clients MAY send unsolicited relay-pong frames; the relay MUST accept them.

4.4.6 Re-authentication

If the relay loses a client’s registration (e.g. relay restart behind a TLS proxy), it sends { "type": "relay-reauth" }. The client MUST re-send relay-auth in response.

4.4.7 Duplicate Identity

When a client authenticates with a nodeId already held by an existing connection:

  • Fresh existing (< 5s): the relay MUST reject the newcomer with close code 4006. This prevents ping-pong loops where two processes with the same identity kick each other.
  • Stale existing (≥ 5s): the relay MAY replace the existing connection by closing it with code 4004. The relay MUST NOT broadcast relay-peer-left for the replaced connection.

Clients receiving code 4004 SHOULD log the collision and MUST NOT automatically reconnect (§5.3). Clients receiving code 4006 SHOULD NOT reconnect — the existing holder is the legitimate one.

4.4.8 Channel Isolation

A relay MAY support multiple isolated channels. Each authentication token maps to exactly one channel. Cross-channel routing MUST NOT occur: frames, peer lists, and presence notifications are scoped to the channel. A relay with no token configured operates in open mode (single default channel, no authentication).

4.4.9 Close Codes

CodeNameClient Action
4001Auth timeoutRetry with auth
4002Auth invalidFix auth frame
4003Invalid tokenCheck token config
4004ReplacedLog collision, do NOT reconnect
4005Heartbeat timeoutReconnect with backoff
4006Duplicate rejectedDo NOT reconnect

4.5 IPC Transport (Local)

Local tools MAY connect to a node via IPC (Unix domain socket, named pipe, or localhost TCP) to query mesh state. The framing is identical to TCP transport. IPC is an implementation convenience for local tooling (dashboards, CLI, monitoring) — it is not a substitute for peer-to-peer transport. Agents that participate in coupling MUST connect as full peer nodes via TCP or WebSocket.

Implementations MUST support a persistent IPC socket at a well-known path. The socket MUST accept multiple simultaneous connections. Each IPC connection SHOULD remain open for the lifetime of the client application — short-lived connections (one query, then disconnect) are permitted but SHOULD be avoided by applications that query frequently.

Well-known IPC path: ~/.sym/daemon.sock (Unix domain socket) or localhost:19517 (TCP fallback).

4.6 Multi-Transport Per Peer

A peer MAY be reachable via multiple transports simultaneously (e.g. LAN TCP + WAN relay). Implementations MUST support maintaining multiple active transports for the same peer and select the highest-priority healthy transport for sending:

PriorityTransportRationale
1 (highest)TCP (LAN)Lowest latency, no intermediary, no cloud dependency
2WebSocket Relay (WAN)Cross-network, higher latency, relay dependency
3 (lowest)Wake (push)Last resort — wake the peer, then connect via 1 or 2

When a node receives an inbound connection from a peer that is already connected via a different transport, it MUST NOT reject the new connection. Instead it MUST add the new transport as a secondary path. Frames SHOULD be sent via the highest-priority healthy transport.

A transport is healthy if it has received a frame (including pong) within the heartbeat timeout (Section 5.4). An unhealthy transport SHOULD be closed after the timeout. The peer is only removed (peer-left event) when all transports for that peer are closed — not when a single transport drops.

This enables graceful failover: if a relay drops, the LAN transport continues. If LAN drops, the relay takes over. The peer remains connected throughout — only the active transport changes.

Q&A

Why must each agent run its own transport?

Coupling is per-node. SVAF field weights (αf) are per-node. Memory stores are per-node. An agent that shares another node’s transport and identity cannot have independent coupling decisions. Multiple agents on the same device each run their own Bonjour advertisement, relay connection, and TCP listener. They discover each other the same way agents on different devices do — there is no special local path.

Is the resource cost of N agents acceptable?

N agents on one device means N Bonjour advertisements and N relay connections. For small N (4–8 agents), this is well within OS limits. Bonjour is designed for many services per host. Relay WebSocket connections are lightweight. The protocol correctness benefit (per-agent coupling) outweighs the marginal resource cost.