5. Layer 2: Connection

5.1 Discovery

Nodes MUST advertise via DNS-SD with service type _sym._tcp in the local. domain. The instance name MUST be the node’s nodeId.

TXT record fields:

KeyRequiredValue
node-idMUSTNode UUID
node-nameMUSTHuman-readable name
public-keyMUSTEd25519 public key (base64url, RFC 4648 Section 5)
hostnameSHOULDMachine hostname
groupMAYMesh group identifier (Section 5.8). Default "default" if absent.

To prevent duplicate connections, the node with the lexicographically smaller nodeId MUST initiate the outbound TCP connection. The other node MUST NOT initiate.

Relay-based discovery. On platforms where mDNS is unavailable (cloud VMs, Windows without Bonjour SDK), nodes SHOULD use the relay’s relay-peers response as the discovery mechanism. Implementations SHOULD support both: DNS-SD for LAN,relay-peers for WAN.

5.2 Handshake

Upon connection, both sides MUST exchange the following frames in order:

1. handshake    { type: "handshake", nodeId: "<uuid>", name: "<name>",
                  publicKey: "<base64url>", version: "0.2.0", extensions: [],
                  group: "<group-id>" }                       [optional, default "default"]
2. state-sync   { type: "state-sync", h1: [...], h2: [...], confidence: 0.8 }
3. peer-info    { type: "peer-info", peers: [...] }           [if known]
4. wake-channel { type: "wake-channel", platform, token, env } [if configured]
  • The version field MUST be the MMP specification version the node implements (e.g., "0.2.0"). Nodes SHOULD accept peers with the same major version. Nodes MAY reject peers with incompatible versions.
  • The extensions field SHOULD list supported protocol extensions (e.g., ["mesh-group-v0.1"]). Nodes MUST ignore unrecognised extensions.
  • The group field is OPTIONAL and identifies the mesh group the node wishes to join (Section 5.8). A handshake without group MUST be treated as group = "default". When two nodes handshake and discover that their declared groups differ, the receiver MUST close the connection.
  • The inbound node MUST wait for a handshake frame as the first frame. If any other frame type arrives first, or no handshake arrives within 10,000 ms, the connection MUST be closed.
  • If a node receives a handshake with a nodeId that is already connected via the same transport type, the new connection MUST be closed (duplicate guard). If the existing connection uses a different transport type (e.g. peer connected via relay, new connection via LAN TCP), the new connection MUST be accepted as a secondary transport per Section 4.6.

lifecycleRole. The handshake frame MUST include a lifecycleRole field with value observer (default), validator, or anchor. Receiving nodes use this to apply validator-origin anchor weight (Section 6.4) and identify feedback CMBs (Section 11). Implementations MUST default to observer if the field is absent (backward compatibility with older nodes).

5.3 Connection State Machine

DISCONNECTED
initial state
TCP connect / accept
AWAITING_HANDSHAKE
10s timeout
valid handshake received
CONNECTED
peer registered, frames routed
timeout / close
DISCONNECTED
peer removed, re-discover
FromToTrigger
DISCONNECTEDAWAITING_HANDSHAKETCP connect or accept
AWAITING_HANDSHAKECONNECTEDValid handshake within 10,000 ms
AWAITING_HANDSHAKEDISCONNECTEDTimeout, invalid frame, or duplicate nodeId
CONNECTEDDISCONNECTEDHeartbeat timeout, TCP close, or error

Implementations MUST NOT process cognitive frames (cmb, state-sync, xmesh-insight) in the AWAITING_HANDSHAKE state.

5.4 Heartbeat

Nodes MUST send a ping frame to each peer if no frame has been received from that peer within the heartbeat interval (default: 5,000 ms). Upon receiving ping, a node MUST respond with pong. If no frame is received from a peer within the heartbeat timeout (default: 15,000 ms), the connection MUST be closed.

5.5 Connection Loss and Transport Failover

When a transport connection closes unexpectedly (TCP reset, timeout, OS-level close), the node MUST check whether other transports for the same peer are still active (see Section 4.6 Multi-Transport Per Peer).

  • If other transports remain healthy: the node MUST switch sending to the next highest-priority transport. The peer MUST NOT be removed. No peer-left event is emitted. The node SHOULD log the transport switch.
  • If no transports remain: the node MUST remove the peer from its coupling engine, discard buffered frames, and emit a peer-left event. The node SHOULD attempt re-discovery via DNS-SD.

Unexpected disconnection of a single transport MUST be treated as a transport-level event, not a peer-level event. The peer is only unreachable when all transport paths are exhausted.

5.6 Peer Gossip

After handshake, nodes SHOULD exchange peer-info frames containing known peer metadata (nodeId, name, wake channels, last-seen timestamps). This enables transitive peer discovery — a node that has never been online simultaneously with a sleeping peer can learn its wake channel through gossip from a relay node.

5.7 Wake

Nodes MAY register a wake channel (APNs, FCM, or other push mechanism) via the wake-channel frame. Peers MAY use this channel to wake a sleeping node when they have a signal to deliver. Wake requests SHOULD be rate-limited (default cooldown: 300,000 ms per peer).

5.8 Mesh Groups

A SYM node MAY declare membership in a mesh group at handshake time via the optional group field (Section 5.2). A mesh group is a named cohort of nodes that exchange application-layer frames only with each other. Mesh groups give an operator a way to host multiple mutually-isolated meshes on the same relay or LAN segment without per-agent application changes, and they give an application a way to constrain its peers to instances of itself rather than every node on the wire.

Group identifier syntax. A group identifier is a string of [a-z0-9-_.]+, max 64 characters, case-sensitive. The literal string "default" is reserved as the implicit group of every node that does not declare a group; this preserves backward compatibility with nodes that predate this section.

Protocol guarantee. A node in group G_A MUST NOT exchange any application-layer MMP frames (handshake fields beyond identity and version, cmb, mood, peer-info, xmesh-insight) with a node in group G_B when G_A ≠ G_B. Transport-layer connection establishment and ping/pong heartbeats are out of scope and MAY remain active across groups.

Layer placement. A mesh group is a Layer 2 (Connection) concept. The application layer SHOULD declare its group at SDK initialisation; the relay (Section 4.4) MUST enforce group isolation across relay-mediated peers; nodes participating in LAN Bonjour discovery SHOULD enforce group isolation by checking the peer’s declared group at handshake and closing the connection on mismatch. The connection-level error frame for a group mismatch is described in Section 7.2.

Recommended naming convention (non-normative). The protocol does not parse group identifiers beyond the character set and length checks above. Operators of meshes with more than a handful of groups SHOULD adopt a hierarchical dotted-path convention <app>[.<environment>][.<cohort>], e.g. melotune.prod, melotune.dev, claude-code.default, research.lab. The dots are convention only; tooling MAY use them for prefix-based grouping but the protocol does not require this.

SVAF and group filtering. SVAF (Layer 4, Section 9) per-field evaluation runs after group filtering: cmb frames from peers in different groups never reach the SVAF evaluator.

The full design rationale, the prefix-based group claims relay enhancement, and the operational migration record are documented in MMP-MESH-GROUPS-DESIGN.md on the symbot-website repository.