The Clusterfun Communications Protocol

A writeup of the communications protocol used on Clusterfun.TV, a website for hosting ad-hoc party games in the play-with-devices style of the Jackbox series.

By: TheHans255

3/10/2023

I'm one of the creators of Clusterfun.TV, a website that hosts collaborative party games! The style is similar to games such as the Jackbox Party Pack series and Use Your Words, in which players gather around the game on a TV and use their mobile devices to interact with the game, making the games easy to pick up and play and also easy to share with a streaming audience. Clusterfun.TV's focus is on game design and ease of use - the presenter can be any Web-enabled device just like the clients can, and players can leave and join a game at any time.

Artist's rendition of the full Clusterfun concept

At the heart of this system is our communications protocol - the presenter device exchanges messages with each client, giving updates on the game state and receiving commands from the players. I am the primary author of this communications protocol, and today I'll be sharing how it works - how the devices are connected together over the Internet, how messages are formatted and relayed, and how presenters and clients arrange their messages into requests, responses, and commands.

Creating and Managing Rooms

Like the other mobile-device party games listed above, Clusterfun operates on the concept of rooms, or small groups of players in which games are hosted. Each room maintains a list of players, some metadata about the game being played, and a four-character room ID that is shared with other players to allow them to join the room.

When a visitor begins a game on Clusterfun.TV, the server will create a new room for them, designating them as the "presenter" and giving them the room ID. The browser UI then shows a game lobby screen, prominently displaying the room ID for others to join.

Later, when another player enters their name and a room code on Clusterfun.TV, the server will approve their entry and register them as a client in the room. The server will also inform the player of what game is being played as well as the device ID of the presenter, allowing it to run the appropriate game code and start communicating with the presenter.

An illustration of the Clusterfun network diagram

Examples of Messages

Before we begin discussing how messages are routed, constructed, and interpreted, let's briefly show some examples of messages that Clusterfun sends:

Join Game

This message is sent by the client to the presenter to indicate that they have joined a game.

{"r":"presenter_id","s":"client0_id","id":"3961825726-request"}^
{"requestId":"3961825726","route":"/basic/handshake/join","role":"request"}^
{"playerName":"Marta"}

Tap Action

In one of our test games, this message is sent by the client to tell the presenter that they have tapped the screen somewhere.

{"r":"presenter_id","s":"client0_id","id":"3961825738-request"}^
{"requestId":"3961825738","route":"/games/testato/actions/tap","role":"request"}^
{"point":{"x":0.581,"y":1.27}}

Board Update

In Lexible, this message is sent from the presenter to the clients to inform them of a newly submitted word.

{"r":"client0_id","s":"presenter_id","id":"380868519-request"}^
{"requestId":"380868519","route":"/games/lexible/gameplay/update-board","role":"request"}^
{"letters":[
    {"letter":"G","coordinates":{"x":23,"y":10}},
    {"letter":"R","coordinates":{"x":22,"y":10}},
    {"letter":"A","coordinates":{"x":21,"y":9}},
    {"letter":"T","coordinates":{"x":20,"y":10}},
    {"letter":"E","coordinates":{"x":19,"y":10}}
    ],"score":5,"scoringPlayerId":"client1_id","scoringTeam":"B"}

Ping Request/Response

All clients exchange messages like these with the presenter to keep track of their relative ping times.

Request:

{"r":"presenter_id","s":"client0_id","id":"3961825727-request"}^
{"requestId":"3961825727","route":"/basic/ping","role":"request"}^
{"pingTime":1677540154120}

Response:

{"r":"client0_id","s":"presenter_id","id":"3961825730-request-response"}^
{"requestId":"3961825730","role":"response","route":"/basic/ping"}^
{"pingTime":1677540232618,"localTime":1677540232639}

The Message Format

As we can see here, the general message format is arranged as 3 JSON objects, separated by a circumflex (^):

A device routing header, containing an "r" field (the recipient), an "s" field (the sender), and a unique "id" to differentiate this message from others.
A topic routing header, containing a "request-id" field (uniquely identifying a request/response exchange), a "role" field (indicating what role the message is playing in the exchange), and a "route" indicating the type of transaction taking place.
A payload, which can be any JSON object permitted by the route being used.

If we restrict the two routing headers so that they are both objects and can't contain circumflex characters, then the following Regex parses the parts quite nicely:

^(?<device>{[^^]*})\^(?<topic>{[^^]*})\^(?<payload>.*)$

Explore this regex with the join example here.

An illustration of the Clusterfun message format, showing the device, topic, and payload sections

Device Routing on the Relay Server

Once a device creates or joins a room, it begins actual message communication by setting up a WebSocket connection with the server. A WebSocket is a bidirectional, TCP-based connection that allows either the server or the client to independently send messages to one another - rather than having to explicitly ask the server if there are any messages waiting, the client can simply wait around, and then it will get an event when a message arrives. It is useful in implementing chat apps and Web-based collaboration spaces, and here, we use it to facilitate communication between presenters and clients.

Rather than let other devices communicate directly, the server acts as an intermediary relay for all messages, routing them in much the same way an Internet router or switchbox would. The messages are formatted so that when the relay server receives a message from a WebSocket, it can quickly figure out which device it's for and send it to that device's WebSocket, spending as little time as possible on each message.

To accomplish this, the server only reads the device routing header. It does this by looking for the first circumflex in the message (giving up after a set number of characters, to avoid parsing headers that are too long) and parsing the content before it as JSON. It then reads the "s" field, verifies that it's the same as the device ID of the Websocket that sent the message, and then sends the message wholesale to the device indicated by the "r" field. All of this is done with very little parsing and allocation, allowing the server routing to scale well with lots of messages.

An illustration of the Clusterfun device routing process

Topic Routing on the Presenter and Client

Once a message arrives on a device, it must interpret the message and trigger an event for interested parties. It does this by looking at the topic message header and comparing the "route" field to interested registrants:

The game API (as discussed later) allows games to listen for requests on specific routes - if a message with the role "request" arrives for that route, the device sends the payload and the sender to the game logic for processing, where it can return a reply that will then be sent back.
The requests themselves are also made through the API, as one-off events. A request will include an ID and the route, and when a message arrives back with the same "request-id" and a role of either "response" or "error", the response event will trigger and give the request or error data back to the game logic.
Any messages for a route that hasn't been requested, or responses to requests that have already been resolved, are simply ignored.

The end effect of this are that messages are arranged neatly into request/response pairs, and a client can send a request message multiple times in the event of a spotty or lost connection.

An illustration of the Clusterfun topic routing process

The Request/Response API

The final topic is how the game logic sends and receives messages. Our game models (the classes that actually run the game logic and hold values to be displayed by the UI) hold onto a SessionHelper object that mediates the connection to the WebSocket. The SessionHelper has two methods, listen() and request(), that allow the game logic to send messages:

export class SessionHelper {
    ...
    /**
     * Listens to a route, calling the provided function when a request arrives
     * @param endpoint An object describing the route to listen to
     * @param apiCallback The function to call when a request arrives
     * @returns A handle to the listener to allow later disposal
     */
    listen<REQUEST, RESPONSE>(
        endpoint: MessageEndpoint<REQUEST, RESPONSE>, 
        apiCallback: (sender: string, value: REQUEST) => RESPONSE | PromiseLike<RESPONSE>
        ): ClusterfunListener<REQUEST, RESPONSE> {
        // ...
    }

    /**
     * Makes a request to a given receiver, automatically
     * retrying and timing out as needed
     * @param endpoint An object describing the route to request on
     * @param receiverId The ID of the receiver to send to
     * @param request The request data to send
     * @returns A Promise-like object resolving to the response
                created by the recipient.
     */
    request<REQUEST, RESPONSE>(
        endpoint: MessageEndpoint<REQUEST, RESPONSE>, 
        receiverId: string, 
        request: REQUEST
        ): ClusterfunRequest<REQUEST, RESPONSE> {
        // ...
    }

(There is also a listenPresenter() variant, which allows a client to only get messages from a presenter, and a requestPresenter() variant, which fills in the presenter's device ID automatically.)

As we can see, these functions are designed to let TypeScript help us with the API design as much as possible, by playing into TypeScript's generics for the request and response types. The MessageEndpoint type is used to both encode this use of generics and provide the route to use. Some examples corresponding to the message examples above follow:

/**
 * Endpoint for joining a game from the client
 */
export const JoinEndpoint: MessageEndpoint<
    { playerName: string }, 
    { isRejoin: boolean, didJoin: boolean, joinError?: string }
    > = {
    route: "/basic/handshake/join",
    responseRequired: true,
    suggestedRetryIntervalMs: 1000,
    suggestedTotalLifetimeMs: 10000
}

/**
 * Endpoint for tapping the screen in the Testato test game.
 * As can be seen below, a response is not required and no
 * retry information is given.
 */
export const TestatoTapActionEndpoint: MessageEndpoint<{ point: Vector2 }, unknown> = {
    route: "/games/testato/actions/tap",
    responseRequired: false
}

/**
 * Data type for board update notifications
 */
export interface LexibleBoardUpdateNotification
{
    letters: { letter: string, coordinates: { x: number, y: number }}[]
    scoringPlayerId: string,
    scoringTeam: string,
    score: number
}

/**
 * Endpoint for the presenter to send Lexible board updates
 * to the clients.
 */
export const LexibleBoardUpdateEndpoint: MessageEndpoint<LexibleBoardUpdateNotification, unknown> = {
    route: "/games/lexible/gameplay/update-board",
    responseRequired: false
}

/**
 * Endpoint for any participant to ping another participant.
 * The retry interval is set to positive infinity in order
 * to indicate that a retry should not happen
 */
export const PingEndpoint: MessageEndpoint<
    { pingTime: number }, 
    { pingTime: number, localTime: number }
    > = {
    route: "/basic/ping",
    responseRequired: true,
    suggestedRetryIntervalMs: Number.POSITIVE_INFINITY,
    suggestedTotalLifetimeMs: 5000
}

These MessageEndpoint objects also make it easy to find everywhere the endpoint is used in an IDE, simply by right-clicking it and selecting "Find All References"/"Find All Usages".

The ClusterfunListener and ClusterfunRequest objects have some of their own tricks as well. The ClusterfunListener object exposes an unsubscribe() handle so that the game logic can close the listener when it's no longer needed:

const listener = sessionHelper.listen(myEndpoint, (request) => { ... });
listener.unsubscribe();

The ClusterfunRequest object, in addition to being awaitable, also has a resend() method to allow the game logic to resend a message immediately, and a forget() method that allows the game to drop any interest in a response, turing the request into a fire-and-forget message:

// Send a request with default retry logic, await its result
const oneRequest = sessionHelper.request(myEndpoint, clients[0], data);
const response = await oneRequest;

// Send another request and respond using .then()
const anotherRequest = sessionHelper.request(myEndpoint, clients[1], data);
let shouldResend = true;
anotherRequest.then(response => {
    shouldResend = false;
    // do stuff with response
})
// ... wait some time, then ...
if (shouldResend) {
    anotherRequest.resend();
}

// Send a third request and immediately forget it
sessionHelper.request(myEndpoint, clients[2], data).forget();

An illustration of the request/response flow

(EDIT: I ended up replacing the forget() method with a sendMessage() API that doesn't set up the request endpoint at all - see this post for details.)

Some Examples

Let's conclude with some examples!

Joining a Game

First, here's an excerpt from the logic for joining a game, found in the base class for all game models:

Client logic:

constructor() {
    // ... other code ...
    this.session.requestPresenter(JoinEndpoint, { playerName: this._playerName }).then(ack => {
        // Update the UI to display any errors
        this.handleJoinAck(ack);

        // Start a follow-up request for the game state
        this._stateIsInvalid = true;
        this.requestGameStateFromPresenter().then(() => this._stateIsInvalid = false);
    });
}

// A game-specific method that fetches the entire need-to-know game state
// from the presenter. This is called on every refresh to ensure that the client
// is up to date no matter how long they've been gone, and it is also called
// when the game suspects its state is desynced
abstract requestGameStateFromPresenter(): Promise<void>;

Presenter logic:

constructor() {
    // ... other code ...
    // The listenToEndpoint() method calls this.session.listen(),
    // and registers the listener to be unsubscribed when the game ends.
    this.listenToEndpoint(JoinEndpoint, this.handleJoinMessage)
}

handleJoinMessage = async (sender: string, message: { playerName: string }
    ): Promise<{ isRejoin: boolean, didJoin: boolean, joinError?: string }> => {
        if (/* player has already joined */) {
            return { didJoin: true, isRejoin: true }
        }
        if (/* player left, but came back */) {
            // ... put the player back into the game ...
            return { didJoin: true, isRejoin: true }
        }
        // player is new at this point
        if (/* players can join in the current game state */) {
            if (/* there is room for new players */) {
                if (/* a player with that name is already in the room */) {
                    return { 
                        didJoin: false, 
                        isRejoin: false, 
                        joinError: `That name is taken`
                    };
                } else {
                    // ... create and register new player ...
                    return { didJoin: true, isRejoin: false }
                }
            } else {
                return {
                    didJoin: false,
                    isRejoin: false,
                    joinError: "The room is full"
                }
            }
        }
        else {
            return { didJoin: false, isRejoin: false, joinError: "The room is currently closed" }
        }
    }

Pinging

Next, here is the ping logic, used by all clients to ping the presenter:

Client logic:

constructor() {
    // ... other code ...
    this.keepAlive();
}

keepAlive = () => {
    if(!this.session) {
        Logger.info(`No session on ${this.playerName}`)
        return;
    }
    
    if(this.gameState !== GeneralGameState.Destroyed) {
        this.session.requestPresenter(PingEndpoint, { pingTime: Date.now() }).then(undefined, (err) => {
            Logger.warn("Ping message was not received:", err);
        })
        setTimeout(this.keepAlive, this.KEEPALIVE_INTERVAL_MS)
    }
    else {
        Logger.info(`Game appears to be over (${this.playerId})`)
    }
}

Presenter logic:

constructor() {
    // ... other code ...
    this.listenToEndpoint(PingEndpoint, this.handlePing)
}

handlePing = (sender: string, message: { pingTime: number }): { pingTime: number, localTime: number } => {
    return { pingTime: message.pingTime, localTime: Date.now() };
}

Word Submission

Finally, here is an example from Lexible - the all important action of submitting a word:

Client logic:

constructor() {
    // ... other code ...
    this.listenToEndpointFromPresenter(LexibleBoardUpdateEndpoint, this.handleBoardUpdateMessage);
}

async submitWord() {
    // prepare letter submission data
    const submissionData: LexibleWordSubmissionRequest = {
        letters: this.letterChain.map(l => ({letter: l.letter, coordinates: l.coordinates}))
    }

    const response = await this.session.requestPresenter(LexibleSubmitWordEndpoint, submissionData)
    if (response.success) {
        // ... react to success ...
    } else {
        // ... indicate failure to submit the word ...
    }

    // clear the player's current selection
    this.letterChain[0].selectForPlayer(this.playerId, false);
}

protected handleBoardUpdateMessage = (message: LexibleBoardUpdateNotification) => {
    message.letters.forEach(l => {
        // ... update each letter block with the new player and score ...
    })
    // ... update other information about the board state ...
}

Presenter logic:

constructor() {
    // ... other code ...
    this.listenToEndpoint(LexibleSubmitWordEndpoint, this.handleSubmitWordMessage);
}

handleSubmitWordMessage = (sender: string, request: LexibleWordSubmissionRequest): LexibleWordSubmissionResponse => {
    const player = this.players.find(p => p.playerId === sender);
    if (!player) throw Error("Unknown player attempted to submit a word");

    if(/* the word's coordinates are valid and it is in the word list */) {
        this.placeSuccessfulWord(request, player);
        return {
            success: true,
            letters: request.letters
        }
    }
    else {
        // ... Indicate failure ...
        return {
            success: false,
            letters: request.letters
        };
    }
}

placeSuccessfulWord(data: LexibleWordSubmissionRequest, player: LexiblePlayer) {
    const placedLetters: LetterChain = /* figure out which letters to update,
                                          update them on the grid, and store
                                          them here */

    // Send a board update message to all clients
    this.requestEveryoneAndForget(LexibleBoardUpdateEndpoint, (p, isExited) => {
        return {
            letters: placedLetters,
            score: word.length,
            scoringPlayerId: player.playerId,
            scoringTeam: player.teamName
        }
    });

    // ... check for win conditions and the like ...
}

Summary

Overall, Clusterfun.TV's messaging system resembles the multi-layered system for the Internet itself. Across the Internet, there are five different layers of communication - physical (electrical/fiber/radio), link (Ethernet/802.11), IP (IPv4, IPv6), transport (TCP/UDP), and application (HTTP). Each layer contains the data from the layer above it and some sort of envelope - a header, transportation medium, or other mechanism - that assists in the routing and transit of that data. Systems across the Internet tune themselves to specific layers and make no attempt to read data above them or construct data below them - HTTP clients just write HTTP, Ethernet switches send Ethernet signals using Ethernet packet headers, and IP routers read IP headers to route messages according to their tables.

In Clusterfun.TV, we have two more layers built on top of the Websocket application protocol: the device routing layer, which gets messages to the correct devices; and the topic routing layer, which gets messages to the right internal endpoints. Each of these layers has their own headers, and the parts of the system responsible for routing each pay no attention to any of the data below them. The end result is that our message data is efficiently routed, and our game logic can process that data intuitively in a request/response format.

There are definitely improvements and tradeoffs I could make to this system in the future. In particular:

Instead of a separate forget() method on requests, there should likely be a different API on SessionHelper for sending fire and forget messages. The request()-then-forget() pattern causes a response listener to be set up and then torn down unnecessarily. (EDIT: I did end up doing this - see this post for details!)
Lifecycle management on ClusterfunListener objects could be better - right now, we use a separate API on the game logic class to hold onto listeners and dispose of them, and calling SessionHelper.listen() without doing this causes the game logic class to be held in memory through the callback.
Each of these headers is a JSON object to allow for better extensibility. If we decided to lock in the fields for these headers, we could assign them to fixed regions of memory within the message and use a DataView to read them, allowing the server to avoid having to duplicate any part of the message at all.
Routing everything through the server gives us some data integrity guarantees, but it also increases our maximum load and operating costs. If we set up WebRTC peer connections between the clients and the presenter, we can use those to send our messages and skip the relay server entirely. (It is likely that we would need to operate in both modes, so that clients that can't get reliable peer-to-peer connections can still talk through the relay server)

Overall, though, I'm quite proud of it, and all of us who work on Clusterfun find it a pleasure to use. We've been able to implement plenty of strong features with its help, including the all-important rejoin-after-refresh that makes games far easier to keep going when connectivity glitches happen.

Clusterfun.TV's premier game, Lexible, is currently in open alpha - play it for free now!