The Clusterfun Communications Protocol
A writeup of the communications protocol used on Clusterfun.TV, a website for hosting ad-hoc party games in the play-with-devices style of the Jackbox series.
By: TheHans255
3/10/2023
I'm one of the creators of Clusterfun.TV, a website that hosts collaborative party games! The style is similar to games such as the Jackbox Party Pack series and Use Your Words, in which players gather around the game on a TV and use their mobile devices to interact with the game, making the games easy to pick up and play and also easy to share with a streaming audience. Clusterfun.TV's focus is on game design and ease of use - the presenter can be any Web-enabled device just like the clients can, and players can leave and join a game at any time.
At the heart of this system is our communications protocol - the presenter device exchanges messages with each client, giving updates on the game state and receiving commands from the players. I am the primary author of this communications protocol, and today I'll be sharing how it works - how the devices are connected together over the Internet, how messages are formatted and relayed, and how presenters and clients arrange their messages into requests, responses, and commands.
Creating and Managing Rooms
Like the other mobile-device party games listed above, Clusterfun operates on the concept of rooms, or small groups of players in which games are hosted. Each room maintains a list of players, some metadata about the game being played, and a four-character room ID that is shared with other players to allow them to join the room.
When a visitor begins a game on Clusterfun.TV, the server will create a new room for them, designating them as the "presenter" and giving them the room ID. The browser UI then shows a game lobby screen, prominently displaying the room ID for others to join.
Later, when another player enters their name and a room code on Clusterfun.TV, the server will approve their entry and register them as a client in the room. The server will also inform the player of what game is being played as well as the device ID of the presenter, allowing it to run the appropriate game code and start communicating with the presenter.
Examples of Messages
Before we begin discussing how messages are routed, constructed, and interpreted, let's briefly show some examples of messages that Clusterfun sends:
Join Game
This message is sent by the client to the presenter to indicate that they have joined a game.
{"r":"presenter_id","s":"client0_id","id":"3961825726-request"}^
{"requestId":"3961825726","route":"/basic/handshake/join","role":"request"}^
{"playerName":"Marta"}
Tap Action
In one of our test games, this message is sent by the client to tell the presenter that they have tapped the screen somewhere.
{"r":"presenter_id","s":"client0_id","id":"3961825738-request"}^
{"requestId":"3961825738","route":"/games/testato/actions/tap","role":"request"}^
{"point":{"x":0.581,"y":1.27}}
Board Update
In Lexible, this message is sent from the presenter to the clients to inform them of a newly submitted word.
{"r":"client0_id","s":"presenter_id","id":"380868519-request"}^
{"requestId":"380868519","route":"/games/lexible/gameplay/update-board","role":"request"}^
{"letters":[
{"letter":"G","coordinates":{"x":23,"y":10}},
{"letter":"R","coordinates":{"x":22,"y":10}},
{"letter":"A","coordinates":{"x":21,"y":9}},
{"letter":"T","coordinates":{"x":20,"y":10}},
{"letter":"E","coordinates":{"x":19,"y":10}}
],"score":5,"scoringPlayerId":"client1_id","scoringTeam":"B"}
Ping Request/Response
All clients exchange messages like these with the presenter to keep track of their relative ping times.
Request:
{"r":"presenter_id","s":"client0_id","id":"3961825727-request"}^
{"requestId":"3961825727","route":"/basic/ping","role":"request"}^
{"pingTime":1677540154120}
Response:
{"r":"client0_id","s":"presenter_id","id":"3961825730-request-response"}^
{"requestId":"3961825730","role":"response","route":"/basic/ping"}^
{"pingTime":1677540232618,"localTime":1677540232639}
The Message Format
As we can see here, the general message format is arranged as 3 JSON objects,
separated by a circumflex (^
):
- A device routing header, containing an
"r"
field (the recipient), an"s"
field (the sender), and a unique"id"
to differentiate this message from others. - A topic routing header, containing a
"request-id"
field (uniquely identifying a request/response exchange), a"role"
field (indicating what role the message is playing in the exchange), and a"route"
indicating the type of transaction taking place. - A payload, which can be any JSON object permitted by the
route
being used.
If we restrict the two routing headers so that they are both objects and can't contain circumflex characters, then the following Regex parses the parts quite nicely:
^(?<device>{[^^]*})\^(?<topic>{[^^]*})\^(?<payload>.*)$
Explore this regex with the join example here.
Device Routing on the Relay Server
Once a device creates or joins a room, it begins actual message communication by setting up a WebSocket connection with the server. A WebSocket is a bidirectional, TCP-based connection that allows either the server or the client to independently send messages to one another - rather than having to explicitly ask the server if there are any messages waiting, the client can simply wait around, and then it will get an event when a message arrives. It is useful in implementing chat apps and Web-based collaboration spaces, and here, we use it to facilitate communication between presenters and clients.
Rather than let other devices communicate directly, the server acts as an intermediary relay for all messages, routing them in much the same way an Internet router or switchbox would. The messages are formatted so that when the relay server receives a message from a WebSocket, it can quickly figure out which device it's for and send it to that device's WebSocket, spending as little time as possible on each message.
To accomplish this, the server only reads the device routing header. It does this by
looking for the first circumflex in the message (giving up after a set number of characters,
to avoid parsing headers that are too long) and parsing the content before it as
JSON. It then reads the "s"
field, verifies that it's the same as the device ID of
the Websocket that sent the message, and then sends the message wholesale to the device
indicated by the "r"
field. All of this is done with very little parsing and allocation,
allowing the server routing to scale well with lots of messages.
Topic Routing on the Presenter and Client
Once a message arrives on a device, it must interpret the message and trigger an event
for interested parties. It does this by looking at the topic message header and comparing
the "route"
field to interested registrants:
- The game API (as discussed later) allows games to listen for requests on specific
routes - if a message with the role
"request"
arrives for that route, the device sends the payload and the sender to the game logic for processing, where it can return a reply that will then be sent back. - The requests themselves are also made through the API, as one-off events. A request
will include an ID and the route, and when a message arrives back with the same
"request-id"
and a role of either"response"
or"error"
, the response event will trigger and give the request or error data back to the game logic. - Any messages for a route that hasn't been requested, or responses to requests that have already been resolved, are simply ignored.
The end effect of this are that messages are arranged neatly into request/response pairs, and a client can send a request message multiple times in the event of a spotty or lost connection.
The Request/Response API
The final topic is how the game logic sends and receives messages. Our game models
(the classes that actually run the game logic and hold values to be displayed by
the UI) hold onto a SessionHelper
object that mediates the connection to the WebSocket.
The SessionHelper
has two methods, listen()
and request()
, that allow the
game logic to send messages:
export class SessionHelper {
...
/**
* Listens to a route, calling the provided function when a request arrives
* @param endpoint An object describing the route to listen to
* @param apiCallback The function to call when a request arrives
* @returns A handle to the listener to allow later disposal
*/
listen<REQUEST, RESPONSE>(
endpoint: MessageEndpoint<REQUEST, RESPONSE>,
apiCallback: (sender: string, value: REQUEST) => RESPONSE | PromiseLike<RESPONSE>
): ClusterfunListener<REQUEST, RESPONSE> {
// ...
}
/**
* Makes a request to a given receiver, automatically
* retrying and timing out as needed
* @param endpoint An object describing the route to request on
* @param receiverId The ID of the receiver to send to
* @param request The request data to send
* @returns A Promise-like object resolving to the response
created by the recipient.
*/
request<REQUEST, RESPONSE>(
endpoint: MessageEndpoint<REQUEST, RESPONSE>,
receiverId: string,
request: REQUEST
): ClusterfunRequest<REQUEST, RESPONSE> {
// ...
}
(There is also a listenPresenter()
variant, which allows a client to only
get messages from a presenter, and a requestPresenter()
variant, which
fills in the presenter's device ID automatically.)
As we can see, these functions are designed to let TypeScript help us
with the API design as much as possible, by playing into TypeScript's
generics for the request and response types. The MessageEndpoint
type
is used to both encode this use of generics and provide the route
to use. Some examples corresponding to the message examples above follow:
/**
* Endpoint for joining a game from the client
*/
export const JoinEndpoint: MessageEndpoint<
{ playerName: string },
{ isRejoin: boolean, didJoin: boolean, joinError?: string }
> = {
route: "/basic/handshake/join",
responseRequired: true,
suggestedRetryIntervalMs: 1000,
suggestedTotalLifetimeMs: 10000
}
/**
* Endpoint for tapping the screen in the Testato test game.
* As can be seen below, a response is not required and no
* retry information is given.
*/
export const TestatoTapActionEndpoint: MessageEndpoint<{ point: Vector2 }, unknown> = {
route: "/games/testato/actions/tap",
responseRequired: false
}
/**
* Data type for board update notifications
*/
export interface LexibleBoardUpdateNotification
{
letters: { letter: string, coordinates: { x: number, y: number }}[]
scoringPlayerId: string,
scoringTeam: string,
score: number
}
/**
* Endpoint for the presenter to send Lexible board updates
* to the clients.
*/
export const LexibleBoardUpdateEndpoint: MessageEndpoint<LexibleBoardUpdateNotification, unknown> = {
route: "/games/lexible/gameplay/update-board",
responseRequired: false
}
/**
* Endpoint for any participant to ping another participant.
* The retry interval is set to positive infinity in order
* to indicate that a retry should not happen
*/
export const PingEndpoint: MessageEndpoint<
{ pingTime: number },
{ pingTime: number, localTime: number }
> = {
route: "/basic/ping",
responseRequired: true,
suggestedRetryIntervalMs: Number.POSITIVE_INFINITY,
suggestedTotalLifetimeMs: 5000
}
These MessageEndpoint
objects also make it easy to find everywhere
the endpoint is used in an IDE, simply by right-clicking it and
selecting "Find All References"/"Find All Usages".
The ClusterfunListener
and ClusterfunRequest
objects have
some of their own tricks as well. The ClusterfunListener
object
exposes an unsubscribe()
handle so that the game logic can
close the listener when it's no longer needed:
const listener = sessionHelper.listen(myEndpoint, (request) => { ... });
listener.unsubscribe();
The ClusterfunRequest
object, in addition to being await
able,
also has a resend()
method to allow the game logic to resend a
message immediately, and a forget()
method that allows the game to
drop any interest in a response, turing the request into a fire-and-forget
message:
// Send a request with default retry logic, await its result
const oneRequest = sessionHelper.request(myEndpoint, clients[0], data);
const response = await oneRequest;
// Send another request and respond using .then()
const anotherRequest = sessionHelper.request(myEndpoint, clients[1], data);
let shouldResend = true;
anotherRequest.then(response => {
shouldResend = false;
// do stuff with response
})
// ... wait some time, then ...
if (shouldResend) {
anotherRequest.resend();
}
// Send a third request and immediately forget it
sessionHelper.request(myEndpoint, clients[2], data).forget();
Some Examples
Let's conclude with some examples!
Joining a Game
First, here's an excerpt from the logic for joining a game, found in the base class for all game models:
Client logic:
constructor() {
// ... other code ...
this.session.requestPresenter(JoinEndpoint, { playerName: this._playerName }).then(ack => {
// Update the UI to display any errors
this.handleJoinAck(ack);
// Start a follow-up request for the game state
this._stateIsInvalid = true;
this.requestGameStateFromPresenter().then(() => this._stateIsInvalid = false);
});
}
// A game-specific method that fetches the entire need-to-know game state
// from the presenter. This is called on every refresh to ensure that the client
// is up to date no matter how long they've been gone, and it is also called
// when the game suspects its state is desynced
abstract requestGameStateFromPresenter(): Promise<void>;
Presenter logic:
constructor() {
// ... other code ...
// The listenToEndpoint() method calls this.session.listen(),
// and registers the listener to be unsubscribed when the game ends.
this.listenToEndpoint(JoinEndpoint, this.handleJoinMessage)
}
handleJoinMessage = async (sender: string, message: { playerName: string }
): Promise<{ isRejoin: boolean, didJoin: boolean, joinError?: string }> => {
if (/* player has already joined */) {
return { didJoin: true, isRejoin: true }
}
if (/* player left, but came back */) {
// ... put the player back into the game ...
return { didJoin: true, isRejoin: true }
}
// player is new at this point
if (/* players can join in the current game state */) {
if (/* there is room for new players */) {
if (/* a player with that name is already in the room */) {
return {
didJoin: false,
isRejoin: false,
joinError: `That name is taken`
};
} else {
// ... create and register new player ...
return { didJoin: true, isRejoin: false }
}
} else {
return {
didJoin: false,
isRejoin: false,
joinError: "The room is full"
}
}
}
else {
return { didJoin: false, isRejoin: false, joinError: "The room is currently closed" }
}
}
Pinging
Next, here is the ping logic, used by all clients to ping the presenter:
Client logic:
constructor() {
// ... other code ...
this.keepAlive();
}
keepAlive = () => {
if(!this.session) {
Logger.info(`No session on ${this.playerName}`)
return;
}
if(this.gameState !== GeneralGameState.Destroyed) {
this.session.requestPresenter(PingEndpoint, { pingTime: Date.now() }).then(undefined, (err) => {
Logger.warn("Ping message was not received:", err);
})
setTimeout(this.keepAlive, this.KEEPALIVE_INTERVAL_MS)
}
else {
Logger.info(`Game appears to be over (${this.playerId})`)
}
}
Presenter logic:
constructor() {
// ... other code ...
this.listenToEndpoint(PingEndpoint, this.handlePing)
}
handlePing = (sender: string, message: { pingTime: number }): { pingTime: number, localTime: number } => {
return { pingTime: message.pingTime, localTime: Date.now() };
}
Word Submission
Finally, here is an example from Lexible - the all important action of submitting a word:
Client logic:
constructor() {
// ... other code ...
this.listenToEndpointFromPresenter(LexibleBoardUpdateEndpoint, this.handleBoardUpdateMessage);
}
async submitWord() {
// prepare letter submission data
const submissionData: LexibleWordSubmissionRequest = {
letters: this.letterChain.map(l => ({letter: l.letter, coordinates: l.coordinates}))
}
const response = await this.session.requestPresenter(LexibleSubmitWordEndpoint, submissionData)
if (response.success) {
// ... react to success ...
} else {
// ... indicate failure to submit the word ...
}
// clear the player's current selection
this.letterChain[0].selectForPlayer(this.playerId, false);
}
protected handleBoardUpdateMessage = (message: LexibleBoardUpdateNotification) => {
message.letters.forEach(l => {
// ... update each letter block with the new player and score ...
})
// ... update other information about the board state ...
}
Presenter logic:
constructor() {
// ... other code ...
this.listenToEndpoint(LexibleSubmitWordEndpoint, this.handleSubmitWordMessage);
}
handleSubmitWordMessage = (sender: string, request: LexibleWordSubmissionRequest): LexibleWordSubmissionResponse => {
const player = this.players.find(p => p.playerId === sender);
if (!player) throw Error("Unknown player attempted to submit a word");
if(/* the word's coordinates are valid and it is in the word list */) {
this.placeSuccessfulWord(request, player);
return {
success: true,
letters: request.letters
}
}
else {
// ... Indicate failure ...
return {
success: false,
letters: request.letters
};
}
}
placeSuccessfulWord(data: LexibleWordSubmissionRequest, player: LexiblePlayer) {
const placedLetters: LetterChain = /* figure out which letters to update,
update them on the grid, and store
them here */
// Send a board update message to all clients
this.requestEveryoneAndForget(LexibleBoardUpdateEndpoint, (p, isExited) => {
return {
letters: placedLetters,
score: word.length,
scoringPlayerId: player.playerId,
scoringTeam: player.teamName
}
});
// ... check for win conditions and the like ...
}
Summary
Overall, Clusterfun.TV's messaging system resembles the multi-layered system for the Internet itself. Across the Internet, there are five different layers of communication - physical (electrical/fiber/radio), link (Ethernet/802.11), IP (IPv4, IPv6), transport (TCP/UDP), and application (HTTP). Each layer contains the data from the layer above it and some sort of envelope - a header, transportation medium, or other mechanism - that assists in the routing and transit of that data. Systems across the Internet tune themselves to specific layers and make no attempt to read data above them or construct data below them - HTTP clients just write HTTP, Ethernet switches send Ethernet signals using Ethernet packet headers, and IP routers read IP headers to route messages according to their tables.
In Clusterfun.TV, we have two more layers built on top of the Websocket application protocol: the device routing layer, which gets messages to the correct devices; and the topic routing layer, which gets messages to the right internal endpoints. Each of these layers has their own headers, and the parts of the system responsible for routing each pay no attention to any of the data below them. The end result is that our message data is efficiently routed, and our game logic can process that data intuitively in a request/response format.
There are definitely improvements and tradeoffs I could make to this system in the future. In particular:
- Instead of a separate
forget()
method on requests, there should likely be a different API onSessionHelper
for sending fire and forget messages. Therequest()
-then-forget()
pattern causes a response listener to be set up and then torn down unnecessarily. - Lifecycle management on
ClusterfunListener
objects could be better - right now, we use a separate API on the game logic class to hold onto listeners and dispose of them, and callingSessionHelper.listen()
without doing this causes the game logic class to be held in memory through the callback. - Each of these headers is a JSON object to allow for better extensibility. If we
decided to lock in the fields for these headers, we could assign them to fixed regions
of memory within the message and use a
DataView
to read them, allowing the server to avoid having to duplicate any part of the message at all. - Routing everything through the server gives us some data integrity guarantees, but it also increases our maximum load and operating costs. If we set up WebRTC peer connections between the clients and the presenter, we can use those to send our messages and skip the relay server entirely. (It is likely that we would need to operate in both modes, so that clients that can't get reliable peer-to-peer connections can still talk through the relay server)
Overall, though, I'm quite proud of it, and all of us who work on Clusterfun find it a pleasure to use. We've been able to implement plenty of strong features with its help, including the all-important rejoin-after-refresh that makes games far easier to keep going when connectivity glitches happen.
Clusterfun.TV's premier game, Lexible, is currently in open alpha - play it for free now!