Setting Character Limits

In which the author discusses how to set character limits in various web frameworks.

By: TheHans255

3/1/2024

Today's post is inspired by this image, taken from a recent incident documented on TikTok:

pov: someone put the entire shrek 1 script in the special instructions section of their order

This is hilarious, of course. All the same, if it was something boring like Lorem Ipsum or "All work and no play makes Jack a dull boy" over and over again, or if it happened to you so many times that you couldn't really make any fun social media posts about it anymore, it becomes decidedly less hilarious. So, you may ask: how would I set a max limit on stuff like this on my own website, so that this sort of thing doesn't happen?

Setting Character Limits: The Basics

First, why do we set character limits? Character limits on requests and fields are a general line of defense that websites and apps set for themselves, in which they simply discard requests that are too large to make sense. For instance, international telephone numbers are limited to 15 digits (besides the international access code), so if a phone number field contains more than say, 20 characters, you can immediately reject the request because you know that the field isn't going to have an actual valid telephone number in it. Character limits also let us discard requests that might make sense, but would be too large to process efficiently - for instance, with the special instructions section of the order, you may want to limit it to one paragraph (which would be about 100 words, or 1000 characters in English) so as not to overwhelm your cook and wait staff.

Character limits can be set both on the frontend, when the data is being entered, and on the backend, when the data is being received and stored. We'll talk about how you would do it on each layer, in multiple frameworks, and how you might choose what your limits are.

Setting Character Limits On The Frontend

Right at the start, to protect the client itself from having to process too much data, you can typically set limits on the text boxes themselves:

<!-- Setting an HTML text field to 100 characters - also works for other kinds of text boxes such as "password" -->
<input id="mytextfield" type="text" maxlength="100">

<!-- Setting a XAML (Xamarin, WPF, Windows Universal App) text box to 100 characters -->
<TextBox Name="MyTextBox" MaxLength="100"></TextBox>

<!-- Setting an Android EditText widget to 100 characters -->
<EditText android:id="@+id/my_text_input" android:inputType="text" android:maxLength="100" />

More importantly, however, we would want to check character limits right before we submit our input. If we've already packed all of our data into some serializable object for submission to our web API, we can just check the fields on that:

// XAML example in our data model class - assumes we have data-bound variables
// and a Submit button at the bottom
private void SubmitButton_Click(object sender, RoutedEventArgs e) {
   if (this.PhoneNumber.Length > 15) {
       // fail the request. You may also want to highlight the faulty text box here
       e.Handled = true;
       return;
   }
   
   // ... do other checks ...
   
   // ... submit the request as normal ...
   this._httpClient.Post( /* ... */ );
}

If your frontend framework does that object packing for you (e.g. because you're using the plain HTML form system to submit your data), you'll want to check the text boxes themselves:

// this is the "onsubmit" event for the form
async function onFormSubmit(event) {
    if (document.getElementById("mytextfield").value.length > 15) {
        // fail the request. You may also want to highlight the text box here
        e.preventDefault();
        return;
    }
    
    // ... do other checks ...
    
    // ... submit the request as normal ...
    await fetch( /* ... */ );
}

There is an important caveat, though: do not rely solely on the client to set character limits. While setting character limits on the client can help keep things smooth there and give your users a better experience, know that a user can change their local client to bypass any limits you set, or even dispense with a client entirely and use a program like curl to send requests to your server that are as big as they dang well please.

Setting Character Limits on the Backend

With that in mind, the most important place to set character limits is on your backend, right as you're accepting data from your users. There are three layers for this - one is right at the beginning, where you limit the size of the entire request, and one is once your data has been parsed, and you want to validate individual fields.

Total Request Size Limits

Most web frameworks set a default on the maximum size of an HTTP request that they are willing to support, and you can change that limit according to the data that you actually process. The web framework will automatically throw out requests that are too large before your handler/parsing code even has a chance to touch them, allowing you to avoid wasting valuable server time on trying to make sense of some prankster uploading an actual MP4 file of Shrek 1 (or a base64 encoding of that file in your special instructions field).

In PHP, either in the core php.ini file or in .htaccess files, you can set the post_max_size variable, which defaults to 8M (8 megabytes):

; php.ini
post_max_value="1M"

# .htaccess
php_value post_max_value 1M

In Kestrel, ASP.NET Core's Web server, the limits can be changed in the MaxRequestBodySize property of KestrelServerLimits when configuring Kestrel, which defaults to 30_000_000 (about 28.6 MB):

.UseKestrel(options =>
    options.Limits.MaxRequestBodySize = 1_000_000;
    // ... more options here ...

Kestrel also supports changing this on individual endpoints:

[HttpPost]
[RequestSizeLimit(1_000_000)]
public IActionResult SubmitOrder([FromBody] RequestFormData data) {
   // ... handler code here ...
}

In Express.js, you can set this limit as part of the BodyParser middleware:

var app = express();
app.use(bodyParser.json({
    limit: '1mb'
    // ... more options here ...
})) // note that the property also exists on the other bodyParser types, including bodyParser.raw()

Individual Field Size Limits

Once you've accepted a reasonably sized request and parsed the body, it's time to start checking limits on individual fields. Fortunately, this is probably the easiest limit to set - since you're almost always going to be parsing the body into some sort of object, you just need to check the fields of that object:

app.post("/api/submitorder", (req, res) => {
    const body = req.body;
    if (body.phoneNumber.length > 15) {
        res.status(400).end("Invalid phone number");
        return;
    }
    if (body.specialInstructions.length > 1000) {
        res.status(400).end("Special instructions too long");
        return;
    }
    // ... make other checks ...
    
    // ... request is all good, start processing it ...
});

If you have a microservices architecture, with many individual services making requests to each other at once, you may consider doing these checks with each call each service makes, in order to control any services going rogue or otherwise misbehaving. This might extend all the way to your database, where you might set these limits on individual columns:

-- In SQL, the CHAR and VARCHAR types specify a max length (though the TEXT type does not)
CREATE TABLE orders (phoneNumber VARCHAR(20), specialInstructions VARCHAR(1000)) -- etc.

Setting Good Character Limits

Now that we've gone over how to set character limits, how should we decide what they are?

First, if the field you're limiting is something like a phone number, email, or US Social Security number that follows certain standards, and those standards include a maximum length, then you have a character limit already - enforce your fields to that length before parsing them. Here are some examples of such standards:

Phone numbers: 19 digits (15 according to E.164 plus the up to 4 digit international access code)
Email: 320 characters (64 for the local part, 255 for the domain, 1 for the @ symbol)
US SSN: 11 characters (3 digits, then a hyphen, then 2 digits, then a hyphen, then 4 digits)
UUID/GUID: 128 bits/32 hex characters plus hyphens/braces (https://datatracker.ietf.org/doc/html/rfc4122)
SHA-256 hash (e.g. for passwords): 256 bits/64 hex characters (https://web.archive.org/web/20130526224224/https://csrc.nist.gov/groups/STM/cavp/documents/shs/sha256-384-512.pdf)

If your field does not have a standard, though (and I should especially note that human names do not have standards), you'll need to pick a character limit yourself. In that case, my advice would be to select a limit that is as high as possible while still being able to provide reasonable service to your customers, considering all the systems where your data will be handled.

For instance, when handling passwords, you can usually set very high limits for what you will accept on your backend, since you will only be storing a constant-size hash of that password in the database instead of the password itself. If someone generates a password that comes only from the ten-hundred words people use the most often (evenly distributed), you can accept up to 85 words before you don't really get any additional entropy from the SHA-256 algorithm, meaning that if your password system can efficiently hash 850 characters, your limit on password length should be at least that high. (Of course, you might want to set it lower for the sake of network transfer limits, such as to fit your entire login request within the MTU of an Ethernet packet).

With our original point-of-sale receipt printing example, there are many places where the special instructions (as well as the other fields) are processed, all of which will affect the character limit in some way:

The special instructions are displayed on the frontend as they are being typed (how much space do we want to have on the UI for this field?)
The special instructions are sent from the ordering app to the website (how long do we want it to take for the order to reach the website?)
The special instructions are processed by the website (how much processing power do we have for that, and how quickly do we want the order to reach the restaurant?)
The special instructions are stored in the database so that both the customer and the restaurant can consult them later in their clients (how much data do we want to store per order?)
The special instructions are sent to the restaurant (how long do we want that to take?)
The special instructions are printed out on the restaurant's thermal printer (how much does the thermal paper cost, and how long do we want to wait per order for it to print out?)
The special instructions have to be picked up, read, and understood by the cook staff (to what degree do we expect them to follow customer demands?)

Here, the limiting factors would likely be the last two - all the computer systems were fine accepting the Shrek 1 script, but not only did printing that script tie up the printer for any additional orders behind it, it's unlikely that text that long would amount to actual special instructions for an order (and even if it was, it would be unlikely to be worth the time and money of anyone working there).

Character limits really only scratch the surface of client- and server-side data validation. In the future, I may continue this series with topics like rate limiting, using streaming and paging to better handle large inputs, structuring data to more efficiently parse and limit results, and more.