An Objective Definition for Strong vs. Weak Typing

by: TheHans255

May 5, 2024

This post is adapted from a post I originally wrote on the Language Dev Stack Exchange.

A common debate in the world of programming languages is whether a programming language is "strongly" or "weakly" typed - that is, how much a program enforces good rules. Unfortunately, there is not a whole lot of consensus as to what these terms mean - some might describe C, for instance, as being strongly typed because it requires you to declare the types of your variables, while some might describe C as weakly typed because it lets you cast pointers willy nilly or lets you store past the end of a list. One blog entry goes as far as to give a pretty snarky definition of these terms so as to show how useless they are:

I don't claim to have the authoritative answer on these terms (much less the programming language debate), but I do remember learning a definition for them in my time at the University of Washington that I find to be reasonable, and while not the end-all answer for this debate, is objective, and something I believe should be an essential ingredient in talking about the desirability of a programming language's type system.

Essentially, a "weakly typed" programming language is one in which there exists one or more ways, within the language's spec, to circumvent its type system and cause undefined behavior. A "strongly typed" programming language is simply defined as having no such holes - the type system, no matter what it is, follows its stated rules.

I should first explain what a "type system" even is. Basically, a type system is a set of rules in a programming language that ensures that your program will behave a certain way, and acts by assigning a "type" to each entity in your program, including code, variables, inputs, and outputs. The "rules" themselves can be anything you want, and type systems have been built to variably enforce a great number of rules over the years:

Not all type systems have all of these rules, even in new, production-level languages. Different languages enforce these rules differently as well. The main point is that the rules exist, and provide some assurance for how each part of your program is going to behave.

Within this context, if a type system is "strong", the rules are ironclad - this does mean that there are some valid, interesting programs that cannot be written in the language, but that you can be assured that the rules that the type system is trying to enforce are actually being applied to your program. A type system is "weak", on the other hand, if there exist ways to circumvent the rules. (This definition is closely related to the idea of a type system being "sound" or "unsound", meaning that the rules of the type system make coherent sense in the first place).

Note that this is distinct from "static typing" vs. "dynamic typing", which are better defined, and have more to do with how the rules of the type system are enforced. A statically typed language is one in which the types of its values can be determined at compile time, usually accompanied by an explicit requirement to declare the types of local/global variables, function arguments, and return values. In a dynamically typed language, the type information is handled internally within the variables at runtime, with the language not concerning itself with the value's type until it's necessary to do so.

JavaScript, for instance, is dynamically but strongly typed. Variable types are kept internally and there's no way to declare which ones are what (and you can end up with variables with a type you don't want at runtime), but the type system has well defined rules for all eventualities - as long as no FFI code is involved, you can follow the code from the beginning, where each and every variable is instantiated, and use the language rules to know exactly what that code is going to do. Yes, you get an error if you try to call a method that doesn't exist on an object, but because the JavaScript runtime has exact rules on what happens when you look up a property that doesn't exist (getting the special value undefined) and what happens when you run the call operator on a value (such as undefined) that doesn't support it (you get a TypeError), the type system is still strong.

Contrast that with C, which is statically but weakly typed. All variables have a specific type associated with them, which you have to declare (unless you're using a variant that lets you use auto to declare a variable if its type can be exactly inferred from the incoming expression). Most expressions have specific rules for what types can be used together and when type casts occur (including implicit type casts). However, because dereferencing an invalid pointer value causes undefined behavior, and arbitrary integers can be bit-cast to pointers (including invalid ones), it is possible to circumvent C's type system - hence, C is weakly typed.

Rust is an interesting case - safe Rust (barring any design flaws) is strongly and statically typed, while unsafe Rust is weakly and statically typed. In both cases, the type system has well-defined rules (including rules with lifetimes, which are part of the value types for each variable and expression), but in unsafe Rust, some actions that break the type system and cause undefined behavior are possible, and it is up to the programmer to ensure that they are keeping the well-defined rules within the confines of the unsafe blocks. (Rust's type system also has some other interesting features - in addition to the operations that are allowed on a value, the type of a value in Rust also defines how long that variable lives, and whether the code holding onto it has the right to change it and/or throw it away.)

And for a few other edge cases of type systems:

Now, with these things said, there are several features in a language that I would certainly consider bad/unpleasant to work with in most circumstances, but would not necessarily consider to be features of a "weak" type system if the rules for when these things happen are well-defined:

If "strong" or "weak" under this definition, then, does not keep out these "undesirable" features of a programming language, what does it do for us? While it enforces nothing about the job that the type system is going to do for us, knowing that a type system is strong at least enforces that its job is going to be done correctly, and any failures to do that are going to be in its implementation. In particular, it is much easier for a program with a strong type system to reliably undergo a security audit, since each of its parts, with their well-established and strongly maintained types, can be examined individually to ensure that the program is specified correctly (and if it is, but is still behaving improperly, to know that it's the compiler or runtime's fault). A weak type system cannot reliably undergo such an audit, since any other part of the program may have broken its rules, and must therefore be audited at a lower level with fewer but more strongly maintained guarantees (such as the assembly code below it).

Beyond that, though, "strength" and "weakness" don't really have anything to do with what the programmer might want out of a type system. I would definitely argue that implicit casts, reckless reinterpret casting, silent failure values, and arbitrary GOTOs are bad, but that opinion is ultimately subjective - some other programmer, at some other time, wanted those features in their language and defined rules for how they should work. The fault in this case is not in the objective ability for the type system to do its job, but in the job the type system is being asked to do.

And we can still call that "strong" or "weak" if we want - indeed, the definitions I have shared for "strong" and "weak" can probably just be called "sound" or "unsound" to make them more specific. We just can't be objective if that's what we decide to do, since everything in the wonderful world of type systems, at least at the reasons they were invented, had reasons for existing as they are.

Copyright © 2022-2024, TheHans255. This work is licensed under the CA BY 4.0 license - permission is granted to share and adapt this content for personal and commercial use as long as credit is given to the creator.