Thoughts on "Markets Are Efficient if and only if P=NP", Part 1

View Part 2 here

This blog post is a response to the 2010 paper "Markets are efficient if and only if P = NP" by Philip Maymin. The paper's synopsis is as follows:

I prove that if markets are weak-form efficient, meaning current prices fully reflect all information available in past prices, then P = NP, meaning every computational problem whose solution can be verified in polynomial time can also be solved in polynomial time. I also prove the converse by showing how we can "program" the market to solve NP-complete problems. Since P probably does not equal NP, markets are probably not efficient. Specifically, markets become increasingly inefficient as the time series lengthens or becomes more frequent. An illustration by way of partitioning the excess returns to momentum strategies based on data availability confirms this prediction.

The paper has never been published to my knowledge, but is available on Arxiv here:

ALSO: A DISCLAIMER: I AM NOT A FINANCIAL ADVISER AND CLAIM NO SPECIAL KNOWLEDGE OF THE FINANCIAL SECTOR. Most of my understanding of finances comes from self-study of asset classes and experience with my own funds, and it is quite possible that this understanding is partially or completely wrong. While I aim to provide correct information for the best of my ability, I do not intend to encourage or discourage any specific financial decisions in this post, and do not take responsibility for any financial decisions thus made.

I first learned about this paper in 2019 from a coworker. I did not have the chance to read it fully at the time, but I was intrigued by its claim that it could connect the free market, a prevalent concept in American economy and politics, to the P = NP problem, a notoriously difficult computer science problem that has incredible implications in computation and cryptography. In particular, I was impressed by its claim that the two were apparently pitted against each other:

The majority of financial academics believe in weak form efficiency and the majority of computer scientists believe that P ≠ NP. The result of this paper is that they cannot both be right: either P = NP and the markets are weakly efficient, or P ≠ NP and the markets are not weakly efficient. (Maymin 4)

Earlier this year, I took the time to read the paper in full and analyze its methods. While I would say the connection is still interesting, I don't really believe the result is valid, which would explain why the paper had never been published. In this post, I will discuss my analysis, while in Part 2, I will discuss some of my own ideas for how this relationship could be tested.


Of the 30 or so double-spaced pages of the paper, the first 10 or so are spent providing background for the efficient market hypothesis and the P = NP problem, including definitions, literature review, and expert polling. I will paraphrase the definitions below.

The efficient market hypothesis claims that "all information relevant to future prices is immediately reflected in the current prices of assets". In other words, you cannot consistently make money by analyzing and exploiting patterns found in publicly available information. This hypothesis has multiple forms, and this essay focuses on the weakest form, which claims that future prices cannot be predicted by analyzing prices from the past (though other information is fair game).

The mechanism where this is specifically explored is the ability to find and exploit patterns. Essentially, if being able to find a consistent strategy for making money is difficult and inefficient, then a single actor in the market would find it and continually use it to make money. However, if more people are able to find the strategy, the strategy itself is affected and eventually collapses, ruining the effectiveness of technical analysis.

The P versus NP problem concerns itself with two "computational complexity classes" of problems:

The P versus NP problem asks if these two sets are equal, rather than P just being a subset of NP. There are a great many problems that we know to be in NP (such as the traveling salesman problem, knapsack problem, and graph coloring problem) that we do not know whether they are in P - learning that P = NP would mean that they could all be solved easily. Indeed, there is a set of problems in NP, known as NP-Complete problems, that can all be efficiently restated in terms of each other - if a polynomial-time solution to even one of them is found, then P = NP.

Analysis of the Paper

The paper argues its claim from two directions. It first argues that being able to efficiently find patterns in the market is an NP-Complete problem, meaning that the markets are efficient if P = NP. It then argues that if the market is efficient, then the market could be "programmed" with carefully placed orders to solve an arbitrarily large problem in 3-SAT, a known NP-Complete problem.

Analyzing the "Market Programming" Argument

We will start by discussing the second argument because, frankly, it is laughably bad. It does a fair enough job of setting up its mechanism: each variable of the 3-SAT expression is represented as a stock, and each clause is encoded by placing an "order-cancels-order" order, which groups together three buy or sell orders and requires that only one of them be filled. For instance, a clause of (a OR NOT b OR c) might be encoded as "buy 100 shares of KO, sell 100 shares of PEP, or buy 100 shares of KDP, order-cancels-order".

It places one of these orders for each clause, setting the price at the midpoint and setting the volume to the minimum possible in order to minimize stock disruption, proposes waiting a constant amount of time in order to give the market time to process the orders - and then gives a punchline that fails to expound on any of the market's actual mechanisms and instead reeks of smarm:

So what should the market do? If it is truly efficient, and there exists some way to execute all of those separate OCO orders in such a way that an overall profit is guaranteed, including taking into account the larger transactions costs from executing more orders, then the market, by its assumed efficiency, ought to be able to find a way to do so. (Maymin 25)

By itself, this suggests that the purpose of the paper is to defeat the efficient market hypothesis, not to simply provide an equivalence between the two as originally stated. More importantly, however, this response hides any of the pitfalls that might arise when giving the problem to solve to the market like this - in particular, the fact that since the market consists of multiple independent entities, the most rational decision for the market to make would be to execute orders in an inconsistent manner (i.e. taking one clause's buy order while taking a different clause's sell order for the same security).

Analyzing the "Efficient Analysis" Argument

The first argument, by contrast, is a little better thought-out. It focuses on a more specific aspect of the efficient market hypothesis, namely that if there is a particular pattern that can be found in the price data, it can be found "immediately". For instance, if it can be calculated that a set of three UP days is followed by another UP day far too often (and thus money can be made by buying on day 3), this pattern will be found quickly. (This is paired with the "no free lunch" assertion, which states that if the pattern becomes public knowledge, people will trade on it earlier and earlier and thus destroy its advantage).

To perform this prediction, we can first generalize a run of price change data (that is, the prices of a security at regular intervals along a given time scale) as simple UP or DOWN movements (e.g. a sequence of UP, UP, DOWN, UP, DOWN, UP, DOWN, etc.). We can then feed this sequence into a computer and ask it whether we should buy or sell (i.e. go long on the security or get out of it). If the sequence is kept to a fixed time interval t, then there are 2t possible inputs, meaning that there are 2t+1 total possible buy/sell recommendation programs you could write (since each one will map each of those price sequences to either going long or going out). We will call these programs "strategies".

We then take a critical profit value K (the amount of money that a strategy should make for it to be statistically significant, which would depend on the market's equilibrium), and a budget constraint B (the amount of capital you have available to spend), and ask the question: is there a strategy that, given a collection of n price series each of length t, makes at least K on each of them without ever exceeding the budget constraint B?

This is shown (mostly) to be equivalent to the knapsack problem, which is NP-Complete - each object is one of our time-series, whose size is the price of the security involved and whose value is the security's return, and the set of objects we put in our knapsack is the set of time-series that result in a "go long" decision. Hence, according to the paper, if P = NP, then we can efficiently find strategies even as the time series lengthen out and we consider more and more of them.

This argument kind of works. However, there are some issues with the argument that significantly weaken it.

That's it for Part 1! In Part 2, I will go further into a possible avenue for how to analyze the relationship between these two concepts, one involving simulation of agents in an actual market and the computational complexity in reacting to a piece of information that changes the value of a security.

Copyright © 2022-2023, TheHans255. All rights reserved.