Planning complexity for model economies

In this post I will present two model economies that should be useful for anyone who wants to get into planning. The economies are formulated as linear programs, and the goal is to find optimal solutions to these programs. I will also provide a sketch for an interior point algorithm, mostly lifted from existing literature. Finally I have a few words on how to solve LPs of this type in a distributed manner.

The models

The model economies presented here exploit a few notions:

  • in real economies, firms only depend on a finite set of other firms much smaller than the economy as a whole
  • the average number of firms any one firm depends on is bounded by some constant cQ== regardless of the size of the economy
  • the economy has a small set of core industries and a larger "long tail" of less crucial industries
  • core industries have lots of industries that depend on them, whereas the rest do not

There are three components to the models:

  • technical coefficients
  • a number of extra "basket" constraints
  • balance equations to enable optimization

The difference between the models is the structure of the technical coefficient matrices. Five parameters determine the size and density of the resulting system matrix Uw==:

  • dg== the number of industries
  • cQ== the number of inputs to each industry, as described earlier
  • dw== the number of basket constraints
  • cg== the density of each basket constraint
  • bw== the number of balance equations to optimize over

The technical coefficients are of the same sort as those used by Leontief. They are arranged so that columns contain the inputs of each firm, and rows correspond to outputs. The diagonal is the identity matrix. Off-diagonal elements are negative. The coefficient matrix must satisfy the Hawkins–Simon (HS) condition. Both matrices are generated through a preferential attachment process, and the process used is the difference between the two models. More on that later.

The right-hand side for the technical coefficients are a given list of demands. Leontief's formulation KEkgLSBBKXggPSBk is relaxed to KEkgLSBBKXggXGdlcSBk.

The "baskets" are an additional set of constraints. They allow having multiple technologies available for producing a single good, or a class of goods. They take the form QiB4IFxnZXEgZg==, where both Qg== and Zg== are non-negative. We can imagine any number of these extra constraints. A concrete example would be the nutrition constraints in the last post.

The name "basket" comes from something I read about how Gosplan worked. Due to limited computational resources Gosplan could not plan things using disaggregated data. Instead they had to group goods into into so-called baskets of goods, and issue aggregate plan targets. This is frankly a very shitty way of planning things, and it should not be construed that I am in favor of the ad-hoc planning system used in the USSR. I am merely borrowing the terminology. There are more ways in which Gosplan failed to produce rational plans, but such criticism I leave for a potential future post.

The purpose of adding extra constraints like this is so that we can evaluate what happens if we decide to relax some set of demands (entries in ZA==) to free up resources for other things. The baskets then act as a set of "sanity checks" so that there is no catastrophic shortfall in essential goods. In other words ZA== can be viewed as a set of wants while Zg== is a set of needs. In liberal economics, wants and needs are considered demands of equal importance, but the scheme presented here gives us more "wiggle room". Another use for these contraints is to be able to say "produce at least this many MWh of electricity", not caring particularly how that electricity is produced. Then one can check what happens if coal fired power plants are successively shut down and replaced by nuclear and renewable power.

Finally there is a set of balance equations meant for the optimization to work on. This can be things like labour time spent, CO2 released, capital stocks consumed and so on. The balance equations can be sparse, but in these experiments I have chosen to make them dense.

For the balance equations the right-hand side is all zeroes. Using non-zero values is possible, but they are easily subtracted away due to the presence of the identity matrix. Using zeroes increases the sparsity of the system and hence makes a solution easier to compute.

Finally, all variables must be positive.

In summary the system looks like this:

ClMgeCA9ClxsZWZ0WwpcYmVnaW57bWF0cml4fQpJIC0gQSAmIDAgXFwKQiAgICAgJiAwIFxcCi1PICAgICYgSSBcXApcZW5ke21hdHJpeH0KXHJpZ2h0XXggXGdlcQpcbGVmdFsKXGJlZ2lue21hdHJpeH0KZCBcXApmIFxcCjAgXFwKXGVuZHttYXRyaXh9ClxyaWdodF0gPSBiCg==

CkEgXGluIFxtYXRoYmJ7Un1ee3YgXHRpbWVzIHZ9LCB+CkIgXGluIFxtYXRoYmJ7Un1ee3cgXHRpbWVzIHZ9LCB+Ck8gXGluIFxtYXRoYmJ7Un1ee28gXHRpbWVzIHZ9Cg==

ClMgXGluIFxtYXRoYmJ7Un1eeyh2K3crbykgXHRpbWVzICh2K28pfQo=

The goal of the solver is to find the optimal solution eF9cYXN0 defined as follows:

CnhfXGFzdCA9IFxvcGVyYXRvcm5hbWUqe2FyZ1wsbWlufV97XHN1YnN0YWNre1MgeCBcZ2VxIGIgXFwKeCBcZ2VxIDB9fSBcc3VtX3tpID0gdisxfV57ditvfSB4X2kKPQpcb3BlcmF0b3JuYW1lKnthcmdcLG1pbn1fe1xzdWJzdGFja3tTIHggXGdlcSBiIFxcCnggXGdlcSAwfX0gY15UIHgK

Yw== is a vector of dg== zeroes followed by bw== ones.

The structure described above makes computing an initial strictly feasible solution straightforward:

CnhfdiA9IChJIC0gQSleey0xfWQK

CnUgPSBcbWF4X2kge3tmX2l9XG92ZXJ7Ql9pIHhfdn19Cg==

CnhfbyA9IE8geF92Cg==

CnhfeygwKX0gPSAxLjEgXGxlZnRbClxiZWdpbnttYXRyaXh9CnUgeF92IFxcCnhfbyBcXApcZW5ke21hdHJpeH0KXHJpZ2h0XQo=

In other words, first a Leontief solution eF92 is computed for the coefficient matrix. Then a value dQ== is computed which makes dSB4X3Y= satisfy the basket constraints. Then eF9v is computed so that the balance equations are satisfied. The final eF97KDApfQ== is assembled from eF92, eF9v and dQ== and scaled up by 10% so that it forms a solution that is strictly inside the feasible set.

On to the technical coefficients:

Price's model

Since industries pop up over time it seems reasonable to pick a model that is designed to produce power-law distributed graphs. One such model is Price's model, named after Derek J. de Solla Price. This model is intended to model citation networks, which grow over time. More famous papers tend to be cited more often, and the citations form a directed acyclic graph (DAG). Price's model results in that the distributions of citations follows a power law.

One drawback of Price's model as applied to economics is that real world economies do not look like a DAG. Recycling especially breaks Price's model. Nevertheless this model is useful because the distribution of row and column ranks is known. Since the resulting incidence matrix is triangular, computing eF92 is especially easy for this model, requiring only TyhubnooQSkp operations.

The sparsity pattern of the Price model looks like this:

Price sparsity pattern

The triangular structure is apparent, as are the coefficients for the baskets and balance equations. To me this model seems to result in an overemphasis of very few core industries. It is also naïve in that it assumes old established industries do not make use of new technology. This is also not the model I started my experiments with.

Interdependent model

The initial model used in my experiments is one where industries grow interdependent over time. It differs from Price's model in that after each new node added, cQ== new links are made equiprobably between any pair of nodes added so far. This produces a matrix where non-zeroes are clustered in the upper left corner, growing more sparse toward the right and the bottom:

Interdependent sparsity pattern

Here the economy is quite explicitly not a DAG. Computing eF92 is also more expensive, but still quite cheap using iterative methods. We could also choose to compute the inverse of the upper left block of core industries explicitly, which would be useful as a preconditioner.

Optimality and the duality gap

When we're optimizing it is useful to know whether we are close enough to the optimal solution. We'd like some sense of how much "room" there is for further optimization, preferably some lower bound. If we know that we are within say 1% of this lower bound then we can stop. Such lower bounds are given by feasible solutions to the dual of our starting program.

Instead of seeking the minimal solution to the primal:

CnhfXGFzdCA9IFxvcGVyYXRvcm5hbWUqe2FyZ1wsbWlufV97XHN1YnN0YWNre1MgeCBcZ2VxIGIgXFwKeCBcZ2VxIDB9fSBjXlQgeAo=

we instead seek to maximize its dual:

ClxsYW1iZGFfXGFzdCA9IFxvcGVyYXRvcm5hbWUqe2FyZ1wsbWF4fV97XHN1YnN0YWNre1xsYW1iZGFeVCBTIFxsZXEgY15UIFxcClxsYW1iZGEgXGdlcSAwfX0gYl5UIFxsYW1iZGEK

By the strong duality theorem we know that if we have XGxhbWJkYV9cYXN0 then we also have eF9cYXN0 and Yl5UIFxsYW1iZGFfXGFzdCA9IGNeVCB4X1xhc3Q=.

All that remains is to compute an initial feasible XGxhbWJkYV97KDApfQ==. I will leave this out for now because my current solution is honestly too much of a hack.

Central path methods

The solver used here is in the class of central path methods. Such solvers date back to James Renegar's 1988 paper A polynomial-time algorithm, based on Newton's method, for linear programming. The idea is to define a measure of "centrality" such that there is a single point that is "equidistant" from all constraints plus the constraint Y15UIHggXGdlcSBrX3soaSl9 defined by the objective function. eF97KGkpfQ== is near this central point at the start of every iteration in the algorithm. a197KGkpfQ== is then updated to a197KGkrMSl9ID0gXGRlbHRhIGNeVCB4X3soaSl9ICsgKDEgLSBcZGVsdGEpIGtfeyhpKX0=, and then a re-centering step is taken to produce eF97KGkrMSl9.

Renegar shows that if XGRlbHRhID0gMS8oMTMgXHNxcnR7bX0p where bQ== is the number of rows in the system matrix (bSA+IG4=), then the distance to eF9cYXN0 is halved every Tyhcc3FydHttfSk= step. Each step amounts to a single Newton step, which takes TyhtIG5ee1xvbWVnYSAtIDF9KQ== operations where XG9tZWdh is the matrix multiplication constant. Achieving TA== bits of accuracy therefore takes TyhtXnsxLjV9IG5ee1xvbWVnYSAtIDF9IEwp operations.

For the kind of linear programs we're talking about here, we can do quite a bit better than this. Renegar's result merely says how much time it takes at most.

Predictor-corrector methods

It turns out that there is a limit to how much the central path bends. This suggest an optimization: compute the tangent of the central path and use that as a predictor. Move eF97KGkpfQ== some distance along the tangent, then perform a correction step to get back near the central path. At the same time update a197KGkpfQ==.

Assuming the central path doesn't bend too much, this allows us to take steps much larger than XGRlbHRh. In my experiments, if steps 75% of the way to the nearest constraint are taken, in the direction of the tangent, then I typically get 1-2 bits of accuracy rather than the MSAvIFxzcXJ0e219 bits with Renegar. The following two pictures hopefully explain the idea:

First step in algorithm

The dashed line is the tangent of the central path at eF97KGkpfQ==. This can be computed either numerically or analytically. The current algorithm does it numerically.

Second step in algorithm

An intermediate step is taken along the tangent, then a197KGkrMSl9 is computed. Currently I update aw== so that the distance is half the distance of the next closest constraint in the system. Finally the intermediate point is re-centered, producing eF97KGkrMSl9.

The downside of this method is that re-centering requires more work. Interestingly it never takes more than a few Newton steps, certainly much fewer than XHNxcnR7bX0=. The work required for centering is also very parallelizable. It amounts to solving for aA== in the following system:

ClxMYW1iZGFfeyhpKX0gPSBcbGVmdFsKXGJlZ2lue21hdHJpeH0KXG9wZXJhdG9ybmFtZSp7ZGlhZ30oUyB4X3soaSl9IC0gYikgJiAmIFxcCiAmIFxvcGVyYXRvcm5hbWUqe2RpYWd9KHhfeyhpKX0pICYgXFwKJiAmIGtfeyhpKX0gLSBjXlQgeF97KGkpfSBcXApcZW5ke21hdHJpeH0KXHJpZ2h0XQo=

ClxsZWZ0WwpcYmVnaW57bWF0cml4fQpTXlQgJiBJICYgLWMKXGVuZHttYXRyaXh9ClxyaWdodF0KXExhbWJkYV97KGkpfV57LTJ9ClxsZWZ0WwpcYmVnaW57bWF0cml4fQpTIFxcCkkgXFwKLWNeVCBcXApcZW5ke21hdHJpeH0KXHJpZ2h0XSBoID0gLVxsZWZ0WwpcYmVnaW57bWF0cml4fQpTXlQgJiBJICYgLWMKXGVuZHttYXRyaXh9ClxyaWdodF0gXExhbWJkYV97KGkpfV57LTF9IFxib2xkc3ltYm9sezF9Cg==

XGJvbGRzeW1ib2x7MX0= is a vector of all ones. The left-hand matrix in the second equation is symmetric and positive definite (SPD). This means that the conjugate gradient method (CG) can be used. For CG the left-hand matrix does not need to be formed explicitly. Therefore each CG iteration only needs to perform TyhubnooUykp work.

A similar process is done for the dual problem, except all matrices are transposed, some signs change and Yw== and Yg== swap roles.

Results

Tests were run on an HP Compaq 8200 Elite SFF PC (XL510AV) with an i7-2600 @ 3.40GHz and 8 GiB 1333 MHz DDR3 running Debian GNU/Linux 10 (buster). The implementation is written in GNU Octave, which parallelizes some of the computations but far from all of them.

dg== is swept over the numbers 300, 1000, 3000, 10000, 30000, 100000, 300000. The other parameters are cQ== = 160, dw== = 10, cg== = 160 and bw== = 10. Wall time, the number of CG iterations and the final duality gap are measured for each run. Runs are made for both models. Stopping conditions are either of:

  • the duality gap is less than 1%
  • less than 1 ppm improvement was made to the gap in the last iteration

The latter condition is necessary because of a bug in the code that I have not yet tracked down. One run failed to find even a decent solution, dg== = 100000 for the Price model. I have omitted that run from the results. Similarly dg== = 1000000 fails to find a reasonable solution for both models, which is the reason for the 300000 stopping point.

The following graph plots the wall times of each run against bm56KFMp and shows power laws fitted to the results:

Price sparsity pattern

The nearly linear fit is very promising. Keep in mind that this is for this specific class of problems, and almost surely does not apply to solving LPs in general.

Some of the runs don't quite make it below the 1% mark:

Price sparsity pattern

This suggests that there is more work to be done. A similarly shaped graph comes out when plotting the total number of conjugate gradient steps vs the number of industries:

Price sparsity pattern

Interestingly the number of steps is between 1000 and 2000 for the runs that do find a 1% solution.

Parallelism

The central operation in Renegar's algorithm and its descendants is the centering Newton step. Recent work in the field has focused on speeding this up by maintaining an approximate inverse of the left-hand matrix. See for example the work of Lee and Sidford. These efforts are serial, and not as useful for dealing with sufficiently large systems.

Another way to deal with this is to parallelize the Newton solver. If we let the number of nodes scale with the number of non-zeroes in Uw== then the time for each step becomes constant, plus some communication overhead. If we use Renegar's more conservative result for the number of steps, then the total time to solve the LP is Tyhcc3FydHttfUwp. From the experiment presented here we know that a single computer with 8 GiB of RAM can deal with a linear program corresponding to an economy with one million industries. It therefore does not seem unreasonable that the entirety of the world economy, a system with billions of industries, could be planned using a relatively modest computer cluster.