Mixed integer planning

In the first post in this series I made the point that while linear programming gets us quite far in planning, it cannot deal with the non-linearities which exist in real economies. In order to get a feeling for how difficult this actually is, I've been dipping my toes into mixed integer programming (MIP).

A program in MIP is like a normal linear program with the additional constraint that some variables must be integral. This pops up whenever we can't make fractional amounts of something. For example it makes no sense to build half a car. Many combinatorial optimization problems can be formulated as MIPs, like the knapsack problem and the travelling salesman problem. A problem that is very relevant to planning is the facility location problem, which covers things like when and where to build new production facilities.

I will be using the computer game Workers & Resources: Soviet Republic as a model economy to illustrate this post. I have written a program that parses the data files in the game, which I plan on writing about in a separate post.

When MIP is useful

MIP is useful whenever we need to plan when and where to make capital investments. Setting up a cement plant must be done in full before it can be brought into use. If one plant can make 2.7 tons of cement per hour then two plants can make 5.4, but 1½ plants can still only make 2.7. Two incomplete plants can obviously make zero cement. For simplicity I assume that plants are constructed sequentially. The following mixed integer program illustrates the concept:

max: cement;

// resources on hand
concrete = 1500;
gravel = 150;

cement = 2.7 plants;
739 plants <= concrete;
70 plants <= gravel;
int plants;


Feed this into lp_solve and it will spit out the solution plants = 2, cement = 5.4. But if concrete is decreased only slightly to 1400 then the optimal solution is plants = 1, cement = 2.7. Remove the integer constraint while keeping concrete = 1400 and the solution becomes plants = 1.89445, cement = 5.11502.

But this example is boring. We have no notion of time. Let's add one.

Discretized time

One way to introduce time into a MIP is through discretization. This can be monotonic for all variables in some given time interval, or we can use a non-monotonic grid, or even a separate non-monotonic grid for each variable. Let's use the same monotonic grid for all variables for simplicity.

The value of each variable at time t depends on their values and the plan at time t - 1. Plan here means the activation level of each plant. In Workers & Resources this means how many workers should work at each workplace in each planning period. We typically don't want workers doing jobs that are unnecessary at the moment. In real life this could mean purposefully giving people time off.

The following MIP covers three planning periods/four time instants on a monotonic grid:

max: cement_3;

// initial resources
cement_0 = 0;
concrete_0 = 1500;
gravel_0 = 200;
coal_0 = 10;

// size of the working population
workers = 6000;

// initial plants
plants_0 = 0;

// 0 -> 1
workers_cement_0 <= 30 plants_0;
cement_prod_0 = 0.09 workers_cement_0;
coal_cement_0 = 0.025 workers_cement_0;
gravel_cement_0 = 0.233 workers_cement_0;
5676 new_plants_0 = workers_construct_0;
739 new_plants_0 = concrete_construct_0;
70 new_plants_0 = gravel_construct_0;
workers_cement_0 + workers_construct_0 <= workers;
plants_1 = plants_0 + new_plants_0;
cement_1 = cement_0 + cement_prod_0;
coal_1 = coal_0 - coal_cement_0;
gravel_1 = gravel_0 - gravel_cement_0 - gravel_construct_0;
concrete_1 = concrete_0 - concrete_construct_0;

// 1 -> 2
workers_cement_1 <= 30 plants_1;
cement_prod_1 = 0.09 workers_cement_1;
coal_cement_1 = 0.025 workers_cement_1;
gravel_cement_1 = 0.233 workers_cement_1;
5676 new_plants_1 = workers_construct_1;
739 new_plants_1 = concrete_construct_1;
70 new_plants_1 = gravel_construct_1;
workers_cement_1 + workers_construct_1 <= workers;
plants_2 = plants_1 + new_plants_1;
cement_2 = cement_1 + cement_prod_1;
coal_2 = coal_1 - coal_cement_1;
gravel_2 = gravel_1 - gravel_cement_1 - gravel_construct_1;
concrete_2 = concrete_1 - concrete_construct_1;

// 2 -> 3
workers_cement_2 <= 30 plants_2;
cement_prod_2 = 0.09 workers_cement_2;
coal_cement_2 = 0.025 workers_cement_2;
gravel_cement_2 = 0.233 workers_cement_2;
cement_3 = cement_2 + cement_prod_2;
coal_3 = coal_2 - coal_cement_2;
gravel_3 = gravel_2 - gravel_cement_2;

int new_plants_0;
int new_plants_1;


This may look intimidating, but the three bigger blocks of equations are mostly copies of each other. The last block is simplified. Let's look at the first block more closely:

workers_cement_0 <= 30 plants_0;
cement_prod_0 = 0.09 workers_cement_0;
coal_cement_0 = 0.025 workers_cement_0;
gravel_cement_0 = 0.233 workers_cement_0;


This part covers production in the cement plants. Each plant can employ up to 30 workers, and each worker produces 0.09 tons of cement per hour given 0.025 tons of coal and 0.233 tons of gravel.

5676 new_plants_0 = workers_construct_0;
739 new_plants_0 = concrete_construct_0;
70 new_plants_0 = gravel_construct_0;


This part covers the construction of new plants. It is similar to what we had in the previous section, except with an added constraint for construction workers. For simplicity each plant is built fully within one planning period (one hour!)

workers_cement_0 + workers_construct_0 <= workers;


This limits where workers may be employed - either in construction or cement production.

plants_1 = plants_0 + new_plants_0;
cement_1 = cement_0 + cement_prod_0;
coal_1 = coal_0 - coal_cement_0;
gravel_1 = gravel_0 - gravel_cement_0 - gravel_construct_0;
concrete_1 = concrete_0 - concrete_construct_0;


This describes how the different means of production change from one time instant to the next. The solution looks like this, after sorting the variables so we can see how they change over time:

Value of objective function: 8.10000000
cement_0                        0
cement_1                        0
cement_2                      2.7
cement_3                      8.1
cement_prod_0                   0
cement_prod_1                 2.7
cement_prod_2                 5.4
coal_0                         10
coal_1                         10
coal_2                       9.25
coal_3                       7.75
coal_cement_0                   0
coal_cement_1                0.75
coal_cement_2                 1.5
concrete_0                   1500
concrete_1                    761
concrete_2                     22
concrete_construct_0          739
concrete_construct_1          739
gravel_0                      200
gravel_1                      130
gravel_2                    53.01
gravel_3                    39.03
gravel_cement_0                 0
gravel_cement_1              6.99
gravel_cement_2             13.98
gravel_construct_0             70
gravel_construct_1             70
new_plants_0                    1
new_plants_1                    1
plants_0                        0
plants_1                        1
plants_2                        2
workers                      6000
workers_cement_0                0
workers_cement_1               30
workers_cement_2               60
workers_construct_0          5676
workers_construct_1          5676


Since we're maximizing cement production we see the amount of cement rising over time. Gravel is spent both on construction and in cement production. I encourage you, dear reader, to experiment a bit with this program by answering the following questions.

• What happens if coal_0 is limited to 2 tons? 1 ton?
• What happens if gravel_0 is limited to 150?
• What happens if the number of workers is doubled? Limited to 5676? 5675?
• What happens if plants_0 is set to 1?

The following python3 script generalizes the MIP given above to any number of periods:

import sys
n = int(sys.argv[1])

print(f"""max: cement_{n:06d};

// initial resources
cement_000000 = 0;
concrete_000000 = 1500;
gravel_000000 = 200;
coal_000000 = 10;

// size of the working population
workers = 6000;

// initial plants
plants_000000 = 0;""")

for i in range(n-1):
print(f"""// {i:06d} -> {i+1:06d}
workers_cement_{i:06d} <= 30 plants_{i:06d};
cement_prod_{i:06d} = 0.09 workers_cement_{i:06d};
coal_cement_{i:06d} = 0.025 workers_cement_{i:06d};
gravel_cement_{i:06d} = 0.233 workers_cement_{i:06d};
5676 new_plants_{i:06d} = workers_construct_{i:06d};
739 new_plants_{i:06d} = concrete_construct_{i:06d};
70 new_plants_{i:06d} = gravel_construct_{i:06d};
workers_cement_{i:06d} + workers_construct_{i:06d} <= workers;
plants_{i+1:06d} = plants_{i:06d} + new_plants_{i:06d};
cement_{i+1:06d} = cement_{i:06d} + cement_prod_{i:06d};
coal_{i+1:06d} = coal_{i:06d} - coal_cement_{i:06d};
gravel_{i+1:06d} = gravel_{i:06d} - gravel_cement_{i:06d} - gravel_construct_{i:06d};
concrete_{i+1:06d} = concrete_{i:06d} - concrete_construct_{i:06d};""")

print(f"""// {n-1:06d} -> {n:06d}
workers_cement_{n-1:06d} <= 30 plants_{n-1:06d};
cement_prod_{n-1:06d} = 0.09 workers_cement_{n-1:06d};
coal_cement_{n-1:06d} = 0.025 workers_cement_{n-1:06d};
gravel_cement_{n-1:06d} = 0.233 workers_cement_{n-1:06d};
cement_{n:06d} = cement_{n-1:06d} + cement_prod_{n-1:06d};
coal_{n:06d} = coal_{n-1:06d} - coal_cement_{n-1:06d};
gravel_{n:06d} = gravel_{n-1:06d} - gravel_cement_{n-1:06d};""")

for i in range(n-1):
print(f"int new_plants_{i:06d};")


The script above takes a single positional argument for the number of plan periods. To generate and solve a program for N periods, do the following:

$python generate.py$N | lp_solve


lp_solve version 5.5.2.5-2 happens to solve this class of programs in O(N²) time, even when the integer constraints are removed. This doesn't demonstrate NP-hardness as I'd like. Oh well. Update: I also ran these problems through glpsol version 5.0-1, part of the glpk package, and it solves them in O(N) time.

Bringing more industries into the program listed above is left as an exercise for the reader. The resulting class of programs is block diagonal.

Non-convex planning

One limitation of the scheme above is that it assumes productivity is entirely linear, or possibly "lumpy" if capital investments are to be optimized. There is no way of dealing with economics of scale, which is a major weakness of Kantorovich's proposal to use linear programming for planning. More recently the weakness of LP for planning has been pointed in the 2012 blog post In Soviet Union, Optimization Problem Solves You by Cosma Shalizi.

Finding optimal solutions to non-convex programs is in general NP-hard. We know that finding approximate solutions to linear programs is P, and that a solution that is within 1% of optimum is likely good enough for planning. For any non-convex program with a non-empty feasible region we can inscribe a linear program which in turn can be solved approximately in P. The following figure illustrates the concept:

We are minimizing labour. The non-convex program P has a global optimum x*. Imagine that we initially find an inscribed linear program P1 through some process. It has the optimal solution x1, an approximation of which we can find quickly. We can imagine that we then find a new inscribed linear program P2 by using x1 as a hint.

Clearly this process would eventually lead to the local optimum xL after some time. And this is not dissimilar to how capitalism tends to find local optima. But we can apply ideas from non-linear optimization to get out of local optima, for example simulated annealing or random restart. We can use different inscribed linear programs - there are an infinite number of them to choose from.

But how does this apply to planning?

Piecewise linearization

In MIP there is something called special ordered sets that can be used to solve non-convex programs via piecewise linearization. lp_solve can do this via the SOS command. For these linearizations we can also find linear functions that result in inscribed linear programs as discussed in the previous section. We can use the current feasible solution to "tighten" the inscribed linear program by bringing these linear functions closer to the linearizations. The following figure illustrates the idea:

s is the amount of something to make and f is how much of some resource is necessary. The figure illustrates economics of scale - as s increases, the marginal demand for f goes down (d²f/ds² < 0). At each iteration we have some s(i) that we can use to choose the linearization, so long as it doesn't underestimate f. We can also add bounds to s so that we can be more aggresive with our choice of linearization. This is useful if d²f/ds² changes sign, like below:

Here the linearization of f is valid for sL ≤ s ≤ sU.

Continuous functions

So far we've only looked at piecewise linearizations. The reason for this is because we can solve these exactly using any MIP solver. Another reason is that we can ask people at each workplace how much stuff they think they need to effect large changes in output. "How much do you need now? How much to make twice as much? Half as much?". Such questions, if they can be answered, are discrete points. An obvious choice for interpolation is a piecewise linear function. But we can just as well choose some continuous function, and the linearization strategy works just as well!

But, most of the time it might not be possible for people to answer questions about large-scale changes in production. The only things we know for certain are historical values for s and f. If we're lucky we might get a decent estimate for df/ds, the answer to the question "how much more/less do you need to change output by a small amount?" This looks something like the following:

I owe this notion to a recent discussion with Spyridon Samothrakis and Dave Zachariah. So a better view of the situation might be something like this:

The linear program is smaller because we're less sure what the functions that make up our constraints even are. I therefore call the uncertain program that we're trying to solve P?.

Conclusions

One feature of the formalization discussed here is that it improves on "Leontief-y" proposals for non-convex planning. By Leontief here I mean square systems based on IO tables. It turns out we need rectangular systems in planning or else we can't weigh multiple methods of achieving the same goal. IO tables cannot help us choose between burning anthracite, wood or uranium when generating electricity.

The uncertainties in the functions that make up our constraints imply limits on the optimizations we can actually implement. We can only move close to constraints which we are certain are not optimistic. We can account for this uncertainty in the solver by assigning appropriate "forces" to each constraint, meaning different constants before each ci(x) term in the barrier function used in the solver (see interior-point methods). This will tend to push the solutions toward more certain constraints.

Physics can be used to verify aspects of some constraints. Thermodynamics is one example of this. Similarly we can apply mass and energy conservation, demanding that the flow of these be accounted for. If a sawmill is fed with logs then we expect the mass of boards, sawdust and bark that is output are no more than what has been supplied. We also expect it to be not too much less than what is put in, minus moisture lost in drying.

If we violate a constraint that exists in reality but which we have failed to model then we have an infeasible plan. Say that we only today discovered that atmospheric CO2 affects the energy balance of the Earth. Such new knowledge about reality would immediately make our current plan infeasible, and we would have to change the plan accordingly. The "stiffer" our plan is the harder it will likely be to reorient the economy.

I suspect that the most important short-term goal may not be optimality but feasibility. We can only do things like reduce working hours if we are certain nothing will break from doing so. We certainly need to bring the amount of greenhouse gasses in the atmosphere into regulation before we can hope to optimize on labour.

That's it for this post. As always, if you like these posts then I recommend subscribing via RSS/Atom using either of the links below. You can use a news reader like QuiteRSS to do so.