Computer Model That Locked Down The World Turns Out To Be Sh*tcode -


I predicted it wouldn't be a big deal and I was right. Why shouldn't I have these programmers jobs?

You've been green-pilled. There's a reason so many "climate deniers" start out noticing the terribly coded climate models, which are so bad at predicting they spawned the field of chaos theory itself.

...wait they did the same fuck-up in epidemic modeling? For fuck's sake, science, get your shit together.

You'd like, if you haven't already read, The Skeptical Environmentalist by Bjorn Lomborg. I'm about to finish it right now. Lomborg doesn't act like an ideologue - he presents himself like a scientist who's just trying to be the moderate compared to the "HUMAN EXTINCTION IN 20 YEARS" crowd - but he still shows a lot of evidence for how environmental science is so corrupted (both intentionally and through incompetence) that it is absolutely worthless. Sometimes it's the models that are completely fried. Oftentimes the models are fine but the people interpreting them for policymakers and journalists are so dishonest in their interpretation of them that they twist them to say the exact opposite thing. The general takeaway is that it is a completely lost cause.

Do you know that the entire program being a single 15,000 line C file is a bad idea? 'Cause if so you know enough to criticise this.

What the fuck? A single line?

I have had ONE programming class in my life. It was introductory C++ and it ended around object-oriented programming and classes, but I don't remember any of that. The most complex level I really remember and use is things like creating save files that you can import, chains of array generation and filling in using random numbers, and having functions split among several text files. Basically, other than a few little things, I mostly just have this random world generator I'm building as a hobby (like what you would see for a 4X strategy game, but there's no actual game to go with it, and just a text-based interface since I don't know anything about graphics).

So I, the single programmer with one class, no willingness to even use everything from that introductory class (just the simpler parts I understood and need), who works on a single hobby program, have a much better coding structure, because I know how to tell the computer to use multiple files instead of putting the whole damn thing in a single text file, which would have gotten me a failing grade if I had turned that in.

Fuck them, the scientists need to be put to death.
 
Last edited:

Kosher Dill

Pumpkin Chips
True & Honest Fan
kiwifarms.net
Here's a new bit of clownery: a helper function was added to the code.
https://github.com/mrc-ide/covid-si...cb#diff-581d9af70cfd8752c0fc8d70b1e3fa65R5651

C:
/* helper function to set icdf arrays */
void setICDF(double* icdf, int startValue)
{
    int i;
    icdf[CDF_RES] = startValue;
    for (i = 0; i < CDF_RES; i++)
        icdf[i] = -log(1 - ((double)i) / CDF_RES);
}

So we take in an int value and immediately write it into an array of doubles.
But it gets worse: this isn't just a case of needless conversion back and forth. Look at the very first call to this function:

C:
setICDF(P.latent_icdf, 1e10);
A number that is too big to fit into an int! So this would in fact result in garbage.

This was fixed 3 days later, but it's obvious that nobody is testing the code at all before checking it in. Or if they are, they have no way of recognizing when a bug has been introduced.
 

Yotsubaaa

And I will love you 💗
True & Honest Fan
kiwifarms.net
Here's a new bit of clownery: a helper function was added to the code.
https://github.com/mrc-ide/covid-si...cb#diff-581d9af70cfd8752c0fc8d70b1e3fa65R5651

C:
/* helper function to set icdf arrays */
void setICDF(double* icdf, int startValue)
{
    int i;
    icdf[CDF_RES] = startValue;
    for (i = 0; i < CDF_RES; i++)
        icdf[i] = -log(1 - ((double)i) / CDF_RES);
}

So we take in an int value and immediately write it into an array of doubles.
But it gets worse: this isn't just a case of needless conversion back and forth. Look at the very first call to this function:

C:
setICDF(P.latent_icdf, 1e10);
A number that is too big to fit into an int! So this would in fact result in garbage.

This was fixed 3 days later, but it's obvious that nobody is testing the code at all before checking it in. Or if they are, they have no way of recognizing when a bug has been introduced.
The clownery aside, I guess the refactoring is a step in the right direction? ...Maybe? I wonder why they couldn't get rid of the surrounding "if-else and for loop" combo that's repeated for almost 50 lines.
refactoring.png
Geezus, it seems like there's a stupefying amount of duplication no matter where you look in this code. Hope the poor postgrad didn't make any errors when they were Ctrl+C, Ctrl+V'ing all of that.
 

Kosher Dill

Pumpkin Chips
True & Honest Fan
kiwifarms.net
I wonder why they couldn't get rid of the surrounding "if-else and for loop" combo that's repeated for almost 50 lines.
I wonder why they have 50 arrays that are all exactly the same except the last element.

EDIT: and actually, hoooooold on a second. They fill all these arrays with the negative log of [1.0, 0.95, 0.9...]
And then step through again, negate, and exponentiate. So they're going to all this trouble to poorly calculate 50 identical tables of multiples of 5%.
Also, exp(-1e10) is just going to be zero. I wonder if that was intended.
 
Last edited:

Kosher Dill

Pumpkin Chips
True & Honest Fan
kiwifarms.net
I was taking another look through this code since Neil Ferguson was in the news again today, and I noticed this.


Their regression tests are just based on checking the hashes of the bitmap and CSV files it outputs!
Don't worry about sanity-checking the results, if they look the same it's fine. :stress:
 

Yotsubaaa

And I will love you 💗
True & Honest Fan
kiwifarms.net
I was taking another look through this code since Neil Ferguson was in the news again today, and I noticed this.


Their regression tests are just based on checking the hashes of the bitmap and CSV files it outputs!
Don't worry about sanity-checking the results, if they look the same it's fine. :stress:
I'm probably having a brainlet moment (in my defense it's still early morning over here) but what's the problem with their hash-based regression testing? I mean don't get me wrong, they definitely want to be sanity-checking their outputs, but wouldn't that be better as a separate test suite? In terms of "did this fix change anything (it shouldn't have)" regression testing, surely just comparing with previously computed hashes suffices? Sorry if I'm missing anything super obvious.
 

Kosher Dill

Pumpkin Chips
True & Honest Fan
kiwifarms.net
I'm probably having a brainlet moment (in my defense it's still early morning over here) but what's the problem with their hash-based regression testing? I mean don't get me wrong, they definitely want to be sanity-checking their outputs, but wouldn't that be better as a separate test suite? In terms of "did this fix change anything (it shouldn't have)" regression testing, surely just comparing with previously computed hashes suffices? Sorry if I'm missing anything super obvious.
Well, it's kind of a bizarre thing to test since this is not primarily a picture making program. You're going to end up with an incredibly fragile test which is really testing someone else's code's output based on your output - I think the issue here would be false positives rather than false negatives. Take this example.



What if the machine you're generating this on doesn't have "Generic Sans Serif Font #4" installed (or even the same font with different antialiasing or kerning settings) and uses some other font? Regression fails. What if the image library (R's copy of libpng in this case) gets upgraded and they decide to emit a few more PNG file headers? Regression fails. What if different colors get selected for the bars, or something decides to render numbers with commas?

I'd make similar arguments for the Excel files since the XLS(X) format is a chaotic nightmare, but apparently they're just emitting CSV files and naming them as Excel files. Even there you could have line ending differences (\n vs \r\n)

Not to mention: how do you debug a regression failure? You're going to have someone pulling up two pictures and flipping back and forth to see if one of those bars is one pixel off or something :lol:

EDIT: oh, one other thing. Since the resolution of this image is naturally limited, this regression can't catch changes smaller than a certain amount. On the vertical axis, 200000 cases is 90 pixels long, so one pixel in height is about 2000 cases. If one of my sub-bars is off by like 1000, it probably won't change the output picture at all.
 
Last edited:

Overly Serious

kiwifarms.net
Well, it's kind of a bizarre thing to test since this is not primarily a picture making program. You're going to end up with an incredibly fragile test which is really testing someone else's code's output based on your output - I think the issue here would be false positives rather than false negatives. Take this example.



What if the machine you're generating this on doesn't have "Generic Sans Serif Font #4" installed (or even the same font with different antialiasing or kerning settings) and uses some other font? Regression fails. What if the image library (R's copy of libpng in this case) gets upgraded and they decide to emit a few more PNG file headers? Regression fails. What if different colors get selected for the bars, or something decides to render numbers with commas?

I'd make similar arguments for the Excel files since the XLS(X) format is a chaotic nightmare, but apparently they're just emitting CSV files and naming them as Excel files. Even there you could have line ending differences (\n vs \r\n)

Not to mention: how do you debug a regression failure? You're going to have someone pulling up two pictures and flipping back and forth to see if one of those bars is one pixel off or something :lol:

EDIT: oh, one other thing. Since the resolution of this image is naturally limited, this regression can't catch changes smaller than a certain amount. On the vertical axis, 2 million cases is 90 pixels long, so one pixel in height is about 20000 cases. If one of my sub-bars is off by like 1000, it probably won't change the output picture at all.

Wow. I'm rarely speechless but wow.
 

Kosher Dill

Pumpkin Chips
True & Honest Fan
kiwifarms.net
Because the glorious mission of this site is to single out autistic dead horses and beat them till they're glue, here's another sample for your derision.

Notice an ominous warning in Model.h:

NOTE: This struct must contain only doubles (and arrays of doubles) for the TSMean averaging code to work.

Say what? Let's dig a little deeper.

C:
struct Results
{
    // Initial values should not be touched by mean/var calculation
    double t;
    double ** prevInf_age_adunit, ** incInf_age_adunit, ** cumInf_age_adunit; // prevalence, incidence, and cumulative incidence of infection by age and admin unit.

    // The following values must all be doubles or inline arrays of doubles
    // The first variable must be S.  If that changes change the definition of
    // ResultsDoubleOffsetStart below.
    double S, L, I, R, D, incC, incTC, incFC, incI, incR, incD, incDC, meanTG, meanSI ;

/* more double fields */
};

// The offset (in number of doubles) of the first double field in Results.
const std::size_t ResultsDoubleOffsetStart = offsetof(Results, S) / sizeof(double);

What's this? Why do we have a hardcoded constant for the offset of "the first double field"? Well, come with me on a magical trip to CovidSim.cpp in the RecordInfTypes function!

C:
// Extraneous code snipped throughout
nf = sizeof(Results) / sizeof(double);

res = (double*)&TimeSeries[n + lc];
res_av = (double*)&TSMean[n];
res_var = (double*)&TSVar[n];
for (std::size_t i = ResultsDoubleOffsetStart /* skip over initial fields */; i < nf; i++)
{
    res_av[i] += res[i];
    res_var[i] += res[i] * res[i];
}

To be clear, in this code TimeSeries, TSMean, and TSVar are all of type Results. So we're forcibly casting all three of these to arrays of doubles, snipping off the fields at the "front" that are not actually doubles (or not meant to be aggregated), and then aggregating some results "slot-by-slot" into the destination structures while completely ignoring the actual, you know, structure. This is insanity.

Also, your spider-senses should have been screaming when you saw them dividing by sizeof(double) on a structure that is not, in fact, entirely made up of doubles. This code is assuming that the size of a pointer is the same as the size of a double. Since they're compiling for x64, they happened to luck out and both of them will be 8 bytes wide. On 32-bit, where pointers are only 4 bytes, this code would have broken. And given the code's demonstrated longevity, it'll probably still be being recycled when we're using 128-bit computers with 16-byte pointers on our intergalactic starships, and it'll break again then.

This, by the way, is the fixed code - older revisions just treated the entire structure as doubles, trashed the pointers, and crashed.
 

Kosher Dill

Pumpkin Chips
True & Honest Fan
kiwifarms.net
Here's another episode that I hope people find entertaining. Let's talk about serialization!

If you look at binio.cpp you see read and write functions to serialize a block of records, just write the binary out to disk.

The code will work OK... as long as you regenerate the files every time the data structures change and never share the files with someone running on a platform with different endianness or int size. Now let me show you a truly astonishing use of it. Here's writing (snipped to the relevant parts)

C:
fwrite_big((void*) & (State.CellMemberArray), sizeof(int*), 1, dat);
fwrite_big((void*) & (State.CellSuscMemberArray), sizeof(int*), 1, dat);

zfwrite_big((void*)State.CellMemberArray, sizeof(int), (size_t)P.PopSize, dat);
zfwrite_big((void*)State.CellSuscMemberArray, sizeof(int), (size_t)P.PopSize, dat);

Got that? We're writing out the addresses (!!!) of a couple of arrays, and also their contents. So how do we read this?

C:
fread_big((void*)& CellMemberArray, sizeof(int*), 1, dat);
fread_big((void*)& CellSuscMemberArray, sizeof(int*), 1, dat);

CM_offset = State.CellMemberArray - CellMemberArray;
CSM_offset = State.CellSuscMemberArray - CellSuscMemberArray;

zfread_big((void*)State.CellMemberArray, sizeof(int), (size_t)P.PopSize, dat);
zfread_big((void*)State.CellSuscMemberArray, sizeof(int), (size_t)P.PopSize, dat);

for (i = 0; i < P.NC; i++)
{
    if (Cells[i].n > 0)
    {
        Cells[i].members += CM_offset;
        Cells[i].susceptible += CSM_offset;
        Cells[i].latent += CSM_offset;
        Cells[i].infected += CSM_offset;
    }
    for (j = 0; j < MAX_INTERVENTION_TYPES; j++) Cells[i].CurInterv[j] = -1; // turn interventions off in loaded image
}

In other words, we read the contents of the array into newly allocated memory. And then everyone who had a pointer into the old array (which was also serialized), we adjust it by the difference between the old array's address and the new one!

This is a truly disgusting hack. It works (again, assuming the platform and every tiny detail of the data remains the same), but they're playing with fire to save a tiny bit of setup/teardown time.
 
Tags
None