Computer Model That Locked Down The World Turns Out To Be Sh*tcode -

Yotsubaaa

Tenkobest
True & Honest Fan
kiwifarms.net
Yeah, there's nothing wrong with that per se. The issue seems to be that the team writing this code doesn't know the difference between good randomness (the kind you intentionally build in) and bad randomness (the kind that comes from your garbage code being buggy and completely insane).
This. The reason that intentional randomness is good (particularly for the statisticians and people that are using these models to make quantitative predictions in this case specifically) is that you know the probability distribution, so you know its biases (if any), and you can therefore say sensible things about the accuracy of your predictions.

I guess this highly paid government pro coder never heard of switch case syntax anymore. Talk about adding unnecessary redundancy, I haven't seen the whole doc; But if this is truly the syntax, then that's just bad. I code in C quite a bit, and I'll have to say this is just poor taste.
Your guess is correct. Look at this shit:
wtf.png

Doesn't know about switches or enums, apparently.
 

MrBlueSocks

kiwifarms.net
On the site linked on the first page where the ex-google softie did the code review, most commentators understand and agree with the criticism . But a couple are still either deliberately or ignorantly misunderstanding or minimising the problems.
James
richard
In an epidemiological model, randomness is a feature, not a bug. The disease follows vectors probabilistically, not deterministically. This isn’t an email program or a database application, if the model always returned the same output for the same input that would be a bug. Prescribing deterministic behavior may prevent discovery of non-linear disease effects.
Way to miss the fucking point. This is such a weasely politician answer that I suspect malice rather than ignorance. He's clearly not stupid, but he's trying to muddy the waters like a lawyer. "Oh your just complaining about the randomness, ..." .


The original code is so appallingly bad that I don't know how to explain it to a lay person . They think you're just being a gatekeeper, or nitpicky. I end up either relying on argument from authority "trust me", or repeated emphasis of "really really really shit. Really shit". It's not even teenage bedroom code. That has different characteristics, and isn't as deeply flawed.
I want to be able to explain this to people just how staggeringly fraudulently incompetent this is.
The results of this were believed by the government with nobody ever checking , even slightly, as to how they were arrived at. And the cost is measured in hundreds of Billions. Yet on a normal procurement project they argue the toss for months about minutiae. Government in this country ,is run by lawyer types, not engineer types. They're all just blagging it.

archive of lockdown sceptic code review
 
Last edited:

Overly Serious

kiwifarms.net
Thanks for the replies. I haven't gone through the code but if it's using randomness and repeated runs to get its predictions is it fair to say that the program is some giant variation on:

for(day = 0; day <= numberOfDaysToSimulate; day++) {
numberOfPeopleHaveCo19 = (numberOfPeopleHaveCo19 * AverageNumberOfContactsPerDay * chanceOfCatching) - (numberOfPeopleHaveCo19 * chanceOfDeath);
}

I mean, obviously with a tonne more variables but basically setting up starting conditions and then looping over and over rolling the dice to see how many people get infected?

EDIT: Anyone actually tried running this themselves? I cloned the repo and manged to compile it and run it, but failed to get past working out seeds and parameter files, etc. Would be fun to actually run this through a few times and see just how much variation there is from one run to another.

SECOND EDIT: So I've now read the follow-up article where it turns out Imperial College were lying about there being no significant changes between the version they released and the version they used. Had this interesting paragraph which answered my question from earlier about randomness:

The Article said:
Imagine you want to explore the effects of some policy, like compulsory mask wearing. You change the code and rerun the model with the same seed as before. The number of projected deaths goes up rather than down. Is that because:


  • The simulation is telling you something important?
  • You made a coding error?
  • The operating system decided to check for updates at some critical moment, changing the thread scheduling, the consequent ordering of floating point additions and thus changing the results?

You have absolutely no idea what happened.
 
Last edited:

nw16613

kiwifarms.net
One of the more alarming things about the recent rise of machine learning is that models produced by machine learning are generally stochastic - for example, I evaluated an ai module (using tensorflow) that would automate spine segmentation (e.g. identifying where the vertebrae are in spine mris automatically). You would train the models by sending in dicom mri images of spines paired with images with the vertebrae annotated. After training the model, you would then send in spine images without the annotations and see if the model could annotate the image automatically (e.g. drawing red over the vertebrae in the output images.)

This actually worked pretty well - I would make a model, train a model, and then send in novel spine images (e.g. from other spines, or rotating/flipping other images), and see if the output image would have appropriate annotations. This generally worked.

The big gotchas were:
a) Training the model and the output were stochastic. It would "generally" get it right, but starting from scratch with same data, training the model would produce a different model every time
b) Taking a trained model and moving it to a different computer would produce different behavior. Similarly, the same model would behave differently on a different GPU.
c) Even trivial differences like cuda versions would produce different results.

Now granted, the model in the article isn't a machine learning model - it looks like a bunch of very shitty code written by exceptional scientists under pressure. Machine learning produces output that has similar flaws, but because we can understand the scientist's code, we know its shit, whereas if the model were produced by machine learning techniques, you could apply all the same flaws about stochastic output and hardware dependency of the models and it could be waved entirely away by saying "that's how machine learning works, dummy."

This should scare you because at least now, we know the models and logic are shit, but in the future, when machine learning is used for decision making, the computers will tell us to hide in our houses and no one will be able to understand why or validate the results.
Apparently Ferguson wrote the code 15 years ago or something. Said so himself on twitter. So can't use pressure as an excuse.
 

Pampered Degenerate

Smol but fierce
kiwifarms.net
How much of an issue is it that the results are "stochastic"? Monte Carlo simulations are valid and stochastic, aren't they? I understand (possibly incorrectly) that the idea is that you carry out sufficient iterations that you get a reliable average with some sort of confidence interval. Now the fact that the results vary by CPU type is pretty worrying but in principle isn't this approach valid?

The code itself looks really bad, though. And this is the cleaned up version.
Just repeating what others have said, but to be clear, computer programs should NEVER produce random output. Monte Carlo techniques use (pseudo-)random number generators to sample from known probability distribution functions, but if you give the code the same set of "random" numbers (or alternatively, use the same seed to the same RNG), it should produce IDENTICAL results. The averaging in MC calculations is done over multiple DETERMINISTIC runs of the code, but starting from different initial configurations and/or different seeds, and the average values should be checked for convergence with respect to the number of simulations. Even then, the results are not absolutely guaranteed to be correct, even within the limits of the model, due to e.g. non-ergodicity.

In short, producing predictions which are then used to decide whether or not to lock down millions of people should not be left to some self-serving academic useful idiot, but be the results of the efforts of a team of statisticians, computer scientists and epidemiologists. On the other hand, I don't blame the government too much, because most people have an outdated view of academic integrity and don't know how far short of it modern academia falls. I can only hope that this shit show eventually leads to a greater degree of scrutiny, but don't hold high hopes.
 
Last edited:

figbatdigger

kiwifarms.net
This. The reason that intentional randomness is good (particularly for the statisticians and people that are using these models to make quantitative predictions in this case specifically) is that you know the probability distribution, so you know its biases (if any), and you can therefore say sensible things about the accuracy of your predictions.


Your guess is correct. Look at this shit:
View attachment 1282577
Doesn't know about switches or enums, apparently.
From what I'm reading that function is initializing a global function pointer... Eww 🤮
They don't seem to know classes and virtual dispatch either.
 

Overly Serious

kiwifarms.net
Just repeating what others have said, but to be clear, computer programs should NEVER produce random output. Monte Carlo techniques use (pseudo-)random number generators to sample from known probability distribution functions, but if you give the code the same set of "random" numbers (or alternatively, use the same seed to the same RNG), it should produce IDENTICAL results. The averaging in MC calculations is done over multiple DETERMINISTIC runs of the code, but starting from different initial configurations and/or different seeds, and the average values should be checked for convergence with respect to the number of simulations. Even then, the results are not absolutely guaranteed to be correct, even within the limits of the model, due to e.g. non-ergodicity.

In short, producing predictions which are then used to decide whether or not to lock down millions of people should not be left to some self-serving academic useful idiot, but be the results of the efforts of a team of statisticians, computer scientists and epidemiologists. On the other hand, I don't blame the government too much, because most people have an outdated view of academic integrity and don't know how far short of it modern academia falls. I can only hope that this shit show eventually leads to a greater degree of scrutiny, but don't hold high hopes.
I'm still trying to get this thing to run. Admittedly I've only spent about half an hour on it. I've got it to compile. The .exe runs and I am passing in the commandline parameters I think it wants but it just keeps spitting out a usage message at me. Am trying to get it to run in such a way I can step through the code and see why but no luck as yet. I don't really have any experience at C++ on Windows. I could probably get it running on Linux but I don't have a box with the requisite RAM right now. Am sure it's some trivial mistake.

Anyway, what I was posting to say is that the program does take in seed numbers. If I can get it working I can see if the results really do vary between runs for the same seeds. But don't have time for in-depth figuring this out. Anyone else trying to run it?
 

evrae

& KNUCKLES
True & Honest Fan
kiwifarms.net
The majority of scientists can't do statistics properly, either.
This is something I can confirm, the institution I work at has dedicated statisticians purely for this reason (who at the introductory meeting with them literally said "this is our job, we are trained exactly for this, take your data gathering to us to make sure you're handling it correctly.").

It's completely horrifying to see something with such political influence have such terrible code, it's beyond fraudulent. The fact that it took so long to put this to the public is beyond me, it's why I have the mindset of including everything in publications. Programs are an extension of your methods, people need them to replicate your methods to see if it's scientifically competent, people should have access to your programs, your raw data, just so they have the confidence to know you're not a goddamn hack but so many people seem so adamantly against it.
Then these people wonder why the public is losing faith in science.
 

Yotsubaaa

Tenkobest
True & Honest Fan
kiwifarms.net
Anyone else trying to run it?
Yeah I cloned the repo just now and started having a play. It's doing something, I guess? (I picked Sweden for the test data country for laughs.)
iguess.png

I dunno, we'll see. I'm not running this on a particularly high-spec machine.

EDIT: S'done. They had some R scripts that helpfully plotted the results:
WeeklyIncidenceDeathsbyAgeGroup.pngWeeklyIncidenceCasesbyAgeGroup.pngWeeklyIncidenceDeathsbyAgeGroup.pngWeeklyIncidenceCasesbyAgeGroup.pngDeath_Incidence.pngDeath_Cumulative Incidence.png

🤷‍♀️


This is something I can confirm, the institution I work at has dedicated statisticians purely for this reason (who at the introductory meeting with them literally said "this is our job, we are trained exactly for this, take your data gathering to us to make sure you're handling it correctly.").

It's completely horrifying to see something with such political influence have such terrible code, it's beyond fraudulent. The fact that it took so long to put this to the public is beyond me, it's why I have the mindset of including everything in publications.

Programs are an extension of your methods, people need them to replicate your methods to see if it's scientifically competent, people should have access to your programs, your raw data, just so they have the confidence to know you're not a goddamn hack but so many people seem so adamantly against it.
Then these people wonder why the public is losing faith in science.
I second all of your points about statistics and coding with regard to science/academia (I can confirm them for much the same reasons you can). I did want to hammer on one thing:
It's completely horrifying to see something with such political influence have such terrible code, it's beyond fraudulent. The fact that it took so long to put this to the public is beyond me, it's why I have the mindset of including everything in publications.
It's worth reminding everyone that we're not even seeing the code that they ran. They still have yet to reveal that (if ever). What we're seeing is the cleaned-up version. Yes: the code we've been making fun of all thread is the result of people actually going through the original code and 'fixing' it so it 'didn't look like trash'.
 
Last edited:

Pampered Degenerate

Smol but fierce
kiwifarms.net
This is something I can confirm, the institution I work at has dedicated statisticians purely for this reason (who at the introductory meeting with them literally said "this is our job, we are trained exactly for this, take your data gathering to us to make sure you're handling it correctly.").

It's completely horrifying to see something with such political influence have such terrible code, it's beyond fraudulent. The fact that it took so long to put this to the public is beyond me, it's why I have the mindset of including everything in publications. Programs are an extension of your methods, people need them to replicate your methods to see if it's scientifically competent, people should have access to your programs, your raw data, just so they have the confidence to know you're not a goddamn hack but so many people seem so adamantly against it.
Then these people wonder why the public is losing faith in science.
The problem is that often your code is effectively your IP, that you've spent years developing and then want to be able to exploit, like an inventor. I can see both sides of the argument, but don't really know what the solution would be.
 

evrae

& KNUCKLES
True & Honest Fan
kiwifarms.net
It's worth reminding everyone that we're not even seeing the code that they ran. They still have yet to reveal that (if ever). What we're seeing is the cleaned-up version. Yes: the code we've been making fun of all thread is the result of people actually going through the original code and 'fixing' it so it 'didn't look like trash'.
Jesus Christ, I dread to think what the actual version looks like. If there's bugs and spaghetti being found with this code, thinking about what degree of spaghetti the original has is horrifying
Why did I go into academia again, I should've gone into medicine, at least then I'd have the money to afford an alcohol addiction.

The problem is that often your code is effectively your IP, that you've spent years developing and then want to be able to exploit, like an inventor. I can see both sides of the argument, but don't really know what the solution would be.
I suppose so, but I guess you could publish it as closed source because then you'll still at least have people able to discover your bullshit.
(Or you could always just audit it I guess.)
 

Gustav Schuchardt

Trans exclusionary radical feminazi.
True & Honest Fan
kiwifarms.net
I was vaguely surprised that this is allowed - since when can you allocate that much on the stack?
If you're running on Windows the default stack is 1MB though you can increase it with a linker switch. TBH I've worked on worse code than the code you quote. I've written it occasionally too, though most of it didn't survive into production. But then nothing I've ever written was used as the sole justification to make economic changes that shut down the US and UK economies.

A wise man once said that correct code is 'fractally correct'. I.e. the design, the comments, the implementation are all good. The inverse of that when you see code like this which is implemented in a shoddy way you have to wonder if that shoddiness extends into other more important areas, like the assumption they fed into to the model to get the results. What's insane is how much money has been lost due to this particular piece of hack code and the total lack of criticism and peer review it went through before that happened.

And if you look at Ferguson's previous predictions which wildly overestimated the death toll from previous pandemics it certainly seems likely that is the case.

It's not just the hack code, it's much worse than that.
 

evrae

& KNUCKLES
True & Honest Fan
kiwifarms.net
If you're running on Windows the default stack is 1MB though you can increase it with a linker switch. TBH I've worked on worse code than the code you quote. I've written it occasionally too, though most of it didn't survive into production. But then nothing I've ever written was used as the sole justification to make economic changes that shut down the US and UK economies.

A wise man once said that correct code is 'fractally correct'. I.e. the design, the comments, the implementation are all good. The inverse of that when you see code like this which is implemented in a shoddy way you have to wonder if that shoddiness extends into other more important areas, like the assumption they fed into to the model to get the results. What's insane is how much money has been lost due to this particular piece of hack code and the total lack of criticism and peer review it went through before that happened.

And if you look at Ferguson's previous predictions which wildly overestimated the death toll from previous pandemics it certainly seems likely that is the case.

It's not just the hack code, it's much worse than that.
Wait, this guy has made predictions of previous pandemics too?
I'm assuming that'd be for swine flu, Ebola, and SARS among others considering this code is apparently about 15 years old?
And they were all also wildly overestimated too?
Why did anyone listen to this moron? He sounds like someone who tries to big up shit for the sake of getting funding, what a fucking hack.
 

MrBlueSocks

kiwifarms.net
SECOND EDIT: So I've now read the follow-up article where it turns out Imperial College were lying about there being no significant changes between the version they released and the version they used. Had this interesting paragraph which answered my question from earlier about randomness:
In case anyone else missed this (like I did) , there is a second analysis article by the same author. (archive) . Worth reading.
Imperial clearly lied about the changes.
Glimmer of hope , Conservative MPs have read this analysis of the code. and apparently one MP actually knows what he's looking at.

Political attention. I was glad to see the analysis was read by members of Parliament. In particular, via David Davis MP the work was seen by Steve Baker – one of the few British MPs who has been a working software engineer. Baker’s assessment was similar to that of most programmers: “David Davis is right. As a software engineer, I am appalled. Read this now”. Hopefully at some point the right questions will be asked in Parliament. They should focus on reforming how code is used in academia in general, as the issue is structural incentives rather than a single team. The next paragraph will demonstrate that.
Twitter of Steve Baker MP
**ETA
archive of Twitter . The follow up tweets are broadly supportive except for a few #FBPE spastics screeching "NO, YOU'RE APPALLING!"


This is something I can confirm, the institution I work at has dedicated statisticians purely for this reason (who at the introductory meeting with them literally said "this is our job, we are trained exactly for this, take your data gathering to us to make sure you're handling it correctly.").
Statistics can seem obvious / common sense etc, but is full of gotchas and thinking errors that will catch a non-expert .
It's like hearing "We designed and implemented our own cryptography algorithms, for extra security" .
 
Last edited:

Overly Serious

kiwifarms.net
I dunno, we'll see. I'm not running this on a particularly high-spec machine.
I have a fairly high-spec machine and would like to try it. It's compiled, it executes but just spits out a usage message at me. Can you help me out, maybe just copy in the command line you're using to run this. I don't have a lot of experience with "cmake" or Visual Studio and what I do was all C# small applications. The below is the sort of thing I think it should be but gets me nowhere.

Code:
.\CovidSim.exe /PP:preUK_R0-2.0.txt /A:United_Kingdom_admin.txt /c:8 /P:p_NoInt.txt /O:testrun /D:wpop_eur.txt  98798150 729101 Run 17389101 4797132
I mostly worked out the above by looking at the Python test script that was included. (And which I 90% guarantee was added by one of the post-fact outsiders who helped them clear it up for release).

The problem is that often your code is effectively your IP, that you've spent years developing and then want to be able to exploit, like an inventor. I can see both sides of the argument, but don't really know what the solution would be.
That may be true as a general point and where to draw the line hard to decide. But in this case it's not like there's a clear separate model we've seen and then something that might be a proprietary implementation containing IP. It appears to just be a simple script which might actually be the model.
 

Yotsubaaa

Tenkobest
True & Honest Fan
kiwifarms.net
Ah, I just used the Python scripts they provided to do everything. Apparently during the build of the executable it needs to be configured to the specific country you're running it on?

Anyway, this is literally all I typed at the command line to get it to work (from the /data directory):
./run_sample.py Sweden

I haven't tried messing around with RNG seeds just yet, but I imagine I want to just make a parameter file or whatever it seems to want? Actually I suppose I can just modify these magic numbers from the Python script (everywhere they appear):
seed.png


EDIT: @Overly Serious why have you got that "Run" bit in your command? Isn't it just the four numbers that you need?

EDIT2: Yeah @Overly Serious I got the Python script to just spit out the command it was using (without actually running anything) and here they are:

No intervention: Sweden NoInt 3.0
./CovidSim /c:8 /A:/home/#####/git/covid-sim/data/admin_units/Sweden_admin.txt /PP:/home/#####/git/covid-sim/data/param_files/preUK_R0=2.0.txt /P:/home/#####/git/covid-sim/data/param_files/p_NoInt.txt /O:/home/#####/git/covid-sim/data/Sweden_NoInt_R0=3.0 /D:/home/#####/git/covid-sim/data/wpop_eur.txt /M:/home/#####/git/covid-sim/data/Sweden_pop_density.bin /S:/home/#####/git/covid-sim/data/Network_Sweden_T8_R3.0.bin /R:1.5 98798150 729101 17389101 4797132

Intervention: Sweden PC7_CI_HQ_SD 3.0
./CovidSim /c:8 /A:/home/#####/git/covid-sim/data/admin_units/Sweden_admin.txt /PP:/home/#####/git/covid-sim/data/param_files/preUK_R0=2.0.txt /P:/home/#####/git/covid-sim/data/param_files/p_PC7_CI_HQ_SD.txt /O:/home/#####/git/covid-sim/data/Sweden_PC7_CI_HQ_SD_R0=3.0 /D:/home/#####/git/covid-sim/data/Sweden_pop_density.bin /L:/home/#####/git/covid-sim/data/Network_Sweden_T8_R3.0.bin /R:1.5 98798150 729101 17389101 4797132

Maybe that's helpful to you? e.g. you don't need the "Run", but you do need to fully specify where the parameter/etc files appear.
 
Last edited:

Pampered Degenerate

Smol but fierce
kiwifarms.net
That may be true as a general point and where to draw the line hard to decide. But in this case it's not like there's a clear separate model we've seen and then something that might be a proprietary implementation containing IP. It appears to just be a simple script which might actually be the model.
I didn't mean IP in a literal / strictly legal sense. At least in principle, everything necessary to produce an implementation and reproduce the results should already be available in the literature, although it isn't more often than not. I meant that if academics start being compelled to give away their codes "for free", then the innovators will lose out to competitors who can just use it to publish lots of papers (i.e. academic output) rapidly. It's basically a similar situation to first world companies developing tech which they give to Chinese factories, who then pump it out at a rate and cost with which the developers can't compete.
 
Tags
None