A push to calculate a 'genetic income score' using giant DNA databases raises a raft of ethical questions.
The UK Biobank is the single largest public genetic repository in the world, with samples of the genetic blueprints of half a million Brits standing by for scientific study. But when David Hill, a statistical geneticist at the University of Edinburgh, went poring through that data, he wasn’t looking for a cure for cancer or deeper insights into the biology of aging. Nothing like that. He was trying to figure out why some people make more money than others.
Along with a team of European collaborators, Hill sifted through the UK Biobank data to find about 286,000 participants who had answered a survey question about household income. Using that information they conducted something called a Genome Wide Association Study, where they looked at 18 million places in the genome to see which ones matched up with higher paychecks. They uncovered about 30, which account for 7.4 percent of household income variation across the United Kingdom. (For some context, another way of viewing the results is to say that 92.6 percent of a person’s income is explained by factors other than genetics.) Hill noticed that many of the genetic differences overlapped with areas known to be associated with intelligence, based on some of his prior work, and when he mapped them out they were largely expressed in the brain.
His team then used these regions to compute a polygenic score, a genetic calculation that predicts a person’s odds of reaching a certain outcome—of, say, developing diabetes or earning six figures. It didn’t perform particularly well, correctly forecasting only 2.5 percent of the differences in income in an independent sample of Scots. “Your DNA will not print you money,” says Hill. But he’s relieved to have found some small effect. “If you’re born with a predisposition for certain traits or abilities, and none of them counted in any way, shape, or form towards your income, then you’d have a profoundly unfair society, in my opinion,” he says.
Hill and like-minded colleagues are working on a science they call sociogenomics. And bolstered by a global boom in biobanking, they have more data than ever before to probe connections between people’s DNA and their socioeconomic circumstances. A “genetic income score” could allow economists and epidemiologists to more precisely investigate fundamental questions about inequality. Policymakers might incorporate this information to better evaluate the social programs intended to pull people out of cycles of poverty. In some places, it could be spun as a powerful argument for radical resource redistribution.
Then there are the dystopian outcomes. Prospective employers could ask you to submit your genetic income score as part of a job application. Health and life insurers could use it to calculate your premiums. Social programs might use it as disqualifying criteria for receiving benefits. Apps like the ones that prevent you from accidentally dating a relative could help you pair up with those genetically inclined toward prosperity. IVF clinics could incorporate it into their genetic screening procedures so parents can choose the highest-earning embryos in addition to the healthiest ones. For every opening to use such information to create a more fair and just society, there exist in equal measure opportunities to weaponize it to exacerbate existing inequalities or perpetuate new ones.
Hill’s unpublished research, posted to the preprint server bioRxiv in mid-March and currently under review, is not yet the stuff of financial fortune-telling. But other, bigger efforts to increase the accuracy of genetic income scores are already underway.
“We’ve been shying away from looking at income for a very long time, for a number of reasons,” says Philipp Koellinger, an economist at the Vrije Universiteit in Amsterdam, where he studies the genetics of behavior. Looking at the molecular architecture of money-making has a lot of potential to be misinterpreted or abused, he says. Especially by fringe groups who might latch on to sociogenomic research as support for racist notions of a hierarchy of human worth. Despite its new name and new software packages, the emerging field of sociogenomics will forever be entangled with the long, dark, history of the statistical tools that serve as its foundation—tools invented by some of the grandfathers of American eugenics. (For more on this, I’d suggest Carl Zimmer’s excellent book on the science of inheritance.)
But, says Koellinger, the main reason researchers haven’t done those studies has been the inadequacy of the data sets. Cheap DNA sequencing and a burst of precision health mega-projects like the UK Biobank, BGI’s Million Chinese Genomes project, and All of Us in the US are changing that. “Those ethical problems are still large and looming out there,” says Koellinger, who worries most that companies will commercialize his results into genetic soothsayers, which he describes as “very misguided.” “But we’re at a point where so much data is available that someone will do it eventually, so we might as well be the ones to do it first with all the experience we’ve collected over the years.”
Back in 2011, Koellinger founded the Social Science Genetic Association Consortium, along with Daniel Benjamin and David Cesarini, to convince researchers from around the world to pool their data so they’d have enough to run a Genome-Wide Association Study, or GWAS. The hope was to tease apart genetic effects from environmental ones for particular traits. The first thing the SSGAC looked into was how long people stay in school. Now, Koellinger is tapping the same network to assemble the largest-ever study linking DNA to dollar signs.
So far, he’s recruited 22 cohorts with more than 800,000 individuals. Some of them include biobanks in Scandinavian countries that also keep public registries of their citizens’ tax returns. Others are less granular. The UK Biobank, for example, includes only household incomes, and it lumps them into five buckets. To get around that, Koellinger’s team developed an algorithm that estimates a person’s income using his or her occupation, age, sex, and housing type. Using this method, and a bigger sample, Koellinger hopes to eliminate some of the noisiness of the Hill team's analysis, to arrive at a more powerful genetic predictor of income. The project is still in its early phases but could have initial results as soon as later this year.
Of course, limitations still abound. The world’s genetic repositories overwhelmingly contain data from people of European descent. So any polygenic scores derived from them are likely to have less utility in nonwhite populations. Still, Koellinger imagines that the genetic income scores they’ll generate will be very valuable to economists trying to uncover how DNA influences the trajectories of people’s lives—the degrees they get, the jobs they work, the money they spend (or save). But other researchers warn that results from such methods should be interpreted cautiously. “It’s tricky, because right now we’re much better at identifying genetic variations and building polygenic scores than we are at understanding the causal underpinnings,” says Graham Coop, an evolutionary biologist at UC Davis. “So while you can use them to control for something about genetics, the issue is you don’t know exactly what you’re controlling for.”
Unlike other fields of science that involve designing experiments to collect data and test hypotheses, sociogenomics doesn’t bother with either. Instead, using data that’s been collected elsewhere, behavioral geneticists unleash statistical algorithms to devour it and spit out correlations—not causation—between minute variations in the DNA and any trait that might be of interest.
And when you have no visibility into whether those correlations implicate innate biology (neurons that fire faster, say) or social stratification (discrimination based on race, sex, religion, etc.), you can make mistakes. Coop knows this from first-hand experience. Earlier this year he and a group of collaborators set out to replicate an interesting finding in his field—that DNA is the reason height varies so drastically across Europe, from the warm southern Mediterranean latitudes up to the sub-Arctic Nordic ones. Starting in 2012, a series of papers used polygenic scores to show for the first time natural selection in action for a complex trait like height. But when Coop’s team looked for the same signal in the UK Biobank (a much bigger sample), the effect disappeared. “We thought it was a solid case, we’d just tick the box on replication, and instead it came crashing down,” says Coop. “The lesson was that there are all these subtle biases that can creep in when you calculate polygenic scores, and those biases compound over thousands of calculations.” If that can happen with a trait like height, which is about 80 percent heritable and has been studied for well over half a century, who knows what pitfalls await attempts to do the same thing for income?
Koellinger is taking these concerns, along with the ethical ones, seriously. He plans to invite his most vocal critics to collaborate with him on the new GWAS project, in the hope that by involving them in communicating the results he can head off any misinformed or malicious interpretations. “We have a scientific responsibility to not just let that happen but to actively try to steer the process.” he says. But he admits that without some form of regulation, writing FAQs and giving interviews can only go so far. It doesn’t take long for companies to turn polygenic scores for behavioral traits into products, like DNA tests for academic achievement or genetic embryo scans for low intelligence. “As soon as our results are in the public domain, we have very little control over what people are going to do with it.”
Hill thinks the answer isn’t regulation so much as education, so consumers know enough not to be taken in by polygenic-score-powered products promising dubious individual predictions. But if that’s the case, his own FAQ could use a little work. Nowhere does it mention that a genetic income score that explains only 2.5 percent of variation is not remotely accurate at predicting how much money an individual will make. When I asked him about the notable absence, he said, “I don’t believe there’s anyone out there today who honestly thinks a polygenic score is a good predictor on an individual level at this stage”—but he admitted that it might be good to say so explicitly.
Like other sociogenomics researchers, Hill believes that his work raises the prospect of personalized social policy. In the same way that personalized medicine aims to identify individuals with a predisposition for a disease, and then give them tools to prevent it from ever manifesting, you could do the same thing with social interventions, he says. If you could compute a genetic income score or genetic education score for kids at a young age, you could change their environment before they start struggling.
That might sound good at first, but if the goal is a more just society, policymakers are supposed to work without knowledge of things like socioeconomic status, sex, race, personality, talents, and especially genes. The idea being that doing so automatically leads to discrimination. Sure, DNA sequencing might not have been around when the influential political philosopher John Rawls articulated this idea in his “veil of ignorance” thought experiment, but a person’s genetic source code certainly would have qualified as information better left in the dark. As researchers like Hill and Koellinger move forward with mining the world’s DNA deposits, policymakers will soon have to decide on which side of the veil polygenic scores for things like income, education, and intelligence belong.