The last 2 decades have seen the collection of increasingly rich bodies of genetic data, opening up a multitude of new research avenues. These data have made undeniable contributions to health research, but have they equally contributed to the population sciences? This article critically surveys the assumptions underlying sociogenomics, a new field that explores the links between genetics and human behaviour. While sociogenomics as such is new, these links have long been debated, as seen for example in the writings of Alfred Jacquard and Daniel Courgeau.
1 In the past decade and a half, sociologists and demographers have sought to integrate genetic data into their empirical analyses. To do so, they have drawn on recently developed high-throughput sequencing and genotyping technologies*, [1] which allow the entire genome to be mapped. They also follow in the line of a research specialty, behavioural genetics, which rose to prominence in the 1970s. This area, which focused notably on the genetic determination of intelligence, attracted severe criticisms, including among demographers (Jacquard, 1978; Vetta and Courgeau, 2003; Courgeau, 2017). However, these criticisms do not always seem to have been heard, and the emergence of new data and techniques has given rise to new problems, as indicated by calls for caution from learned societies in human genetics (Société française de génétique humaine, 2010; Risch, 2016; Société française de génétique humaine, 2018).
2 This article extends this critical work. It begins with a look back at the development of sociogenomics, understood here as the combination of sociology/demography and genetics (Section I). Section II examines the limitations of the theoretical concepts and models used in sociogenomics, while Section III focuses on the limitations of the data and methods of analysis used in the field. Section IV presents a provisional assessment of sociogenomic research to date and its contributions to sociology and demography. Finally, the conclusion examines the factors that have favoured the development of this research specialty, despite its considerable scientific weaknesses (Section V).
I – From the origins of genetics to sociogenomics
3 The birth of genetics is often associated with the work of Francis Galton and Gregor Mendel in the 19th century. Later, the article ‘The Correlation Between Relatives on the Supposition of Mendelian Inheritance’, published by Ronald Fisher in 1918, marked a decisive step in the development of quantitative genetics. [2] In it, Fisher laid down the foundations of the ‘additive polygenic’ model, according to which variations in a phenotype* are the result of the sum of the effects of a large number of genes—each of which has an infinitesimal effect—and of environmental factors (Section II.2). In mathematical terms, Fisher proposed the decomposition of variance in the phenotype into the sum of variance due to genes and variance due to the environment. [3] He thereby opened up the possibility of calculating the ‘heritability’ of a phenotype, defined as the proportion of phenotypic variance attributable to variance in genes (Section II.1).
4 Understanding how genes contribute to complex traits has been the focus of a great deal of research since, and remains a major question today. The applications of quantitative genetic methods to human beings expanded rapidly beginning in the 1970s, in medicine but also through a new research specialty, behaviour(al) genetics (Panofsky, 2011, 2014), whose aim is to study the role of individuals’ genetic heritage in social behaviour. Many psychologists took an interest in behavioural genetics, which seeks to quantify the heritability of characteristics like personality traits, social attitudes, and mental illnesses such as schizophrenia, but also intelligence (via the ‘intelligence quotient’, or IQ). More recently, other social science disciplines have taken up the study of the heritability of behavioural traits*. Research currents have been developed in criminology (biosocial criminology; Larregue, 2016, 2017, 2018a), political science (genopolitics; Larregue, 2018b), and economics (genoeconomics; Benjamin et al., 2012), with the study of traits as diverse as delinquency, electoral behaviour, and income.
5 The 2000s saw the rapid development of high-throughput sequencing and genotyping technologies. This made it possible to simultaneously study a large number of genetic markers* in a large number of individuals at a quickly decreasing cost, leading to the development of new data sources and analytical techniques. It was in this context of rapid technological and statistical progress that the combination of genetics and sociology (or demography) emerged and spread, beginning in the late 2000s. When the term sociogenomics was first used, it referred to the study of ‘social life in molecular terms’, as applied to both humans and animals (Robinson et al., 2005). In her book Social by Nature: The Promise and Peril of Sociogenomics (2018), Catherine Bliss used this term (or, more often, social genomics) to refer to the application of genomic methods to social science research in order to identify the genetic causes of social phenomena (excepting illnesses). This article adopts the narrower definition proposed by Mills and Tropf (2020), which restricts the disciplinary perimeter to sociology. [4] A few dozen researchers present their work under the banner of sociogenomics, mainly in the United States and the United Kingdom; among them, around 15 are particularly active. Their empirical research is currently focused on a relatively small number of topics: principally level of education, [5] social mobility, and fertility, along with some high-risk behaviours (smoking, alcoholism, multiple sexual partners).
6 Only a handful of research articles in the area were published in the 2000s, but the number grew considerably over the following decade, probably due to the development of new tools such as polygenic risk scores (Section III.4). Many found an outlet in genetics and/or biology journals. Some, however, were published in major journals in sociology—with a special issue in the American Journal of Sociology in 2008, occasional single articles in the AJS as well as in the American Sociological Review, Social Forces, the Journal of Marriage & Family, Social Science and Medicine, and Sociological Science—and demography (Demography and Population Studies). Sessions were also held on the subject at the annual meeting of the Population Association of America. Sociogenomicists have also sometimes been awarded substantial funding, such as the Sociogenome project [6] led by Melinda Mills at the University of Oxford, which has received large European research grants (BSA, 2017). While few sociogenomicists occupy central positions in sociology, there are a few exceptions, such as Dalton Conley (Princeton), Melinda Mills (Oxford), Michael Shanahan (Zurich), and Jeremy Freese (Stanford). And the specialty is attracting growing numbers of young researchers.
7 Taken together, the arguments of these ‘genetics entrepreneurs’ (Shostak and Beckfield, 2015, p. 98) in defence of the integration of genetics and sociology/demography set out a kind of research programme, whose structure can be described in terms of a set of axes (Section IV). First, on a very general level, sociogenomics must deepen our understanding of social phenomena. ‘That still adds an incredibly large piece of the puzzle’ (Mills in BSA, 2017, p. 17), which will allow us ‘to learn something deeper about the social structures we live in and the mechanisms that give rise to those structures’ (Bearman, 2008, p. x). This implies breaking away from the Durkheimian conception of social facts (explaining the social by the social), considered an obstacle (Udry, 2000), and ‘re-own’ biology (Fuller in BSA, 2017, p. 20). More concretely, one of the avenues most explored by sociogenomicists is the question of gene–environment interactions, i.e. the ways environmental effects are mediated by genetic predispositions, and vice versa. Moreover, they argue that the addition of variables representing individuals’ genetic characteristics to statistical regression analyses could produce more robust, less biased econometric models. Finally, they suggest that knowledge of the genetic predispositions of individuals could, in certain cases, guide public policies.
II – The limitations of concepts and theoretical models in sociogenomics
8 Sociogenomics research very often draws on a concept—heritability—and is systematically based on a theoretical framework—the additive polygenic model. Both are widely disputed and subject to several criticisms.
1 – Heritability
9 The concept of heritability is ubiquitous in sociogenomics. Typically, either the heritability of a social phenomenon is directly measured, or measurements of heritability in previous studies serve as an argument for including genetic data in analyses.
10 The term’s origin is difficult to retrace. It is often attributed to Lush and his book Animal Breeding Plans (1937). But the mathematical formulation of heritability was produced earlier, by Ronald Fisher in 1918. Fisher proposed that the variance of a phenotype (P) could be decomposed into the sum of the variance due to the genes (G) and the variance due to the environment (E). He emphasized the importance of the ratio var(G)/var(P), i.e. the portion of phenotypic variability attributable to genetic variability. [7] This is precisely what we now call heritability.
11 Fisher’s model was first applied to measurable human physical traits, such as height or weight, but also to another quantitative trait: IQ. The heritability of IQ became a matter of fierce debate beginning in the late 1960s, especially after the publication of the work of Jensen (1969). Using IQ data collected in various studies, Jensen estimated that IQ was around 80% heritable. He concluded that the origin of differences in intelligence between social groups is largely genetic and that educational policies aimed at reducing inequalities would thus be ineffective. This type of reasoning was notoriously taken up later in Herrnstein and Murray’s The Bell Curve (1994) and in the research of the eminent psychologist Robert Plomin (2018). On the other hand, research on the heritability of IQ has been subject to numerous theoretical, methodological, moral, and political criticisms (Jacquard, 1978; Kempthorne, 1978; Lewontin et al., 1984).
12 The literature on the heritability of cognitive, cultural, and social traits has ballooned since the 1970s, notably due to the development of behavioural genetics. There has also been considerable interest in estimating the heritability of various common human diseases. For example, many studies have sought to understand the contribution of genes to diabetes (Génin and Clerget-Darpoux, 2015b).
13 Sociogenomicists also quantify the heritability of traits. For example, using data on twin girls in the United Kingdom, Tropf et al. (2015) concluded that 26% of the variation in age at first childbirth is explained by genetic predispositions, 14% by siblings’ shared environment, and 60% by the non-shared environment or measurement errors. In another example, Baier and Van Winkle (2020) found that the heritability of school performance is lower among children whose parents are separated, and concluded that educational policies could specifically target such children to help them fulfil their genetic potential.
14 But regardless of the discipline, studies using measures of heritability often suffer from misuses and erroneous interpretations.
Population versus individual
15 First, the concept of heritability concerns a population, but it is often confused with the individual concept of heredity, i.e. the degree to which a trait of a given individual is caused by genetic factors. The variability of a trait in a population is not equivalent to its determination. One focuses on variations in values, the other on the values themselves. In one case, the measurement concerns a population, and in the other an individual. Consequently, a trait can be hereditary and yet have zero heritability. For example, mammals have two ears, a hereditary trait because it is controlled by genes and transmitted over generations. But the origin of the very low variability in the number of ears between individuals is accidental, and thus environmental, not genetic: its heritability is zero (de Vienne, 2019).
Local versus universal measurement
16 Another important point is that heritability as a measurement is not universal, but specific to the studied population. It is local in both space and time. It depends on the frequency of genetic markers, the variability of the environment, and the variability of the phenotype. These three elements can vary from one population to another. Consequently, the heritability of a given trait in the same group of individuals, with the same genetic heritage, may differ depending on whether the environment is constant or variable.
Variance versus causes
17 Above all, heritability tells us nothing about the causes, mechanisms, or origins of the differences between populations. Analysis of variance and analysis of causes are disconnected (Lewontin, 1974), [8] as the example of divorce illustrates:
Divorce is heritable, but do we really expect that twin studies of marital processes will lead us to a genetic explanation of divorce? … The point is not that they are environmental as opposed to genetic; indeed, as we cannot emphasize enough, marriage, divorce, and whatever may cause them are just as heritable as anything else. But this heritability does not mean that either is a biological process awaiting genetic analysis … they do not have a specific genetic aetiology.
The biological error
19 Measuring heritability implies separating the contributions of genes and the environment. It is used routinely and effectively in plant and animal breeding (to predict the effects of selection). But this occurs in the context of experimental conditions where the environment can be carefully controlled, which is impossible in nature and in the case of human beings. Our understanding of the environmental factors that contribute to the development of human traits is very partial, and we can never be sure that we have identified them all. We thus also cannot know whether key environmental factors are stable or highly variable. Consequently, heritability is likely to mainly reflect environmental variability (Moore, 2006).
20 More fundamentally, contemporary biology has demonstrated that traits result from interactions between genetic and non-genetic factors at every stage of development (Moore and Shenk, 2017). Genes are part of a ‘developmental system of influences’ (Gottlieb, 2001, p. 6126). Genetic factors, proteins, cells, organs, organisms, populations of individuals, cultural factors, and other aspects of the environment all interact to produce the traits of living things over the course of their development (Moore, 2013).
21 Finally, if we start from the observation that any phenotype has some genetic content (Rutter, 2002), then the question whether variation in a human behaviour is influenced by genes is a purely rhetorical one (Courgeau, 2017). Moreover, heritability does not tell us what causes an individual to have a given trait. It does not tell us whether genetic factors contribute to this trait or what the relative influences of genes and environment are. Nor does it provide information on the transmission of a trait from parents to children. [9]
22 Numerous authors have been emphasizing the blind alley that heritability represents for human genetics since at least the 1970s. Lewontin suggested as early as 1974 ‘that we stop the endless search for better methods of estimating useless quantities’ (Lewontin, 1974, p. 525), while Jacquard (1978) complained that ‘it is not possible to make sense of an absurd and meaningless question merely by dint of the complexity of the mathematics used to answer it’ (p. 241). More recently, Moore (2006) noted that ‘rather than spending countless man hours analysing how sources of variation … account for variations in outcomes, our time and energy would be better spent exploring what causes those outcomes in the first place’ (pp. 350–351).
2 – The additive polygenic model
23 The concept of heritability is based on the idea that phenotypes can be described as the sum of a genetic component and a non-genetic (or environmental) component, and that the genetic component involves many genes, each of which makes an infinitesimal contribution to variation in the phenotype, whose effects sum together. This origins of this model, called the additive polygenic model (or infinitesimal model), lie in the observations of Galton (1877), their analysis by Pearson (1898), and their interpretation by Fisher (1918). It underlies many approaches in human genetics and forms the foundation of all the tools presented in Section III. It relies on a number of assumptions. These assumptions are not always clearly stated, and even less often verified. Courgeau (2017) identified five principal assumptions heritability calculations rely on:
- A1. Genes act additively (i.e. their effects sum together).
- A2. Genes segregate* independently.
- A3. The environment is random and independent of the genes.
- A4. The population is in Hardy–Weinberg equilibrium, i.e. there are no related individuals, migration, mutation, or selection.
- A5. T he number of genes is infinite (which simplifies the calculations).
24 However:
- Dominance effects (interactions between alleles*) and epistasis (interactions between genes) occur, so gene effects are not exclusively additive (A1).
- Genes do not segregate independently, in particular when they are located on the same chromosome. Courgeau (2017) notes that in 1918, Fisher did not know genes were distributed across 23 pairs of chromosomes (A2).
- In the context of experiments on animals or plants, environmental effects (or at least some of them) can be controlled. For human populations, however, this is not possible, so environmental exposure is not random. Furthermore, genetic and environmental factors interact and are not transmitted independently, notably due to epigenetic* phenomena—with genes subject to imprinting*, methylation*, etc. (Génin and Clerget-Darpoux, 2015b) (A3).
- Given assortative mating (see below), A4 does not hold (Courgeau, 2017), especially as Fisher’s proposed equations for the correlations between related individuals are incorrect (Vetta, 1976).
- The number of protein-coding genes is generally estimated to be around 20,000 (A5).
25 Génin and Clerget-Darpoux (2015b) also point out that the additive polygenic model assumes that numerous genetic and environmental factors exist and that the contribution of each is small; in other words, no single genetic or environmental factor makes a major contribution. And yet many environmental factors are known to have sizeable effects, such as time in education on age at first childbirth, and of diet and physical activity on type 2 diabetes (Génin and Clerget-Darpoux, 2015b).
26 In the end, then, none of the assumptions underlying the additive polygenic model holds in reality. Increasing numbers of scholars are thus calling for this model to be left behind and for a profound paradigm shift in human genetics that brings us ‘closer to the true biological system’ (Nelson et al., 2013, p. 261). The omnigenic model of Boyle et al. (2017) is one such proposal. They noted that in genome-wide association studies (GWASs), statistical associations between genetic markers and diseases identify a large number of genes scattered throughout the genome, including many genes with no evident link with the disease. This runs counter to the expectation that causal markers will be grouped together in major pathways related to the disease. Boyle et al. suggest that gene regulatory networks are so interconnected that all genes are likely to influence the functions of the genes that are central to a given illness. They thus draw a distinction between peripheral and core genes. Most importantly, according to the omnigenic model, most heritability is explained by the effect of genes located outside the core pathways.
III – Limitations of data and methods of analysis
27 The methods used by sociogenomicists, as well as researchers in behavioural genetics and in other related disciplines, are based on a number of conceptual and statistical assumptions, whose validity should be examined.
1 – Twin studies
28 The first estimates of heritability were derived from empirical data on phenotypic correlations between relatives, and various approaches have been proposed to compare these correlations across different types of relations (Tenesa and Haley, 2013). Among the different types of relatives, the most readily applicable to human genetics is the comparison of identical and fraternal twins. In 1876, Galton proposed that twins be used to distinguish between genetic and environmental factors in the expression of a trait. But it was not until the early 20th century that the idea that there were two types of twins emerged: monozygotes (MZ, who share 100% of their DNA) and dizygotes (DZ, who share only 50% of their genotype). In 1924, Siemens published the first study comparing the similarity of MZ and DZ twins. In 1960, Falconer showed that heritability can be straightforwardly estimated from the difference between concordance rates, i.e. the rate of similarity of a given characteristic in MZ and DZ twins. Twin studies quickly came to be widely used to study human traits (Polderman et al., 2015). The study on the heritability of age at first childbirth mentioned in the previous section, for example, was based on twin data (Tropf et al., 2015).
29 In these twin studies, the resemblance between pairs of monozygotic twins and pairs of dizygotic twins on a given trait is measured. If monozygotic twins resemble each other more than dizygotic twins, the conclusion is drawn that the trait in question is—at least partly—genetically determined. Statistical models are used to quantify the share of the variability in the trait that is attributed to heritability.
30 Twin studies rely on a set of assumptions that are not always mentioned. In particular, these include the following (Joseph, 2013; Burt and Simons, 2014):
- A1. Researchers can reliably and accurately determine whether twins are monozygotic (MZ) or dizygotic (DZ).
- A2. The genes of MZ twins are 100% identical, those of DZ twins 50% identical.
- A3. Twins share the same percentage of genes throughout their lives.
- A4. Phenotypic variation can be broken down into a genetic component, an environmental component shared by both twins, and an environmental component not shared by the two twins.
- A5. The effects of the relevant genes are additive.
- A6. Parents pair randomly with respect to the studied trait: in other words, there is no homogamy (i.e. random mating, also known as no assortative mating).
- A7. The environments of MZ twins are no more similar than those of DZ twins (known as the equal environment assumption [EEA]).
31 However, at least some of these assumptions are invalid. In the previous section, we saw that genetic effects cannot be assumed to be strictly additive (A5). The possibility of decomposing phenotypic variation into a genetic component and an environmental component implies the absence of interactions between genes and environment, another assumption whose invalidity is noted above (A4). Homogamy is a well-established pattern, both in human genetics (individuals tend to genetically resemble their partner more than an individual randomly selected from the population: see Conley and Fletcher, 2017) and in the social sciences (A6). Social homogamy is associated with genetic homogamy, which increases the genetic resemblance between children. In the context of twin studies, homogamy increases the similarity of DZ twins relative to MZ twins, thereby biasing the estimate of heritability downward.
32 Moreover, recent research in genetics shows that the genotypes of monozygotic twins are not 100% identical (A2) and that the genetic overlap of twins is not constant across the lifespan (A3). These discrepancies suffice to make estimates of heritability unreliable (Charney, 2012).
33 The EEA (A7) is undoubtedly the one that has been most intensely debated. Since the 1960s, empirical evidence has been accumulating that monozygotic twins experience more similar social environments than dizygotic twins. For example, they are more likely to be treated the same way by their parents, to have the same friends, be in the same class, spend time together, be more attached to each other, etc. (Joseph, 2013; Burt and Simons, 2014).
34 Monozygotic and dizygotic twins also experience different prenatal (intrauterine) environments. Those of MZ twins (who often share the same placenta) are more similar than those of DZ twins (who never share the same placenta). The prenatal environment is known to have a major impact on many aspects of development (Charney, 2017).
35 Most advocates of twin studies agree that the environments of MZ twins are more similar than those of DZ twins. Some reformulate the EEA to make it less restrictive. They suggest that it is enough for the environments to be ‘equal’ in terms of characteristics directly connected to the trait studied (trait-relevant EEA). They thereby seek to place the burden of proof on critics, who, they suggest, must demonstrate that this revised hypothesis is not valid.
36 Many studies have attempted to show that violations of the EEA do not bias heritability estimates. The most rigorous and convincing are undoubtedly those using data on ‘misclassified’ twins [10] (Conley et al., 2013). However, the scope of these studies is not general, insofar as they analyse particular data and traits, and implicitly rely on the validity of all the other assumptions underlying twin studies. Guo (1999) shows, conversely, that heritability estimates can be relatively high even in the complete absence of genetic factors when the environments of MZ twins are more similar than those of DZ twins.
2 – Genome-wide association studies
37 The 2000s saw the rise of high-throughput sequencing and genotyping technologies. These technologies have made it possible to study large numbers of genetic markers, in many individuals, at a rapidly decreasing cost. [11] This technological progress enabled the emergence of genome-wide association studies (GWASs). The principle of these studies is mapping (or ‘genotyping’) a very large number [12] of single nucleotide polymorphisms (or SNPs)* in each of various individuals. A statistical association test is then carried out for each polymorphism to detect any ‘hits’—polymorphisms with a significant statistical association with the studied trait, i.e. those significantly more likely to be present in individuals with the trait.
38 In sociogenomics, for example, using a genome-wide association study, Barban et al. (2016) identified 12 loci* that influence reproductive behaviour, ‘thereby increasing understanding of these complex traits’ (p. 1).
Assumptions
39 GWASs are based on the additive polygenic model, but also on a few assumptions of their own, including the following (Charney, 2012):
- A1. The DNA in all tissues and cells in an individual’s body is identical (except germ cells and some immune cells).
- A2. The presence of a particular gene (polymorphism or mutation) implies that it is activated, that it can be transcribed in a way that is associated with this polymorphism or mutation. Consequently, in two different individuals, the same polymorphism will be able to be transcribed equivalently (they will both be activated).
40 However, as Charney (2012) explains, the DNA in all cells in an individual’s body is not identical due to the widespread phenomenon of mosaicism* (A1); and the presence of a particular allele does not imply that it can be transcribed in the manner associated with that allele, as it can be epigenetically silenced (A2).
41 Furthermore, since the statistical association between SNPs and a phenotype is measured using regression models, the assumptions underlying these models are also relied upon, but some are not valid in the case of GWASs (Angers et al., 2019). In particular, the effect of an SNP is not linear with respect to the number of minor alleles (linearity assumption) if one allele is dominant or if the causal allele is recessive. The data used are rarely samples drawn randomly from a population (random sampling assumption). If an SNP is correlated with unobserved environmental factors, the estimated parameter will be biased so as to erroneously attribute an environmental cause to the genotype [13] (exogenous error assumption). The same assumption will be violated if an SNP is not causal, but is correlated to a causal SNP (Angers et al., 2019).
False positives and replicability
42 Given the vast number of associations tested in GWASs, the risk of ‘false positives’—associations that are statistically significant by chance—is high, even if it is limited by the use of Bonferroni correction, the adoption of a very low significance threshold to correct for bias due to multiple tests. [14]
43 Replicability can also be an issue. Different studies on a given phenotype sometimes fail to find the same significant associations. For example, collating the results of 17 GWASs on intelligence, Smith (2019) found that more than 87% of the 2,335 SNPs identified were only found in one of the studies.
Data
44 GWASs tend to rely on data with low diversity in demographic, geographical, and ancestral terms. They are heavily biased towards White individuals of European origin. [15] Samples are also often unrepresentative, with disproportionate numbers of women, older people, and individuals with high socio-economic status (Mills and Rahal, 2019). These diversity and sampling biases can have problematic consequences for the reliability of the results (Mills and Tropf, 2020).
45 Furthermore, because of the need for large numbers of observations for statistical reasons, researchers tend to combine databases of various origins (epidemiological surveys, DNA banks, data from private genetic testing companies, etc.). This can compromise the quality of the data (Barton et al., 2019).
46 Additionally, little of the genetic data was collected for sociological or demographic purposes, and the available indicators to measure social phenomena, such as educational attainment, are often of limited precision and relevance (Mill and Tropf, 2020). This issue is amplified by the fact such indicators must be present in all the combined datasets. [16]
47 Finally, GWASs focus on a single type of genetic polymorphism, SNPs. These genetic markers are used because they are the most common. But there are many others, such as copy number variations, multiple copies of segments of genes, whole genes, and whole chromosomes, which are likely to impact phenotypes (Charney, 2013). The basis of the choice to focus on SNPs in GWASs is pragmatic (ease of data collection) and not scientific, and ultimately the information they yield on the associations between genes and phenotypes is very limited.
3 – The genome-based restricted maximum likelihood method
48 Over the last 15 years, the number of GWASs has exploded. This has led to the development of new analytical methods, in particular for estimating heritability from the nucleotide polymorphisms surveyed in these studies, and thus without relying on twin studies (Speed et al., 2020). Among these methods, genome-based restricted maximum likelihood (GREML*) is no doubt the most widely used. It offers a way to quantify genetic similarity between unrelated individuals (Yang et al., 2011).
49 However, from the perspective of population genetics, relatedness is not reducible to family ties: GREML must deal with population stratification bias (see Section III.4). The techniques implemented to correct this bias are not suitable for the task (Browning and Browning, 2011, 2013; Janss et al., 2012).
50 GREML is also subject to sampling and errors in phenotype measurement; estimates are biased and standard deviations inaccurate. It does not produce reliable and stable estimates of heritability (Charney, 2013; Kumar et al., 2016a, 2016b).
51 Furthermore, the measures of heritability obtained from twin studies are generally high, whereas those obtained from GWASs are much lower. For example, twin studies have estimated the heritability of age at first childbirth to be around 30% (Briley et al., 2017), while studies using GREML have estimated it at 15% (Tropf et al., 2015b), or even 0.9% based on a polygenic risk score (Section III.4.) (Mills et al., 2018). The discrepancies between these results have given rise to debates about the origin of this ‘missing heritability’ (Manolio et al., 2009). It is often interpreted as an artefact of the temporary imperfection of the available data and the statistical methods used.
4 – Polygenic risk scores
52 In 2007, Wray et al. proposed a new predictive tool for clinicians: polygenic risk scores* (PRSs). A PRS is a quantitative variable that is supposed to summarize an individual’s genetic predisposition towards a given trait. It is calculated as a linear combination of the SNPs present in the individual, weighted by the size of the effects of the SNPs measured in the GWAS (Dudbridge, 2013). The underlying hypothesis is that each individual has some degree of genetic predisposition towards the trait, resulting from the small contributions of many genetic markers.
53 Software has been developed to calculate individual PRSs for many diseases, with the intention of supporting clinical decision-making. The number of articles touting the benefits of using PRSs for different complex diseases has grown exponentially over the past decade. [17] PRSs are also often used by sociogenomicists. [18] For example, Mills et al. (2018) found that women who are genetically predisposed to have their first child later also show a temporal shift in their entire reproductive period, with later onset of menstruation and menopause. Domingue et al. (2015) found that on average, individuals with a genetic predisposition towards educational attainment complete more years of schooling, even after controlling for the effects of social background. They concluded that this genetic predisposition does indeed have a causal effect.
54 Sociogenomicists also use PRSs to study gene–environment interactions. For example, Schmitz and Conley (2016) analysed a PRS for smoking among Vietnam veterans. They found that, among veterans with a high genetic predisposition to smoke, the rate of tobacco use was significantly lower among those who completed some education after the war. Conley et al. (2015) examined the role of genes in the relationship between the levels of education of parents and their children. They concluded that one-sixth of the correlation between parents’ education and their children is attributable to genetic transmission and five-sixths to social inheritance; that parents’ PRSs do not significantly affect their children’s educational attainment after controlling for the children’s own risk score; and that children’s genetic predispositions are not moderated by their parents’ sociodemographic characteristics. They concluded that two distinct inheritance systems coexist, one genetic and the other social.
55 PRSs rely on the results of GWASs and inherit their limitations. But they also give rise to other problems.
Sensitivity
56 First, PRSs depend heavily on various decisions made when they are calculated: to use all SNPs or only those most strongly associated with the phenotype, the choice of a significance threshold, linkage disequilibrium pruning (LD pruning*), etc. (Ware et al., 2017).
Eurocentrism
57 Most PRSs are calculated on data from populations of European origin, reflecting the sampling bias of GWASs. For other populations, their predictions are much less accurate, or even highly inaccurate, as is the case for example with populations of African descent (Martin et al., 2019). Using them thus risks increasing health and other inequalities between populations based on whether they are included in GWASs (Martin et al., 2019).
Population stratification
58 Population stratification can be defined as the presence of a systematic difference in allele frequencies between subpopulations due to differences in ancestry. This phenomenon, omnipresent in the human species, can bias the results: observed associations between genes and phenotypes may be (at least partly) spurious because they also reflect differences in genetic structures between groups of individuals. In other words, an environmental factor associated with a given phenotype may differ between subpopulations. As this factor is associated both with the phenotype and with genetic variations in the population, it becomes a confounding factor.
59 A classical example of this bias is the ability to eat with chopsticks (Lander and Schork, 1994). There could be genetic variations that affect this ability. But at the scale of the global population, most of the variation in this trait is due to differences in cultural background—and thus to environmental, not genetic, differences. A GWAS would identify markers associated with the ability to eat with chopsticks. But these associations would be spurious because they actually reflect genetic differences between people from East Asia and those from the rest of the world.
60 Population stratification bias can be small at the level of an individual locus but very significant when thousands of loci are aggregated, as happens in the calculation of a PRS (Barton et al., 2019).
61 Methods such as principal component analysis can be used to attempt to control for this bias, but they are not enough to eliminate it (Dandine-Roulland et al., 2016; Curtis, 2018; Berg et al., 2019; Haworth et al., 2019; Sohail et al., 2019), even with relatively homogeneous populations. [19] And even within a subpopulation with shared ancestry, compositional effects exist, depending on place of birth (Haworth et al., 2019), gender and age (Mostafavi et al., 2020), ethnicity (Freese et al., 2019), or social class (Abdellaoui et al., 2019; Richardson and Jones, 2019).
62 Finally, no natural population is without structure (i.e. homogeneous, genetically random); and in humans, the environment cannot be controlled experimentally as it can in studies on plants or animals (Barton et al., 2019). No statistical technique seems able to effectively neutralize the complexity of population stratification. Consequently, the observed correlations between PRSs and traits are inevitably partly spurious (Richardson, 2017).
5 – Candidate gene studies
63 To study gene–environment interactions, sociogenomicists use not only PRSs but another, older approach, called candidate gene studies. These studies target particular genetic markers with a known physiological function, examining the hypothesis that they have an effect on a given trait. If the trait is found significantly more often in individuals with the chosen genetic markers, this validates the hypothesis.
64 Guo et al. (2008), for example, studied number of sexual partners as a risk behaviour among young White men. They found that the 9R/9R genotype has a protective effect, but that this effect tends to disappear in schools where a high proportion of students start to have sexual relations early. Pescosolido et al. (2008) found that the GABRA2 gene has little effect on the risk of alcohol dependence in women, and that this genetic influence is attenuated still further by family support but accentuated by childhood deprivation.
65 Candidate gene studies suffer from a major problem: replicability. To date, most of the associations apparently uncovered between genes and phenotypes have not been replicated with new data. This non-replicability is often explained by a lack of statistical power, understood as a source of ‘false positive’ associations (Chabris et al., 2012).
66 Another weakness of this approach is the problem of ‘overabundance’, i.e. many diverse phenotypes have been associated with the same genes. Charney and English (2012) showed that four genes in particular (MAOA, 5-HTT, DRD2, and DRD4) were at the origin of a large number of studies. They presented a 15-page table of phenotypes ‘explained’ by one of these genes, ranging from age at first sexual intercourse, voting behaviour, openness, creativity, and gang membership to alcoholism, colorectal cancer, Tourette’s syndrome, and premature ejaculation. The extreme diversity of phenotypes associated with these four genes shows that in the end they carry little information on biological mechanisms that may help explain those phenotypes.
67 Furthermore, candidate gene studies most often amount to no more than simple investigations of statistical associations, even though more advanced methods for testing genetic models exist (Clerget-Darpoux et al., 1988).
IV – Assessment and perspectives for sociogenomic research
68 Beyond the conceptual and statistical limitations of the approaches used to date, what conclusions can be drawn on the contributions of sociogenomics to sociology or demography?
1 – From statistical association to mechanisms
69 First, the gap between a ‘hit’ and its biological interpretation—i.e. between identifying a SNP associated with a trait and understanding the mechanisms that produce the trait—is immense. This is true in medicine, and a fortiori in sociology and demography, where the pathways from genes to observed phenomena are undoubtedly longer and less direct. To make even modest progress on filling these gaps would require using and combining various data and methods (Bourgain et al., 2007).
70 In some cases, the genes identified by sociogenomicists already have known biological functions. For example, certain genes known to be biologically associated with fertility-related processes, such as ovarian growth, oestrogen production, or hormonal stimulation, are also statistically associated with fertility behaviour (Barban et al., 2016). But not only are such results rare in sociogenomics and somewhat tautological, they are arguably likely to contribute more to advancements in biology or medicine than in sociology and demography. Moreover, they represent only a tiny step towards an understanding of the mechanisms that link genes, other biological components, and sociodemographic factors with the targeted processes (fertility, educational attainment, etc.)—an understanding that seems largely out of reach at the current time.
2 – Purification of effects
71 According to sociogenomicists, controlling for genetic factors in econometric models can help to avoid genetic confounding, obtaining causal effects of sociodemographic variables that are ‘purer’, less biased, with lower standard deviations. This, they argue, will improve the ‘predictive power’ of models used in studies of populations, whether in sociology or demography (Guo et al., 2008; Conley et al., 2014; Conley, 2016; Freese, 2018).
72 From a strictly predictive point of view, drawing on genomic data is legitimate. But PRSs have not proven to have much predictive power. Their added value is thus limited. Some sociogenomicists respond to this criticism with the observation that the predictive performance of sociodemographic factors is also weak. This argument is unconvincing for several reasons. First, certain sociodemographic factors do in fact exert a major effect. For example, the duration of women’s studies has a very strong influence on the age at which they have their first child: the correlation in the United States is 54% (Marini, 1984), whereas a PRS explained less than 1% of variation in age at first childbirth (Mills et al., 2018). Social origin also strongly influences academic success. In OECD countries, parents’ number of years of education accounts for about 30% of variation in their children’s number of years of education (OECD, 2019) compared with 2%–3% for a PRS (Conley et al., 2015). Moreover, the status of these two categories of factors is not the same. While PRSs, as a measure of a genetic predisposition, are supposed to capture the overall effect of genetics, or even of nature as opposed to nurture, no sociodemographic factor is intended to measure the effect of ‘the social’ in general, or even of one of its dimensions. They are always only understood as partial, imperfect indicators. [20]
73 From an explanatory perspective, integrating a PRS into an econometric model is problematic. The idea of using a PRS to eliminate genetic confounders and ‘purifying’ the effect of the main explanatory variable implies knowing precisely what it is that is ‘controlled for’ by the PRS. In reality, this is not the case, particularly because of population stratification bias, which means that in controlling ‘for’ a genetic predisposition, we also partly control for sociodemographic factors, while lacking information on their nature and on the scope of this problem.
74 Furthermore, from a methodological perspective, the econometric techniques used in this context have many limitations, which are all the more significant when they are applied to genomic data. This is true, for example, of the most widespread of all such methods, Mendelian randomization, which adapts the econometric approach based on instrumental variables using a genetic variable as an instrument. These studies struggle to satisfy the statistical assumptions of the underlying model [21] (Davey Smith and Ebrahim, 2003; Nitsch et al., 2006; Mills and Tropf, 2020).
75 Moreover, the logic of these uses is restricted to the narrow framework of the ‘general linear reality’ described and criticized by Andrew Abbott (1988), and of an ‘instrumental positivism’ [22] which holds that a given factor has a ‘real’ effect that can be precisely measured by using modelling to disentangle it from the effects of all other factors. But determinants do not simply add together: they combine and interact, forming networks of factors. Seeking to isolate them from each other, whether in the social world (Bourdieu, 1979; Ragin, 2006) or the biological domain (Moore, 2013), is a futile endeavour. Moreover, there is no way to actually verify an effect’s purity—to check that all the relevant factors have been successfully controlled for.
76 Finally, it should be noted that here, the linkage of genetics and sociology/ demography is limited to an instrumentalization of genetic data to measure sociodemographic effects.
3 – Gene–environment interactions
77 Sociogenomicists often emphasize the considerable interest of studying gene–environment interactions (G × E effects) to analyse how social effects are mediated by genetic predispositions (Guo et al., 2008; Conley, 2016; Freese, 2018). Some sociogenomicists have sought to theorize these interactions by conceptualizing them in terms of a set of ideal types (Shanahan and Hofer, 2005; Boardman et al., 2013). In the contextual triggering, or diathesis–stress model, a genetic predisposition remains latent until it is activated by an (often negative) environmental factor. According to the differential susceptibility model, some individuals are more sensitive than others to certain environmental factors, both positive and negative. The bioecological model (also known as the social compensation or social distinction model) holds that genetic effects are greatest in stable environments, often those of the upper classes, which allow individuals to fulfil their genetic potential. According to the social control (or social push) model, genetic influences are filtered or mediated by environmental factors such as social norms or structural constraints.
78 One might wonder what effects the use of the term environment without epistemological scrutiny has on the construction of sociological objects. In the additive polygenic model, the environment is anything that is not genetic. In the context of epidemiology, the environment refers mainly to risk factors (diet, pollution, etc.). For sociogenomicists, the environment seems synonymous with the social. The use of the term without conceptual work thus leads to the grouping of highly diverse processes under the same term.
79 Moreover, in practice, sociogenomicists more often study how the environment modifies the effect of genes [23] (i.e. using the social to better understand the genetic mechanisms) than the opposite. Strictly speaking, these studies do not seek to put genetics at the service of a better understanding of social phenomena.
80 Finally, gene–environment interactions are generally operationalized in a highly simplistic way.
- The genes under study consist of either a single (candidate) gene or an aggregate of many genes (PRS).
- The environment is represented by one (sometimes a few) variable(s), taken to summarize a complex reality (at-risk environment, family environment, etc.).
- In practice, the interaction is defined through an interaction term in a regression model.
81 From a strictly statistical point of view, the demonstration of gene–environment interactions suffers from many limitations, such as low statistical power, sensitivity to the choice of scales to measure the environment, and confounding factors (Domingue et al., 2020; Mills and Tropf, 2020). But most importantly, complex (higher-order, dynamic, feedback, etc.) interactions are never modelled. The presence or absence of statistical gene–environment interactions tells us nothing about the real, ‘physical’ interactions between genes and their contexts (Moore, 2018), which are ubiquitous and increasingly widely studied, in particular in epigenetics (Gottlieb, 2003). Variation partitioning and mechanism elucidation approaches are totally distinct, which has provoked much debate and misunderstanding since the beginnings of quantitative genetics (Tabery, 2015).
82 Finally, although the question of interconnections between the biological and the social sits at the very heart of sociogenomics, the field reduces these interactions to the abstraction of their econometric definition, ultimately reproducing a simplistic dualism of nature and culture.
4 – Decision support and public policy
83 Sociogenomicists sometimes raise the possibility that genetic information could provide decision support for public policy (targeting at-risk populations, etc.), on the model of ‘personalized’ medicine (Conley et al., 2014; Baier and Van Winkle, 2020). [24]
84 Conley and Fletcher (2017) promote theories holding that heritability constitutes an indicator of social justice in a given society. For example, low heritability of academic achievement would point to a context where the population’s genetic potential is not being adequately realized because social factors are responsible for differences in achievement. As we have seen above, however, the use of heritability in human genetics faces insurmountable problems. One might add that the social sciences do not suffer from any shortage of measures of inequality, which do not require the use of genetic data.
85 But above all, Conley and Fletcher (2017) defend the idea that the study of gene–environment interactions can be used to identify differences in the impact of public policies depending on the genetic predispositions of the beneficiaries, improving the targeting of these policies. Progress in personalized medicine has been much slower than expected, largely due to the complexity of the genetic architecture of most diseases. PRSs are sometimes used to try to target at-risk individuals. In this context, they are used as a complement to other information. [25] Some authors recommend great caution in the interpretation and use of the results [26] as well as the balancing of risks and benefits in the case of clinical predictions (Rosenberg et al., 2018; Barton et al., 2019; Baverstock, 2019). The contribution of GWASs, especially given the massive investment they have attracted, is highly debated and sometimes considered approximately nil (Paneth and Vermund, 2018). [27] It seems likely that the impacts of GWASs for the study of social behaviours, whose links with genes are undoubtedly looser and more complex, will not be much more positive. Today, in any case, the desire to use PRSs to guide the targeting of educational policies by identifying target populations is totally unrealistic because the predictive performance of PRSs with respect to education is far too low (Angers et al., 2019).
V – Development at a distance from biology
86 The limitations of the linkage of genetics with sociology and demography, and the problems it raises, are numerous and profound. What consequences have sociogenomicists drawn from these facts?
87 Some sociogenomics researchers highlight the strong limitations on their results and the ways they hope to overcome them. The predictive power of a PRS is low? Soon larger bodies of data will be collected, improving predictions. We don’t know where the missing heritability went? Data on rare genetic markers will soon be collected, filling a part of the gap. Establishing the causal role of genetic effects is difficult? New statistical methods will soon solve the problem. Sociogenomicists thus rely on faith in the advancement of sociogenomics through technical progress, without questioning the biological model on which their entire enterprise is based.
88 The origin of this gap probably lies in the ‘social distance’ (Collins, 2010, see pp. 8–12) between sociogenomicists and biologists, which explains the former’s reliance on an outdated genetic model and an epistemology shown to be obsolete by contemporary debates in biology (Meloni, 2014), whereas the latter are ‘more cautious in their conclusions and less certain about the current state of knowledge’ (Larregue, 2018a, p. 297). We now know that genes are a group of cellular resources among others, and not ‘codes’ for development—and even that genes as such exert no causal effect (Baverstock, 2019). ‘Scientific’ genetic determinism is linked to the idea that both the genotype and the external environment affect the phenotype through various interactions by way of developmental processes in the internal environment of an organism. It is opposed to a ‘hard’ genetic determinism, which holds that genes alone determine some or many of the individual traits of organisms, including human beings, grossly exaggerating their role (Aivelo and Uitto, 2015). Sociogenomics, like other specialties that have followed in the line of behavioural genetics, often tends to err on the side of hard determinism and therefore wastes time and resources on ‘chasing ghosts’ (Charney, 2013).
89 Several factors are involved in creating this social distance between sociogenomicists and biologists. First, the cost of entry into the practice of sociogenomics may seem high. But it is not as high as all that. A minimal level of skills in genetics and molecular biology is enough to make use of the data and statistical methods used by sociogenomicists. [28] The cost of entry is further lowered by the availability of textbooks (Mills et al., 2020), numerous computer and statistical packages, and ‘turnkey’ data.
90 This last point is important, as the development of sociogenomics has piggybacked on the development of data sources. The most widely used genomic databases, such as the UK Biobank, include social indicators (school success, fertility, etc.), putting sociologists and demographers in the role of data consumers (Larregue, 2018a) who ‘tinker’ with these data (Larregue, 2017, p. 176). At the same time, some social science surveys now incorporate genomic information, [29] or even ready-to-use polygenic risk scores. Three US surveys in particular are often used by sociogenomicists: the Wisconsin Longitudinal Study, the University of Michigan Health and Retirement Study, and, above all, the National Longitudinal Study of Adolescent to Adult Health, which includes data on twins as well as genomic data. [30]
91 Additionally, sociogenomics is based on procedures that are already well established in closely related specialties, not only behavioural genetics but also political science, economics, and criminology. Empirically, it draws on approaches that fit readily into the mould of ‘general linear reality’ (Abbott, 1988): genomic data can be summarized in one variable, a PRS, which complements the more traditional variables in econometric models. The resulting impact on habits of research and the construction of research objects is thus minimal.
92 One final factor favours the development of sociogenomics ‘at a distance’ from biology. When such studies are submitted to social science journals, the peer review process facilitates their acceptance because the technical nature of the methods used limits the set of potential reviewers to the circle of behavioural geneticists and sociogenomicists (Larregue, 2019).
Conclusion
93 Research often advances on the basis of approximations of reality. But how approximative can a model be about reality before it becomes false and counterproductive? This very general question recalls other debates across the ‘quantitative’ social sciences. Does it make sense to continue to infer causes from regression models even though we know that to do so is strictly impossible (Freedman, 1991, 1997; Clogg and Haritou, 1997; Berk, 2004)? Should significance tests and p values continue to hold a central place when the problems they pose have been documented for nearly 8 decades (Berkson, 1942; Poitevineau, 2004), are regularly pointed out by learned societies of statisticians, and alternatives exist (Wasserstein and Lazar, 2016)?
94 In the natural sciences, the additive polygenic model undoubtedly contributed for some time to progress in genetics. But we have now known for several decades that it cannot account for the complexity of the relevant phenomena—a fortiori when they are social phenomena. The rise of genomic data momentarily led to the belief that new tools could correct these problems, but they are not reducible to technical issues. Other pathways are possible, drawing on new theoretical models and combinations of approaches and tools (Bourgain et al., 2007; Génin and Clerget-Darpoux, 2015a). They imply moving beyond the mere inclusion of a genetic predisposition variable in a regression model.
95 In proposing to set aside the opposition between nature and culture, sociogenomics breaks with the dominant sociological paradigms. [31] The dead end that it (perhaps temporarily) represents for sociology and demography might at least have the merit of recalling an elementary principle of scientific wisdom: in a given state of knowledge, theories, and tools, it is simply impossible to answer certain questions (Lieberson, 1987). However, this situation may not represent an impasse for genetics and biology: since sociologists and demographers are the key specialists in measuring contexts and institutions, it is doubtless they who are most capable of helping the natural sciences model the environment (Mills in BSA, 2017).
Appendix
Glossary
96 allele: one of the possible versions of the same gene.
97 DNA methylation: an epigenetic mechanism used in cells to regulate gene expression.
98 dominance: interaction between alleles of the same gene.
99 epigenetics: the study of the molecular mechanisms that modify the expression of genes without changing the nucleotide sequence (DNA). These changes are reversible, transmissible (during cell division), and adaptive (they vary depending on the context).
100 epistasis: interaction between genes.
101 gene segregation: separation of homologous chromosomes, of paternal and maternal origin, at the time of meiosis, i.e. cell division leading to the production of sex cells, for reproduction.
102 genetic marker: segment of DNA whose physical location (locus) is identifiable on a chromosome.
103 genome-based restricted maximum likelihood (GREML) method: a method for estimating heritability from nucleotide polymorphisms identified in genome-wide association studies.
104 genome-wide association studies (GWASs): studies that map a very large number of nucleotide polymorphisms for each individual studied.
105 high-throughput sequencing and genotyping: Sequencing is a technique for determining the sequence of nucleotides in DNA. Genotyping aims to determine the identity of a genetic variation, at a specific position on all or part of the genome, for a given individual or group of individuals. High-throughput techniques allow the rapid analysis of thousands or even millions of DNA molecules simultaneously.
106 linkage disequilibrium (LD) pruning: a method for filtering the nucleotide polymorphisms used to calculate a polygenic risk score.
107 locus (plural: loci): the precise location of a gene on a chromosome.
108 phenotype: In genetics, this term most often refers to an observable characteristic of an organism, such as height, eye colour, etc.
109 polygenic risk score (PRS): a quantitative variable that summarizes an individual’s genetic predisposition to show a given trait, calculated from the results of a genome-wide association study.
110 single nucleotide polymorphism (SNP): the most frequent form of genetic variation (i.e. differences between individuals) in the human genome (they represent approximately 90% of all human genetic variation); a type of DNA variation in which a given segment of two chromosomes differs by a single base pair.
111 somatic mosaicism: the coexistence of two or more cell populations with different genotypes in the same individual.
112 subject to parental imprinting: A gene is said to be subject to parental imprinting when the copies inherited from the mother and the father are expressed in different ways.
113 trait: In genetics, a trait is a specific characteristic of an individual. Studies typically seek to identify possible genetic predispositions that play a role in producing a given trait. The term is also commonly used in psychology to refer to an enduring aspect of personality.
Notes
-
[1]
Terms followed by an asterisk are defined in the Appendix.
-
[2]
See the papers from the symposium ‘Un siècle de Fisher’ [A century of Fisher], held in Paris on 12 and 13 September 2019 (https://1siecledefisher.sciencesconf.org/). For recordings of the presentations, see https://sfg.igh.cnrs.fr/1-siecle-de-fisher.html.
-
[3]
It was in this paper that Fisher introduced the term variance, which refers to the square of the standard deviation.
-
[4]
Demography is not as clearly separated from sociology in the English-speaking world as it is in France.
-
[5]
Most often measured through the number of years of education. One might wonder whether this object is not simply an extension of work in behavioural genetics on intelligence quotients, which, after numerous criticisms, is currently discredited.
- [6]
-
[7]
This formulation corresponds to ‘broad-sense’ heritability. However, genetic variability G can still be decomposed into its additive, dominant, and epistatic* components. The relationship between additive genetic variability (the addition of the average effects of the two alleles* at each genetic locus*) and variability in the phenotype is known as ‘narrow-sense’ heritability. It is commonly used in animal and plant breeding.
-
[8]
Analysis of variance (ANOVA) is generally not an appropriate statistical tool for measuring causal efficacy. It measures relative and not absolute efficacy, and an effect on variations in a trait and not on its level (Northcott, 2008).
-
[9]
The philosopher of science Evelyn Fox Keller (2010) hypothesized that the pervasiveness and persistence of misuses and misinterpretations, even among the most competent and careful authors, is partly due to the polysemy of the terms used and, in particular, to the inevitable semantic shifts between the common-sense definition of heritability (as that which can be inherited, i.e. transmitted from one generation to another) and its scientific definition (the relationship between genetic variability and phenotype variability).
-
[10]
MZ twins misclassified as DZ twins and vice versa.
-
[11]
The cost of genotyping a human genome fell from USD 10 million to around USD 1,000 between 2007 and 2015 (https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost).
-
[12]
Several hundred thousand, even more than a million.
-
[13]
This is seen in the problem of population stratification, discussed below.
-
[14]
The commonly used threshold is p < 5 × 10−8.
-
[15]
Who represented 88% of individuals in GWASs in 2017, according to Mills and Rahal (2019).
-
[16]
This no doubt explains the fact that all sociogenomics research on school success uses a rudimentary indicator such as the number of years of study.
-
[17]
A search on PubMed for polygenic risk score(s) yielded 2,339 publications on 20 October 2021. The first dates from 2010; more than two-thirds have been published in the last 3 years; the average annual growth rate in the number of publications is 75%.
-
[18]
It may even be reasonable to think that the development of PRSs marked a turning point in sociologists’ and demographers’ use of genomic data.
-
[19]
See Kerminen et al. (2019) on the example of Finland.
-
[20]
Socio-occupational category does not fully encapsulate what social origin represents in a theoretical model, for example.
-
[21]
Such as the assumptions of exclusion restriction or the absence of confounds in the associations between genotype, the trait under study, and intermediate phenotype.
-
[22]
According to Bryant (1985), ‘“Instrumental positivism”… is instrumental insofar as it confines social research to only such questions as the limitation of current research instruments allow, and it is positivist insofar as this self-imposed constraint is indicative of a determination on the part of sociologists to submit to rigours comparable to those they attribute to natural sciences’ (p. 133).
-
[23]
As is the case with the example cited above, in which the link between the presence of the GABRA2 gene and the risk of alcohol dependence can be modified by the social context (support, deprivation, etc.) (Pescosolido et al., 2008).
-
[24]
Albeit less forcefully than in other research specialties, such as behavioural genetics, particularly with regard to educational policies (Asbury and Plomin, 2013; Plomin, 2018).
-
[25]
Fachal and Dunning (2015) and Mavaddat et al. (2019) thus defend the use of PRSs to improve the targeting of people at risk of cancer and their orientation into prevention programmes. A major international research programme (https://www.fondation-arc.org/mypebs) is exploring this question for the case of breast cancer screening.
-
[26]
Due to the limitations of GWASs and PRSs described in the present article.
-
[27]
For positive reviews of the usefulness of GWASs in medicine, see for example Hirschhorn (2009), Visscher et al. (2012), and Visscher et al. (2017). For much more critical views, see Goldstein (2009), Jordan (2010), and Bourgain (2014).
-
[28]
The required statistical skills are relatively sophisticated but quite common in the social sciences in the English-speaking world, where they are even considered the cost of entry into the dominant current in the field.
-
[29]
Due to the decreasing costs of collecting this information, which has become less expensive than collecting questionnaires.
-
[30]
The concentration of sociogenomics research on a small number of data sources is not without its consequences for the quality of the results (sampling problems, standardized indicators, etc.).
-
[31]
Although at this degree of generality, most sociologists would endorse this proposition, and the history of sociology, from Durkheim to Bourdieu via Elias, is marked by thinking on the articulation of nature and culture.