1 The study of homogamy, i.e. couple formation by two people from the same social group, requires a multidimensional approach. In societies without any legal or religious prohibition against the formation of heterogamous couples, we might suppose that couple formation is random and does not, a priori, favour the formation of either homogamous or heterogamous couples. This is not the case: Alain Girard’s work of 1964 marked the beginning in France of an extensive literature on the subject, recently updated by BouchetValat (2014, 2015). It shows that an individual’s spouse is more likely to belong to the same social group (profession and sociooccupational category, level of education, etc.) than if it were left to chance. Where membership of different social groups reflects inequalities, we might think that this tendency for homogamy contributes to the persistence of socioeconomic inequalities within the society. In particular, the propensity for marriage between individuals with similar levels of education (educational homogamy) is likely to reinforce, at the household level, the salary inequalities that exist within the labour market between the least and most qualified individuals.
2 Trends in educational homogamy are well documented and vary widely by country and over time (for a review of the literature, see Blossfeld, 2009). [1] In France, for example, BouchetValat (2014) demonstrated an overall decline in educational homogamy between 1969 and 2011. Other studies conducted in different countries have tried to measure the impact of educational homogamy on the distribution of income between households and its contribution to the general increase in inequalities. They generally converge on the idea that the effect of educational homogamy is relatively small (Breen and Andersen, 2012; Eika et al., 2014; Greenwood et al., 2014 [2]; Harmenberg, 2014; Boertien and Permanyer, 2019). In France, the available research has focused mainly on the effect of homogamy on inequalities between households based on wage income alone (Frémeaux and Lefranc, 2015; Courtioux and Lignon, 2015a) and reveals, like the international research, limited impact.
3 Estimates in the literature of the effects of educational homogamy are generally based on samples that exclude the youngest and oldest individuals to limit generational bias (Eika et al., 2014; Greenwood et al., 2014; Boertien and Permanyer, 2019). The method generally used in constructing the counterfactual scenario ‘without educational homogamy’ involves considering that individuals from older generations may find themselves ‘at random’ in a couple with individuals from younger generations. Restricting the field therefore limits this bias, but the results are not directly comparable with the usual indicators of living standards inequalities calculated in the general population, i.e. among all individuals, irrespective of age.
4 This research note aims to calculate the proportion of living standards inequalities that is attributable to educational homogamy by proposing an original method that allows for analysis of the whole population. It contributes to the analysis of the situation in France by exploring three dimensions largely overlooked in the literature but necessary, it seems, to determine the effect of educational homogamy on income inequalities.
5 First, the specific repercussions of educational homogamy on differences in standard of living have not been explored in depth using French data. Using standard of living encompasses all the factors that contribute to the formation of household income (employed or selfemployed income, replacement income, taxes, and social benefits) and allows us to take household composition into account by applying a correction based on consumption units that include the economies of scale that result from living with others. It enables us to consider, simultaneously, two factors that are not neutral in terms of inequalities: the redistributive role of the tax and social security system, and the pooling of employment income within couples. To date, only Boertien and Permanyer (2019) have analysed the French situation within the context of an international comparison. They showed that, looking only at employment income, the estimated impact of educational homogamy on inequalities is less than the impact on standard of living. However, their indicator relates only to the population aged 30–64, and their definition of educational homogamy lacks detail, with only four different levels of education identified.
6 Secondly, research measuring the effect of educational homogamy on living standards has compared the observed situation with a counterfactual scenario in which spousal education is determined at random, irrespective of the individual’s birth cohort (Greenwood et al., 2014). This approach does not enable us to fully comprehend the role played by the development of higher education and the progression of educational levels. The probability of being in a couple with an individual of a particular level of education, even when randomly assigned, depends on the qualifications of the available spouses, which tend to evolve over generations. To better reflect these changing patterns of education levels, some works have sought to make the counterfactual simulation conditional on the reference individual’s age by distinguishing two age categories (Boertien and Permanyer, 2019). This paper proposes an alternative method—based on multinomial logistic regression analysis—that can more closely explore the generational effects [3] associated with levels of education. In the case of France, this precise control is important given that changes in homogamy patterns are not consistent across educational levels. While, overall, educational homogamy has tended to decline in France over time, BouchetValat (2014) noted an increase among graduates of the elite grandes écoles.
7 Thirdly, a complete analysis of the effect of educational homogamy needs to document the ‘margins’ of the distribution. Its impact for grandes écoles graduates (BouchetValat, 2014) suggests that the income concentration effect is potentially greater at the top of the distribution. Yet the indicators generally used in the literature, such as the Gini index, do not incorporate this dimension particularly well, especially given that the measurement of homogamy is not based on any distinction between graduates of the grandes écoles and those of other higher education institutions. This paper produces an original analysis seeking to verify the existence of ‘margin effects’ of the impact of educational homogamy on inequalities. To do this, we combine a detailed approach to educational level (using nine categories, more than the number of levels of education generally used in the literature [4]) with the calculation of an indicator that evaluates the concentration of inequalities at the top and bottom of the distribution. Three standard indicators are used in addition to the Gini: the proportion of wealth held by the richest 10% of households, the proportion of affluent or ‘welloff’ [5] households, and the proportion of households at risk of income poverty.
8 To deepen our analysis of the repercussions of educational homogamy on inequalities in the general population based on these three approaches (examination of standard of living by precisely defined educational level, consideration of generational effects, and analysis of the margins), this research note uses the French Tax Income Surveys (Enquêtes revenus fiscaux [ERF]) and Tax and Social Income Surveys (Enquêtes revenus fiscaux et sociaux [ERFS]) available for the years 2003 to 2013. In Section I, we present and discuss the methodology used to measure the specific effect of educational homogamy on inequalities. The second section sets out the results obtained and demonstrates that the effect of educational homogamy on inequalities is influenced by a generational effect that is more obvious at the top of the income distribution.
I – Constructing a counterfactual population to measure the extent of educational homogamy
1 – The challenges of controlling for generational effects
9 To measure the specific effect of educational homogamy on inequalities, we need to construct a counterfactual population in which unions would be randomized in terms of educational qualifications. When analysis is focused on one dimension of homogamy—the educational dimension—the method generally used is imputation randomization. [6] This involves constructing the population of counterfactual couples based on observed couples, by applying a correction coefficient c to their weight within the initial population (notated P). From a technical point of view, this requires a qualitative control variable for the homogamy structure (for example, qualification E) provided for each of the individuals forming the couple (E_{i} and E_{k}). Two elements need to be identified to construct a counterfactual population: (a) the theoretical probability of random matching (probability of observing the couple E_{i} and E_{k}), notated p_{t}, under the hypothesis of absence of educational homogamy (i.e. the probability for a man of having a wife of educational level E_{k}, irrespective of his own level of education E_{i}); (b) the matching probability observed with educational homogamy, notated p_{o}. The imputation method involves correcting the weights by a coefficient c equal to the ratio between p_{t} and p_{o} [7]. In the literature (Greenwood et al., 2014; Harmenberg, 2014; Boertien and Permanyer, 2019), estimating the coefficient c is based on the construction of two contingency tables: the first giving the frequencies of matching between educational levels in the observed scenario (to estimate p_{o}); and the second, the frequencies where there is independence of E_{k} and E_{i} (to estimate p_{t}).
10 This habitually used method has at least two limitations. First, it takes no account of changing trends in the distribution of the educational levels of potential spouses over generations. For example, Greenwood et al. (2014) calculated coefficient c without making p_{t} and p_{o} conditional on birth cohort, while Boertien and Permanyer (2019) differentiated only two age categories. The second limitation concerns the highly aggregate levels of the educational variables used for the analysis. This lack of precision is justified considering the method and data used, particularly in order to retain sufficient subpopulation numbers when calculating frequencies of educational matches (see below). However, it does impact the extent of educational homogamy identified. In France, for example, at equivalent educational level, graduates of professional master’s degree courses at universities and graduates of business schools do not have the same propensity for homogamy. Business school graduates are more homogamous and have better salaries on average, all else equal (Courtioux and Lignon, 2015a). To construct the counterfactual population, it is the distance between the degree of homogamy and a randomized matching that determines the correction coefficient c. Where the construction of the counterfactual scenario is based on a categorization that differentiates business schools from other master’s courses, the correction coefficient applied to couples formed of business school graduates will be closer to 0 (and further from 0 for couples formed of university graduates with professional master’s degrees) compared with a classification in which no distinction is made between these qualifications. A detailed approach to educational level therefore drastically reduces the proportion of business school graduate couples formed ‘by chance’ in the counterfactual population on which an indicator of inequality without educational homogamy is based. In the case of France, using a more precise level of education to measure educational homogamy—and one that differentiates between graduates of the grandes écoles and other graduates—is likely to reveal a greater potential influence of educational homogamy on inequalities.
11 To overcome these limitations, we need a method for constructing a counterfactual population that enables the detailed correction of educational homogamy while taking into account changes in educational levels over time.
2 – Controlling closely for generation in French data
12 To evaluate the effect of homogamy on inequalities, an imputation method is applied to the ERF and ERFS survey data for each year in the 2003–2013 period. These data match households from the Labour Force Survey (enquête Emploi) with the tax returns of the individuals within these households. The data thus provide reliable information on income (being established on the basis of administrative data and a significant number of observations [around 50,000] per year). They also provide information on the living standards of households, the composition of their resources, and their characteristics. On the latter point, the ERFS enables close analysis of educational homogamy by identifying nine levels of qualifications: (1) no qualifications; (2) CAP/BEP/ Brevet de technicien (lower secondary vocational); (3) Bac (upper secondary); (4) Bac + 2 (BTS, IUT; 2 years postsecondary); (5) Bac + 3 / Bac + 4 (3 or 4 years postsecondary; (6) Bac + 5 – University; (7) Bac + 5 – grandes écoles; (8) nonmedical doctorate; (9) medical doctorate. This represents 81 matching possibilities and categories for the couples.
13 All households composed of at least one differentsex couple are included. [8] Taking into account both the heterogeneity of educational levels and how they change over generations raises the issue of subpopulation size for certain matching categories in the contingency table used to calculate the coefficient c. Rather than opting for the aggregation of educational categories, this article proposes an original method for estimating the coefficient c (and therefore the theoretical (p_{t}) and observed (p_{o}) probabilities) based on multinomial logistic regression. These models can be used in a descriptive approach to estimating probabilities, without an underlying causal assumption (AfsaEssafi, 2003).
14 To construct the counterfactual population while controlling for generational effects (notated P^{c}), we use the pooled ERF and ERFS surveys to estimate two multinomial logistic models that determine, for a male individual in a couple, [9] the probabilities that his wife will have a particular level of education. Model 1 (Equation 1) is designed to estimate p_{o}, i.e. matching probability in the presence of educational homogamy, and takes the following form:
16 Where the probability that the spouse k of individual i has an educational qualification of E depends on E_{i} (the qualification of individual i) and g_{i}, a generational trend [10] enabling the identification of changes in spousal availability in terms of qualifications. This generational effect is therefore specific to each level of education. To identify potential effects of ‘slowing’ or ‘amplification’ of this trend, we also introduce the square of this generation variable (g^{2}_{i}). This choice allows for the inclusion of individuals from the youngest generations studied who are still students. The estimators corresponding to the vectors α_{k}, β_{k}, δ_{k} and γ_{k} are reported in the Appendix Table A.1 (Panel A).
17 Model 2 (Equation 2) is designed to determine p_{t}, the theoretical matching probability in the absence of homogamy (Appendix Table A.1, Panel B). It is based on a specification that uses only the generational trend as an explanatory variable and therefore takes the following form:
19 Based on estimators of functions f_{o} and f_{t}, it is possible to calculate the probabilities [11] p_{o} and p_{t} from which the coefficient c can be deduced and to construct the counterfactual population P^{c} by applying this correction coefficient to the weights of households in population P. In the following section, P^{c} is the population in which we can calculate what are called ‘inequality indicators with random educational matching controlling for generation’.
20 To identify more clearly the contribution made by introducing a generational control (i.e. the coefficient vectors δ_{k} and γ_{k}), we also constructed a counterfactual population P^{d} without trying to control for generational effects in calculating the effects of educational homogamy. This method is based on the comparison of estimates in Model 3 (Equation 3, with educational level, observed matching) and Model 4 (Equation 4, without educational level, theoretical random matching), the specifications of which do not include generational trends and can therefore be formulated as follows:
22 In the following sections, the results produced from Equations 3 and 4 are called ‘inequality indicators with random educational matching without controlling for generation’. The results are very similar to those produced by the usual method used in the literature using the contingency tables presented in Section II.
3 – Inequality indices used
23 From populations P, P^{c}, and P^{d}, it is possible to calculate various inequality indicators. Works on educational homogamy have generally used the Gini coefficient. To go beyond this general understanding of inequalities, other indicators, permitting better analysis of the margins of the distribution, are used alongside the Gini coefficient. Two additional indicators are used to take these margins into account: (a) the share of the welloff, an indicator used in recent debates on the shrinkage of the ‘middleclass’ and defined as the ratio between the number of individuals with a living standard at least twice the median level and the total number of individuals in the population (Atkinson and Brandolini, 2011; Courtioux et al., 2020); (b) the traditional concentration of wealth indicator, which is the income share (measured by living standard) held by the richest 10% of households. [12] These indicators are supplemented by the income poverty rate, which allows us to consider the effects of educational homogamy at the bottom of the income distribution.
24 We used bootstrapping to test the statistical significance of differences in these indicators between observed and counterfactual populations. By design, the populations P^{c} and P^{d} are produced from population P. Based on their weight in P, we randomly sampled, with replacement, the households present in order to obtain 1,000 samples of population P (P_{1}, P_{2}, … P_{1000}) of the same size as P. Based on results previously obtained, we can deduce for each of the samples of P the corresponding samples of P^{c} and P^{d} (P^{c}_{1}, P^{c}_{2}, … P^{c}_{1000}; P^{d}_{1}, P^{d}_{2}, … P^{d}_{1000}). By calculating the inequality indices on these samples, it is then possible to assess the statistical significance of the results presented in the following section.
II – Results
25 The first important result concerns the estimation of the multinomial logistic models corresponding to Equations 1–4. For the four models estimated, all coefficients are statistically significant at the 1% level (Appendix Tables A.1 and A.2). Using these estimators to correct the weights and obtain the two counterfactual populations P^{c} and P^{d} (see above) produces a robust analysis. Likewise, when we look at the inequality indicators calculated on these counterfactual populations (Figure 1), the issue is to determine if it is possible to interpret these differences directly or if they are too small to be statistically significant. The result of difference of means tests performed using bootstrap methods indicates that the differences between these indicators are statistically significant for all years in the period analysed here (Appendix Table A.3).
26 The results presented here demonstrate the importance of controlling for the effects of generation in estimating the impact of educational homogamy on inequalities in living standards when producing indicators in the general population.
27 First, Figure 1 clearly shows that the effect of educational homogamy on inequalities in living standards is greater within generations than between generations: the grey curve is systematically higher than the black curve. Technically, random educational matching, which corrects educational homogamy, reduces inequalities more when this matching takes place within one generation than when it involves all the generations. This difference is in the order of 0.3–0.7 percentage points over the period for the Gini coefficient, 0.2–0.6 for the income share of richest 10%, 0.1–0.4 for the poverty rate, and 0.2–0.5 for the share of the welloff. One explanation for this result is that qualifications were less common among older generations and have less impact on the income scale. Calculating an indicator of inequalities without controlling for generational effects thereby equates to ‘coupling up’ individuals from older, less qualified generations who are in a relatively favourable position on the labour market due to their experience, with younger, more qualified individuals for whom qualifications are a more important factor in labour market entry and differences in income.
Figure 1. Effect of educational homogamy on four indicators of inequalities in living standards
Figure 1. Effect of educational homogamy on four indicators of inequalities in living standards
28 In addition to the clear distinction of these two effects from a demographic point of view, the counterfactual population that ‘makes sense’ is one that takes into account the fact that partnerships are generally formed between individuals from similar generations. The distance between the grey and black curves, as noted above (Figure 1), illustrates the extent of the error likely to occur within the general population when using a method that does not control for the impact of generation. As such, the indicators obtained with random educational matching controlling for generation are the ones that should be compared with the usual indicators calculated in the general population (respectively, the black and dotted curves in Figure 1).
29 From this point of view, Figure 1 reveals an important finding. The increase in inequalities caused by educational homogamy mainly affects the top of the income distribution and, in this respect, validates the hypothesis previously formulated on the basis of the results obtained by BouchetValat (2014). Figure 1 shows that for the three indicators that take the top of the distribution into account (Gini, income share of richest 10%, and the share of the welloff), lack of educational homogamy reduces the inequalities observed, [13] whereas this finding does not clearly emerge when we do not control for generation. Conversely, if we look at the poverty rate, an indicator that describes the bottom of the distribution, with random educational matching controlling for generation it differs very little from the observed poverty rate, meaning that it is difficult to interpret any obvious effect of homogamy.
30 The advantage of the proposed method is that it allows these results to be compared more directly with the usual indicators of inequalities in the general population. Unsurprisingly, they confirm the results found in the literature, namely that the effects of this phenomenon are minor. The extent of the measured effects is reflected in the impact of educational homogamy on the Gini—an impact far lower than, for example, that associated with the redistribution effected through statutory contributions and welfare benefits. Blasco and Picard (2019) showed that the latter, overall, reduced the Gini by around 20% in 2016. However, the influence of educational homogamy on inequalities remains more significant than the effects of successive reforms of the welfare and tax system over recent years. Research by André et al. (2017) highlighted that the measures implemented in 2013, 2014, and 2015 reduced the Gini by around 0.002 points per year, whereas random educational matching reduces the Gini by 0.004 points on average; as such, over 3 years, the reforms have largely offset the effect of educational homogamy on inequalities.
31 Regarding the scale of the effects measured for the other inequality indicators in Figure 1, no comparable research is available as yet. Nonetheless, in the upper income distribution, based on the increase in the share of the welloff that followed the 2008 financial crisis (Courtioux et al., 2020), we can observe that the mean impact of educational homogamy on the share of the welloff (0.2 percentage points) remains less than the ‘crisis effect’, which was 0.5 percentage points between 2008 and 2011.
Conclusion
32 This research note presents an original method for estimating the impact of educational homogamy on living standards in the general population, applied to French data for the 2003–2013 period. The results show that educational homogamy produces a very slight but statistically significant increase in inequalities of living standards, primarily through a generational effect, in the upper portion of the income distribution.
33 It is important that we properly distinguish the influence of educational homogamy from other effects, particularly by controlling for changes in education levels across generations. The results invite further analysis on generational effects and on the consequences of educational homogamy on high income, particularly from a comparative European perspective. In this respect, France is characterized by an overall decline in educational homogamy since the 1970s, except for an elite group of individuals within which it has increased (BouchetValat, 2014). A comparative analysis would enable us to identify the diversity of the impact of educational homogamy within varied institutional contexts, looking not only at degree of similarity between spouses but also at changes in the educational system and at the architecture of the tax and welfare system.
Appendices
Table A.1. Matching estimators with generational trend (multinomial regression)
(a) The variable corresponding to this estimator is calculated by taking year of birth and subtracting 1970.Note: Standard deviations of the estimators are in parentheses.
Significance: *** significant at the 1% level; ** significant at the 5% level; * significant at the 10% level.
Coverage: Men in a couple with a woman.
Table A.3. Bootstrap mean differences in inequality indicators in the different populations
Notes: I is the inequality indicator analysed; P represents the population produced from ERF and ERFS; P^{c} the counterfactual population with random educational matching controlling for generation, and P^{d} the population with random educational and generation matching. The columns show the mean differences between the inequality indicators for the various populations P (observed population), P^{c} (counterfactual population controlling for generational effects) and P^{d} (counterfactual population without controlling for generational effects), and whether they are significantly different from 0. Bootstrap standard errors are in parentheses.Significance: * signifies that the hypothesis of no difference can be rejected with a type 1 error risk of less than 10%; ** respectively with a risk of less than 5%; and *** with a risk of less than 1%.
Notes

[1]
By way of example, educational homogamy has been increasing in the United States since the 1960s (particularly for the highest levels of education; Chiappori et al., 2020), while in Britain it has been decreasing since the 1970s (Halpin and Chan, 2003). A fall in homogamy among the least educated individuals and an increase among the most educated was observed in Spain between 1920 and 1970 (Esteve and Cortina, 2006).

[2]
This article formed the subject of a corrigendum published on 18 June 2015 (Greenwood et al., 2015), as the initial version of the research greatly overestimated the impact of homogamy on inequalities in living standards.

[3]
The term generational effect is preferred to age effect, given the method and data used (see below). In these models, each individual’s year of birth is applied, which, unlike age, has the advantage of remaining constant irrespective of the reference year.

[4]
For example, Greenwood et al. (2014) used five categories, Cornelson and Siow (2016) used three, and Boertien and Permanyer (2019) used four.

[5]
As defined by Atkinson and Bandolini (2011), referring to households whose standard of living is 2 times higher than the median standard.

[6]
See in particular Harmenberg (2014) and Courtioux and Lignon (2015b) for a comparison of the imputation method with other methods.

[7]
For educational levels associated with small populations but strong homogamy (such as the grandes écoles), the coefficient c is much lower than 1 in that the theoretical probability (p_{t}) is much lower than the observed probability (p_{o}).

[8]
Unlike some articles (Greenwood et al., 2014), we decided not to exclude, a priori, complex households containing multiple couples, which represent 5% of households in the ERFS surveys from 2003–2013. On the other hand, samesex couples are not taken into account. It is, unfortunately, impossible to study them on the basis of the ERFS given the small numbers involved (0.4% of the database) and the potential errors in the reporting of sex, which can have significant consequences given the small number of samesex cohabiting couples.

[9]
The choice of sex is standard. Given that this work is only being conducted in established couples, there is no reason to think that this choice influences the results.

[10]
This variable, expressed in number of years, corresponds to the difference between the individual’s year of birth and the year 1970, used here as a standard reference point.

[11]
For a more detailed description of the method, see AfsaEssafi (2003).

[12]
The study of indicators of living standards inequalities is conducted here using households as the base unit.

[13]
Excepting the year 2006 for the Gini and income share of richest 10% (Figure 1). Further analysis shows that this year is particularly sensitive to the weighting corrections applied to medical doctorate graduates. The mean effect observed for the other years suggests that the year 2006 is not representative of the mean impact of educational homogamy on inequalities.