1The legal recognition of same-sex unions in many countries has led to strong demand for statistics on these unions. Goldani et al. (2013) have counted 25 countries that produce data on the number of same-sex couples. Whether drawn from a census or a major survey, these data are always collected via questionnaires designed to study de facto couples (Banens and Le Penven, 2013). Since 2004, the French census asks the question “Do you live with a partner?” which, in theory, provides a means to identify same-sex couples in the same way as heterosexual ones. In practice, this question is problematic.
2The first problem is that of non-response due to the social stigma associated with homosexuality (Black et al., 2000; Goldani et al., 2013). Non-response is a deliberate act, and therefore difficult to eradicate. For their most recent censuses, Brazil and Uruguay ran advertising campaigns to encourage same-sex couples to declare themselves (Goldani et al., 2013), but the effect of these campaigns and the scale of non-response remain unknown (Cortina and Festy, 2014). To date, no reliable measures are available for estimating the share of unreported same-sex unions, and no country has proposed any estimation or correction methods.
3The second problem is the existence of “false” same-sex couples, i.e. heterosexual couples which are counted as same-sex because one of the partners is sex-miscoded. This error concerns only a tiny minority of heterosexual couples. However, given that the number of “real” same-sex couples is very small, these “false” same-sex couples represent a large share of the total. In France, “real” same-sex couples are estimated to represent 0.6% of total couples (Buisson and Lapinte, 2013). The share of “false” same-sex couples is unknown, but estimates from several countries give a range of between 0.2% and 0.6% (Table 1), a similar proportion to that of “real” same-sex couples.
Share of “false” same-sex couples among all couples, and among all couples recorded as same-sex
Share of “false” same-sex couples among all couples, and among all couples recorded as same-sex
4Non-response and sex errors differ in many respects. The first underestimates the number of same-sex couples while the second overestimates it; the first is deliberate, the second is accidental; the first restricts the study to “real” self-reported same-sex couples while the second “pollutes” the study of same-sex couples by including large numbers of heterosexual couples; the first cannot be identified or corrected while the second can. Correcting the sex error is a priority for many researchers and statistical offices (Black et al., 2000; Cortina and Festy, 2014; Statistics Canada, 2001; Turcotte et al., 2003; Festy, 2007). Among other things, it improves the accuracy of statistics on same-sex couples.
5To identify sex miscoding, an additional control question must be included in the census or survey. This is the method used by Statistics Canada since 2001: same-sex couples are identified twice, via an explicit statement that the respondent lives with a same-sex partner, and via the sex of each partner declared elsewhere in the questionnaire. There is a risk of error for each answer: 1% for the type of couple, 0.57% for the sex of one of the partners (Statistics Canada, 2001), but sex errors can be eliminated with near certainty by cross-matching the responses to the two independent questions.
6New Zealand, Brazil and Uruguay followed the example of Canada for their censuses. Other countries, such as the United States, chose to make ex-post corrections using the couple’s first names (Hogan et al., 2012). Again in the United States, the biannual American Community Survey is currently testing new methods based on the Canadian example.
7No European country has followed suit (Cortina and Festy, 2014), although some major surveys, such as the French Family and Housing Survey (Enquête Famille et Logements, EFL) have included control questions (INSEE, 2011; see below).  Thanks to record linkage with the census, the data from this survey was checked and corrected, making it possible to study cohabiting same-sex couples in France in 2011.  But the survey also provides an opportunity to study sex errors in the census, and this is the purpose of our study. We begin by measuring and analysing sex miscoding in the census of persons who also took part in the EFL survey. We then measure the sex miscoding of the respondent’s partner as a secondary means to verify the accuracy of responses. Last, using the observed frequency as an indicator of probability, we show that all couples who are recorded as same-sex do not have the same probability of being “false”.
I – The Famille et Logements survey
8The EFL survey is conducted in conjunction with the census and is designed to obtain information on the diversity of family structures. As the sample is very large (359,770 respondents), the survey data can also be used to study same-sex couples. The same self-reporting method is used for both the survey and the census.  The census enumerator gives out the survey and census questionnaires at the same time, and returns to pick them up a few days later. Each respondent thus fills out two questionnaires, providing two sources of information.
9The respondent’s sex is not reported in the same way in both questionnaires. For the census, the respondent ticks the box homme (male) or femme (female). In doing so, he or she may make a mistake, but errors may also be introduced when the data are captured.  In the EFL survey, respondents are not asked to report their sex, but the questionnaire is completed either by women only or by men only: in each household, only persons of a given sex – decided according to place of residence – are surveyed. If the enumerator gives out the wrong questionnaire, it will be rejected at the data capture stage. However, if a man fills out a questionnaire intended for his female partner, or vice-versa, the error will not be detected during data capture.
10The two questionnaires also use different techniques to identify same-sex couples. The survey detects them directly by asking respondents if they have a partner, if the partner lives in the same dwelling, and if he or she is male or female. The census, on the other hand, only asks respondents if they have partner. Same-sex couples are detected by identifying households where only two persons of the same sex report having a partner. This method introduces two errors: first, some households include more than two persons with a partner, so the couple cannot be identified; second, the two respondents may each “have a partner” who does not live in the household. These errors can be measured by comparing the census and survey questionnaires.
11A total of 359,770 respondents completed both survey and census questionnaires. In addition, the database contains the census data of the other household members, i.e. a further 471,192 people. The uncorrected raw declarations are also available.
II – Sex miscoding of respondents with a cohabiting partner
12The sex of 810 EFL survey respondents (out of 359,770) was different from that declared in the census, so a mistake had been made somewhere. Before correction, there were a total of 1,236 “ambiguous” respondents (INSEE, 2014), but 426 were removed because the mistake probably originated in the survey. To deduce the origin of the error, the adjustment procedure used information on the sex of the partners of “ambiguous” respondents who had a partner (977). Assuming that sex miscoding is independent of the type of couple (homosexual or heterosexual), the proportion of these 977 individuals in a same-sex couple should be below 1%. Yet if the survey information was accurate, this would mean that 426 (44%) had a same-sex partner. This percentage was deemed unrealistic, and it was concluded that in these cases the error occurred in the survey. After sex correction in the survey, they were considered to be “out of scope” – they were no longer of the right sex to complete the survey in their area of residence – and were eliminated from the database.
13The remaining 810 individuals were kept in the sample, with the sex reported in the survey, although we cannot be sure that it is correct. More than one-third (259 out of 810) did not report a partner, and so were not checked using the method described above. Measurement of the error is therefore limited to persons with a partner (551). According to the survey, all live with an opposite-sex partner, i.e. none were eliminated by the first check. However, in a non-negligible number of cases (109), the partner’s sex also proves to be different from that reported in the census. For example, a man aged 26 reports in the survey that he lives with a woman aged 28, while according to the census, the 26-year-old is a woman and the 28-year-old partner is a man. So we still have the two individuals, but their genders are reversed. This reversal may have occurred during the survey-census linkage procedure because of an error or omission in the questionnaire, in which case it is not a census sex error. Another possibility is that both partners reported the wrong sex in the individual census forms. The first explanation seems more likely than the second, given that it only involves a single error, even though two people are involved, while the second involves two errors. Analysis of the couples concerned confirms this hypothesis. According to the survey, contrary to other heterosexual couples, the women in these couples are generally older than their partner, and more often work in production, processing and construction than in business, secretarial functions and logistics. We therefore conclude that the problem arises from a reversal during record linkage rather than from double errors in census form completion.
14Of the 442 remaining individuals, 16 do not live with their partner and 31 others live in a household where the couple could not be identified in the census. For this reason, our measures are limited to the population reported in both sources as living with a partner, and that were unambiguously identified as a couple.  This limitation is not a problem for us, since only the sex miscoding of persons living with a partner is capable of generating “false” same-sex couples. In the end, we thus have 224,023 individuals living with a partner, among whom 395 (0.176%) are recorded under the wrong sex in the census.
15At this stage of the analysis, the sex of all individuals in our sample is assumed to be correct in the survey. We can now apply individual weights.  After weighting, the proportion of respondents to the EFL survey living with a sex-miscoded partner is 0.177%.
Sex miscoding by certain sociodemographic characteristics
16As shown in Table 2, sex miscoding in the census is less frequent among men (0.157%) than women (0.198%), and the difference is significant at the 1% level. For each sex separately, another sociodemographic variable, that of the sex of the children living in the household, also has a significant impact. Having more boys than girls significantly lowers the error rate for men and significantly increases its for women. The reverse is true for households with more girls than boys. This is probably a repetition effect: ticking the “male” box several times on the children’s individual bulletins may increase the risk of ticking it one time too many. This hypothesis implies that in certain households, all the individual census forms were completed by just one person.
17Being an immigrant with no qualifications seems to raise the risk of error, although the differences are not significant. None of the other characteristics seem to have a notable effect on the risk of error.
18Hence, a heterosexual couple has a 0.157% risk of appearing to be a same-sex female couple, and a 0.198% risk of appearing to be a same-sex male couple. In all, the risk of being a “false” same-sex couple is 0.355%, a level within the range observed in other countries (Table 1).
Sex miscoding rate in the census by sex and certain sociodemographic characteristics
Sex miscoding rate in the census by sex and certain sociodemographic characteristicsInterpretation: Among men living with a female partner, 0.157% show up as a “woman” in the census. (a): numbers too small (below 200).
Significance levels: * 5%; ** 2%; *** 1%.
Coverage: Men and women in heterosexual cohabiting unions.
Partner’s sex miscoding
19The EFL survey provides a second way to measure sex miscoding in the census. Survey respondents report the sex of their partner, so their answer can be compared with the sex reported in the census by the partner him/herself. Our analysis is limited to couples where no other inconsistencies were detected in the respondent’s sex or the partner’s year of birth.  This gives us a total of 217,917 respondents, of whom 1,005 (0.48% after weighting) reported a partner’s sex that was different from that reported in the census by the partner him/herself.
20Here again, the mistake may originate in the census or in the survey. Since neither risk is linked, in principle, to the type of couple, it is again likely that the 1,005 individuals are, in more than 99% of cases, in a heterosexual union. If the couple appears as same-sex in the survey, the error probably originates in the survey; if not, then it likely originates in the census. Only the latter errors are of interest to us. There are 374 errors of this type which, in a sample totalling 217,917 respondents, gives an error rate of 0.191% (weighted), that can be broken down by the sex of the partner who made the mistake: 0.221% for women and 0.161% for men. These error rates are practically the same as for the respondent him/herself.
III – Consequences for the enumeration of same-sex couples
21Table 3 gives the numbers of people in same-sex cohabiting unions according to the census (column 1), the EFL survey (column 2) and according to both sources (column 4). Columns 2 and 3 show those detected in the census that were not confirmed by the survey. Those in column 2 were discarded for miscoded sex; those in column 3 because the person was found to be in a non-cohabiting couple. Columns 5, 6 and 7 show the persons identified in the survey but not in the census. Those of column 5 had not been detected because the household included more than two persons living with a partner, making it impossible to identify the couple; those in columns 6 and 7 because the person him/herself (6) or their partner (7) had not reported living with a partner.
Number of persons in a same-sex cohabiting union1,2,3,4,5,6,7,8,9
Number of persons in a same-sex cohabiting union1,2,3,4,5,6,7,8,91. Persons in cohabiting same-sex union according to the census.
2. Miscoded sex of respondent or partner.
3. Respondent in a non-cohabiting union.
4. Persons confirmed by the survey.
5. Persons not detected in the census because household includes more than two persons living with a partner.
6. Persons not detected in the census because the respondent did not report living with a partner.
7. Persons not detected in the census because the partner did not report living with a partner (but the respondent did).
8. Total not detected in the census.
9. Total number of same-sex cohabiting couples according to the survey.
Interpretation: In 2011, 279,300 persons were identified as same-sex couples in the census, of whom 117,738 (42%) were corrected because of a miscoded sex by the respondent or his/her partner, and 6,249 (2%) because cohabitation was not confirmed. The survey, for its part, identified 17,762 persons in cohabiting same-sex couples not detected by the census, which corresponds to 11% of the persons in this situation detected by the two sources.
Coverage: Individuals included in the census, corrected, confirmed or newly identified by the EFL survey, by conjugal status and presence of children in the household (all children aged 0-17, plus those aged 18-24 reported as the children of at least one person in the couple); metropolitan France.
22In principle, none of the columns can be considered exact. However, column 4 is unlikely to contain errors since it includes persons for whom both sources provide consistent information. They represent a total of 155,300 persons (after weighting), comprising 63,700 women and 91,600 men. Among these, 12,300 women (no men) live with at least one child aged below 25. Adding the persons identified by the survey only, we have a total of 71,900 women and 101,200 men living in same-sex couples, half of whom (50.0%) are in a civil partnership (PACS). As there is no “PACS” category in the census questionnaire, 28% of persons in PACS unions declared that they were married.  So almost all the “married” persons in Table 3 are in fact in civil partnerships.
23Sex miscoding is by far the most frequent type of error found in the census: 42% of identified persons are in a “false” same-sex couple due to miscoded sex. The second type of error, due to inconsistent answers on conjugal status, accounts for just 2%. Contrariwise, 10% of persons were not detected in the census because they themselves (8%) or their partners (2%) did not report living with a partner.  However, this correction can be variously interpreted since, unlike sex miscoding, we are dealing with contradictory declarations and not mistakes. The last type of error – incontestable this time – is marginal: less than 1% of persons in a same-sex union could not be detected in the census because the household included more than two people living with a partner.
24Sex miscoding is the main type of error, and is also the only type of error whose effects vary by sociodemographic category. The proportion of “false” couples is just 7% for single men with no children in the household, but 100% for those living with children. For women, these proportions are 13% and 71%, respectively. The other categories are situated between these two extremes. If observed frequency is taken as an indicator of the probability of being “false”, then it is strongly correlated with the sociodemographic category to which the person belongs.
25The French Famille et Logements survey provides a unique opportunity to study sex miscoding in the census. We find that the error rate – 0.355% of couples – is well within the range observed in other countries. While the rate is low, it still represents 117,700 persons in “false” same-sex cohabiting couples, which must be set against the 173,100 that the survey finally considers to be living in a “real” same-sex cohabiting union. For couples living with children, the true/false ratio is even more unfavourable: 59,500 versus 14,300. It is therefore essential to correct sex miscoding before using the census – or any other major survey – to study same-sex cohabiting couples.
26We observe that sex miscoding appears to be random, with the exception of the respondent’s sex and that of the children living in the household. We also observe that sex miscoding, unlike other errors, produces large disparities in the probability that an apparent same-sex couple is “real” or “false”. This characteristic could be exploited to develop a census data correction method in the absence of control data.
Centre Max Weber-CNRS, Université Lyon 2, France.
Correspondence: Maks Banens, Centre Max Weber-CNRS, Université Lyon 2, 14 av. Berthelot, 69007 Lyon, email: Maks.Banens@univ-lyon2.fr
MV2 Conseil, Montrouge, France.
In 2015, the French census introduced two new categories concerning conjugal life, but they cannot be used to detect sex miscoding.
The controls and corrections are described in the INSEE reference (2014, pp. 19-22). However, the same controls were not applied to non-cohabiting couples.
The census takes the form of annual surveys whose mode of organization will not be discussed here since it is unlikely to have any impact on sex miscoding. For the sake of simplicity, the term “census” in this article will refer to the 2011 annual census survey.
Sex miscoding due to incorrect data capture are rare in practice since this is the checkbox variable for which INSEE has the lowest error rate tolerance (http://www.insee.fr/fr/ppp/bases-de-donnees/recensement/resultats/doc/pdf/2.3-controle-qualite-de-la-saisie.pdf).
To determine conjugal status in the EFL survey, we used q3couple_r=11 which corresponds to the cohabiting couple response with no possible ambiguity; in the census, we selected persons who reported living with a partner in a household where only one other person reported being in the same situation. This limitation keeps couple identification errors to a minimum.
Women were deliberately over-represented in the EFL survey sample (two women for one man), so the weights are very different. So long as the actual sex of respondents in the study is uncertain, weights cannot be applied.
We used the raw variable (q5sexe_c_x), as the cleansing procedure (INSEE, 2011) corrected the errors analysed here. We excluded the modalities of the raw variable that indicate an ambiguous response: deletions, more than one box checked, etc. (q5sexe_c_x =. | 11 | 22 | 23 | 32 | 33).
The survey took place in 2011, before same-sex marriage was legalized in 2013. Note also that only the census contains information on marital status.
The asymmetry is due to the fact that for households comprising two persons of the same sex, an error is immediately counted in the census if at least one of the two respondents did not report living with a partner.