1Age at entry into union is a fundamental variable in the timing of conjugal and family life. Do censuses and population surveys yield the same estimates of age at first union ? Should one of these sources be preferred over the other ? For countries where it can be difficult to record ages and where the study of nuptiality is generally based solely on survey data, these questions deserve special attention. Véronique Hertrich and Solène Lardoux compare estimates of age at marriage drawn from 450 data collection operations – both surveys and censuses – carried out in 55 African countries since the 1950s, and demonstrate that these two types of source each introduce biases that pull in opposite directions. These inconsistencies originate in reporting errors at the time of data collection and are particular to each type of source. So there is no reason to prefer surveys over censuses when analysing the timing of nuptiality ; rather, the two sources should be used in tandem.
2The production of national population statistics, although late in getting off the ground (in many countries the first survey dates from the 1960s and the first census from the 1970s), has made considerable progress in Africa over recent decades. As part of this process, in addition to ten-yearly censuses, huge demographic research programmes have been established, from the World Fertility Surveys (WFS, 1975-1985) through to the latest rounds of Demographic and Health Surveys (DHS) ; there have also been more specific programmes, such as the League of Arab Nations’ PAPCHILD (Pan Arab Project for Child Development) and PAPFAM (Pan Arab Project for Family Health) surveys and the MICS (Multiple Indicator Cluster Survey) studies under the guidance of UNICEF, as well as independent national surveys. As a result, a significant quantity of national demographic data is now available at the scale of the whole continent and for most African countries. In the 55 countries of Africa, a low estimate [1] puts the number of censuses and national surveys carried out between 1950 and 2010 at over 500, an average of more than nine for each country.
3Although some of the information collected by censuses and surveys is identical, these different data sources are rarely used in an integrated and systematic manner to examine long-term demographic trends at the scale of the African continent. This is certainly the case for nuptiality trends. Period estimates of age at first union can be obtained from both censuses and surveys by using the standard statistical table of marital status by sex and age. However, after the seminal work in this field in the 1980s (Lesthaeghe et al., 1989 ; van de Walle, 1993), most publications describing African nuptiality trends have used only a selection of the existing data. [2]
4Why is the potential for comparative analysis offered by available statistical operations so neglected ? Two main reasons can be suggested : first, data accessibility (i.e. use of data is restricted by difficulties in gaining access to it), and second, source comparability (the data collection protocols of the various sources are not uniform enough to guarantee that data from independent operations will be comparable).
5For a long time, dissemination of census and survey results was often delayed and limited in scope, making analysis difficult. But the situation has changed considerably. Since the launch of the DHS, swift publication of results and easy access to data have become the norm for most African surveys. Census publications are available more quickly and easily on the websites of national statistical institutions, while the IPUMS project (Integrated Public Use Microdata Series) provides access to census microdata on a growing number of countries. Finally, with specific regard to nuptiality, the United Nations’ database (2008, 2013) that brings together indicators drawn from several different sources for each country should now make it easier to develop systematic comparative approaches (Ortega, 2014).
6But even with this increasing access to data, there is still the question of whether data from censuses and surveys are sufficiently comparable to be treated as a single corpus. Does the heterogeneity of data sources bias the reconstruction of long-term trends in age at first union ? Are there consistent types of differences between the estimates produced and, if so, what causes them ? Is it possible to determine whether censuses should be preferred over surveys, or vice-versa, to obtain high-quality estimates ?
7These questions will be addressed using two approaches. The first compares estimates of age at first marriage taken from censuses and surveys, drawing on a pan-African database on nuptiality that brings together more than 450 censuses and surveys from the 55 countries of Africa. The second is based on analysis of a corpus of 15 MICS surveys, which recorded women’s marital status on both a household and an individual questionnaire. After determining the frequency and the direction of differences between census-based and survey-based estimates, these secondary analyses of individual data will provide a more precise view of the mechanisms that give rise to such distortions. We will begin by examining the factors that may affect the quality and comparability of estimates of age at marriage drawn from survey and census data.
8The terms “marriage” and “union” are used interchangeably here when discussing women’s first unions, without reference to whether or not they have been formalized.
I – Should we expect censuses and surveys to yield different estimates of median age at first marriage ?
Cross-sectional estimates : a means of avoiding retrospective reporting biases
9In the absence of vital statistics, there are two main methods for measuring age at marriage within a population, using either respondents’ retrospective reports (age at or date of marriage) or the proportions of never-married individuals by age recorded at a given point in time. Retrospective data are provided by most demographic surveys and can be used to estimate trends directly. However, their quality is limited not only by the recall errors common to all retrospective reporting but also by difficulties, particular to sub-Saharan Africa, in dating unions. For one thing, marriage processes involve various ceremonies and stages, and this leads to flexible and varying interpretations of the timing of entry into union (van de Walle, 1968 ; Mair, 1971 ; Meekers, 1992 ; Hertrich and Locoh, 1999 ; Antoine et al., 2009 ; Hertrich, 1998, 2007b ; Lardoux, 2009). Furthermore, it remains difficult to determine a precise date or age in contexts where these concepts may only recently have come into use (Roger et al., 1981 ; Ewbank, 1981 ; Waltisperger, 1988), so there is a risk of recording normative, imprecise responses. Methodological studies have concluded that these retrospective data on age at marriage in Africa are of poor or, at best, middling quality (van de Walle, 1968, 1993 ; Lesthaeghe et al., 1989 ; Blanc and Rutenburg, 1990 ; Gage, 1995 ; Hertrich and Lardoux, 2009 ; Chae, 2011). Ron Lesthaeghe (1989) and Étienne van de Walle (1968, 1993) advocate rejecting them in favour of cross-sectional indicators.
10Using cross-sectional data we can avoid the risks of mistaken interpretation and dating of past events by focusing on the structure of the population by sex, age and marital status at the time of the survey or census. Under the approach proposed by Hajnal (Hajnal, 1953 ; Tabutin and Vallin, 1975 ; United Nations, 1984 ; Gubry, 1984), the series of proportions of never-married individuals by age can be equated with that of a theoretical cohort and summarized by a standard indicator such as mean age or median age at first marriage. Where entry into union is concentrated within a narrow age range, the indicator captures the nuptiality of the cohorts reaching those ages at the time of survey. In sub-Saharan Africa, where most women marry young, median age at first union is strongly correlated with the proportion of never-married women aged 15-19 and 20-24. [3] Another advantage of this method is that it can be applied to most data collection operations : marital status is generally recorded by censuses and surveys and published in a statistical table by sex and age group.
Key variables : age at time of survey and marital status
11When median age at first marriage is calculated from period data, the constraints of retrospective analysis disappear. However, the quality of the indicator is still dependent on two pieces of information : the respondent’s age and marital status.
12In general, age (or date of birth) is a problematic variable in sub-Saharan Africa. It raises data collection issues, since it is not well understood by respondents and long remained irrelevant to local practices ; it also poses problems during analysis because of errors and lack of precision in reporting (Ewbank, 1981 ; Blanc and Rutenburg, 1990 ; Roger et al., 1981 ; van de Walle, 1968 ; Waltisperger, 1988). Education, migration and administrative requirements have helped to improve reporting, but data quality remains a cause for concern in most African countries (Pullum, 2006). Age inaccuracies would not have significant consequences for estimating age at marriage if they were independent of respondents’ marital status. However, this is not the case. When age is unknown, the family life cycle can provide reference points for an estimate. Marital status is one of these : between two women of the same actual age, never-married women will tend to be assigned a younger age than their married counterparts, using local norms of age at marriage as a practical reference if necessary (Caldwell and Igun, 1971 ; Ewbank, 1981 ; Roger et al., 1981 ; Blanc and Rutenburg, 1990 ; Gage, 1995).
13There is also a risk of error or inaccuracy in recording marital status. Where marriage formalization includes different elements, an individual could – depending on the criterion (ceremony, cohabitation, etc.) – be classified as “single” or as “married” and placed in the never-married or the ever-married category. This issue arises in particular for individuals who have uncertain or temporary marital status, or who are on the margins of the usual customs (for example, marriages in the process of being formalized, or non-cohabiting unions) ; in such situations, there is a risk that reporting will tend to reflect the expected response for a person of a given age or status. Some young women who have experienced a short period of marriage are likely to be recorded as “never-married” rather than as divorced or widowed ; at the same time, other – somewhat older – women may be reluctant to state that they have never been married, since this is not a valued status.
14In general, both censuses and surveys focus on the de facto situation, relying on the respondent’s own account without laying down any precise criteria (Antoine, 2006 ; Lloyd, 2005). This pragmatic approach is probably a good solution : it assumes that individuals interviewed in their own homes will state spontaneously, for themselves and on behalf of the people who live with them, if they are in a union or if they have been so in the past. The imposition of specific criteria, on the other hand, might make them hesitant, introducing confusion and complexity into the recording process. In some cases, such as the last censuses in Kenya (KNBS, 2009) and Uganda (UBS, 2002), instruction manuals explicitly ask census enumerators to record reported marital status without asking for precise details. The category “married/in a union”, although often handled as one questionnaire response, can also be broken down into separate items, distinguishing between consensual unions and marriages, between monogamous and polygamous situations. For example, the 1996 and 2001 South African censuses (SSA) distinguished between three types of union : cohabiting unions (living together like married partners), civil marriages and customary marriages – although the most recent census (2011) no longer differentiates between the last two categories. In Mali (INSTAT), the first three censuses (1976, 1987, 1998) categorized men’s marital status by their number of wives and women’s by their number of marriages ; the last census (2009) uses three identical categories for both sexes : “In a monogamous marriage”, “In a polygamous marriage”, “Consensual union/living together”. Whether systematic differences exist between surveys and censuses remains an open question : although surveys – the DHS in particular – have been more inclusive of cohabiting unions (Blanc and Rutenberg, 1990 ; van de Walle, 1993), censuses may have been more accurate in recording non-cohabiting marriages (Antoine, 2006 ; van de Walle, 1993). The categories that appear in questionnaires tend to reflect the variety of situations rather than a clear difference between censuses and surveys (Lloyd, 2005).
15Possible errors associated with marital status, including age reporting, and the fuzzy delimitation of marital status, are constraints affecting both censuses and surveys. However, other factors are liable to generate differences in results linked to the data source. These include the conditions of data collection and the eligibility criteria.
Data collection conditions
16The circumstances in which data are collected for censuses and for surveys differ in at least two ways : the personnel deployed and the status of respondents. The issue of data collection personnel – in terms of numbers, level of recruitment, training and supervision – is viewed as the weak spot of censuses, whereas surveys, in contrast, “can call on better selected, better trained and better managed personnel” (Clairin, 1988). In that regard, survey data are generally considered to be of higher quality than census data (Clairin, 1988 ; Tabutin, 2006). The same is true with regard to the status of respondents (Blanc and Rutenberg, 1990). In general, censuses in Africa use a household questionnaire completed with the help of one representative of the household (often the household head), whereas surveys record most information, including marital status, using an individual questionnaire completed with the person concerned. [4] Third-party responses will inevitably be less reliable. The head of a household does not necessarily know the precise marital status of each member, particularly if – as is often the case in Africa – the household includes individuals who are not his own close kin (such as a wife’s relative or a young domestic servant). Moreover, the method by which census data are collected (repeating the same questions for each individual on the list of household members) does not lend itself to discussion of particular cases, and the respondent may be tempted to avoid reporting situations that are viewed as problematic, simply in order to fit into the expected categories. With an individual survey, the interviewer is actually speaking to the person who, it can be assumed, has the most precise knowledge of her own marital status. This does not prevent inaccurate reporting but does mean that it tends to be a deliberate choice on the part of the person concerned – for example, to conceal a situation associated with low social esteem. The survey protocol can be designed to limit these biases, either through the interview conditions (confidentiality, duration, attentiveness) or because a given marital status often leads to further questions and can therefore be matched against other biographical data to identify and correct inconsistencies.
17Such distorting factors will affect estimates of age at marriage through their impact on measures of the proportion of never-married at the start of adult life. The conditions of census data collection increase the risk of overestimating the number of never-married women aged 15-19 : the use of only one informant and the choice of recording method tend to result in young widowed or divorced women being classified as never-married, whether through ignorance or for convenience. There is some confusion around the term “single”, used in questionnaires as a synonym of “never-married”, since it is commonly taken to mean “with no partner” (including after union dissolution) – which can also contribute to this bias. [5] In some censuses – in Senegal (DPS, 2002), for example – interviewers have been alerted to the risk of error that this entails.
Eligibility and interviewer effect
18Respondent eligibility criteria and the way these interact with interviewers’ practices represent yet another issue that can lead to bias, but affecting this time the quality of survey data. Age transfer at the upper and lower limits of eligibility for the survey or for additional modules is a well-known phenomenon (Arnold, 1990 ; Rutstein and Bicego, 1990 ; Marckwardt and Rutstein, 1996 ; Pullum, 2006 ; Schoumaker, 2009). For example, when the individual questionnaire concerns the 15-49 age group, an imbalance between numbers aged 14 and aged 15, and between numbers aged 49 and aged 50 is often observed. Again, when a specific survey module (relating, for example, to vaccination, breast-feeding or post-partum behaviour) applies only to children aged under 5, births dated 6 years before the survey are commonly overrepresented. This phenomenon can be attributed to the convergence of two factors : respondents’ ignorance of their age and the tendency among some interviewers to lighten their workload by classifying individuals on the margins of the eligibility criteria as “out of range”. According to the standard protocol of major surveys such as the DHS and the MICS, all women aged 15-49 recorded in the sample households should be surveyed individually. Even though her individual questionnaire will be shorter, leaving out an eligible young woman necessarily reduces the interviewer’s workload. [6] According to published DHS reports, the fact that adolescent girls move around and would be obliged to return to the household to be questioned, and the possibility that interviewers are too embarrassed to ask young girls (who may not yet be sexually active) about their sexuality, both contribute to this tendency to transfer young women below the eligible age of 15 (Rutstein and Bicego, 1990 ; Marckwardt and Rutstein, 1996). Distortions attributed to interviewers are recognized as a classic problem in the DHS and have now become the focus of particular attention in field manuals and when re-interviewing for quality control purposes (ICF Macro, 2009, 2011). This risk of bias is smaller in censuses, because most of the questions apply to the entire population regardless of age, so there is no incentive for the enumerator to underestimate an individual’s age (Rutstein and Bicego, 1990). [7] If there is age underestimation of women aged 15-16, this can be assumed to mainly concern girls with none of the markers of adulthood, especially never-married women, leading to an underestimation of the never-married proportion in the 15-19 age group, and consequently of median age at first marriage in surveys.
19To sum up, there are several reasons why cross-sectional measures of age at marriage may differ depending on whether census data or survey data are used. There is most probably a shared tendency to underestimate the actual value of the indicator, because lower ages are frequently attributed to young never-married women in contexts where their ages are unknown. Two additional factors come into play, combining to produce an estimate of the never-married proportion in the 15-19 age group – and hence of median age at first marriage – that is higher with census data than with survey data. These are data collection conditions on the one hand, which favour survey-based estimates, and eligibility criteria and interviewer effects, on the other, which favour census-based estimates.
20To what extent are these predicted differences borne out by reality ? A comparison of indicators based on data from censuses and surveys carried out in Africa over the last 50 years should shed light on the question.
II – Comparing census-based and survey-based estimates
21To assess the consistency of cross-sectional estimates of age at marriage drawn from censuses and from surveys, we follow a two-stage analysis. We first calculate, for each country and each data collection operation in turn, the difference between estimates made on the basis of the two series. We then compare nuptiality trends in the different countries according to the two sources.
Data and indicator
22Here we make use of tables on the distribution of population by marital status, sex and age group, brought together in INED’s pan-African database on nuptiality (Hertrich, 2007a). This database includes 453 data collection operations for 1950-2010 [8] from the 55 countries of Africa, 41% of which are censuses (186) and 59% national surveys (130 DHS, 46 MICS and 91 others). We have seven or more data collection operations for 70% of the countries, and at least 10 operations for 35% of the countries. The majority of countries have at least one census or survey for the period 2000-2010 (98% of countries) and at least one from the preceding decade (89% of countries). Coverage of earlier periods is poorer, but still substantial : for two thirds of countries, we have at least one pre-1970 operation.
23The indicator used for each data collection operation is the median age of women’s first union, [9] calculated from the proportions of never-married women by five-year age group.
24There are two series of results for each country, the census series and the survey series. We compared them for the period covered by the two series, linking two estimates to each data collection operation : median age at marriage drawn from the census or survey in question and median age at marriage calculated by linear interpolation of the series from the other data source. The difference between these two estimates – made for the same date – defines our consistency indicator. This indicator is available for 250 operations [10] in 46 countries. In order to compare trends associated with the two types of source, an additional restriction was applied : four countries were excluded because their two series had only one point of comparison ; this left 42 countries in the analysis.
Comparing estimates for all the countries of Africa
25Table 1 presents an overall assessment, at the scale of the African continent and its regions, of the comparability of estimates of median age at marriage drawn from censuses and from surveys. Consistency to within ±0.5 years can be observed in half of all cases and to within ±1 year in three quarters of cases. The mean difference is 0.3 years (0.8 years on the absolute value of the differences). There is no trend towards reduction in discordances : on the contrary, the highest rates of discordance were recorded over the last decade (40% of differences of more than one year in the period 2000-2010 – double the rate for earlier periods) (Table 1). This can probably be attributed to a number of different factors, including the widespread rise in age at first union (Lloyd, 2005 ; Hertrich, 2007a ; Shapiro, 2014 ; Ortega, 2014) which increases the likelihood of observing large age differences, greater diversity in pathways of entry into union which make marital status even more imprecise, and persistent dating problems (Pullum, 2006). The degree of consistency varies by region, with the best score observed in Eastern Africa (90% are consistent to within ±1 year) and the worst in Southern and Middle Africa (36%). Western and Northern Africa have intermediate scores (75% are consistent to ±1 year) (Table 1).
Differences (census estimate – survey estimate) between estimates of median age at marriage calculated from censuses and from surveys, 1950-2010*,**

Differences (census estimate – survey estimate) between estimates of median age at marriage calculated from censuses and from surveys, 1950-2010*,**
* The correlation is affected by two outliers from Botswana. When these two surveys are excluded, the correlation coefficient becomes 0.96 for the Southern and Middle region, 0.95 for sub-Saharan Africa and 0.94 for Africa as a whole.** Given the small number of observations available for Southern Africa (5 countries, 24 observations) and Middle Africa (9 countries, 15 observations) and the similarity of the patterns observed, we have grouped these two regions together.
Comparing trends
26How do these discordances show up in the national series ? Are there random differences ? Or do patterns of error recur, suggesting that statistical information produced by censuses is fundamentally different from that of surveys ? In order to answer these questions, we grouped the countries into five categories. Series were deemed to be strictly consistent if they matched (to within approximately 0.5 years at every point) or broadly consistent if divergence was exceptional and small. Series were defined as inconsistent if differences of more than 0.5 years affected over 20% of observations ; a distinction was made between the pattern where census estimates of median age at marriage were higher than survey estimates, the pattern where they were lower and the pattern where the difference varied (Appendix Table A.1, Figure 1). [11]
Trends in median age at first marriage : consistency of time series calculated from census data and from survey data

Trends in median age at first marriage : consistency of time series calculated from census data and from survey data
27Only a quarter of countries (11 out of 42) displayed consistent patterns : examples include Kenya, with near-perfect correspondence in its data series (nine points), and Burkina Faso, with two isolated distortions out of 11 observations (Figure 2).
Trends in median age at first marriage calculated from census data and from survey data. Examples, by type of inconsistency
Consistent series – strictly defined

Consistent series – strictly defined
The difference between the two estimates never exceeds 0.5 yearsConsistent series – broadly defined

Consistent series – broadly defined
No more than 20% of data points with a difference of 0.5-1 yearSeries with inconsistencies

Series with inconsistencies
Median age (censuses) > Median age (surveys) no more than one data point with a difference in the opposite directionSeries with inconsistencies

Series with inconsistencies
Median age (censuses) < Median age (surveys) no more than one data point with a difference in the opposite directionSeries with inconsistencies

Series with inconsistencies
Varying differencesTrends in median age at first marriage calculated from census data and from survey data. Examples, by type of inconsistency
28Regional patterns emerge for groups of countries that display inconsistencies (Figure 1), with a contrast between the northern arc of the Arab countries, where age at marriage estimated from surveys is almost systematically higher than estimates from censuses, and the sub-Saharan countries, where the discordance is most often in the opposite direction. The latter group includes not only the countries of Southern Africa, where entry into union tends to be relatively late, but also those of Western Africa, where it is much earlier. There are only a few countries (five instances) that do not display a standard pattern of discordances, and they are not in any particular region. In order to illustrate the three distinct patterns of inconsistencies, Figure 2 provides the trends for Algeria, Mali and Burundi.
Are there two models of distortion ?
29The two types of comparisons (point indicators and trends) between estimates drawn from censuses and from surveys converge, confirming the existence of characteristic differences between the two sources. We can see two models of distortion, which are geographically delineated. The first pattern is predominant – and to be expected – in sub-Saharan Africa : median ages at first marriage associated with surveys are younger than estimates from census data (36% of the points of comparison and 75% of the inconsistencies). The second follows the opposite pattern and is predominant in Arab countries (42% of the points of comparison and 95% of the inconsistencies) (Table 1).
30We made these comparisons for each country in turn and for the same date, so the observed differences cannot correspond to the true situation. Rather, they must result from differences between the ways that surveys and censuses treat the same reality. The two characteristic patterns of distortion north and south of the Sahara suggest that we should focus on the aspects of data collection protocols that vary between the two regions. The question of eligibility and how it is handled by interviewers represents a key issue in this debate. Most surveys conducted in Arab countries (such as those in the PAPCHILD/PAPFAM programmes) have eligibility criteria that differ from the classic protocol (as used in the DHS, for example) in their handling of marital status. Data on marital status are collected via a household questionnaire and, among women aged 15-49, only ever-married women are surveyed individually. If, when completing the household questionnaire, interviewers faced with individuals whose status is unclear prefer to consider them as ineligible, we can expect results to vary according to the eligibility criterion. As we have already seen, the hypothesis that the numbers of never-married women aged 15-19 will be underestimated because some young never-married women have been transferred into the next youngest age group is consistent with the sub-Saharan context, where ages are not known and the eligibility threshold is set at age 15. Conversely, it is logical to not find this pattern of discordances in North Africa, where ages are better known and where the constraints of eligibility with regard to marital status tend to produce an overestimation of numbers of never-married women in surveys. Survey protocols (with differing eligibility-related biases) thus offer an explanation for the two characteristic patterns of distortion north and south of the Sahara. This does not imply that censuses provide reliable estimates ; rather, it means that, because the protocol they use is similar, it does not, in principle, introduce different forms of bias from one region to another.
III – Distorting factors : an empirical examination based on MICS-2
31Identifiable patterns of discordance between median ages at marriage drawn from censuses and surveys are helpful when discussing the factors underlying them, but they do not prove these factors – or enable us to evaluate whether one data source is ultimately more reliable than the other. In order to do this with complete rigour, we should compare individual records (of age and marital status) from the two sources with exact data or, failing that, make at least a comparison between the sources on the basis of individual crosslinkage. Neither of these approaches is available to us. However, some surveys do record marital status twice, in both the household and the individual questionnaires ; this allows us to identify inconsistencies at the individual level and to evaluate their impact on the indicators. Although not exactly the same, to some extent the data collection conditions in the household module of a survey look like those of a census (one respondent per household, the same list of questions about each individual, non-confidential interviews, etc.), so their effects may be similar. On the other hand, the effects of certain elements that differentiate a census from a survey are impossible to capture (selection, training and supervision of interviewers), while others can be assessed only indirectly (eligibility criteria). Despite these limitations, this exercise offers the opportunity for a straightforward, empirical approach to the inconsistencies generated at the individual level by differences in protocol. This is the angle from which we shall now analyse the MICS-2 surveys, looking first at discordances in recorded marital status, then at errors in recorded age and the resulting sample distortions. Finally, we shall attempt to decide which data source should be preferred, focusing for this purpose on sub-Saharan Africa.
MICS-2 : discordances consistent with the pattern observed between censuses and surveys
32The second round of UNICEF surveys (EGIM/MICS-2), conducted around the year 2000, recorded women’s marital status in two ways : [12] on the household questionnaire, through a question about all household members aged 15 or over ; and via the individual questionnaire administered to each woman aged 15-49 (Appendix A.2). This dual record of marital status (on both household and individual questionnaires) is available for 15 sub-Saharan countries ; [13] we downloaded the country databases and processed the data directly.
33To accurately assess their consistency, the two declarations of marital status should be recorded independently. But this is not guaranteed by the protocol. On the contrary, the individual questionnaire design makes it possible for the interviewer to take into account and to check the marital status recorded on the household questionnaire (Appendix A.2). However, the extent of the discordances (Table 2) suggests that no practical steps were taken on any large scale to ensure consistency between the two records (except in Madagascar, which was excluded from our analyses). Since there is no guarantee of independence, and because in some instances both questionnaires might have been completed with the same person, the observed levels of discordance should be viewed as a low minimum. We used these linked data to document the mechanisms of distortion, attributing no more than a relative value to the frequency of discordances.
34In the MICS data, we can again observe the same pattern of discordances as noted between censuses and surveys across the whole of sub-Saharan Africa. In the 15 countries whose data we analysed, the never-married proportion in the 15-19 age group is always higher when taken from the household questionnaire than from the individual questionnaire. The difference between the two estimates of median age at marriage is more than 0.5 years in nine of the 15 countries (Table in Appendix A.3), meaning that this situation is even more frequent than the divergence recorded for sub-Saharan Africa as a whole (36% of cases : Table 1).
Reported marital status : what about the “false never-married” ?
35As the individual survey of women involves a personal interview carried out, in principle, in conditions of confidentiality, we would expect it to provide better-quality information about marital status than the household survey, especially for women whose marital status is atypical or temporary. The data confirm this difference : young women who reported being “divorced” or “widowed” when interviewed individually had very often been recorded on the household questionnaire as “never-married”. In 11 of the 15 countries studied, this situation was observed for over 60% of divorced or widowed women aged 15-19 (Table 2) ; and in five countries, this type of error was more or less systematic (90% or higher). [14] The incorrect classification of widowed and divorced women by the household survey is a factor that, alone, is enough to explain the differences in never-married proportions recorded through the household and the individual questionnaires. By comparison, consistency within the categories of “married” and “never-married”, where the majority of women in this age group are concentrated, is generally high (90% or more).
Comparison of marital status recorded on the individual women’s questionnaire and on the household questionnaire, women aged 15-19 covered by both surveys*

Comparison of marital status recorded on the individual women’s questionnaire and on the household questionnaire, women aged 15-19 covered by both surveys*
* The proportion of consistent declarations is measured with respect to the total number of women recorded with this marital status by the questionnaire under consideration (household or women’s).Note : Countries are listed in descending order by proportion of consistent declarations within the category of widowed or divorced women in the women’s survey. Unweighted data and indicators.
36These indicators throw further doubt on the quality of information collected by the household survey and may appear to suggest that data from the individual surveys should be preferred. However, we must not forget another factor that works in the opposite direction : the distortion of the sample covered by the individual survey.
Sample distortion : who are the women missed by the survey ?
37If the survey sample includes a lower proportion of never-married women than the general population, there will be a bias towards underestimation of age at first marriage. Two factors are likely to contribute to this type of underestimation in the 15-19 age group : (a) the selection of respondents on the basis of their marital status (if never-married women are more likely to be excluded from the survey) and (b) errors in estimating age correlated with marital status (underestimation of the age of young never-married women and, possibly, overestimation of the age of ever-married adolescents). Examining the age structure of respondents who answered the individual questionnaire, the existence of such distortions seems obvious. Figure 3 illustrates this through the cases of the Central African Republic (CAR) and Cameroon. The gap between the two curves indicates the sample loss, i.e. the eligible women (aged 15-49) who should have been surveyed individually but were not. In both cases, this sample loss is especially evident in the 15-19 age group and for never-married women (in both countries, one in five never-married women aged 15-19 were not surveyed, versus one in ten ever-married women in this age group). The classic irregularities that result from age heaping are also visible, although, remarkably, 15-year-olds escape this distortion : their numbers are much lower than those of 14-year-olds. This distortion is an additional illustration of the fact that some women who should have been included in the individual surveys were classed as ineligible – and this has a direct impact on estimates of median age at first marriage, which is heavily influenced by the never-married proportion in the 15-19 age group. This phenomenon is particularly marked in the Central African Republic, which has a substantial irregularity in the age groups on either side of the eligibility range – before age 15 and after age 50.
Age distribution of women recorded on the household questionnaire and of women covered by the individual survey, MICS-2 (2000), Central African Republic and Cameroon


Age distribution of women recorded on the household questionnaire and of women covered by the individual survey, MICS-2 (2000), Central African Republic and Cameroon
38To gain an overall picture of the impact of these distortions at the scale of the 15 countries whose data we analysed, we calculated two series of indicators for each country (Table 3). The first series relates to the effect of selection of ever-married women aged 15-19 and includes two indicators : the proportion of women recorded in the list of household members who were not surveyed individually and, in turn, the proportion of these who were never-married. The second series estimates the sample loss that results from some women aged 15-19 being classed in the 10-14 age group. In order to do this, we compared the total number aged 15-19 with a theoretical number calculated using the method proposed by Pullum (2006) [15] for identifying transfers from one age group to another.
39In most of the countries studied, the two types of distortion have the cumulative effect of biasing the sample towards an underestimation of the never-married proportion among women aged 15-19 covered by the individual women’s survey. Sample loss is more than 10% in 10 of the 15 countries, and exceeds 15% in seven of these (Table 3). It is not independent of marital status : with one exception (Swaziland), the never-married proportion is always higher among young women who are recorded on the household questionnaire but not covered by the individual survey. Several factors probably contribute to this situation. Some of these undoubtedly stem – as for the DHS (Rutstein and Bicego, 1990 ; Marckwardt and Rutstein, 1996) – from real difficulties in reaching these girls, because of their greater mobility or their reluctance to respond in person to a survey before they have reached full adult status (shyness), or because male interviewers are embarrassed to question adolescent girls. Other factors could also depend on the interviewers, who may make more effort to interview ever-married women than never-married woman who are rarely concerned by some of the questionnaire modules (notably those dealing with children).
Distortion of the sample of women aged 15-19, sample loss and underestimation of women’s ages(1),(2),(3),(4),(5),(6)

Distortion of the sample of women aged 15-19, sample loss and underestimation of women’s ages(1),(2),(3),(4),(5),(6)
(1) Marital status as reported in the survey of individual women. The indicator is calculated on the basis of all women included in the individual survey.(2) Marital status as reported in the household survey. The indicator is calculated on the basis of women recorded in the household database but not in the individual (women’s) database.
(3) This indicator estimates the proportion of women aged 15-19 who were wrongly classed in the 10-14 age group by the household survey.
(4) This indicator measures the underestimation of the number of women aged 15-19 resulting partly from incorrect classification (transfer into the younger 10-14 age group) and partly from sample loss (eligible women not surveyed individually).
(5) Number estimated using Pullum’s method (2006, Appendix D), see Footnote 15.
(6) Number of women included in the individual database.
Note : Countries in ascending order by level of underestimation of numbers aged 15-19 in the individual survey (column 4).
40However, such sample loss explains only part of the shortfall of nevermarried women observed in the 15-19 age group. Underestimation of the age of some girls aged 15-19, leading to their classification in the 10-14 age group, is another mechanism at work. The data do not allow us to identify and analyse the particular characteristics of girls whose ages have been underestimated and who have thus been excluded from the individual survey, but it is probable that the never-married are over-represented in the 10-14 age group.
41If we compare the observed total number aged 15-19 with the number that could be estimated on the basis of the age structure of the population (Table 3), we find a shortfall in the population recorded by the household survey in nine countries ; this difference is reversed or absent in the other six countries. But if we take into account only women aged 15-19 surveyed individually, this measured shortfall applies to all 15 countries (Table 3) : it exceeds 10% in 13 of the 15 countries and exceeds 20% in seven of them.
Household questionnaire or individual women’s questionnaire : which provides a better-quality estimate ?
42The MICS surveys confirm the existence of two biases acting with opposite effects. On the one hand, the imprecise reporting of marital status via the household questionnaire wrongly increases the proportion recorded as nevermarried, while, on the other hand, distortions of the individual survey sample underestimate the numbers of never-married women aged 15-19. Can we therefore finally conclude in favour of one form of data collection over the other ?
43To address this question, we estimated the never-married proportion in the 15-19 age group using the data from each of the individual and household databases of the 15 MICS surveys. We proceeded as follows :
- First we calculated the corrected (theoretical) numbers aged 10-14 and 15-19 using Pullum’s method (2006) (see Footnote 15) ;
- We then used the marital status recorded on the individual questionnaire for the group of women who took part in the survey aged 15-19 ;
- F or the sub-group of missing women aged 15-19 [corrected number (1) – number of who took part (2)], we assumed that the never-married proportion was the same as that observed for women aged 15-19 who appeared in the household survey [16] but not in the individual survey ;
- Finally, we recalculated the never-married proportion in the 15-19 age group as the weighted mean of the corrected estimates for the two categories, “individually surveyed” (2) and “missing” (3).
44Median age at first marriage was then calculated from this proportion of never-married women in the 15-19 age group, and from the proportion given by the individual survey for the 20-24 age group. Table 4 compares this estimate with those provided directly by the household survey and the survey of individual women.
Comparison of estimates of median age at first marriage drawn from cross-sectional data : corrected (“probable”) estimate, estimates drawn from the household and individual women’s questionnaires

Comparison of estimates of median age at first marriage drawn from cross-sectional data : corrected (“probable”) estimate, estimates drawn from the household and individual women’s questionnaires
Note : Countries are listed in ascending order by “probable” median age at first marriage.45In the majority of countries, we note a difference between the “probable” median age and the estimate based on the household survey (which overestimates it), as well as between the “probable” median age and the estimate based on the individual survey (which underestimates it). This difference is the result of biases introduced by both forms of data collection. In half the countries, underestimation based on individual surveys has a greater impact, and in the other half, overestimation by household surveys.
46We cannot reasonably conclude, therefore, that one data source is more reliable than the other, but only that neither type of source should be neglected.
Conclusion
47Our analyses consistently reveal standard patterns of difference between cross-sectional indicators of nuptiality drawn from censuses and from surveys in sub-Saharan Africa. It is not so much the frequency of discordances that is striking (in three quarters of cases, estimates of median age at marriage differ by less than a year), but rather their direction : age at marriage based on census data is generally higher than age at marriage based on survey data. In the countries of sub-Saharan Africa, this pattern is observed for three-quarters of the points of discordance and for 70% of countries with discordant series of indicators.
48The patterns of discordance at the scale of the continent, like those we found through a more in-depth analysis of 15 MICS surveys, clearly show that although the two sources are subject to different mechanisms of error, their results are biased in the same direction. When marital status is recorded via a household questionnaire like those used in censuses, errors regarding the marital status of young women (most particularly, categorizing widows and divorced women with never-married women) lead to an overestimation of the never-married proportion and therefore of median age at first marriage. When data on a limited section of the population is gathered, as in surveys, the number of eligible individuals is under-recorded ; this results in sample deformation, creating bias in estimates of age at marriage. In sub-Saharan Africa, where the classic survey approach is to focus on women aged 15-49, overestimation of ever-married women is observed in the 15-19 age group. This is caused by two mechanisms : first, a tendency to underestimate the age of never-married young women, combined with an over-classification of girls aged under 15 as never-married, and second, lower survey coverage of never-married than of ever-married women in the 15-19 age group. Bias linked to eligibility has opposite effects for Arab countries : since never-married women are excluded from individual surveys, distortion logically works in the direction of overestimating the numbers of never-married women – and therefore overestimating median age at first marriage.
49Our analyses draw attention to data collection circumstances and their influence on data quality. We could reasonably have expected that experience gained from five decades of field operations might have gradually eliminated the problem. [17] However, it is exactly the opposite : the largest inconsistencies between estimates, in terms of both frequency and amplitude, are found during the last decade (the 2000s). In agreement with other recent studies (Bignamivan Assche et al., 2003 ; Johnson et al., 2009 ; Randall et al., 2013), our analyses call for closer consideration, when interpreting results, of interviewer effects, survey protocol design and fieldwork supervision. The distortions we observed reflect, at least in part, a certain degree of standardization of records by interviewers faced with the constraints of fieldwork and the demands of data collection. When the survey protocol requires information (such as age) that the respondent is not able to provide, interviewers are obliged to improvise – with the attendant risks that they will provide answers that appear to fit the picture (equating age with marital status, for example) and prefer responses that simplify or lighten their workload. In that regard, patterns of discordances also reflect the amount of freedom left to interviewers – suggesting that there is scope for improvement in their training and supervision. Developments in these areas are reflected in the most recent guidance on DHS field staff training (ICF-Macro, 2009, 2011), confirming that there is growing awareness of the problem.
50Our results do not suggest that one data source should be preferred over the other. But they do challenge the widespread belief that surveys are of better quality than censuses. Survey data display significant and sometimes dramatic distortions, which can be easily explained by the eligibility criteria. We should therefore regard survey results with caution, in the awareness that their tendency to underestimate age at marriage inclines towards a “conservative” reading of trends which underestimates the decline in early marriages. Censuses, which tend rather to overestimate age at marriage, also have the disadvantage of being less numerous. This analysis encourages us not to give preference to one type of source, but rather to make use of both. The discordances between their results ultimately give us a more nuanced and more reliable view of trends in age at marriage than we could gain from just one data series or from simply comparing two different points in time. They do not prevent us from detecting trends ; in fact, they enable us to base them on a larger number of points of comparison and to confirm them through a convergence that is strong enough to be audible above the “noise” produced by the data’s diversity and lack of precision.
Acknowledgements
Several earlier versions of this paper were presented in Quebec (Hertrich and Lardoux, 2009), at INED (Paris) in 2012 and in Busan (Korea) in 2013, and the exchanges that followed were of great benefit to us in writing this article. We extend our warmest thanks to our colleagues for their opinions and encouragement, Dominique Tabutin, Bruno Masquelier and Bruno Schoumaker in particular.INED’s pan-African database on nuptiality was built over a long period, and this work owes a great deal to the valuable contributions of many institutions and people, whom we also wish to thank : documentation services (CEPED, INED, INSEE, LSE, University of Texas), international databases (United Nations, MICS, DHS), colleagues who arranged for us to have access to particular data (P. Alberts, M. Barbieri, Z. Bedidi-Ouadah, S. Boodoo, S. Cardoso, N. Chobokoane, R. Dackam Ngatchou, B. Kandeh, L. Keïta, W. Muhwava, A. Sarr, F. Sepa, S. M. Traore, E. Udjo, K. Vignikin) and temporary staff who helped with data entry (G. Dabet, G. Jeanpetit, C. Léotard, S. Petit) and analysis (A. Stephan).
Table A.1. Classification of countries by consistency of census-based and survey-based series of women’s median age at first marriage based on period data
Consistent series (11 countries)

Consistent series (11 countries)
Series with inconsistencies : more than 20% of data points with a difference greater than 0.5 years (31 countries)(1)

Series with inconsistencies : more than 20% of data points with a difference greater than 0.5 years (31 countries)(1)
Coverage : Countries with at least two available points of comparison for the period 1950-2010.(1) No more than one data point in the series with a difference in the opposite direction.
Appendix A.2. Questions on marital status in MICS-2 surveys
Household Questionnaire
51Excerpt from the questionnaire
9. | What is the marital status of (name) ? ** |
1 currently married/in union | |
2 widowed | |
3 divorced | |
4 separated | |
5 never married |
52Excerpt from the Interviewer’s Instruction Manual
Q. 9 | Marital status :For household members over age 15, circle the code for the response. Throughout this questionnaire ‘marriage’ refers to both formal and informal unions, such as living together. |
Questionnaire for Individual Women
53The first question in the Contraceptive Use Module relates to marital status.
54The questionnaire did not include any further questions on marital life history (e.g. the first marriage).
55Excerpt from the questionnaire
1. Are you currently married or living with a man ? | Yes… | 1 |
No, widowed, divorced, separated… | 2 | |
No, never married… | 3 |
56Excerpt from the Interviewer’s Instruction Manual
Q. 1 | Check marital status in Household Listing Form (Q. 9), or ask : Are you currently married or living with a man ? Record the woman’s status at the time of the interview. If the woman is currently married or living in an informal union, circle 1. If she is not in a union now, probe to find out if she has ever been married (circle 3 if never married) and is now widowed, divorced or separated. If one of the latter, circle 2. Remember that in this questionnaire ‘married’ means living in both formal and informal unions. If she is currently married (or in union) go on to Q. 2. If not, skip to next module after drawing a line through this one. Survey coordinators may decide to allow separate codes for ‘widowed’, ‘divorced’ and ‘separated’. |
57The question on the household questionnaire and the one on the women’s questionnaire are not considered independently from each other. See information in the Interviewer’s Instruction Manual :
- “Marital status was obtained in the Household Listing Form (Q. 9) : the question is repeated here as a check.” (p. A1.19),
- “Check marital status in Household Listing Form (Q. 9), or ask : Are you currently married or living with a man ?” (p. A1.20)
58Source : UNICEF, 2000.
Table A.3. Proportion of never-married women and median age at first marriage (period measure) based on data from household questionnaires and questionnaires for individual women

(2) Median age estimated on the basis of the never-married proportion in each five-year age group.
(3) Unweighted numbers according to the household survey.
Note : Countries listed in ascending order by median age at first marriage according to household questionnaires.
Notes
-
[*]
Institut national d’études démographiques (INED), Paris.
-
[**]
Université de Montréal, Département de démographie.
Correspondence : Véronique Hertrich, Institut national d’études démographiques, 133 boulevard Davout, 75980 Paris cedex 20, tel : +33 0 (1) 56 06 21 32, e-mail : hertrich@ined.fr -
[1]
Based on the results of Gendreau and Gubry (2009) and Hertrich and Lardoux (2009).
-
[2]
For instance, some publications describe trends by comparing period estimates from successive DHS surveys, while others focus on retrospective measures from one single survey, or compare estimates at two points in time (Garenne, 2004, 2014 ; Lloyd, 2005 ; Mensch et al., 2005, 2006 ; Shapiro and Gebreselassie, 2014 ; Tabutin and Schoumaker, 2004 ; Westoff, 2003).
-
[3]
If entry into union is early, median age at first union is correlated with the proportion of nevermarried women aged 15-19 (Lesthaeghe, 1989) ; when it is later, there is a stronger correlation with the proportion of never-married women aged 20-24. According to our calculations using the pan-African database on nuptiality (see below), the median age at first union is similarly correlated (coefficient of 0.95) with the proportion of never-married women aged 15-19 when median age at first union is under 21, and with the proportion of never-married women aged 20-24 when median age at first union is 21 or over.
-
[4]
Not all surveys record marital status via an individual questionnaire, however. Some (such as the first large surveys in Francophone Africa, conducted in the 1960s) consist only of a household questionnaire. Some surveys restrict the individual questionnaire to ever-married women, recording marital status through a household questionnaire, for instance the PAPFAM surveys in Arab countries. Others record marital status through both household and individual questionnaires – surveys in the MICS-2 and DHS-V programmes, for example.
-
[5]
This confusion is also found in French, around the terms “célibataire” and “jamais marié(e)”.
-
[6]
The interviewer effect has also been noted in relation to another eligibility criterion : the requirement of having spent the night preceding the survey in the household. This criterion figured in the first DHS surveys but was subsequently withdrawn as it led to artificial underestimation of the number of women aged 15-49 to be interviewed (Rutstein and Bicego, 1990 ; Marckwardt and Rutstein, 1996).
-
[7]
A comparison between censuses and DHS surveys dating from the late 1980s shows that this pattern of distortion is not apparent in censuses (Rutstein and Bicego, 1990).
-
[8]
Around 90% of censuses and demographic surveys carried out over this period.
-
[9]
i.e. the age at which the never-married proportion is 50%. This median age is calculated by linear interpolation between the age groups on either side, according to the formula :where C(x,x+5) is the never-married proportion in the relevant age group (x,x+5).
-
[10]
This includes each data collection operation for which an estimate associated with the other source can be calculated – a census taking place between two surveys or a survey conducted between two censuses. Operations at the extreme ends of their series are therefore not taken into account, and nor are operations falling into series that do not intersect within the time period concerned.
-
[11]
Contrary to what might have been expected, recorded inconsistencies do not increase with the number of points of comparison. Just the opposite : the proportion of countries with consistent patterns increases with the number of data collection operations – 19% for countries with fewer than five points of comparison, 27% for those with 5-8 points and 45% for those with at least eight.
-
[12]
In both cases, marital status was the subject of a brief, isolated question, and no other question on marital life history was included.
-
[13]
MICS-2 surveys were conducted in eight other countries, but in seven of them marital status was recorded only on the household questionnaire. In the eighth case (Madagascar) there was complete equivalence between the marital status variables shown in the two databases. These eight countries are therefore not included in our analyses.
-
[14]
Even among the small number of young women recorded as widowed or divorced in the household questionnaire, inconsistency with the individual questionnaire is high, suggesting that respondents, including the women themselves, are reluctant to acknowledge this status.
-
[15]
This method (Pullum, 2006, Appendix D) looks at four age groups, hypothesizing that transfer occurs between the two middle age groups (with numbers in the age groups on either side remaining correct) and that there is a log-linear variation in numbers between age groups. Thus the theoretical number aged 15-19 is calculated from the observed total numbers according to the formula :while the theoretical number aged 10-14 is : Nth(10 ? 14) = No(10 ? 14) + No (15 ? 19) – Nth (15?19).
-
[16]
Although this is higher than the proportion recorded for women who completed an individual survey, it is certainly lower than the true proportion.
-
[17]
The methodological questions that were prominent in the demographic literature on Africa in the 1960s (for example, in the studies of Francophone Africa by the Groupe de démographie africaine [African Demography Group, GDA], or in Ewbank, 1981) became less central in the decades which followed. They are now starting to be taken up again to investigate inconsistencies in some research results.