1The period total fertility rate (TFR), the most widely used summary measure of fertility at the aggregate level, suffers from a few limitations. Two of these, in particular, concern us here. The first is that the TFR does not take individual behaviour into account, and this (together with progress in terms of both data collection and analytical methods) is the main reason why, in the second half of the twentieth century, the scientific study of population progressively adopted a micro-level perspective (Courgeau and Lelièvre, 1997), shifting “from studies of structures to studies of processes” (Willekens, 1991, 1999). Secondly, the TFR controls for the age distribution of the population, but disregards other potentially relevant structural characteristics, such as parity, educational level, or place of residence.
2Recently, however, a renewed interest in the link between macro- and micro-level research has emerged (Voss, 2007), and several scholars now emphasize the importance of taking both dimensions into account in order to better understand, for instance, contemporary fertility and family dynamics (Matysiak and Vignoli, 2010).
3A recent paper by Hoem and Mure?an (2011a) bridges this gap explicitly, because it reconciles a macro outcome, the TFR, with covariates of fertility that act at the micro level, the effects of which are estimated with event history analysis (EHA), and are thus “net” of all the other explicitly considered covariates (see also Hoem and Mure?an, 2011b, or Hoem et al., 2013). This paper pursues their line of reasoning and shows that their approach can also be applied to short panels, a new type of data which are forward- rather than backwardlooking because they use panel follow-up and not retrospective questions. Short panels are becoming increasingly common in modern social sciences : examples are the ECHP (European Community Household Panel), the EU-SILC (European Survey on Income and Living Conditions) and all kinds of national Labour Force Surveys (with rotating panels). With short panels, individuals and households are observed for too short a lapse of time to follow their entire life course, and, indeed, these datasets are normally designed not to study fertility, but social and economic behaviour, such as saving or labour force participation for example. Sufficiently detailed retrospective fertility questions are normally lacking in these surveys. In some cases, there may even be no question at all on fertility so that all births taking place in the short period under consideration must be inferred indirectly from changes in the household roster. All the individuals aged 0 who are present in the household at round t+1 but not at round t are considered as births between the two rounds. The parents too can be identified only indirectly, by looking at the relationships within the household (“spouse of person of reference”, “child of person of reference”, etc.), which is essentially a modern variant of the own-children method of fertility estimation (Cho et al., 1986). [1]
4These short panels permit analysts to calculate period total fertility rates (TFR) among other things. This is seldom done, however, because this measure is almost always available from some other more reliable source and because, until now, it could not be related to individual characteristics and behaviour. But with Hoem and Mure?an’s (2011a) approach, it is easy to estimate a summary fertility measure that is basically a period TFR with covariates. Among these covariates, some cannot be satisfactorily measured with retrospective questions because of recollection biases, ex-post adaptation and rationalization, and selection. Examples are economic conditions (e.g. income), fertility intentions and desires (Régnier-Loilier and Vignoli, 2011 ; Testa, 2012), happiness and confidence in the future (Baranowska and Matysiak, 2011), norms and values, or kin and environmental characteristics.
I – From event history analysis to the total fertility rate
5Fertility micro-data are frequently analysed with logistic regression or, as in this paper, event history analysis (e.g. Allison, 1984). A panel of subjects (normally women, sometimes couples) are observed over a long time span, with the purpose of estimating the likelihood of a birth in each subgroup (or sub-period), relative to a reference group/period. The covariates observed at the beginning of each period can be interpreted as determinants of higher or lower fertility with respect to a reference group. The interest is typically not in the absolute level of fertility, but in the relative distance between an arbitrarily chosen reference group (e.g. married women, aged 25-29, with low education, and unemployed) and the others, differing by one characteristic at a time (the “cause” under scrutiny), all other things being equal.
6The original methodology has evolved considerably in recent years, for instance with multi-process modelling (Matysiak, 2009), which tackles issues of endogeneity and selection into specific conditions (or careers), and fixedeffect models, which correct for selection on time-invariant (for instance, maternal) characteristics. Each of these approaches, however, tends to increase the complexity of the theoretical framework : the link between the empirical results of these applications and the general fertility level of the population becomes less and less evident.
7Hoem and Mure?an’s (2011a) paper takes a step in the opposite direction and highlights the connections between EHA and the TFR. In their application to the Romanian GGS (Generations and Gender Survey) sample of 2005, containing detailed retrospective questions that cover the respondent’s entire life, they estimate
9where “piecewise constant childbearing intensities” ? depend on the basic time factor axt for age x and time t, which combines (multiplicatively) with several covariates : birth parity bp, education ej, and rural/urban residence in childhood uk.
10The sum of these “age-specific intensities” (annual rates) gives a “total intensity rate” for the reference group
12for year t and for the reference category r, which, in their case, is a woman who spent her youth in an urban environment, and has parity 0 and low education (the label and the symbol do not correspond exactly to those of Hoem and Mure?an). The weighted average of the “fertility intensities” of all the considered subgroups g gives (approximately) the general “total intensity rate”
14If a summary fertility measure exists or can be computed for year t, TFRt, the ratio
16is a scaling factor (Hoem and Mure?an call it a “calibration” factor) that can be used to multiply the estimated fertility intensities (age- and group-specific), so as to make them consistent with the general fertility level. Other variables such as education, parity, and residence in childhood can be treated as inflators or deflators of the baseline fertility intensity It.
II – Short panel data
17For this paper we applied Hoem and Mure?an’s (2011a) ideas to a dataset of a different kind : the four waves of the Italian section of the EU-SILC, 2004-2007. The EU-SILC is a rotational household panel survey and the statistical data reference source for comparative income statistics for the European Union. The survey has been taken yearly in each member state since 2004 and collects detailed longitudinal information on the social and economic characteristics of individuals (aged 16 and over) and households. In our application, the analysis focuses on women who were first interviewed in 2004, 2005, or 2006, and re-interviewed at least once, 12 months later (in 2007 at the latest), and were thus observed for one to three consecutive years. Weights are provided by the Italian National Institute of Statistics to correct the biases that may derive from the complex sampling scheme and from non-response.
18This type of data, with repeated observations within a limited time window, is becoming increasingly common in social sciences : examples are the ECHP (the predecessor of the EU-SILC), Labour Force surveys, and income surveys, like the Bank of Italy SHIW (Survey on Household Income and Wealth). These surveys are not designed for demographic research, but focus on current socioeconomic conditions (income, labour market participation, help given to and received from kin and friends, etc.), and on their changes in the subsequent (short) period. Retrospective questions are rare, if not totally absent, in these surveys, limiting their suitability for demographic analyses. Occasionally, as with the EU-SILC for instance, there are no direct questions on births – not even on current births, let alone birth histories – so the birth of a child in the period under study must be inferred from changes in the household composition between two successive waves of the survey. This implies that infant mortality leads to a slight underestimation of fertility, because children who die shortly after birth go largely unnoticed in the survey. However, as infant mortality was very low in Italy in the period examined (about 3.7 per thousand births), we do not consider it an important cause for concern about data quality.
19The use of this dataset introduces a few differences from Hoem and Mure?an’s case. The first is that long life histories are not available here, complete ones even less so. We rely on short observations, covering three years at most (women observed from 2004 to 2007 – type C in Figure 1), but in some cases two years (women of type B) or even only one year (type A).
Lexis diagram for the selection of women in the dataset, Italy, 2004-2007

Lexis diagram for the selection of women in the dataset, Italy, 2004-2007
Note : Arrows represent women who are considered only if they have been interviewed at least twice, in any two consecutive years between 2004 and 2007. Filled rectangles represent the interview periods. Note that, while interviews were spaced 12 months apart on average, they were not actually conducted at the beginning of each year. In this paper, however, for the sake of simplicity, fertility is imputed to the year of the interview : it is therefore convenient to imagine that women were interviewed on December 31. All women’s characteristics, including age, are observed and recorded at the beginning of each year, and, when subject to change, they are updated at every new round of the panel (i.e. they are time-varying).20The sample consists of women who have been interviewed in any two consecutive years between 2004 and 2007. Some of their characteristics (covariates) are registered at the beginning of the 12-month period, and by looking at changes in the household composition, it is possible to ascertain whether they have had children, and how many, in the 12 months that separated two interviews. If a woman is interviewed three times (the second and the third interview taking place 12 and 24 months after the first), she remains under observation for two years, and she contributes twice to the data set (women of type B in Figure 1). If she is interviewed four times, she contributes three woman-years (women of type C in Figure 1). In all cases, her characteristics (e.g. age and employment status) are registered at the beginning of each (12-month) period of observation, and may vary over time. In total, there are more than 13,000 women years under observation in our three-year window and slightly more than 500 births (Table 1).
Woman-years, births and fertility, Italy, 2004-2006

Woman-years, births and fertility, Italy, 2004-2006
Note : Weighted data. All numbers are rounded. Fertility rates are per 1,000 women. Total fertility equals the sum of age-specific fertility rates, taking the width of the age class into account.21This paper proposes a new type of period fertility analysis that seeks to reconcile macro indicators (TFR) with micro-covariates, along the lines indicated by Hoem and Mure?an (2011a). In order to attenuate random fluctuations, it seemed preferable to combine the data for the three years of observation. Figure 1 provides an example of how we proceeded : for each five-year age group of women (e.g. 20-24 years), starting in 2004, 2005 and 2006, the observation spans 12 months and generates fertility rates (births/woman-years). These rates reflect the general shape of the Italian fertility curve reasonably well (Figure 2), but they are not perfect. The estimated TFR is just 1.01, as compared to the official value of 1.33 for Italy in those years. The average age at childbirth, on the other hand, is slightly higher in our case than in the official data (32.2 years, as opposed to about 31). In part, this pattern may depend on the under-representation of foreigners (about 3% in the sample, about 4.5% among residents), because foreigners in Italy have more children than Italians do, and at a younger age. However, this under-representation cannot explain the entire gap between the two sources, because the fertility of native Italians is estimated to be close to 1.2 children per woman (Gesano, Ongaro and Rosina 2007) against 1.01 in this sample. Another possible explanation is that households that had very recently had a child were more likely to be unavailable for re-interview, or had moved somewhere else and could not be located, so were (selectively) dropped from the panel. Note that the overall average non-response rate for the period 2004-2007 amounts to 18.6%.
Estimated age-specific fertility rates in Italy, 2004-2006

Estimated age-specific fertility rates in Italy, 2004-2006
Note : 25-29 is the reference age group in this paper (Table 2).22If the downward bias is non-selective, the “calibration parameter”, as Hoem and Mure?an (2011a) call it, corrects it perfectly. This parameter is simply a scaling factor (see Equation 4), equalling S = 1.33/1.015 (= 1.31) in our case, the application of which, by definition, “forces” the empirical estimate to coincide with the official value. However, if the downward bias is selective, the calibration parameter cannot fully correct the bias and careful further examination of sampling and weighting procedures is necessary. For the purpose of this paper, specific causes of the underestimation are less important, because the Italian dataset is used as an example of how to apply the proposed method rather than to investigate differential fertility. However, more details and general information on EU-SILC data quality can be found in Graf et al., (2011), an overview on the response rates by wave and rotational group for the Italian EU-SILC in European Commission (2010), and details on the weighting procedure in Verma et al. (2006).
III – Modelling period fertility with covariates
23The approach proposed in this paper is a standard discrete-time EHA application based on fertility rates observed in a “typical” calendar year (average of three consecutive years, 2004 to 2006). Data preparation is straightforward : each woman is considered only if she is present both in year t and in year t+1. Births are inferred from the addition of new household members born between two successive waves. Fertility rates h are computed (in our case by five-year age groups) using a standard event-history notation with hx,z = hr,0 × exp(?’Z), where Z is a matrix of dichotomous variables : 0 for the reference category (e.g. “age 25-29”), and 1 for each other modality of that variable (age, in this example). The baseline fertility intensity hr,0, also called basic time factor axt in Hoem and Mure?an (2011a), gives the fertility rate of the reference group r (the women for whom each dichotomous variable takes the value of 0), while exp(b’) are the estimates of the coefficients that act as inflators or deflators of the baseline fertility intensity.
24The results are presented in Table 2. Model 1 is the simplest : age is the only covariate and the estimated relative risks (RR) simply reflect the estimated fertility rates of Table 2 and Figure 2. In other words, Model 1 (based on EHA) and traditional fertility analysis are identical (after scaling) if EHA includes all the age groups and refers to the entire sample of women, without stratifying or selecting them in any way (apart from age).
Event history analysis (EHA) regression models for fertility in Italy, 2004-2006

Event history analysis (EHA) regression models for fertility in Italy, 2004-2006
Notes : Model results include missing categories (or non-response). Models are adjusted for intra-group correlations to account for the fact that some women are observed for more than one year. The constant represents the estimated risk for women in the reference group, in births per woman per year.Significance levels : *p ? 0.1, ** p ? 0.05, *** p ? 0.01.
25Model 2 broadens the picture by including two covariates : parity and education. What emerges is that women of parity 1 (i.e. with one child, of any age, at the moment of the interview) were twice as fertile in the subsequent 12 months as those of parity 0. Conversely, those who had already had 2 children were less fertile (27% less than nulliparous), and those who had already had 3 were the least likely to have another child. These results are not as surprising as they may appear at first sight, because marital status is not controlled for in this application : many childless women do not have a partner, whereas women with (at least) one child are in most cases married, or at least partnered (not shown here). Of course, marital status or living arrangement could be considered among the covariates, although this would introduce other forms of distortion, especially reverse causation, because in Italy most women who want to have a child enter a stable relationship first. [2]
26Education affects fertility and the sign of the relation has changed recently (Rosina and Testa, 2009 ; Régnier-Loilier and Vignoli, 2011). At present it is women with medium (+27%) or high education (+30%) who have more children, while women with low education have fewer, all other things being equal. But, of course, education does not act alone and this is why, in Model 3, additional covariates are considered : (equivalent) household income, employment (of the woman) and area of residence.
27As expected, in most cases the significance of the parameters gets lost because of the limited number of observations and because of the cross-correlation between the covariates. For instance, it is mostly in the centre-north of Italy that women are employed and household (equivalent) incomes are higher – in part because prices, too, are higher (by about 20% ; see De Santis and Maltagliati, 2010). But even if not significant, and even if they cannot be interpreted causally, the estimated signs of the regression parameters go in the expected direction and separate analysis on each single variable or subgroup of variables (not shown here) confirms that this is indeed how these covariates and fertility are associated.
28Women from high-income households have more children than others. Once again, a note of caution is in order here, because this depends in part on a “partner effect” : women without a partner are typically poorer and have fewer children.
29Employed women have more children than others : more refined analysis should take into account the type of occupation (e.g. permanent versus temporary), the working schedule (full-time versus part-time) and the employment status of the partner (see, for example, Vignoli et al., 2012), but, once again, the issue of selection would then have to be considered explicitly.
30Finally, women from the south have fewer children, even after controlling for all the other variables, which is indeed a remarkable change in comparison to a still recent past in Italy. The finding is not new, but what is new is the simplicity with which the present analysis brings it to the fore, net of other covariates.
31The same results can also be presented differently, as in Table 3, where percentage distributions and relative ratios are translated into (rough) estimates of fertility levels of each population subgroup. Two aspects are specially worth remarking here. First, the estimated values refer to period TFR, and are therefore subject to the well-known possible biases of tempo variations. This is particularly evident in the case of parity 1, which also suffers from a selection bias, in that it refers almost exclusively to partnered women (and, probably, women in recently-formed couples). In all the other cases the distortion seems to be less strong.
Estimates of TFR by various population characteristics, Italy, 2004-2006

Estimates of TFR by various population characteristics, Italy, 2004-2006
Notes : Subgroup TFRs have been forced to average 1.33 (official national average), given the RRs (from Table 2) and the distribution of women. Weighted data.32The second aspect worth emphasizing is that while the TFR column in Table 3 provides the same information as the RR column of the same table, the TFR column is much easier to read and interpret. With regard to parity, for instance, the correct way of reading the table is as follows : if all women in Italy, in the period 2004-2007, had “behaved” (in terms of fertility) in the same way as those with no children (at the start of the 12-month period of observation), the average TFR in those same years would have been (about) 1.20. If, instead, their fertility had been the same as that of those who already had one child at the start of the period (but of whom the great majority were also partnered) fertility would have been much higher : up to 2.44.
33Fertility differentials were relatively strong in Italy in those years (something that the RR column in Table 3 shows well), but even the most fertile subgroups (women of high educational level, or of high income, or employed, or living in the centre of Italy) remained far from replacement – and the TFR column of Table 3 brings this to the fore. Of course, interactions may be included in the analysis – as Hoem and Mure?an (2011a) did, for instance – and this is indispensable if one is specifically interested in the possible effect of the combined action of two or more covariates (e.g. living in the north and being employed and highly educated). But this finer analysis is possible only if the number of observations allows it, and in all cases it comes at the cost of confounding the message that this paper wants to convey : with some approximation, a cross-sectional TFR with covariates can be easily computed on modern datasets. This approach proves particularly useful on short panels.
Conclusion
34The approach proposed by Hoem and Mure?an (2011a), which, incidentally, is not the first that tries to reconcile micro-level results with aggregate fertility indicators like the TFR (see, for example, Schoumaker 2004), has several merits. The first, and perhaps most important, is that it is very simple, it gives an order of magnitude of the fertility level of various population subgroups, all other things being equal, and does so in a way that is consistent with the general TFR, which can in fact be obtained as a weighted average of the various group-specific TFRs. In so doing, it reconciles the modern, micro analysis of fertility (EHA) with the aggregate measures that non-demographers are still more familiar with and that are still useful for describing the general level and trend of fertility in a country.
35Secondly, and this is the focus of our paper, it may be applied not only to the databases typically used for fertility analysis, with detailed retrospective questions, but also to surveys with short panels. These surveys often provide information that is not available otherwise (e.g. fertility by income level before the birth of the child) or is biased by memory, ex post adaptation (e.g. about fertility intentions, values and norms), and selection of respondents. However, this advantage is to be weighed against other possible disadvantages, among which attrition-related bias is probably the most important.
36With limited effort, the approach yields descriptive statistics about fertility, basically a period TFR with covariates. While its results cannot be interpreted causally, they provide the general framework within which more refined analyses can be carried out.
Notes
-
[*]
Dipartimento di Statistica, Informatica, Applicazioni (DiSIA), University of Florence.
-
[**]
Demography Unit, Department of Sociology, Stockholm University.
Correspondence : Gustavo De Santis, DiSIA, Dip. Di Statistica Informatica, Applicazioni “G. Parenti”, Università di Firenze, Viale Morgagni 59, 50134 Firenze, Italy, tel. : +39 055 275 1560, email : gustavo.desantis@disia.unifi.it -
[1]
The only important difference is that in the short panel data described here the covariates can be observed not only after the birth of the child, as in the own-children method, but also before.
-
[2]
A possible alternative would be to focus only on married (or cohabiting) women, but this selection would make it impossible for us to connect our results to the general TFR in any simple way.