CAIRN-INT.INFO : International Edition

Demographers’ interest in genealogical data is not new. The development of websites for building and pooling family trees has understandably rekindled this interest. The authors of this article assess the quality of the measurements of the demographic dynamic (fertility, mortality, and migration) based on data from one such website. Biases are numerous and the general tendency is towards an underestimation of phenomena, but the contribution of these data to existing knowledge is undoubtedly the most promising in the field of internal migration.

1Researchers in historical demography draw extensively on the data about our ancestors (births, marriages, and deaths) contained in parish registries and, following the French Revolution, in the civil registry, as well as on census data (Henry and Blayo, 1975; Dupâquier, 1981; van de Walle, 1986; Bourdieu et al., 2004; Bonneuil et al., 2008). But historians and demographers are not the only ones to use these data. Genealogists also refer to them and can thus further knowledge on the history of populations. For example, at the University of Utah, Mormons have established voluminous genealogical databases that have contributed to the study of past populations (Bean et al., 1978; Lindahl-Jacobsen et al., 2013). However, as noted by Dupâquier (1993), while the work by genealogists renders ‘outstanding service to historians’, it suffers from two main biases. The first relates to the representativeness of genealogists, who form a little world apart, peopled mostly by men of a certain age and retired. The second stems from the ascending approach generally used by genealogists to reconstitute families. This method, which consists in starting with an individual and gradually going back in time to his or her ancestors, tends to obscure collaterals not having had children. As a result, low-fertility or infertile branches are likely to be incorrectly represented (Dupâquier, 1992).

2The rise of the Internet in the 2000s considerably changed these two biases. First of all, the digitization of archives facilitated access to a younger and more active population of genealogists (Hervis, 2012). The democratization of genealogy activities could thus reduce the bias of representativeness. Hervis (2012) nevertheless pointed out that most genealogists are aged over 50. Secondly, the development of the Internet was accompanied by the increased use of genealogy websites. Some of these sites provide fee-free access to electronic registries and call on web users to transcribe them on a collective basis with a view to storing the information contained in a normalized manner in databases. The transcribed registries enable users to easily reconstitute their family tree and share it with other users. This facilitates the reconstitution of collateral branches and thus reduces the bias of representativeness of infertile branches.

3While the approach adopted by genealogy websites and their users is not as rigorous as that employed in the survey of 3,000 families initiated by Jacques Dupâquier in the early 1980s (Dupâquier and Kessler, 1992), it does have several considerable advantages. The collaborative efforts of web users serve not just to cover numerous families and a more extensive territory but also to compile information in relatively short time spans and at lower cost.

4Broadly speaking, collaborative data would appear to offer promising prospects for research. From a technical standpoint, they provide a ‘learning’ sample that can be used to estimate the parameters of statistical models (Lease and Yilmaz, 2013). Their use can be seen in medicine, for example. In this discipline, while data collection is often limited to a circle of experts, as with the CIViC [1] database aimed at increasing knowledge on cancer (Griffith et al., 2017), data can also be collected effectively by non-experts, for example in the study of sleep (Warby et al., 2014). In geography, collaborative data, notably those of OpenStreetMap [2] (OSM), an international project to create a free world map, are also used by researchers. Haklay (2010) showed that while these data contain a few errors, the information OSM users provide are fairly precise when used for cartography. Girres and Touya (2010) concluded the same, though they stressed the difficulties relating to data heterogeneity.

5What, then, is the case for collaborative data in the field of genealogy? Are the problems of representativeness highlighted by Dupâquier (1993) still an issue? Can historical demography benefit from the genealogy data entered by millions of users? Several studies have been carried out based on the data pooled by the users of genealogy websites such as wikitree.com, familysearch.org, and geni.com. This work has served to study longevity (Gavrilov et al., 2002; Gavrilova and Gavrilov, 2007; Fire and Elovici, 2015). A recent study by Kaplanis et al. (2018) exploring the family trees of several million individuals shows that the genealogy data resulting from the collaborative work of enthusiasts can be used to build high-quality family trees. This article focuses on the situation in France, drawing on genealogical data from the Geneanet website and comparing the results of the latter with those already known in the literature. [3] The first part of the article overviews the methods used to extract and classify the information contained in Geneanet data. It then analyses the representativeness of the data using three indicators: the number of births per French department, the sex ratio, and the number of children per woman. The last part of the article looks at the mortality of individuals born in metropolitan France in the early 19th century and their migration, along with that of their descendants over three generations.

I – Construction of the data

1 – Data scope

6This work draws on data collected by genealogy enthusiasts researching their ancestors. Geneanet users can publish their family tree or keep it private; the data at our disposal come purely from published trees. [4] Geneanet users can build their family tree by entering in varying degrees of detail the information collected during their research. For each individual in the tree, the information concerning three event types corresponding to civil registry certificates—the individual’s birth, marriage(s), and death—may be entered, together with the place and date of each event. The individuals making up the tree are connected according to their parents and spouse(s).

7Collaborative genealogical data have numerous advantages, including spatial coverage. Many studies are limited to a village or a specific region owing to a lack of data on a larger territory or the financial resources for collecting such data across a broader geographical scope. Here, the data available can be used to consider the territory of metropolitan France as established today. [5] In addition, the data’s content does not oblige us to limit the sample to individuals with a family name beginning with a certain n-gram, as is the case with the TRA survey in France (Bourdieu et al., 2014) and the COR database in Belgium (Matthijs and Moreels, 2010). The data can also be used to cover a greater number of individuals over several generations. Lastly, this type of data has a sizeable cost advantage. The digitization of the content of various registries is a fastidious, costly, and time-consuming task. Entrusting this work to hundreds of thousands of users lends a certain interest to collaborative data.

8But collaborative genealogical data have their limits. [6] One of them consists of the difficulty of grouping common branches of the trees of different users. The main problem here is of a practical nature, resulting from the size of the databases. The colossal number of observations calls for considerable computing power or the use of alternatives requiring extensive computer processing. This involves segmenting data to obtain manageable objects while taking care not to isolate observations liable to ultimately belong to the same branch of a tree. [7] The second obstacle to grouping lies in the very source of the data. The copy of the registers referred to by genealogists may include errors, and the exactitude of the transcribed information varies. With the branches of two trees grouped using information copied by web users, any mistakes in copying may hinder the grouping process, though some errors may be corrected by comparing user copies and relying on the most frequently observed values. In a further issue, collaborative genealogical data are far from exhaustive. The presence of an individual and the quantity of information concerning them in the database depends on the existence of a source and, once again, user goodwill. For example, owing to a lack of time and exact information, a genealogist may omit a childless great-great uncle.

2 – The construction of family trees

9In this study, our analysis is restricted to individuals born in metropolitan France between 1 January 1800 and 31 December 1804 and their descendants up to three generations, or 2,457,450 people. This initial period corresponds, as pointed out by Fleury and Henry (1958), to the uninterrupted resumption in the civil registry of entries on marriages, births, and deaths across France.

10A few details should perhaps be provided on the construction of the sample. At the start of this work, a list was provided containing 238,009 names of users having mentioned in their family tree an ancestor born in metropolitan France in 1890 and 1900. The information extracted from the trees of these users contained 701,466,921 entries. The study is limited to individuals born between 1800 and 1804 and their descendants (1,547,086 individuals from the 1800–1804 generation, 402,190 children of these individuals, 286,071 grandchildren, and 222,103 great-grandchildren). Restricting ourselves to family trees including an ancestor born in France between 1890 and 1900 leads to a loss of information, which can be measured as follows. A sample including individuals born between 1800 and 1804 and their descendants is constituted, excluding trees in which no descendants are born between 1880 and 1889. Relative to the sample used in this article, this new sample contains 93.2% of the ancestors born between 1800 and 1804. The results concerning the representativeness of the data (see the following section) are only slightly impaired (no change in the sex ratio, a one-tenth change in the number of children per woman).

11To build their individual family trees, amateur genealogists mainly work in ascending genealogy (Brunet and Vézina, 2015). They start with an individual and gradually search for ancestors. The data used here correspond to this type of genealogy, consisting of direct information on the parents (where available) for each individual included in the database. Yet when studying the geographical dispersion of individuals based on their ancestors (the method used in this article), it is more appropriate to use a different approach consisting in identifying the descendants of individuals, referred to as descending genealogy. This requires reworking the data to follow descendants rather than ancestors. For example, the family tree of a user in the sample drawn on by this study includes two distinct observations concerning two individuals. The first concerns {Charles Mélanie Abel}-{Hugo}, born in Paris in 1826, the second {Adèle}-{Hugo}, born in Paris in 1830. The two observations point towards the same parents: {Victor Marie}-{Hugo}, born in Besançon in 1802, and {Adèle Julie}-{Foucher}, born in Paris in 1803. The left-hand section of Figure 1 illustrates the ascending approach and the right-hand section the descending approach of Victor Hugo and Adèle Foucher.

Figure 1

Example of an extract from the family tree of a user

Figure 1

Example of an extract from the family tree of a user

Note: On the left, data as presented in the database (two individuals with the same parents); on the right, data as we wish them to be structured (the descendants of each parent). Each box represents a person. The links between the boxes represent filiation. For simplicity, the brothers and sisters of Charles and Adèle Hugo are not presented.
Source: Geneanet.

12Extensive coupling work is thus required. Because each amateur genealogist builds their own tree, the same person may appear in several trees, resulting in numerous duplications in the raw data. This calls for substantial work on cleaning and formatting the data. [8] First, as the number of observations is considerable and computer-processing capacity limited, the sample is separated into subsamples by French department. Then, for each department, an algorithm detects the duplications. Once each department has been processed, the subsamples are grouped into a single sample for France. [9] The presence of the same individual in several trees proves useful in reinforcing the quality of the information because if an entry in a user’s tree is incomplete, the missing information may be found in the tree of another user. Merging the duplications thus serves to enrich the database.

13To detect duplications, six methods are applied successively. The first and simplest consists in merging individuals with the same following characteristics: last name, first name, sex, date of birth, mother’s last name, and father’s last name. The second method for merging individuals consists in taking account of errors made when entering last names and first names. Two people born in the same department in the same year with very similar last and first names (for example, {Matthieu Paul}-{Henri} or {Mathieu Paul}-{Henri}), as well as very similar parent names, may be grouped in a single entry. To that end, a measurement of the distance between last and first names is calculated. [10] In the third and fourth methods, possible mergers between individuals are detected by taking into account any possible errors concerning sex and incomplete dates (missing or erroneous days or months). The fifth way of identifying duplications is by drawing on similarities in first names, this time retaining only the initial first name and not the two or three others. The sixth and final method for detecting duplications is based on brothers and sisters. Where applicable, the list of the individual’s brothers and sisters is examined, particularly where some of them have the same first name and who were born or died in the same year. If this is the case, they may be grouped as one and the same person.

14Once the data of the trees of different users is matched, each individual is sorted according to generation into one of four categories: individuals from the 1800–1804 generations, their children, their grandchildren, and their great-grandchildren. The distribution of the birth year of the individuals in each category is shown in Figure 2. Each generation is born approximately at 30-year intervals.

Figure 2

Distribution of the birth years of the individuals in the sample according to their position in the family tree

Figure 2

Distribution of the birth years of the individuals in the sample according to their position in the family tree

Note: The number of births per year is expressed on the y-axis according to a logarithmic scale as the number of ancestors is large compared with that of their descendants.
Source: Geneanet; authors’ calculations.

II – The representativeness of the data

1 – Number of births per French department

15The geographical representativeness of the data is assessed here by comparing the number of individuals at the origin of the sample with institutional statistics. INSEE, the French national statistics office, provides information on the number of births per department at different points in time throughout the 19th century (Statistique générale de la France, 2010). In 1801, the values are given according to the sex of newborns. The ratio between the headcount of the sample and that of INSEE is calculated according to sex (Figure 3). For metropolitan France, the overall percentage of births identified using genealogy data is 32.8%. The coverage varies from one department to the next but appears to be independent of sex (the correlation between the departmental coverage of men and that of women is 98%). In departments in northern and eastern France, the number of births given in Geneanet data is close to 60%. For men, it stands at 57.7% in the Pas-de-Calais department and 57% in the Nord department. Southern and south-western departments show a much lower proportion of the births in the INSEE data (for example, just 8.4% for men in the Gers department, 11.6% in Hautes-Pyrénées, and 12.3% in Lot-et-Garonne). The strong heterogeneity observed between departments may be partly attributed to when access to electronic public archives became available. The departments with the lowest percentage of births according to the genealogy data include Gard, Gers, and Hautes-Pyrénées. These three departments have only recently begun to digitize their old civil registry. A further explanation may lie in the fee-free access to electronic data. For example, a low percentage of births in the Geneanet data may be observed in Charente and Calvados, two departments where access to digitized archives became free of charge only recently (2015 for Charente and 2016 for Calvados).

Figure 3

Proportion (%) of births recorded by INSEE in 1801 found in the Geneanet sample, by department

Figure 3

Proportion (%) of births recorded by INSEE in 1801 found in the Geneanet sample, by department

Note: NA denotes a missing value.
Source: Geneanet; authors’ calculations.

2 – Sex ratio

16The final sample contains a strong over-representation of men. The sex ratio at birth, i.e. the ratio between the number of male births and female births, comes out at 116. The sex ratio of births in 1801 alone rises to 117. [11] This is a high value relative to the sex ratio established by Blayo and Henry (1967) of 105.4 for the period 1740–1829 in Brittany and Anjou. INSEE data for male and female births in 1801 give a sex ratio of 105. This male over-representation applies to all departments, with a minimum of 105 (51.2% men) in the Loire department and a maximum of 136 (57.7%) in the Drôme department (Figure 4). A strong inverse correlation between high sex ratios and the sample rate is observed, suggesting that the missing individuals are most often women.

Figure 4

Sex ratio at birth per French department for individuals born in 1801

Figure 4

Sex ratio at birth per French department for individuals born in 1801

Source: Geneanet; authors’ calculations.

17The existence of a sex-related bias in genealogy studies has already been observed. Gavrilov and Gavrilova (2001) note an under-reporting of women and children in genealogy databases in general. They consider that the bias of female representativeness in their data on European royal families and nobility mainly concerns the oldest records, i.e. those dating from the 19th century. The TRA survey suffers from the same bias, as highlighted by Bourdieu et al. (2014). In this case, it would appear to result in part from the way the sample is built on male lines, with the wives nevertheless added to each generation. Amateur genealogists clearly construct their trees by following patriarchal lines. Brunet and Bideau (2001) note that amateur genealogists ‘generally limit themselves to an ascending line, drawing to varying degrees on collateral lines’. As maternity is easier to establish than paternity, it would be more logical to follow matrilinear lines, as Balsamo (1999) notes. But as stressed by Eichner (2014), in accordance with the Napoleonic Code, ‘women are subsumed by the civil status of their husband.’ [12] As such, following the change in a woman’s last name when marrying, it perhaps becomes more difficult to follow her in registries. Where the marriage registry does not exist or the genealogist is unable to find it, the trace of the woman is potentially lost. [13]

3 – Fertility

18In the sample, women who married during their lifetimes and who reached at least the age of 15 had an average of 1.46 children. [14] For the women in the initial generation (1800–1804), the average number of children is 1.36. The number rises to 1.61 for the following generation (children) and remains stable at 1.62 children per woman for the generation of the grandchildren. These levels are much lower than those reported by Chesnais (1986), which are as high as 4.46 children per woman at the start of the 19th century and gradually decrease to 2.9 children per woman 100 years later. This indicator is weak because, in the genealogies constructed by enthusiasts, as stressed by Brunet and Bideau (2001), some individuals, notably those who remained single, are not necessarily taken into account. But in the next generation, the lack of single people should head in the opposite direction since single people theoretically have fewer children. Moreover, Hollingsworth (1976) explains that genealogists are not particularly interested in the question of infant mortality in the past, as the efforts required to find out whether a child mentioned died young or not are too considerable. The absence of these individuals appears to imply de facto a biased measurement of fertility levels. The distribution of the number of children per married woman per generation is illustrated in Table 1. While the proportion of women without children for individuals in the first generation (47%) appears high relative to the proportion put forward by Houdaille and Tugault (1987), at around 15%, that of the following generations (27% and 26%) tallies with the values for the early 20th century reported by Toulemon (1995).

Table 1

Distribution (%) of the number of children per married woman according to position in the family tree

Table 1
Generation Number of children 1800—1804 Children Grandchildren 0 46.9 27.4 25.8 1 24.5 39.2 40.3 2 10.3 13.2 13.7 3 6.1 7.5 7.9 4 4.0 4.8 4.5 5 2.6 2.9 2.7 6 1.9 1.9 1.9 7 1.2 1.2 1.2 8 0.8 0.8 0.8 9 0.5 0.5 0.5 10 0.4 0.4 0.3 >10 0.6 0.4 0.5 Total 100 100 100 Sample size 1,547,086 402,190 286,071

Distribution (%) of the number of children per married woman according to position in the family tree

Note: The sample does not provide information on the number of children of the great-grandchildren of individuals born between 1800 and 1804.
Source: Geneanet; authors’ calculations.

III – Mortality in France in the early 19th century

19This section analyses the mortality of the generations born from 1800 to 1804, at the national and then at the regional level. For each generation, we calculate the number of individuals still alive as well as the number of deaths at each age using the dates of birth and death provided for 813,551 individuals (or 53% of the total). This makes it simple to calculate the probability of survival at each age for each cohort. These estimates are compared with the data from the tables on mortality per generation produced by Vallin and Meslé (2001), [15] which provide mortality estimates starting from the first age reached in 1806. It is thus not possible to compare the data with those of Geneanet for the first ages. As such, the likelihood of survival is estimated only for individuals having reached at least the age of 6. Once these estimates have been made separately for each of the five cohorts, the average survival probability is calculated for each age, followed by the average survival probability at each age in the 1800–1804 cohorts based on the tables by Vallin and Meslé.

20Compared with historical values, these results initially underestimate the mortality of women and then overestimate it beyond age 25 (Figure 5). The mortality of men is underestimated up to more advanced ages, nearing 40. For both sexes, the estimates for ages 40 through 90 made using genealogy data are much closer to those proposed in the literature. However, a considerable underestimate for both sexes is observed beyond age 90. Despite the differences, the general shape of the curves corresponds (roughly) to that from Vallin and Meslé.

Figure 5

Comparison of the curves of survival and risk of death according to age and sex

Figure 5

Comparison of the curves of survival and risk of death according to age and sex

Note: The estimates are made conditional on the individuals having survived until at least their 6th year.
Source: Authors’ calculations based on Vallin and Meslé (2001) and Geneanet.

21For each of the five cohorts, life expectancy by sex based on data from Vallin and Meslé is compared with that estimated with Geneanet data (Figure 6). [16] For the 1804 cohort, life expectancy at age 6 estimated using the Vallin and Meslé mortality tables is 49.7 years for women (95% confidence interval ranging from 49.6 to 49.9) and 48.9 years for men (confidence interval from 48.8 to 49.1). [17] The estimates made using data from family trees result in substantially lower values for women (49.2 years, with a confidence interval of 95% from 49.1 to 49.4) and higher for men (51.4 years, with a confidence interval of 95% from 51.3 to 51.6). The genealogy data tend to underestimate the residual life expectancy of women and overestimate that of men. For each cohort and each sex, the average differences of life expectancy at each age between the estimates made using genealogy data and those made using data from Vallin and Meslé vary between −0.37 years and −0.56 years for the female cohorts, and 0.55 years and 0.77 years for the male cohorts. The average of these averages per cohort comes out at −0.47 years for women and 0.57 years for men. The differences are more pronounced for life expectancy calculated at young ages.

Figure 6

Life expectancy according to the age reached by cohort and sex

Figure 6

Life expectancy according to the age reached by cohort and sex

Source: Authors’ calculations based on Vallin and Meslé (2001) and Geneanet.

22While biases are observed in the measurement of the residual life expectancy of men and women based on Geneanet data, geographical differences are perhaps more faithfully transcribed. [18] An excellent point of comparison is provided by van de Walle (1973), who estimated life expectancy at birth per French department for 10 groups of different female cohorts. One of the groups concerned the cohorts from 1801 to 1810, which are compared with our 1800–1804 cohort. Based on each source, differences in life expectancy at birth are calculated for each department relative to the national average. The results are shown in the maps in Figure 7. Overall, the departmental contrasts are similar for both sources. The spatial correlation between the two series amounts to 0.65. The genealogy data, similar to the results of van de Walle (1973), show that women’s life expectancy is higher than the national average in Normandy and south-western France and lower in central France. As with van de Walle (1973), though slightly less clearly, a triangular area having its points in the Ille-et-Vilaine, Nièvre, and Gironde departments, and within which life expectancy at birth is lower than the national average, may be observed.

Figure 7

Differences relative to the national average life expectancy at birth (in years) per department for the 1800-1804 cohorts (Geneanet) and the 1801-1810 cohorts (van de Walle)

Figure 7

Differences relative to the national average life expectancy at birth (in years) per department for the 1800-1804 cohorts (Geneanet) and the 1801-1810 cohorts (van de Walle)

Source: Authors’ calculations based on van de Walle (1973) and Geneanet.

IV – Sedentism and internal migration in France in the 19th century

23Besides dates relating to birth, marriage, and death, genealogy data provide information on place in the form of GPS coordinates. [19] An initial way of describing intergenerational migration consists in observing at the departmental level the existence of substantial population movements. To that end, the birth department of the observed individuals born between 1800 and 1804 in France is compared with that of their descendants. [20] Account is taken only of the coordinates of an individual from the 1800–1804 generations and a descendant (child, grandchild, or great-grandchild). With this approach to representation, a child with two known parents will be present twice in the observations, once to characterize migration relative to the mother’s place of birth and once to characterize migration relative to the father’s place of birth. The map on the left in Figure 8 shows that most of the children of the individuals at the origin of the study were born in the same department as their ancestor born between 1800 and 1804. On average, per department, only 17.4% of the children were born in a department different from that of their ancestor born between 1800 and 1804, with a standard deviation of 4.64 percentage points. This percentage is lowest in the Nord department (8.51%) and highest in the Meurthe-et-Moselle department (31.6%). Overall, the proportion of grandchildren born in a department different from that of their ancestor is slightly higher, the average per department increasing to 31.2%. The lowest and highest percentages are once again observed in the Nord (15.9%) and Meurthe-et-Moselle departments (46.5%). It is only with the generation of great-grandchildren that more emphatic geographical distinctions are observed. On average, the proportion of great-grandchildren born in a department different from that of their 1800–1804 ancestor is 44%, with a standard deviation of 8.74 percentage points. This proportion reaches a high of 67.2% in Territoire de Belfort and a low of 21.9% in Haute-Savoie.

Figure 8

Proportion (%) of descendants born in a department different from that of their ancestor born between 1800 and 1804, by department and according to position in the family tree

Figure 8

Proportion (%) of descendants born in a department different from that of their ancestor born between 1800 and 1804, by department and according to position in the family tree

Source: Geneanet; authors’ calculations.

1 – Short- and long-distance migration

24Movements may be analysed in greater depth on the basis not of birth departments but the distance travelled by individuals from one generation to the next. More specifically, birthplace coordinates may be used to measure the intergenerational distances travelled by the French population in the 19th century. This approach sheds a different light on migration from that in the previous section. Until now, we have compared the place of birth of a descendant (child, grandchild, or great-grandchild) relative to the place of birth of their ‘closest’ ancestor born between 1800 and 1804, i.e. the ancestor born between 1800 and 1804 whose place of birth is closest geographically to that of their descendant. Figure 9 illustrates these distances. As with the results of Bourdieu et al. (2000), the number of sedentary individuals peaks before declining over the generations, after which the distribution is spread out, at a little over 10 km, and then, in more recent generations, distances of over 100 km.

25In the literature, individuals’ movements are frequently characterized according to the distance travelled (short vs. long). According to Rosental (2004), beyond a certain migration distance, individuals leave a familiar environment, specific to their municipality. The increase in this distance also leads to an increase in costs, economic and otherwise (Kesztenbaum, 2008). These costs become an obstacle to migration, but some categories of the population are more willing to bear them. The literature also highlights a phenomenon of positive selection on the part of individuals completing long-distance moves. An initial selection factor concerns their place of residence, with a contrast between urban and rural environments. As noted by Rosental (2004), urban and rural populations differed considerably in the 19th century, urban populations tending to be more attracted to cities and more inclined to cover long distances. Education is also identified as a selection factor, in particular by Bourdieu et al. (2000), who show that short-distance migration is associated with modest education levels compared with long-distance migration. Bonneuil et al. (2008) ascribe this aspect to social mobility.

Figure 9

Distribution of the distances separating the birthplace of individuals in the sample from the birth place of their closest ancestor born between 1800 and 1804, according to their position in the family tree

Figure 9

Distribution of the distances separating the birthplace of individuals in the sample from the birth place of their closest ancestor born between 1800 and 1804, according to their position in the family tree

Source: Geneanet; authors’ calculations.

26In this article, short-distance migration is limited to 20 km. Comparatively, Rosental (2004) sets this value at 25 km, Bourdieu et al. (2000) at 20 km, and Kesztenbaum (2008) at 17 km. Their choice, like our own, is based largely on the median value of the distances covered by migrants. As shown in Table 2, the proportion of short-distance migrants (19%) is practically the same as that of long-distance migrants (18%) for the children of individuals from the first generation. The choice of 20 km also offers the advantage of remaining in a familiar environment for short-distance migration. Table 2 shows that the share of the children of individuals born at the start of the 19th century in the same place as their parent stands at 62%, falling to 38% for the grandchildren, and 24% for the great-grandchildren. At the same time, long-distance migration increases much more sharply (from 18% to 49%) than short-distance migration (from 19% to 27%). But caution should be taken in interpreting these results, as the high percentages of sedentary individuals may result in part from the difficulties encountered by amateur genealogists in finding mobile ancestors.

Table 2

Distribution (%) of the individuals in the sample according to the distance between their place of birth and that of their closest ancestor born between 1800 and 1804

Table 2
Children Grandchildren Great-grandchildren Sedentary 62 38 24 Short-distance migrants 19 28 27 Long-distance migrants 18 34 49 Total 100 100 100

Distribution (%) of the individuals in the sample according to the distance between their place of birth and that of their closest ancestor born between 1800 and 1804

Note: Sedentary individuals were born in the same place as their closest ancestor born between 1800 and 1804. Short-distance migrants were born less than 20 km away and long-distance migrants more than 20 km away.
Source: Geneanet; authors’ calculations.

V – Discussion and conclusion

27This article is based on the use of data generated through the collaborative efforts of hundreds of thousands of amateur genealogists to build and share their family trees. These data were used to study the population of metropolitan France in the 19th and 20th centuries. While the sources relied on by genealogists coincide with those used by some historians and demographers, i.e. parish and civil registries, their digitization is entirely different. Each genealogist builds his or her own tree, omitting or including, and with varying degrees of detail, what they have learned about their ancestors. The pooling of the family trees built by numerous users suggests that it is possible to use the wealth of information contained in each tree, particularly to describe the population in the past. However, whereas the methodical and rigorous work of researchers, as part of projects such as the TRA survey, serves to minimize biases in the data, it is uncertain that the pooling of the genealogical data shared by enthusiasts does not contain biases. This article has aimed to explore the limits of collaborative genealogical data and to examine the extent to which these data may be used in demographic research.

28The results show that these data may not be used to study female fertility because of the presence of strong biases underestimating fertility. The unsatisfactory reporting of births also appears to have repercussions on infant mortality, the study of which using genealogical data should be avoided. The study of mortality is not impossible, however, if the focus is on older adults. In this respect, estimates of the survival of individuals aged over 40 correspond to those provided in the literature. However, perhaps because of the low population sizes, Geneanet data once again underestimates mortality in very old age. The study of migration gives similar results to those found in the literature, notably on the distance of moves made by individuals from one generation to the next.

Acknowledgements

This research was carried out as part of the ‘Valorisation et nouveaux usages actuariels de l’information’ research initiative (exploitation and new actuarial uses of information) coordinated by the Fondation du Risque in partnership with GENES, Université de Rennes 1 and Université Paris-Est La Vallée. Our work was also supported by a subsidy from the Agence Nationale de la Recherche (ANR-17-EURE-0020).
A preliminary study was presented at the Science XXL event in March 2017 at INED. We would like to thank Olivier Cabrignac and Jérôme Galichon for their help in exploring the data, as well as the participants for the discussions that inspired some of the aspects of this study. We would also like to thank the members of INED’s History and Population Research Unit for their comments. We also benefited from productive dialogue with participants at several conferences, including UseR (Budapest, May 2018), Rencontres R (Rennes, July 2018), the 29th International Biometric Conference (Barcelona, July 2018), the Eco-Lunch seminar (Marseille, September 2018), and the École thématique TEPP-CNRS, Évaluation des Politiques Publiques (Aussois, March 2019).

Appendices

Appendix A

Table A.1

Sex ratio at birth by department with confidence intervals, for individuals born in 1801

Table A.1
Department Number of boys per 100 girls 95% confidence interval Ain (01) 110.7 [95.4, 128.5] Aisne (02) 121.1 [108.6, 135.3] Allier (03) 112.5 [98.6, 128.5] Alpes-de-Haute-Provence (04) 113.2 [91.9, 139.7] Alpes-Maritimes (06) 107.4 [88.4, 130.8] Ardèche (07) 123.2 [106.8, 142.3] Ardennes (08) 121.6 [104.9, 141.2] Ariège (09) 112.7 [91.4, 139.4] Aube (10) 127.1 [106.5, 152.3] Aude (11) 128.2 [109.3, 150.9] Aveyron (12) 116.3 [100.9, 134.3] Bouches-du-Rhône (13) 105.8 [88.7, 126.3] Calvados (14) 121.0 [102.5, 143.2] Cantal (15) 122.1 [102.3, 146.2] Charente (16) 117.4 [98.3, 140.7] Charente-Maritime (17) 111.6 [98.0, 127.1] Cher (18) 109.3 [96.7, 123.6] Corrèze (19) 117.8 [101.3, 137.3] Corse (20) 129.4 [102.7, 164.0] Côte-d’Or (21) 127.4 [112.7, 144.4] Côtes-d’Armor (22) 123.0 [112.3, 134.9] Creuse (23) 117.7 [101.2, 137.2] Dordogne (24) 121.3 [106.0, 139.0] Doubs (25) 118.9 [100.5, 141.0] Drôme (26) 136.3 [116.2, 160.6] Eure (27) 109.7 [95.8, 125.7] Eure-et-Loir (28) 119.7 [103.9, 138.1] Finistère (29) 111.6 [101.3, 122.9] Gard (30) 121.7 [100.6, 147.8] Gers (32) 125.7 [90.7, 176.5] Gironde (33) 114.7 [97.7, 135.0] Haute-Garonne (31) 110.8 [94.8, 129.8] Haute-Loire (43) 117.1 [99.6, 138.0] Hautes-Alpes (05) 124.3 [103.7, 149.6] Hérault (34) 111.0 [94.9, 130.0] Ille-et-Vilaine (35) 119.3 [107.6, 132.3] Indre (36) 118.3 [100.9, 138.9] Indre-et-Loire (37) 127.8 [106.1, 154.8] Isère (38) 118.9 [105.7, 133.8] Jura (39) 118.9 [99.8, 142.0] Landes (40) 114.6 [96.7, 136.1] Loir-et-Cher (41) 112.7 [95.8, 132.9] Loire (42) 105.1 [90.8, 121.7] Loire-Atlantique (44) 124.8 [111.3, 140.2] Loiret (45) 116.3 [100.9, 134.2] Lot (46) 121.1 [99.3, 148.4] Lot-et-Garonne (47) 114.1 [91.7, 142.5] Lozère (48) 120.2 [96.9, 149.7]

Sex ratio at birth by department with confidence intervals, for individuals born in 1801

Table A.1

(cont’d). Sex ratio at birth by department with confidence intervals, for individuals born in 1801

Table A.1
Department Number of boys per 100 girls 95% confidence interval Maine-et-Loire (49) 120.4 [105.3, 137.9] Marne (51) 123.6 [110.2, 138.9] Haute-Marne (52) 120.9 [105.4, 139.0] Mayenne (53) 117.3 [102.5, 134.5] Meurthe-et-Moselle (54) 126.0 [112.9, 140.9] Meuse (55) 122.5 [108.6, 138.5] Morbihan (56) 116.6 [105.0, 129.7] Moselle (57) 113.8 [103.7, 125.0] Nièvre (58) 122.7 [106.2, 142.1] Nord (59) 117.2 [109.9, 124.9] Oise (60) 124.4 [109.0, 142.2] Orne (61) 128.0 [109.8, 149.7] Pas-de-Calais (62) 114.7 [105.7, 124.6] Puy-de-Dôme (63) 120.9 [106.9, 137.0] Pyrénées-Atlantiques (64) 110.5 [97.7, 125.1] Hautes-Pyrénées (65) 123.1 [89.0, 172.1] Pyrénées-Orientales (66) 121.6 [99.7, 149.0] Bas-Rhin (67) 109.5 [100.2, 119.6] Haut-Rhin (68) 109.3 [97.4, 122.7] Rhône (69) 117.7 [102.1, 136.0] Haute-Saône (70) 124.4 [109.0, 142.3] Saône-et-Loire (71) 116.6 [105.3, 129.3] Sarthe (72) 108.6 [96.1, 122.9] Savoie (73) 108.5 [94.1, 125.2] Haute-Savoie (74) 109.5 [94.6, 126.9] Paris (75) 125.7 [103.4, 153.5] Seine-et-Marne (77) 123.0 [109.3, 138.6] Seine-et-Oise (78) 118.0 [105.0, 132.6] Seine-Maritime (76) 120.7 [108.0, 135.1] Vosges (88) 125.2 [111.9, 140.3] Manche (50) 121.1 [108.6, 135.1] Deux-Sèvres (79) 112.4 [98.0, 129.1] Somme (80) 120.6 [107.5, 135.5] Tarn (81) 108.2 [93.3, 125.6] Tarn-et-Garonne (82) 119.3 [96.4, 148.1] Var (83) 117.8 [95.7, 145.6] Vaucluse (84) 120.0 [99.9, 144.6] Vendée (85) 116.1 [103.1, 130.9] Vienne (86) 112.0 [97.2, 129.4] Haute-Vienne (87) 128.9 [108.5, 153.7] Yonne (89) 117.7 [104.0, 133.3] Territoire de Belfort (90) 123.3 [89.3, 172.2]

(cont’d). Sex ratio at birth by department with confidence intervals, for individuals born in 1801

Note: Numbers in parentheses after a department’s name are the two-digit code assigned by the national statistics office of France (INSEE). The sampling rates per department, i.e. the ratio between the size of the sample and the size of the population identified by INSEE, may be used to establish a confidence interval for measuring the sex ratio. Details on the calculation are available in Appendix D in Brian and Jaisson (2007).
Source: Geneanet; authors’ calculations.

Appendix B

Bootstrapping confidence intervals for life expectancy

29The life expectancy estimates in this paper are accompanied by 95% confidence intervals obtained via bootstrapping, a method inspired by that established by Carlo-Giovanni Camarda. [21] The idea, on the basis of a population S0 born in a given year (for example, 1800) and a probability of dying within the year (equation im14, estimated with our data for this cohort), is to proceed in an iterative fashion. At age x, if Sx are still living, the number of deaths dx is drawn according to a binomial distribution equation im15. We then assume Sx + 1 = Sxdx, arriving at a new number of deaths for the following age, and so on. These iterative steps are produced independently in 1,000 samples. In each of the samples, we calculate life expectancy at birth according to the draws realized. The empirical quantiles of 0.025 and 0.975 thus become confidence intervals at 95% for the life expectancy at birth for the individuals in the cohort in question.

The spatial heterogeneity (per department) of life expectancies at birth

30Genealogical data are biased in terms of estimates of life expectancy at birth. But it is useful to examine the spatial heterogeneity of the estimates. To do so, we estimate the life expectancy at birth of the individuals born between 1800 and 1804 in each of the departments. Figure B.1 shows the spatial heterogeneity of estimates of life expectancy at birth by department. The corresponding numerical values are presented in Table B.1. Regional differences are independent of the sex of the individuals. In the regions corresponding to the present-day Occitanie, as well as Normandy, the life expectancy of the individuals born between 1800 and 1804 is higher than in the other regions. In contrast, life expectancy in central France is lower than in the rest of the country.

Figure B.1

The spatial heterogeneity (per department) of life expectancy at birth for the cohorts born between 1800 and 1804

Figure B.1

The spatial heterogeneity (per department) of life expectancy at birth for the cohorts born between 1800 and 1804

Source: Geneanet; authors’ calculations.
Table B.1

Life expectancy at birth (years) by department and sex for the cohorts born between 1800 and 1804

Table B.1
Women Men Department 95% confidence Life expectancy interval 95% confidence Life expectancy interval Ain (01) 41.9 [41.0, 42.8] 40.5 [39.5, 41.4] Aisne (02) 41.1 [40.4, 41.8] 39.9 [39.1, 40.6] Allier (03) 41.5 [40.6, 42.4] 36.7 [35.8, 37.6] Alpes-de-Haute-Provence (04) 42.6 [41.2, 44.1] 37.3 [35.7, 38.8] Alpes-Maritimes (06) 42.6 [41.2, 43.9] 42.3 [40.8, 43.7] Ardèche (07) 51.2 [50.3, 52.0] 47.3 [46.3, 48.2] Ardennes (08) 46.4 [45.5, 47.3] 44.6 [43.6, 45.7] Ariège (09) 49.9 [48.5, 51.1] 48.2 [46.8, 49.5] Aube (10) 47.0 [45.8, 48.2] 44.8 [43.4, 46.1] Aude (11) 38.7 [37.5, 39.8] 36.1 [35.0, 37.4] Aveyron (12) 53.1 [52.3, 53.9] 48.5 [47.6, 49.4] Bas-Rhin (67) 43.8 [43.2, 44.3] 43.4 [42.9, 43.9] Bouches-du-Rhône (13) 44.0 [42.9, 45.2] 40.7 [39.6, 41.8] Calvados (14) 52.9 [52.0, 53.9] 52.5 [51.5, 53.5] Cantal (15) 49.4 [48.2, 50.5] 47.2 [46.0, 48.4] Charente (16) 41.3 [40.1, 42.6] 36.5 [35.3, 37.8] Charente-Maritime (17) 39.8 [39.0, 40.7] 38.0 [37.0, 38.9] Cher (18) 35.5 [34.7, 36.2] 33.6 [32.8, 34.4] Corrèze (19) 43.5 [42.6, 44.5] 38.9 [38.0, 39.9] Corse (20) 46.7 [45.5, 47.9] 45.8 [44.5, 47.1] Côte-d’Or (21) 41.9 [41.1, 42.7] 38.0 [37.2, 38.9] Côtes-d’Armor (22) 42.5 [41.9, 43.0] 41.2 [40.6, 41.9] Creuse (23) 40.6 [39.6, 41.6] 40.2 [39.1, 41.2] Deux-Sèvres (79) 46.8 [45.9, 47.7] 45.0 [44.1, 46.0] Dordogne (24) 41.5 [40.6, 42.3] 37.5 [36.6, 38.5] Doubs (25) 48.0 [47.0, 49.0] 47.5 [46.4, 48.6] Drôme (26) 46.4 [45.4, 47.4] 42.1 [41.0, 43.2] Eure (27) 45.4 [44.5, 46.3] 43.1 [42.1, 44.1] Eure-et-Loir (28) 43.3 [42.4, 44.2] 43.2 [42.2, 44.1] Finistère (29) 41.9 [41.4, 42.5] 39.5 [38.9, 40.1] Gard (30) 46.4 [45.2, 47.6] 41.5 [40.1, 42.8] Gers (32) 48.3 [46.2, 50.4] 47.1 [44.9, 49.3] Gironde (33) 46.7 [45.6, 47.8] 42.7 [41.5, 43.9] Haut-Rhin (68) 42.4 [41.7, 43.1] 42.5 [41.9, 43.2] Haute-Garonne (31) 43.7 [42.6, 44.8] 42.7 [41.6, 43.9] Haute-Loire (43) 48.8 [47.9, 49.8] 47.3 [46.3, 48.3] Haute-Marne (52) 40.7 [39.8, 41.5] 39.5 [38.5, 40.4] Haute-Saône (70) 40.4 [39.6, 41.2] 38.3 [37.4, 39.2] Haute-Savoie (74) 42.8 [41.9, 43.7] 42.0 [41.1, 42.9] Haute-Vienne (87) 43.5 [42.5, 44.5] 38.8 [37.8, 39.9] Hautes-Alpes (05) 36.5 [35.3, 37.7] 31.7 [30.5, 32.9] Hautes-Pyrénées (65) 53.5 [51.5, 55.5] 51.1 [48.9, 53.3] Hérault (34) 43.7 [42.7, 44.8] 41.2 [40.1, 42.4] Ille-et-Vilaine (35) 40.8 [40.2, 41.4] 39.9 [39.2, 40.5] Indre (36) 45.6 [44.5, 46.6] 41.7 [40.6, 42.8] Indre-et-Loire (37) 43.7 [42.5, 44.9] 43.8 [42.5, 45.1] Isère (38) 44.8 [44.0, 45.5] 42.7 [41.8, 43.6] Jura (39) 42.7 [41.7, 43.8] 40.2 [39.0, 41.4]

Life expectancy at birth (years) by department and sex for the cohorts born between 1800 and 1804

Table B.1

(cont’d). Life expectancy at birth (years) by department and sex for the cohorts born between 1800 and 1804

Table B.1
Women Men Department 95% confidence Life expectancy interval 95% confidence Life expectancy interval Landes (40) 41.3 [40.2, 42.4] 38.9 [37.7, 40.1] Loir-et-Cher (41) 38.9 [37.9, 40.0] 37.9 [36.8, 39.1] Loire (42) 43.5 [42.6, 44.5] 42.6 [41.7, 43.5] Loire-Atlantique (44) 43.3 [42.6, 43.9] 42.9 [42.2, 43.6] Loiret (45) 43.2 [42.3, 44.1] 42.5 [41.5, 43.4] Lot (46) 49.3 [48.0, 50.6] 45.7 [44.2, 47.2] Lot-et-Garonne (47) 52.0 [50.5, 53.4] 48.4 [46.8, 50.0] Lozère (48) 50.7 [49.5, 52.0] 44.2 [42.7, 45.7] Maine-et-Loire (49) 46.5 [45.7, 47.3] 44.9 [44.0, 45.9] Manche (50) 47.9 [47.3, 48.5] 48.3 [47.6, 49.0] Marne (51) 42.1 [41.4, 42.8] 40.7 [39.9, 41.6] Mayenne (53) 43.0 [42.1, 43.8] 41.1 [40.2, 42.0] Meurthe-et-Moselle (54) 39.3 [38.6, 40.0] 36.5 [35.7, 37.3] Meuse (55) 39.1 [38.3, 39.8] 37.8 [36.9, 38.6] Morbihan (56) 39.5 [38.9, 40.1] 38.5 [37.8, 39.2] Moselle (57) 42.7 [42.1, 43.3] 42.6 [42.0, 43.3] Nièvre (58) 39.6 [38.6, 40.5] 35.0 [34.0, 36.1] Nord (59) 43.7 [43.3, 44.1] 42.4 [42.0, 42.8] Oise (60) 42.6 [41.8, 43.4] 41.1 [40.1, 42.0] Orne (61) 49.1 [48.1, 50.0] 48.6 [47.5, 49.7] Paris (75) 59.3 [58.5, 60.1] 57.9 [56.9, 58.9] Pas-de-Calais (62) 45.9 [45.4, 46.4] 45.2 [44.6, 45.7] Puy-de-Dôme (63) 44.3 [43.5, 45.1] 43.3 [42.4, 44.1] Pyrénées-Atlantiques (64) 49.6 [48.7, 50.4] 47.3 [46.5, 48.2] Pyrénées-Orientales (66) 41.7 [40.4, 43.0] 39.0 [37.7, 40.5] Rhône (69) 45.1 [44.2, 46.0] 42.3 [41.3, 43.3] Saône-et-Loire (71) 39.4 [38.7, 40.0] 37.2 [36.5, 37.9] Sarthe (72) 46.5 [45.7, 47.3] 46.4 [45.6, 47.3] Savoie (73) 37.4 [36.5, 38.3] 38.3 [37.4, 39.2] Seine-et-Marne (77) 39.5 [38.7, 40.2] 38.8 [38.0, 39.6] Seine-et-Oise (78) 42.1 [41.4, 42.8] 39.7 [38.8, 40.5] Seine-Maritime (76) 47.7 [47.1, 48.4] 48.2 [47.5, 48.9] Somme (80) 43.2 [42.5, 43.9] 42.0 [41.3, 42.8] Tarn (81) 45.8 [44.8, 46.9] 43.3 [42.3, 44.4] Tarn-et-Garonne (82) 45.7 [44.3, 47.1] 45.6 [44.1, 47.1] Territoire de Belfort (90) 45.7 [43.7, 47.7] 44.2 [42.1, 46.3] Var (83) 46.2 [44.8, 47.6] 41.6 [40.0, 43.1] Vaucluse (84) 44.8 [43.5, 46.1] 39.1 [37.8, 40.5] Vendée (85) 41.9 [41.2, 42.6] 40.1 [39.4, 40.9] Vienne (86) 44.2 [43.2, 45.1] 42.0 [41.0, 43.1] Vosges (88) 42.3 [41.6, 43.0] 40.0 [39.3, 40.7] Yonne (89) 42.2 [41.4, 43.0] 39.2 [38.3, 40.0]

(cont’d). Life expectancy at birth (years) by department and sex for the cohorts born between 1800 and 1804

Note: Numbers in parentheses after a department’s name are the two-digit code assigned by the national statistics office of France (INSEE).
Source: Geneanet; authors’ calculations.

Notes

  • [1]
    Clinical Interpretations of Variants in Cancer: https://civicdb.org/home
  • [2]
  • [3]
    Geneanet (https://www.geneanet.org/) proves well adapted to a study on the situation in France as it is the leading site in France in terms of users. Through its three million members, it provides information on over six billion individuals.
  • [4]
    The case with 99% of trees.
  • [5]
    Our study relies on individuals born in metropolitan France between 1800 and 1804 and their descendants, the latter including individuals who may have been born outside France.
  • [6]
    A more exhaustive list of the problems relating to genealogical data can be found in the article by Dupâquier (1993).
  • [7]
    Further information on the data are available online at: https://3wen.github.io/genealogy/
  • [8]
    More detailed information on the procedure may be found online at: https://3wen.github.io/genealogy/
  • [9]
    Division by department is carried out based on the birth department of the 1800–1804 generations. After processing by department, all the trees are grouped in the same database and cross-departmental duplications are searched for and eliminated. Further explanations, statistics, and examples are available online at: https://3wen.github.io/genealogy/
  • [10]
    More specifically, we calculate the ‘cosine measurement’. For details on this measurement of the distance between character chains, see Cohen et al. (2003).
  • [11]
    On the basis of sampling rates by department, in other words, the ratio between the size of the sample and the size of the population according to INSEE, confidence intervals to frame this value may be calculated following the method set out by Brian and Jaisson (2007). This results in limits of 116 and 120, for a confidence interval of 95%. The values obtained are included in Appendix Table A.1.
  • [12]
    For example, a woman takes the nationality of her husband on marriage.
  • [13]
    For example, retracing the tree of a given person could lead back to the couple of parents Victor Hugo and Adèle Hugo. When merging trees, if it is not known that Adèle Hugo’s parents were called Foucher, it will be difficult to connect Adèle to her mother.
  • [14]
    Fifteen is the minimum marriage age for women (18 for men) as set out in the French decree of 17 March 1803 (Henry and Houdaille, 1979). However, we found 898 observations concerning married women aged under 15.
  • [15]
    The mortality tables of Vallin and Meslé (2001) are available online (in French) at: https://www.ined.fr/Xtradocs/cdrom_vallin_mesle/texte.pdf. For the generations from 1702 to 1805, the incomplete tables are used (Tables III-D-1 and III-D-2).
  • [16]
    The estimates are based on all the available ages. In the mortality tables of Vallin and Meslé (2001), the first age available in 1806 for the 1800 cohort being 6, the calculation of residual life expectancy starts de facto at age 6. For the 1801 cohort, the first age available in 1806 being 5, the calculation of residual life expectancy starts at age 5. For estimates of residual life expectancy based on Geneanet data, the first age available is that of birth.
  • [17]
    Confidence intervals are obtained by bootstrapping. Further details on this method are provided online at: https://3wen.github.io/genealogy/
  • [18]
    Graphs illustrating the spatial heterogeneity of the estimates are available online at: https://3wen.github.io/genealogy/
  • [19]
    Where it was possible to identify them, the places are geolocated by Geneanet, in which case pairs of coordinates (longitude and latitude) are provided for each entry.
  • [20]
    Genealogical data could also help to pinpoint movements between individuals’ place of birth and the places in which they marry, have children, and die. Unfortunately, for now, the lack of information is such that these life trajectories may not be traced.
  • [21]
English

A growing number of websites offer users the possibility of building family trees. This article analyses the data collection and entry work of these users and how their results may be used in historical demography to further knowledge on past generations. To that end, the results obtained on the Geneanet website are compared with those established in the literature, concerning the entries of 2,457,450 French or French-origin individuals who lived in the 19th century. The comparison shows a considerable bias in the sex ratio, with women underrepresented. Fertility is also substantially underestimated. Regarding mortality, the data (compared with historical values) underestimate the mortality of men up to the age of 40 and that of women up to the age of 25, after which age it overestimates both. Lastly, the wealth of spatial characteristics contained in the family trees is also used to produce new data on internal migration in the 19th century.

  • genealogy
  • collaborative data
  • fertility
  • mortality
  • migration
  • historical demography
Français

La démographie historique peut-elle tirer profit des données collaboratives des sites de généalogie ?

Les sites qui proposent à leurs utilisateurs de reconstituer en ligne leur arbre généalogique fleurissent sur Internet. Cet article analyse le travail de collecte et de saisie effectué par ces utilisateurs et comment il pourrait être utilisé en démographie historique, afin de compléter la connaissance des générations du passé. Pour cela, les résultats obtenus à partir de la base Geneanet sont confrontés à ceux connus de la littérature, et concernent les enregistrements de 2 457 450 individus français ou d’origine française ayant vécu au xixe siècle. Est ainsi mis en évidence un biais important du rapport de masculinité (sous-représentation des femmes). La fécondité est elle aussi fortement sous-estimée. Quant à la mortalité (par comparaison aux valeurs historiques), ces données sous-estiment la mortalité des hommes jusqu’à 40 ans environ et celle des femmes jusqu’à 25 ans, puis elles la surestiment. Enfin, la richesse des caractéristiques spatiales contenues dans les arbres généalogiques est également exploitée pour produire de nouvelles données sur les migrations internes au xixe siècle.

Español

¿Puede la demografía histórica beneficiarse de los datos colaborativos que ofrecen los sitios de genealogía?

Los sitios que proponen a sus usuarios reconstruir en línea su árbol genealógico florecen en Internet. Este artículo analiza el trabajo de recogida de datos realizado por estos usuarios y la manera en que podría utilizarse en demografía histórica, para completar el conocimiento de las generaciones del pasado. Los resultados obtenidos a partir de la base Geneanet se han confrontado a los conocidos en la literatura demográfica. Se trata de de los registros de 2 457 450 individuos franceses o de origen francés que han vivido durante el XIX° siglo. Esta confrontación pone en evidencia un sesgo importante de la relación de masculinidad (déficit de mujeres) y una subestimación de la fecundidad. En cuanto a la mortalidad, los datos genealógicos la subestiman en los hombres hasta los 40 años aproximadamente y en las mujeres hasta los 25 años, para sobrestimarla después. Por último, se aprovecha la riqueza de las características espaciales contenidas en los árboles genealógicos para producir nuevos datos sobre las migraciones internas en el siglo XIX°.

References

  • Balsamo G., 1999, Pruning the genealogical tree: Procreation and lineage in literature, law, and religion, Lewisburg, Bucknell University Press.
  • OnlineBean L. L., May D. L., Skolnick M., 1978, The Mormon historical demography project, Historical Methods, 11(1), 45–53.
  • OnlineBlayo Y., Henry L., 1967, Données démographiques sur la Bretagne et l’Anjou de 1740 à 1829, Annales de démographie historique 1967, 91–171.
  • OnlineBonneuil N., Bringé A., Rosental P.-A., 2008, Familial components of first migrations after marriage in nineteenth-century France, Social History, 33(1), 36–59.
  • OnlineBourdieu J., Kesztenbaum L., Postel-Vinay G., 2014, The TRA project: A historical matrix, Population, 69(2), 191–220.
  • OnlineBourdieu J., Postel-Vinay G., Rosental P.-A., Suwa-Eisenmann A., 2000, Migrations et transmissions inter-générationnelles dans la France du xixe et du début du xxe siècle, Annales. Histoire, sciences sociales, 55(4), 749–789.
  • OnlineBourdieu J., Postel-Vinay G., Rosental P.-A., Suwa-Eisenmann A., 2004, La dispersion spatiale des familles: un problème de taille. Les solidarités familiales de 1800 à 1940, Revue des politiques sociales et familiales, 77(1), 63–72.
  • Brian E., Jaisson M., 2007, Appendix D: Sex ratios at birth and the calculus of probabilities, in Brian E., Jaisson M. (eds.), The descent of human sex ratio at birth, Dordrecht, Springer, 221–229.
  • OnlineBrunet G., Bideau A., 2001, Démographie historique et généalogie, Annales de démographie historique, 2000(2), 101–110.
  • OnlineBrunet G., Vézina H., 2015, Les approches intergénérationnelles en démographie historique, Annales de démographie historique, 129(1), 77–112.
  • Chesnais J.-C., 1986, La Transition démographique. Étapes, formes, implications économiques, Cahier No. 113, Paris, PUF–INED.
  • Cohen W., Ravikumar P., Fienberg S., 2003, A comparison of string metrics for matching names and records, Kdd workshop on data cleaning and object consolidation, 3, 73–78.
  • OnlineDupâquier J., 1981, Une grande enquête sur la mobilité géographique et sociale aux xixe et xxe siècles, Population, 36(6), 1164–1167.
  • Dupâquier J., 1992, Pour une nouvelle histoire sociale, in Dupâquier J., Kessler D. (eds.), La société française au xixe siècle: tradition, transition, transformations, Paris, Fayard, 7–21.
  • OnlineDupâquier J., 1993, Généalogie et démographie historique, Annales de démographie historique 1993, 391–395.
  • Dupâquier J., Blanchet D. (eds.), 1992, La société française au xixe siècle: tradition, transition, transformations, Paris, Fayard.
  • Dupâquier J., Kessler D., 1992, L’enquête des 3 000 familles, in Dupâquier J., Kessler D. (eds.), La société française au xixe siècle: tradition, transition, transformations, Paris, Fayard, 23–61.
  • OnlineEichner C. J., 2014, In the name of the mother: Feminist opposition to the patronym in nineteenth-century France, Signs: Journal of Women in Culture and Society, 39(3), 659–683.
  • OnlineFire M., Elovici Y., 2015, Data mining of online genealogy datasets for revealing lifespan patterns in human population, ACM Transactions on Intelligent Systems and Technology, 6(2), 1–22.
  • OnlineFleury M., Henry L., 1958, Pour connaître la population de la France depuis Louis XIV. Plan de travaux par sondage, Population, 13(4), 663–686.
  • OnlineGavrilov L. A., Gavrilova N. S., 2001, Biodemographic study of familial determinants of human longevity, Population: An English Selection, 13(1), 197–222.
  • OnlineGavrilov L. A., Gavrilova N. S., Olshansky S. J., Carnes B. A., 2002, Genealogical data and the biodemography of human longevity, Social Biology, 49(3–4), 160–173.
  • OnlineGavrilova N. S., Gavrilov L. A., 2007, Search for predictors of exceptional human longevity, North American Actuarial Journal, 11(1), 49–67.
  • OnlineGirres J.-F., Touya G., 2010, Quality assessment of the French OpenStreetMap Dataset, Transactions in GIS, 14(4), 435–459.
  • OnlineGriffith M., Spies N. C., Krysiak K., McMichael J. F., Coffman A. C., Danos A. M. et al., 2017, CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer, Nature Genetics, 49(2), 170–174.
  • OnlineHaklay M., 2010, How good is volunteered geographical information? A comparative study of OpenStreetMap and ordnance survey datasets, Environment and Planning B: Planning and Design, 37(4), 682–703.
  • OnlineHenry L., Blayo Y., 1975, La population de la France de 1740 à 1860, Population, 30(1), 71–122.
  • OnlineHenry L., Houdaille J., 1979, Célibat et âge au mariage aux xviiie et xixe siècles en France. II. Âge au premier mariage, Population, 34(2), 403–442.
  • OnlineHervis C., 2012, Généalogie: les nouvelles demandes du collectionneur, de l’enquêteur et de l’historien, La Gazette des archives, 227(3), 27–32.
  • OnlineHollingsworth T. H., 1976, Genealogy and historical demography, Annales de démographie historique 1976, 167–170.
  • OnlineHoudaille J., Tugault Y., 1987, Une bourgeoisie peu malthusienne dans un pays neuf : généalogies américaines du xixe siècle, Population, 42(2), 305–320.
  • OnlineKaplanis J., Gordon A., Shor T., Weissbrod O., Geiger D., Wahl M. et al., 2018, Quantitative analysis of population-scale family trees with millions of relatives, Science, 360(6385), 171–175.
  • OnlineKesztenbaum L., 2008, Cooperation and coordination among siblings: Brothers’ migration in France, 1870–1940, The History of the Family, 13(1), 85–104.
  • OnlineLease M., Yilmaz E., 2013, Crowdsourcing for information retrieval: Introduction to the special issue, Information Retrieval, 16(2), 91–100.
  • OnlineLindahl-Jacobsen R., Hanson H. A, Oksuzyan A., Mineau G. P, Christensen K., Smith K. R., 2013, The male–female health-survival paradox and sex differences in cohort life expectancy in Utah, Denmark, and Sweden 1850–1910, Annals of Epidemiology, 23(4), 161–166.
  • OnlineMatthijs K., Moreels S., 2010, The Antwerp COR*-database: A unique Flemish source for historical-demographic research, The History of the Family, 15(1), 109–115.
  • OnlineRosental P.-A., 2004, La migration des femmes (et des hommes) en France au xixe siècle, Annales de démographie historique, 107(1), 107–135.
  • Statistique Générale de la France, 2010, Données sur la démographie, la population et l’enseignement primaire sur la période 1800-1925 [Data file].
  • OnlineToulemon L., 1995, Very few couples remain voluntarily childless, Population: An English Selection, 8, 1–28.
  • Vallin J., Meslé F., 2001, Tables de mortalité françaises pour les xixe et xxe siècles et projections pour le xxie siècle, Paris, INED.
  • Onlinevan de Walle E., 1973, La mortalité des départements français ruraux au xixe siècle, Annales de démographie historique 1973, 581–589.
  • Onlinevan de Walle E., 1986, La fécondité française au xixe siècle, Communications, 44(1), 35–45.
  • OnlineWarby S. C., Wendt S. L., Welinder P., Munk E. G. S., Carrillo O., Sorensen H. B. D. et al., 2014, Sleep-spindle detection: Crowdsourcing and evaluating performance of experts, non-experts and automated methods, Nature Methods, 11(4), 385–392.
Arthur Charpentier
Université du Québec à Montréal (UQAM).
201, avenue du Président-Kennedy, Montréal (Québec), 2X 3Y7, Canada.
Ewen Gallic
Aix-Marseille Université, CNRS, EHESS, Centrale Marseille, AMSE.
Translated by
James Tovey
This is the latest publication of the author on cairn.
This is the latest publication of the author on cairn.
Uploaded on Cairn-int.info on 04/12/2020
Cite
Distribution électronique Cairn.info pour I.N.E.D © I.N.E.D. Tous droits réservés pour tous pays. Il est interdit, sauf accord préalable et écrit de l’éditeur, de reproduire (notamment par photocopie) partiellement ou totalement le présent article, de le stocker dans une banque de données ou de le communiquer au public sous quelque forme et de quelque manière que ce soit.
keyboard_arrow_up
Chargement
Loading... Please wait