In France, every town has a memorial to its inhabitants who perished in the First World War. Numbering in the hundreds of thousands, the names of these individuals were engraved on such monuments to ensure their permanence in the nation’s history. However, some names have disappeared with the deaths of their bearers. This article estimates the extent of the disappearance of surnames attributable to the war based on a comparison of two databases: the ‘Morts pour la France’ database and the INSEE surname file. How many surnames have disappeared? Which regions were most affected?
1The death toll of the First World War was particularly high. The losses, though mainly military, were considerable. The number of dead among individuals enlisted in the French army alone was around 1.5 million (Héran, 2014).  Their deaths led to the disappearance of their surnames. The scale of this extinction and hence of the decrease in the number of French family names attributable to the conflict has failed to draw the attention of researchers. The issue may admittedly be considered incidental relative to the tragedy constituted by the slaughter of the war.
2The disappearance of family names became a preoccupation among demographers and mathematicians as early as the mid-19th century. Research on the subject was motivated by fears of the gradual disappearance of upper-class families to the benefit of families from the demographically more fertile lower classes. The issue was brought to the fore by Sir Francis Galton in his eugenic thinking on racial degeneration. Working with the mathematician Henry William Watson (Watson and Galton, 1875), he laid the groundwork for a solution to a problem identified as early as 1845 by Irénée-Jules Bienaymé:
Considerable attention has been paid to the possible increase in the number of men, and various highly curious observations have recently been published on the fatality of the nobility, the bourgeoisie, and the families of illustrious men. It is said that this fatality will ineluctably lead to the disappearance of what is referred to as closed families.
4He responded to the challenge issued in 1873 by Alphonse de Candolle in these terms:
Naturally, all names must die out …. A mathematician could calculate how the reduction of names and titles will occur, consistent with the probability of all-female, all-male, or mixed births and the probability of the lack of births for a given couple.
6Looking beyond mathematical formulae, common sense suffices to understand that families having just one child have fewer chances of passing on their name than those who have more children.
7This article estimates the number of surnames that disappeared because of the First World War, taking account of possible disparities between French departments (départements).  While the focus of most of the work addressing the demographic consequences of the war is on the ‘disruption in individual behavior and practices’ (Faron, 2002), our intention here is not to explore the disruption that may have affected anthroponymic practices, even if considerable benefit may clearly be drawn from socio-anthropological approaches in the analysis of family behaviour and social and cultural practices.  Instead, our focus is on the war’s impact on France’s patronymic stock.
8Following a presentation of our data, drawn from the INSEE surname file and the file of ‘Morts pour la France’ (MPFs, or soldiers having died for their country), we will assess the Galton–Watson (G–W) model and the issues involved in applying it to the study of the disappearance of surnames. The model requires several parameters to be taken into account, including the distribution of the number of boys per household, the number of men likely to pass on their surname, and the occurrence of the surnames themselves (as a rare surname has a greater likelihood of dying out that a common one). Based on this model, we propose for each French department (French territory of 1871 corresponding to the current territory excluding Moselle, Bas-Rhin, and Haut-Rhin) and for France as a whole an estimate of the number of surnames whose extinction may be considered ‘natural’ (resulting from the simple chance of reproduction, be it socially or geographically differentiated). We then compare this estimate with the number of surnames included in the MPF file but missing from the INSEE surname file. Our discussion will focus on the geography of surname losses, a comparison of these losses with human losses, the sensitivity of the G–W model to various parameters, and the difficulties in assessing disappearances that can truly be attributed to the war.
I – Data used
9This study is based on the statistical extrapolation of two computer files: the surname file provided by INSEE and the MPF file provided by the Department of Memory, Heritage, and Archives (DMPA) of the French Ministry of Defence.
1 – The INSEE surname file
10The INSEE surname file (INSEE, 1985) is a classic reference for addressing questions relating to family names. For each municipality in metropolitan France, it provides the number of births registered by surname for different periods, and notably the periods P1 [1891–1915] and P2 [1916–1940] used for this work.
11The INSEE file is the most complete on this subject, but it is not exhaustive. It concerns only those people still alive in 1972, or so says INSEE, which produced the file, in its presentation.  Whatever the case, the file can be used to provide estimates of the number of different names listed by department and the number of registered births. 
2 – The ‘Morts pour la France’ file
12Established by the Act of 2 July 1915 (amended in 1922), ‘Mort pour la France’ status is an official mention added to the civil registry (in the margin of records) and subject to Articles L. 488 and L. 492bis of the French code of military invalidity and war victims’ pensions. It initially concerned only soldiers killed in combat or having succumbed to war wounds. The file of recipients (MPF file) used here includes both the last name and first name of the deceased and their date and department of birth.  While available in the individual files of recipient soldiers, data on the municipality of birth and the date and cause of death are not systematically computerized; they are thus not used here.
13The original file provided by the DMPA comprised 1,343,377 registrations,  but this work takes account only of individuals with MPF status born in metropolitan France (the borders of 1871). The file has also deleted duplications.  Following a ‘clean-up’, the ‘useful’ file comprises 1,211,523 MPFs corresponding to a patronymic stock of 179,037 surnames.
14Though nearly 20% of the soldiers mobilized during the war lost their lives, the chronological distribution of MPF births (Figure 1) shows that not all generations paid the same price.
Distribution of MPFs by year of birth
Distribution of MPFs by year of birthCoverage: MPFs born between 1860 and 1910 in metropolitan France (1871 borders).
15The most impacted ‘classes’ are those of the generations born between 1870 and 1900, particularly MPFs born between 1892 and 1895.  For these four classes, MPFs account for nearly 30% of mobilized soldiers. 
3 – Problems inherent in the constitution of the files
16Errors in the transcription of names are inevitable when compiling a computer file. Some surnames included in the files have been altered relative to their original written form (random variations). Some names, often in low numbers, must surely have been created artificially, while others have disappeared from the corpus through excessive standardization. Without going through all the original documents (infeasible), it is impossible to assess the exact impact of these inaccurate transcriptions. However, and while this is a major issue when assessing the patronymic stock of a department, no attempt was made to standardize the names other than the standardization imposed by the individuals having produced the two files, who chose to write surnames in capital letters with no accents.  The identity of two transcriptions is never a guarantee of a real patronymic identity, but unless transcription errors or the effects of standardization relating to the lack of accents are unevenly distributed geographically, the assessment of the patronymic stocks of departments, though inaccurate as an absolute value, may be considered correct as a relative value.
17The chronological coverage of both the INSEE and MPF files, whose origin and production are different, is incomplete, as shown in Figure 2.
Chronology of the corpuses used
Chronology of the corpuses usedNotes: Of MPFs born in metropolitan France (1871 borders), 265 were born after 1900, and for 127 of them no year of birth is documented. They have not been considered in this study but account for just a minute fraction of the 1,211,523 MPFs in our ‘useful’ file. The 984 MPFs born before 1861 were included in Cohort A (accounting for just over 10% of the cohort).
Coverage: INSEE surname file (periods P1 [1891–1915] and P2 [1916–1940]) and MPFs born in metropolitan France (1871 borders) before 1901.
18Besides the chronology of the two files, Figure 2 includes the three censuses used to determine the distribution of the number of children per household (necessary for estimating at the departmental level the number of disappeared surnames according to the G–W extinction model).
4 – Patronymic potential
19To assess the representativeness of the patronymic data in the MPF file relative to the reference file constituted by the INSEE surname base, patronymic potential, measured as the number of different surnames per 100 people (Darlu et al., 1997), was calculated by department for the two files (Figure 3).
Patronymic potential by department (number of different surnames per 100)
Patronymic potential by department (number of different surnames per 100)Note: The patronymic potential is S/N where S is the number of (distinct) surnames and N the number of births recorded.
Coverage: INSEE file (individuals born in P1 [1891–1915]) and MPFs born in metropolitan France (1871 borders) before 1901 and whose department of birth is known.
20Whether the calculation is based on the INSEE file for the P1 period [1891– 1915] (Figure 3A) or on the MPF file (Figure 3B), the same departmental distribution is observed in the two estimations (linear correction r = 0.94).  The representativeness of the MPF file may thus be considered satisfactory, at least from the standpoint of this spatial distribution. In both cases, the number of different surnames is lower in north-west France and higher in north-east, south-west, and south-east France. We know that this spatial distribution indirectly reflects migration flows (for example, Darlu and Ruffié, 1992), as the regions with high patronymic potential are those having experienced substantial immigration flows in the late 19th century, bringing with them new surnames.
II – Methods
21Where surnames are transmitted from father to child, which is the case in France for the periods we address, the survival of a surname from one generation to the next within a family is possible only if the father has at least one son likely himself to pass on his name to the next generation. But because the chance of survival of the surname of a father having two or more sons is necessarily higher, modeling the disappearance of a surname necessarily requires that the number of sons per father be taken into account. This is the general framework of the approach adopted by Watson and Galton to describe the disappearance of surnames in an iterative manner from one generation to the next. This model, generally used for population dynamics, is related to the theory of branching processes in discrete time. It is characterized by the fact that each individual in the population is born, reproduces, and dies, their descendants (in random number) being in turn and independently subject to the same events.
22The disappearance of a surname also depends on its frequency in the reference population. A rare surname is more likely to die out than a common one. The frequency of surnames must thus be considered when modeling the disappearance; we have done so by applying the model to the MPF data.
1 – The probability of surname disappearance with the G–W model
23Let p0, p1, p2, …, ps, … pq be the vector p of the probabilities of a person (generally a father) transmitting their name to 0, 1, 2, …, s, …, q sons, with
25q being the maximum number of sons through whom the name may be transmitted. To simplify, we refer to this as father-to-son transmission, which was the most common if not the sole situation in France in the late 19th century and early 20th century, such that the probability ps corresponds to that of having s sons.
26Let x1 be the probability of the disappearance of a surname of the first generation, that of the fathers, to the second, that of the sons.
27Consider the case of the surname of a father. Where the latter has no sons, with a probability equal to p0, the probability of his surname disappearing is equal to x01 = 1. In this case, the probability of the disappearance of the surname is equal to p0x01 = p0.
28If this father has a single son to whom his name may be transmitted, the name will disappear with a probability equal to p1x11, the product of the probability of having a single son, p1, and that of his name not being transmitted, x11. If the father has two sons, the probability of disappearance will be equal to
p2x21, the probability of having two sons, p2, and that each one of them, independently, does not transmit the surname, x1 × x1 = x21. If the father has s sons, the probability of the name disappearing will be psxs1, the probability of having s sons, ps, and that none of them transmit their surname, xs1. We can thus calculate the probability x2 of the disappearance of the name in the second generation (from the second generation, that of the sons, to the next):
30Whence the following recurrence formula, which gives the probability of disappearance at generation n:
32The probability xn of the extinction of a name thus increases from one generation to the next. But the complete disappearance occurs only in particular population conditions. It may be shown that, if we remain with the situation specified by Watson and Galton (Watson and Galton, 1875), i.e. the case of families with a maximum of two sons, the extinction is complete only on the condition that . Otherwise, the probability of extinction converges towards this relation (Bacaër, 2011). The situation is much more complex for a higher number of children per family, as this convergence depends on the distribution of the ps.
33The calculation of an extinction probability thus requires the exact number of ‘family generations’ (father to son) during which the G–W process plays out. It also requires an estimation of the vector p.
34Because we aim to calculate the probability of extinction by department, the values of xn by department (xkn) must be calculated based on an estimation of the vector p by department: pk.
From the number of children per family to the number of sons per father
35The censuses of the late 19th century provide population data giving the number of children per household by department and for France as a whole.  These data can be used to estimate the vector pk required for calculating extinction probabilities.  Unfortunately, the statistics available for the vector pk do not distinguish the number of girls and boys in the number of children. Consequently, to determine the number of boys per household – the only ones likely to transmit their surname – the values of the vector pk drawn from the population data were corrected (Brouard, 1989) to obtain the values of the vector p as in Equation 1C. The correction assumed that the distribution of the number s of boys by sibship size n obeys a binomial law of parameters n and g, where g is the usual proportion of boys, or g = 105/205, assumed to be the same regardless of the sibship size. The probability ps of the number s of boys in a family of n children is thus:
37The censuses selected for this study, those of 1886, 1891, and 1896,  specify for the 87 departments of metropolitan France corresponding to its borders of 1871  the number of households with no children, one child, and up to seven children or more (the rare households with over seven children are aggregated here with those with seven children). Owing to uncertainties or errors in the censuses, already reported by Bonneuil (Bonneuil, 1989, 1997), we made our calculations based on the average by department of the results of 1886, 1891, and 1896 (Figure 4). 
Distribution of departments according to the proportion of households with 0 children, 1 child, or up to 7 or more children
Distribution of departments according to the proportion of households with 0 children, 1 child, or up to 7 or more childrenNote: Each plus sign (+) corresponds to one of the 87 departments (1871 borders). The proportion (per 1,000) is calculated based on the average of the three censuses (1886, 1891, 1896). The average points are denoted by circles.
Coverage: Households listed in metropolitan France (1871 borders).
38With these average demographic data, and by applying the G–W extinction model (Equations 1A–1C), we can estimate the probabilities of the extinction of a surname for each department k and calculate them for one or several generations. The same work may be carried out for France as a whole, by using the data on the number of children per household provided for this level of observation.
2 – Applying the G–W model to ‘Morts pour la France’ data
39The estimation of the probability of extinction as described above considers that, at the start of the iteration procedure, the probability of a father having between 0 and q sons is that of the vector p.  But this is not the case with MPFs. The MPFs born in 1850 had certainly ended their reproductive lives by 1914 and were thus able to have several sons, in which case the use of the vector p is justified. However, those born in 1898 were only 20 at the end of the war and most certainly did not have the time to have more than two sons. The probability of MPFs having between 0 and s sons thus depends both on their year of birth and their year of death (between 1914 and 1918). This requires the application of the formula (Equation 1A) conditioned by the expectancy of the maximum number of sons, v, that they could have had in their life. To do so, x2 (the probability of the extinction of the name in the second generation) simply needs to be replaced by:
41The descendants of these fathers then had the possibility of having between 0 and q sons, such that the formula (Equation 1) serving to calculate the probabilities for the ensuing family generations can be implemented without this truncation.
42The question that remains is over how many ‘generations’ the G–W process needs to be pursued, i.e. how many iterations need to be considered. As the file used to identify the surnames of MPFs having disappeared is the INSEE surname file, which supposedly should only concern people still alive in 1972, it appears reasonable to pursue these iterations until that year. 
43The estimations of the probability of extinction were thus calculated for each birth cohort h in Table 1, which proposes a correspondence between the year of birth (discretized in four classes) and the (possible) number of ‘family generations’ (three to five) until 1972. 
Correspondence (for MPFs) between year of birth, their division into cohorts, and the number of family generations separating them from 1972
Correspondence (for MPFs) between year of birth, their division into cohorts, and the number of family generations separating them from 1972Note: Cohort A includes the 984 MPFs born before 1961.
Coverage: MPFs born in metropolitan France (1871 borders) before 1901.
3 – Estimation of the number of disappeared names according to the G–W model
44The probability of extinction described above applies to each MPF. It takes account of the MPF’s birth cohort and the number of iterations. However, the disappearance of a surname according to the G–W model also depends on the frequency of the surname, the probability being lower if the name is very common. As such, we need to determine the vector rk of the proportion of names represented once, twice, …, u times in the department k:
46This vector can be estimated using several approaches. The first consists in calculating the vector rk based on all the names in the INSEE file for the first period P1 [1891–1915]. It may be assumed that these proportions represent, with no major biases, those of the names in the MPF file. The second consists in considering only the name in the MPF file and retaining, in the INSEE file, the number of occurrences of those names to deduce the vector rk. This method, which corresponds more closely to the names in the MPF file, was selected for the rest of the calculations. It shows a considerable and significant difference with the previous method only for the share of hapaxes (names occurring only once in a corpus), of which the average proportion per department is 42% in the INSEE file but 57% in the MPF file (Figure 5).
Average and departmental proportion of the number of names present r times in the MPF file and the INSEE surname file
Average and departmental proportion of the number of names present r times in the MPF file and the INSEE surname fileNotes: Figure 5A: Average proportion p (calculated for the 87 departments) of the number of names attested to j times (based on the MPF file: symbols o; based on the INSEE surname file (period P1 [1891– 1915]): symbol •). Figure 5B: Proportion of the number of names attested to j = 1, 2, or 3 times in the MPF base for each of the 87 departments (symbol +) and the corresponding average value (symbol ○).
Coverage: Individuals born in P1 [1891–1915] and MPFs born in metropolitan France (1871 borders) before 1901 and whose department of birth is known.
47The number of names likely to disappear according to the G–W process is then calculated by department k as follows:
48Let Sk be the number of surnames of MPFs in department k. Only a part of this number is likely to have disappeared under the G–W process. Let Dk,h be this number per department k and for the cohort h:
50with yuk,h the probability of extinction per department k and cohort h for the names in the MPF file calculated according to the procedure detailed above. The first term rk,1 × y1k,h is thus the probability of disappearance where the surname is borne by only one person in the department k, the second term rk,2 × y2k,h where it is borne by two people, and so on.
51For a high-quality comparison between the INSEE and MPF bases, only Cohort D should be selected, i.e. MPFs born after 1891. As they died for France, and thus were not alive in 1972, their birth should not in theory be reported in the INSEE file.
52The sum extended to the different cohorts provides an estimate of the total number of names that could have disappeared according to the G–W process:
III – Results
54An IT procedure was implemented to search for surnames in the MPF file that are not included in the INSEE file. The 12,489 surnames extracted from this comparison have thus disappeared. They account for a little over 7% of the 179,037 surnames belonging to MPFs born in metropolitan France (1871 borders). But these disappearances have multiple causes, since if no MPF should be listed as an individual in the INSEE file,  their surname may well figure in the file. This is true where the surname belongs to possible descendants  and/or several collateral kin alive in 1972, or if it is borne by simple homonyms. If a name attested to in the MPF file is not included in the INSEE file, then it has disappeared:
- either by the G–W process, with a number of iterations separating the age at the possible paternity of the MPF and the date of 1972 (or between n = 3 and 5 according to the MPF’s birth cohort; Table 1). The model described earlier can be used to estimate the number of this type of extinction;
- or because of the war, the disappearance of the MPF’s name resulting from his death;
- or by the disappearance before 1972 of all possible people with the same surname as the MPF, whether related to him or not. 
55Only Cohort D in the MPF file (births between 1891 and 1900) covers the chronology of the INSEE file (births between 1891 and 1915). The estimates of the names having disappeared for this cohort, which is strongly represented (41% of MPFs), are in theory less biased than those of the cohorts corresponding to births before 1891. We nevertheless preferred to make the calculations while taking account of an estimation established on an average of the extinction probabilities calculated for each cohort (weighted by the corresponding number of surnames) and to apply this average probability to the total number of surnames. This solution takes account of the number of different surnames observed in all the cohorts of a department, whereas a calculation based on each cohort would include surnames present in several cohorts.
1 – Geography of human losses
56The assessment of the human losses attributable to the First World War has been addressed by in-depth research. The oldest such studies attest to the early attention paid to the issue (Huber, 1931), while others have contributed to renewed general interest sparked by the centenary celebrations (Sangoï, 1997; Winter, 2004; Prost, 2008, 2014; Héran, 2014; Guillot and Parent, 2018). Most of this work has focused on the (direct and indirect) demographic consequences of the war.  Regarding France, however, research addressing the issue through an analysis of mortality rates by department remains relatively scarce. And research attempting to estimate the scale of departmental losses (Festy, 1984; Gilles, 2010; Gilles et al., 2014; Loez and Mariot, 2014) is hindered by a major problem: the MPF file does indeed mention the birth department of the recipients, but the calculation basis for estimating the relative weight of these deaths is a delicate choice. Should the number of MPFs born in a department be compared with the population of that department at the time of the 1911 census? For the male population only?  Or with the population of the enlisted men from the department? The choice is neither easy nor neutral. This is a sensitive issue as it is at the root of heated debate on the supposed sacrifices of certain regions that made a particularly strong contribution to the war effort. For example, the Bretons and Corsicans are said to have paid a heavier tribute than others.  Each calculation method has its merits, but none is ideal. We have selected two methods for this work. The first consists in simply considering the number of MPFs by department (Figure 6A). Thus assessed, the losses appear to be evenly distributed across France, though with slightly more deaths in Brittany, the Atlantic coast, in the north, and along a Gironde–Rhône axis.
57The second option ‘corrects’ the gross weight of these deaths by comparing it with all the male births corresponding to the most war-impacted generations (Figure 1). For this calculation, we chose as our departmental reference value the number of male births recorded between 1867 and 1899, corresponding to the mobilization of the soldiers of the ‘classes’ 1887 (called up between March and August 1916) to 1919 (April 1918). 
Number and proportion of MPFs by department (France, 1871 borders)
Number and proportion of MPFs by department (France, 1871 borders)Note: Figure 6A: number of MPFs (department of birth); Figure 6B: proportion of MPFs (born in the department) relative to the total number of births registered between 1867 and 1899.
Coverage: MPFs born in metropolitan France (1871 borders) before 1901 and whose department of birth is known, and male births (same territory) between 1867 and 1899.
58This second option reveals stronger territorial contrasts. According to the department, losses account for between 9.7% in Vendée and 4.6% in Bouchesdu-Rhône of the male births of the generations concerned (Figure 6B). 
59Relative to the number of births per department,  the (relative) losses were the highest along a line extending from Loire-Atlantique to Meurthe-en-Moselle following the lower part of the Loire and a good half of its middle part. This line continues east and north-east of the Loiret, in the Yonne and Seine-et-Marne departments, and, more broadly, those corresponding to the most intense combat zones (notably Meuse and Vosges). When considering the ‘Clémentel regions’  of 1919, the regions of Nantes and Caen (roughly corresponding to today’s Pays de la Loire and Basse-Normandie) also show substantial human losses. This calculation method highlights a sharp contrast between the north and the south, the departments lying north of an axis between La Rochelle and Mulhouse having suffered much heavier losses, although the death toll is high in the Landes and Gers in the south-west. Mortality rates relative to births are considerably lower in the departments of south-east France, notably those on the Mediterranean.
2 – Geography of surname losses
Losses by department
60We have several indicators for each department that can be used to estimate expected surname losses and real surname losses (Appendix Table).
61As the number of names belonging to MPFs changes considerably from one department to the next, the number of disappeared surnames was compared with the total number of MPF surnames per department. This proportion is much higher for surnames whose extinction is expected under the G–W model (Figure 7A) than for the surnames of MPFs absent from the INSEE file (Figure 7B). The comparison was made both for the cohort of MPFs whose birth should have been registered in the INSEE file (Cohort D, births from 1891 onwards) and for all the cohorts weighted by their size (Figure 7A), with no substantial changes in the differences between departments (linear correction r = 0.99).
62Assessed solely on the basis of the MPF names effectively absent from the INSEE file (Figure 7B), the disappearances of surnames represent just 1% to 8% of the stock, depending on the department. A comparison of Figures 6B and 7B also shows that the geography of human losses differs considerably from that of the effective disappearances of surnames. While human losses are the highest along the middle and lower stretches of the Loire, surname losses (observed by comparing the two files) are very low in this central region of the country. Effective surname losses are substantial in Brittany and Corsica, with their low ‘patronymic potential’ and average human losses. In the Basque Country, with its high immigration levels in the late 19th century and, hence, high ‘patronymic potential’, human losses are at the national average, while surname losses are considerable compared with the rest of the country.
Expected and observed proportions of disappeared surnames by department (1871 borders)
Expected and observed proportions of disappeared surnames by department (1871 borders)Note: The proportions (expected and observed) of disappeared surnames are calculated relative to the total number of surnames of MPFs from the department.
Coverage: MPFs born in metropolitan France (1871 borders) before 1901 and whose department of birth is known.
63The application of the G–W model (Appendix Table) shows that expected surname disappearances are much higher for the most recent cohorts (births after 1881) than for previous cohorts (births before 1881). This is logical, as the older generations lived longer and thus had more opportunity to transmit their surnames. According to the G–W model and as a departmental average, Cohort D lost 618 surnames compared with just 124 for the Cohort A. These differences apply to the MPF surnames that have effectively disappeared from the INSEE file, though at a more modest level, with average surname disappearances for the two cohorts totaling 59 and 13, respectively.
64Equation 3 expresses surname disappearance probabilities while considering that the death of MPFs put an end to their possibility of transmitting their name (their death having occurred before the end of their reproductive lives). The effect of this correction is more emphatic for Cohort D (the youngest MPFs, i.e. those born after 1890 and whose descendance was probably cut short).  Depending on the department, the estimated number of expected surname extinctions is between 1.1 and 1.3 times what it would have been if the MPFs had not died and had thus had every opportunity to transmit their name. For example, the expected number of surname extinctions for the Eure department is estimated at 1,347 when the correction (Equation 3) is applied (1,257 without the correction). This number is 344 for Corsica (273 without the correction) and 10,178 for France as a whole (8,929 without the correction).
65Figure 7A also shows that the expected disappearance of surnames (according to the G–W model) varies substantially from one department to the next (Appendix Table). The highest values are in the northern regions of France, particularly around Normandy, as well as in the south-east. This variability cannot be attributed to the model itself but to contrasted population realities because the total number of different surnames, the distribution of the number of children per household (particularly the proportion of households without sons or with only one son), and the expectancy of the number of sons possible for a father according to his years of birth and death vary considerably between departments.
66These expected disappearances are directly related to the construction of the assessment, their number being strongly correlated to the frequency of households with no children (Figure 8).  This result is logical since the surnames of households without children (or with a low number of children) are the most likely disappear.
67But surprisingly, a comparison of the surname extinctions expected under G–W with the number of surnames in the MPF file effectively absent from the INSEE file shows that the number of effective extinctions (surnames missing from the INSEE file) is much lower for almost all departments than the G–W model predicts (Appendix Table). Several explanations can be posited:
681. As we mentioned earlier, more births are listed in the INSEE file for the 1881–1915 period than expected. It is thus not impossible to imagine that MPFs, and thus their surnames, figure among this surplus.
692. The estimation of the vector rk (Equation 4) of the frequency of names represented 1, 2, …, u times (Equation 5) in the population of MPFs is possibly biased by the overestimation rk,1 of the names occurring just once among MPFs (Figure 5).  The weight of the names borne by a low number of individuals – and notably that of hapaxes – is substantial in the estimation of the number of extinctions expected under the G–W model.
Relationship between the proportion of MPF surnames whose disappearance is expected according to the G–W model and the proportion of households without children (%), by department
Relationship between the proportion of MPF surnames whose disappearance is expected according to the G–W model and the proportion of households without children (%), by departmentCoverage: MPFs born in metropolitan France (1871 borders) before 1901 and whose department of birth is known, and births (same territory) between 1867 and 1899.
703. The number of expected disappearances under G–W and the number of surnames in the MPF file missing from the INSEE file are assessed by department. A name may disappear from a department, however, without disappearing from the national corpus. We can assume that some surnames expected to disappear according to the G–W model in one department remain present in others, whether or not the people with that name are related (for example, brothers, cousins, uncles, or nephews). Migration, which concerned a considerable part of the population at the beginning of the 20th century, may have led to the reintegration of some names in departments where their extinction was expected according to the G–W model.
71In any case, the departmental distribution of MPF surnames missing from the INSEE file (Figure 7B) is enlightening. While based on a low number of surnames, it reveals a closer relationship between the number of different surnames Sk and the number Nk of MPFs per department.  Surname extinctions thus appear to have been largely proportional to the patronymic stock and/or total number of MPFs. Pyrénées-Atlantiques and Hautes-Pyrénées (as well as, to a lesser extent, Ariège and Corsica) are the only departments with an exceptional level of disappearances, undoubtedly attributable to the anthroponymic particularities of regions with a high number of surnames, combining names of Basque origin with those of Occitan origin (Darlu and Oyharçabal, 2006).
72In addition, an analysis of the names having effectively disappeared (i.e. those included in the MPF file but not in that of INSEE) shows that over 93% of them were hapaxes (from the MPF corpus).  From an anthroponymic standpoint, the war merely served to precipitate the disappearance of extremely rare surnames, which MPFs were probably often the only people of reproductive age to bear.  None of the disappeared surnames is attested to more than 3 times in the MPF corpus,  confirming that surname losses were not a result of the disappearance of large sibships which were the only bearers of a rare anthroponymic heritage.  If the disappeared surnames are sorted into two groups, belonging or not belonging to MPFs born in the same department, almost all the extinctions (99%) concern geohapax names, i.e. names present exclusively in a department.  There is a strong correlation (r = 0.95) between the number of MPF surnames missing from the INSEE file and the number of geohapaxes per department.  The local deep-rootedness of surnames thus weakens their chances of survival.
73Lastly, Figure 9 shows the considerable contrast between the proportion of surnames that may be expected to disappear according to the G–W model and the proportion of surnames that have effectively disappeared from the INSEE file. While the average of this relation is between 10% and 15% (meaning that there are 10 to 15 times fewer surnames absent from the INSEE file than expected), the percentage is between 30% and 47% for a few departments in the south-west, Brittany, and Corsica.
Losses for France as a whole
74The G–W model can be applied to metropolitan France in its entirety. Although relinquishing the departmental approach tends to erase the differences whose importance has been stressed (in the estimation of the vector pk of the number of children per household or in that of the vector rk of the number of surnames present n times in department k), it serves to assess complete and definitive surname extinctions (expected according to G–W or observed through comparisons of the MPF and INSEE files) at the national level. This leaves aside the problem posed by the possible reintegration (following a migration) of a surname in a department where its disappearance may be expected according to G–W.
Number of departments according to the relation (as a %) between the number of MPF surnames absent from the INSEE file and the number of MPF surnames expected to disappear according to the G–W process
Number of departments according to the relation (as a %) between the number of MPF surnames absent from the INSEE file and the number of MPF surnames expected to disappear according to the G–W processInterpretation: Five departments show a relation between the number of MPF surnames missing from the INSEE file and the number of MPF surnames expected to disappear according to the G–W process of around 10%. For 11 departments, this percentage is around 16%; for a single department (Pyrénées-Atlantiques), it exceeds 45%.
Coverage: MPFs born in metropolitan France (1871 borders) before 1901.
75Applied to France as a whole, the implementation of the surname extinction model relies on estimations of the vectors p and r made on this scale. While the average proportion of hapaxes per department in the MPF file was 57%, the proportion of hapaxes in metropolitan France is just 16% of the MPF surname corpus (some surnames that are hapaxes in one department may be present in another and are thus not hapaxes for metropolitan France). The difference is considerable and greatly affects the estimation of extinctions expected according to the G–W model, serving to reduce it.
76For France as a whole, the number of MPF surnames expected to disappear under the G–W model is 10,178, whereas the number of surnames in the MPF file missing from the INSEE file is 12,837. The difference means we can estimate at around 2,600 the number of surnames whose disappearance may be attributed to the First World War. This accounts for 1.4% of all the surnames belonging to MPFs. At the national level, the impact of the war thus appears modest, but at the departmental level it is more considerable and, as we have seen, sharply contrasted. These findings underline the problems posed when applying the G–W extinction model to different territorial scales.
77The geographical distribution of human losses resulting from the First World War is contrasted. The strong contribution made by Brittany and northern France to the war effort changes when taking account of the birth rates in the decades preceding the conflict. The ‘human sacrifice’ turns out to be greater to the north of a La Rochelle–Strasbourg axis and more modest in the south-east.
78But the patronymic losses truly attributable to the First World War remain difficult to assess. If we limit ourselves to the losses highlighted through a simple comparison of the MPF and INSEE files, the surnames of MPFs that no longer occur in the INSEE file (1891–1940) represent between just 1.3% (Eureet-Loir department) and 8.4% (Pyrénées-Atlantiques department) of the total number of different surnames of MPFs. According to the surname extinction model proposed by Galton and Watson, the number of expected surname disappearances should have been much higher and the geographical distribution substantially different from that obtained by the comparison of the MPF and INSEE data.
79The differences observed between the two approaches result, on the one hand, from the imperfect chronological coverage of the two files, which makes them difficult to compare, and, on the other hand, from the implementation at the departmental level of the G–W surname extinction model. Though necessary from our geopatronymic perspective, this approach generates an overestimation of the number of hapaxes in the MPF file. The probability of the extinction of these hapaxes is strong, and their weight in the implementation of the G–W model is essential. Yet certain names that are hapaxes in one department are attested to elsewhere. Their extinction in a department may thus be expected according to the G–W model but is not actually the case (having possibly been reintroduced in the department following migration).
80But one thing is sure: France’s patronymic stock resisted strongly to the impact of a war that marked the country deeply and on a lasting basis. After the First World War, the department-to-department mobility of the population and, hence, the mobility of surnames increased considerably, and immigration served to introduce new surnames. Consequently, the country’s stock of surnames, by department and at the national level, has continued to grow, definitively removing any risk of the impoverishment of a heritage in perpetual motion.
AcknowledgementsThe authors would like to thank Laurent Veyssière, Head of the Cultural Heritage Delegation of the Department of Memory, Heritage and Archives (DMPA) of the French Defence Ministry, as well as his colleague Sandrine Aufray, who helped us to obtain the electronic database and the corresponding ‘licence to reuse public information’ (Contract of 27 May 2013 between the Defence Ministry and the UMR 7323 and 7206 mixed research units).
This work also benefitted from a subsidy from the French National Research Agency (ANR) (ANR-18-CE02-0011, MathKinD).
Out of some eight million French individuals mobilized between 1914 and 1918, of whom 1.8 million young people born between 1894 and 1899 (Boulanger, 2003).
Galton was probably unaware of this text.
The documentation used requires us to limit our focus to the disappearance of surnames relating to the deaths of soldiers or civilians having obtained the distinction of having ‘died for France’. The losses attributable to the deaths of soldiers or civilians not having gained this distinction elude us. They are undoubtedly limited relative to the former. For example, we know that civilians accounted for no more than 12% of deaths (Héran, 2014), including the beginning of Spanish influenza.
For a recent summary of the population impacts of the First World War, see Rohrbasser (2014). On the specific issue of mortality, Vallin remains the reference (1973, 1984).
This assertion does not hold up to an in-depth analysis of the data. The number of (live) births in metropolitan France for the P1 period is nearly 20,000,000 (Daguet, 1995), and the mortality tables by generation (Vallin and Meslé, 2001) can be used to estimate the number of survivors in 1971 for the generations born between 1891 and 1915 at around 8,700,000. Yet the number of births listed in the file for this period is 10,500,000. The INSEE file thus surely includes people who were no longer alive in 1972 (certainly including those with MPF status).
These figures are undoubtedly underestimations of the real values owing to the non-exhaustive nature of the INSEE file.
This is still a ‘living’ file, as families may still request the attribution of MPF status for ascendants whom they consider unjustly forgotten or whose death may be directly attributable to the war.
Status of the MPF file when made available on 19 April 2013.
‘Duplications’ here are considered as registrations corresponding to MPFs with the same surname, the same first names, and the same date and place of birth as another MPF.
The word ‘class’ is used here in its military sense. A (recruitment) class corresponds to all men having reached age 20 and registered in the census tables. For MPFs, recruitment class and mobilization class may be different. The information relative to this (these) class(es) was recorded in the individual files of the soldiers having received MPF status but has not been computerized.
The calculation was made by comparing the number of MPFs in the class in question with the number of mobilized individuals as indicated in Les archives de la Grande Guerre, Volume VII, No. 19, Tables B (p. 45) and C (p. 47).
This reduces the variety of written forms and, hence, mechanically, the diversity of patronymic stocks. The impact of the (other) transcription errors on this diversity is harder to estimate (reduction or increase).
Figure 3B is also based on generations common to the two files (1891–1910), with no significant change to its appearance. The correlation between the departmental values of S/N (patronymic potential) based on the selected coverage and those based on common generations is over 0.95.
The documents used (see note 16 below) indicate the ‘number of children per family’ while the population censuses report on households; it is thus advisable to refer to the ‘number of children per household’.
It would be ideal to have the number of children per father (transmitting his surname), but it can be reasonably supposed that this number is not sufficiently different from the number of children per household to cause a major bias in the estimation of the vector pk.
The data used are accessible in electronic file format on the Centre de Recherches Historiques website of EHESS (see Béaur and Marin, 2011).
The Territoire de Belfort, corresponding to the only part of Alsace and Haut-Rhin remaining in France following the defeat of 1871, was not created until 1922. It is nevertheless possible, based on MPFs’ municipality of birth, to reattribute to this department the people who were born there, which enables us to take it into account in the calculations. This a posteriori reattribution was made when the data were computerized, since some MPFs are listed as being born in this department.
The only data appearing to have no obvious errors are those of 1886, as reported in the original census lists and reused by Chervin (1888). Those of 1891 and 1896 are undoubtedly inaccurate for several departments, with aberrant values for the Landes department in 1891 and for the Ille-et-Vilaine, Meuse, and Yonne departments in 1896. These values were corrected and replaced by the interdepartmental average.
The vector p changes over the generations, but we did not take this change into account as the lack of sufficiently reliable data meant we were unable to include this information. The selected estimation is based on the average of the three censuses (1886, 1891, and 1896). In addition, the correlations between the three series of departmental values are very high, suggesting a fairly slow change in the vector p at the end of the 19th century.
Other models could certainly have been devised, taking account of the MPF’s age at death and their age (possible for some, real for others) at paternity; because it was impossible to cover the individual histories of each person, we opted for plausible ‘iterations’.
The cohort [1891–1900] was defined a priori to respect the chronological milestones of the period P1 in the INSEE file [1891–1915] (see Figure 2). Other discretizations were attempted, which modify the results only marginally.
See above and note 5.
Because the children born during the war to an MPF father were aged between 54 and 58 in 1972, they could still have been alive at that date, and their name may thus be included in the INSEE file.
It is difficult to make the distinction between surnames having disappeared owing to the First World War and those having disappeared owing, for example, to epidemics. While Spanish influenza (1918–1920) may have had an impact, its effect on the disappearance of surnames remains difficult to assess for France, particularly at the departmental level (on this epidemic, see Darmon, 2000; Lahaie, 2011; Vagneron, 2015). The same difficulty applies to typhoid fever in 1914–1915.
See note 4.
In the MPF file in our possession (as of 19 April 2013), 35 women are listed (at least). Obviously, these women were not soldiers because women were not to be admitted to the French army before the Act of 11 July 1938. These women, some of whom nurses, were among the civilian victims given MPF status, but their number is incidental relative to that of men.
For debates over the unequal regional contribution to the war effort, see in particular Gilles et al. (2014) and Loez and Mariot (2014).
For most of the departments corresponding to the borders of 1871, we have data covering the entire period thanks to the Annuaire statistique de la France (ASF) and/or the files provided on the websites of CRH and INSEE (note 16). We have made a few minor corrections (relative to the inconsistencies observed or incorrect values detected by making comparisons of the files or with the ASF) and estimated a few missing values through various adjustments.
Without departmental mortality tables for the generations of the late 19th century, it is not possible to compare the number of MPFs born in a given department with the estimated number of men still alive in the department when the war broke out. The calculated values (%) are thus considerably lower than the weight of the real losses in the population of ‘survivors’ in 1914. According to Héran (2014), at age 20, 72% of the male generation born in 1894 had survived infant and juvenile mortality, while 25 years earlier, the generation born in 1869 had already been reduced by 37% [at the same age].
The choice of this option is not neutral. We see it as more coherent regarding the question of surname losses and the materials available for answering that question (MPF file and INSEE surname file).
The First World War was at the root of the creation of the French regions. Regional economic groupings, referred to as the ‘Clémentel regions’, were introduced in 1919 largely on the basis of chamber of commerce groupings.
The comparison of the results between departments shows that the impact of the correction (application of Equation 3) is low, albeit significant (p < .001); the correlation between departmental measurements with and without correction remains very high (r = 0.99).
This frequency corresponds to the p0 of Equation 1. The linear correlation between the values per department of the proportion of expected surname disappearances (according to the G–W model) and the values of p0 is r = 0.93. The correlation is r = 0.56 if we consider the frequency of households having just one child.
The over-representation of hapaxes in the MPF file remains difficult to explain. It may result from a large number of transcription errors in the initial data entry of the surnames. For example, the proportion of double surnames is much higher in the MPF file than in the INSEE file. The electronic file we used also includes MPF surnames that are evidently not real surnames.
Linear correlations of r = 0.92 and r = 0.86, respectively. The Seine department has a strong impact on these correlations, as they fall to r = 0.76 and r = 0.69, when it is excluded.
These disappeared surnames include a high proportion of double surnames. The latter account for just 5% of the corpus of MPF names and 6% of the names in the INSEE file but account for nearly 15% of the extinctions.
In any case, other possible contemporary homonyms, related or not to these MPFs, certainly did not have a descendance (or a descendance that did not survive beyond 1971); otherwise, their surname would be included in the INSEE file.
Those having disappeared, while borne by three MPFs, total 20: Boucaine, Codaccionni, Gamee, Goanach, Gossegin, La Bouere, Le Dicabel, Le Porcq, Leblevec, Legouvello de la Porte, Lesolec, Luzj, Marchetay, Martruc, Pasquau, Prost Toulland, Rohfristch, Roumas Bertranine, Schoonhere and Sourseau. There are also uncertain extinctions, as the written forms ‘Codaccioni’, ‘Le Blevec’, and ‘De la Bouere’ are attested to in the INSEE file.
Those who were killed shared their surname with other men in the population. These certainly exist, even if their exact number has yet to be determined because patronymic identity is not sufficient proof of a direct genealogical link, including for births registered under the same name in the same department.
On the concept of geohapax, see Chareille and Darlu (2013).
There is also a strong correlation between the number of extinctions expected by the G–W model and the number of hapaxes per department (r = 0.97) or geohapaxes per department (r = 0.80). Disappearances estimated by G–W are correlated to these indicators.