1Textual analysis begins with a count of how frequently words appear in a text or a corpus of texts. Ngram Viewer provides a means to do this on an all-inclusive scale, by counting the annual occurrences of words or word groups in the entire corpus of books contained in the Google Books database, which currently includes more than seven million titles published over five centuries in eight languages. What are the strengths and weaknesses of this new tool? What does it teach us about the diffusion of words, in the field of demography especially, over the last century? How is word usage in the Google Books corpus linked to historical events? Analysing the rise to prominence or the demise of certain demographic terms, François Héran discovers that the vocabulary of formal demography has fallen out of favour, replaced by new themes and new notions that reflect the broadening horizons of population science.
2A mere generation ago, who could have imagined that the millions of books printed since Gutenberg might be instantly accessible, and provide us with an immediate record of the frequency of word usage over a period of several centuries? First brought on line on 16 December 2010, and updated in November 2013, Ngram Viewer uses the Google Books database to explore the lexicon of eight million volumes published since the sixteenth century. It instantly counts the occurrences of words or word groups in eight languages (nine, if British and American English are counted separately), opening the door to an infinite number of queries. The use of this application to study the vocabulary of demography has been tested on two occasions, with the findings published in Population and Societies (Héran, 2013), and in Demographic Research (Bijak et al., 2014). While pointing up the limits of Ngram Viewer, these first two experiments have demonstrated its potential value for demographers.
3This article proposes a more detailed historical exploration of the vocabulary of demography, focusing on moments of innovation or abrupt change, of revival or decline. How do these patterns of usage shed light on the place of demography in the body of scientific knowledge? Here, the exploration initiated in the two above-mentioned articles will focus more closely on methodological aspects, starting with two simple questions: How does Google Books select the documents to be included in its database? And how do we move from this corpus to that of Ngram Viewer? Does the omission of scientific journals from the corpus bias the results or, on the contrary, does it shed light on the relationships between demography and society, demography and culture? After an attentive look at the capabilities and limitations of Ngram Viewer using examples taken from demography and elsewhere, we will move on to a sensitive question: is demography really losing ground in written culture? If so, what are the reasons for this, and how could the situation be remedied?
I – Ngram Viewer, or how to search through an ocean of words
1 – The unstoppable advance of Google Books
4We cannot produce or analyse Ngram Viewer’s graphs without first understanding how they are constructed and, before that, how the source it exploits – the Google Books digital library – is organized. Explanations can be found on the official websites and on some institutional blogs. The most detailed information, including a description of some of Google’s methods, is contained in the technical appendix to the article in which Ngram Viewer was first presented (Michel et al., 2010). Some manuals have also recently taken an interest in this application (Hai-Jew, 2014). But these disparate elements need to be drawn together and, if possible, assessed with a critical eye.
5First, it is important to distinguish between Google Books and Ngram Viewer. Google, the American search engine giant, launched the Google Books programme in October 2004 with the aim of digitizing as many as possible of the books produced since the invention of printing, starting with the major libraries of the United States and Europe (Library Project), then moving on to publishers’ catalogues (Partner Program). While Google eagerly advertises its mission to “organize the world’s information and make it universally accessible and useful”, its goals are primarily commercial. Storing the vast mass of information in all languages accumulated in the world’s libraries is a valuable capability in itself, be it simply to provide a database for automatic translation software. It is quite another matter to promote online reading and to detect readers’ preferences in order to draw up their profile as a consumer of culture – two dimensions of the project that we shall not dwell upon here.
6The objectives of the Google Books programme may appear somewhat disconcerting to statisticians loyal to the rules of representativeness. The aim is not to digitize a random or selected sample of books or periodicals, but rather, in line with the philosophy of big data, to be globally exhaustive, and to include all books, in all languages, stored in libraries since the sixteenth century.  Google Books aims to digitize documents of all kinds, including monographs, theses, fiction, official texts and more. Periodicals are absent from the database, however, along with other items – such as maps, posters and, above all, newspapers and magazines – that are difficult to scan for reasons of format, poor conservation or page layout. This means that the written culture in Google Books is more scholarly than that of general culture (which helps to explain why the highly journalistic term “baby boom” appears very late in the corpus, as we shall see below).
7Given the Herculean nature of this digitization programme, it was rolled out on a massive scale. In the major libraries, the Google operators deployed an optical system capable of scanning one thousand pages per hour without flattening the books (distortion due to page curvature was corrected automatically). In October 2010, five years after it was launched, Google Books announced that more than 15 million volumes had been digitized in a hundred countries and in more than 400 languages. By April 2013, the figure had already doubled to 30 million volumes – representing a quarter of all the books printed and conserved by humankind since the invention of printing, if Google’s engineers are to be believed.  At this rate, the digitization of the world’s books could be complete by 2020.
8The major US libraries readily admit that they would have taken decades to digitize their holdings, a task accomplished by Google Books in few short years. These same professionals have observed a net slowdown in recent years, however (Howard, 2012), apparently linked to the fact that Google Books is now focusing on filling specific gaps (such as the Hispanic collection at the University of Texas). It is also reportedly spending more time on avoiding duplication, and concentrating resources on European libraries. But these librarians’ accounts do not tell the full story. The American giant is very discrete, providing small nuggets of information on blogs that it opens and closes at will, often obliging the specialized press to rely on guesswork.
9How have the French-language holdings been covered by this vast endeavour? A large share of the French-language books of the seventeenth and eighteenth centuries were printed by publishers in the Netherlands, Switzerland or Great Britain. They are therefore well represented in the holdings of the KB (National Library of the Netherlands), the Bibliothèque universitaire de Lausanne and the Bodleian Library in Oxford, all of which are major partners of Google Books in Europe. However, the French national library, the Bibliothèque nationale de France (BNF), headed by the historian Jean-Noël Jeanneney from 2002 to 2007, declined Google Books’ offer on the grounds that Europe was threatened by American cultural hegemony (Jeanneney, 2005). The BNF opted instead for the Gallica programme as part of the Europeana European digital library project. Gallica has set high quality standards but is not very user-friendly. Its main limit for our purposes is that book pages are scanned in “image mode” and not “text mode”, thus ruling out any statistical analysis of the lexicon. Since 2011 the National Fund for the Digital Society (Fonds pour la société numérique), via the Investments for the Future programme (Programme des investissements d’avenir), has funded a major programme to digitize the pre-eighteenth century holdings of the BNF, with no link to the Google Books project.
10To fill the gap left by the BNF’s “Gallican” refusal, Google Books has drawn upon collections held in American and European libraries. In 2008, it also won the contract to digitize the Bibliothèque municipale de Lyon, France’s largest municipal library (3.8 million volumes) with large holdings of historical books (Colombet, 2008). An agreement has also been signed with the online catalogue of the Lyon-based bookshop, Decitre.
2 – Ngram Viewer: data, structure and potential
11Ngram Viewer operates downstream of Google Books. It was developed in close partnership with engineers from Google, but is not one of its commercial applications. Designed by IT and language processing specialists at Harvard University, Ngram Viewer is hosted by Google Books and is freely accessible to users.  As its name indicates, it produces graphs which show the relative frequency of words or word groups (ngrams) contained in published material printed over the centuries. Of the 30 million documents digitized by Google Books, Ngram Viewer analyses 8.1 million, in eight different languages (Table 1). The French-language corpus totals 800,000 volumes, containing some 102 billion words, with an especially high number of words per volume that reflects the analytical nature of French syntax.
Dimensions of the Ngram Viewer corpus in its 2012 version
Dimensions of the Ngram Viewer corpus in its 2012 versionNote: A word with 50 occurrences in a volume is counted 50 times.
12Why only 8.1 million volumes and not 30 million? The successive filters applied to the Google Books corpus are briefly described by Ngram Viewer’s designers (Michel et al. 2010, supporting online material, pp. 7-8). As well as choosing just eight languages, they also excluded publications that could not be dated (by matching the document with the publisher’s or library’s metadata) or for which optical character recognition – assessed using a special algorithm – was poor. Alleged dating problems prompted the designers to exclude almost all periodicals, including scientific journals. This is a key point that we will return to later.
Chronological coverage for France between 1740 and 2008
13The time span covered by Ngram Viewer is unprecedented, especially for the French corpus, which begins in 1547. Coverage is intermittent for the sixteenth century, but becomes annual from 1609 (Figure 1). The threshold of ten million words per year is reached in around 1750.  In practice, 1750 offers a good starting point for Ngram Viewer analyses in French, as by this time – some 50 years before the English and American corpuses – the number of words is large enough to limit random variations in frequencies of occurrence.
The French corpus of Ngram Viewer: quantitative data (log scale)
The French corpus of Ngram Viewer: quantitative data (log scale)
14The French corpus of Ngram Viewer includes the Revolution; compared with the period 1786-1788, there are twice as many books for the years 1789-1791 and 45% more words, so the potential for research is vast. In the nineteenth and twentieth centuries, the production curves are disrupted by episodes of revolution and war that wiped out the progress of the previous decades (Figure 2). The corpus reaches a peak in 1968, though it is impossible to tell whether the subsequent downturn marks a waning of editorial effervescence or a strengthening of copyright protection. The faster pace of increase in the 1990s and 2000s may reflect the inclusion of editors’ catalogues in the corpus alongside the library collections. Overall, the French corpus can be divided into four equal parts covering the periods 1547-1866, 1867-1936, 1937-1986 and 1987-2008, with a successive shortening of each period that reflects its ever faster growth.
Number of books, pages (divided by 850) and words (divided by 140,000) in the French corpus of Ngram Viewer (arithmetic scale)
Number of books, pages (divided by 850) and words (divided by 140,000) in the French corpus of Ngram Viewer (arithmetic scale)
15The corpus currently stops in 2008, and this cut-off point remains unchanged for the versions developed in 2009 and 2012 which came online in 2010 and 2013, respectively. The French language corpus doubled in volume from one version to the next, from 390,000 documents to 780,000. Major improvements have been made. For example, word groups no longer extend beyond the end of a sentence, while those that straddle two pages are now recovered. Many optical recognition errors have been corrected, though instances of nativité (nativity) being incorrectly read as natalité (natality) still turn up occasionally in the seventeenth century. The long shape of the letter s in the centre of a word is often read as an f, (for example, in the British corpus of the 1780s, “necessity” is read as “neceffity” in one-quarter of occurrences, “reason” as “reafon” in one in seven occurrences) but it is simple enough to sum the occurrences of both spellings to resolve the problem. Word-breaks at the ends of lines are not always detected (for example, in many cases, “circum” is still an artefact produced by the uncorrected “circum-stance”). These are minor flaws, however, that reflect a legitimate respect for ancient spellings.
Corpus structure: sliding sequences of one to five words
16In Ngram Viewer, a sequence of characters bounded by spaces is called a gram. An Ngram Viewer query can include a series of one to five grams, so ngrams can be unigrams, bigrams, trigrams tetragrams or pentagrams. For the sake of simplicity, we shall call them “expressions” or “word groups”. A corpus comprises annual tables of possible sequences of one to five words.  Each sequence is followed by the year, the number of occurrences in documents of that year, the number of different pages and books in which it occurs. This means that when querying the corpus, users can use a joker (by convention an asterisk) before, inside or after an expression of fewer than five words. This produces the ten most frequent expressions at the end of the queried period. Introduced at the end of 2013, this innovation sheds interesting light on the most frequent word associations. For example, the query “fertility and *” or “* and fertility” reveals that the association of beauty with fertility, applied to landscapes – but also to women – reached its apogee in the 1830s. It was not until a century later that mortality took over as the word most commonly associated with fertility.
17Because of the way the data are organized, the frequency of a sequence of n words is calculated from the total number of sequences of the “same length” in the documents of the year (e.g., the frequency of a bigram in all the bigrams of that year). Does this mean that expression of different lengths cannot be compared? Not at all, as illustrated by the superposition of the trigram “die in vain” and the tetragram “not die in vain” so commonly heard in times of war (Figure 3); it shows that in the majority of cases, “die in vain” forms part of the assertion “not die in vain”. We can also calculate ratios between sequences of different lengths, such as “life expectancy / expectancy”, to discover that in half of all cases, when “expectancy” is written in English, it is part of the expression “life expectancy”, a term that was barely known in the early 1930s!
Example of comparison between a 3-word sequence and a 4-word sequence
Example of comparison between a 3-word sequence and a 4-word sequenceTranslation: Tombés en vain: die in vain; Pas tombés en vain: not die in vain
Note: Frequency per million, smoothing of 3.
Entry barrier: occurrence in at least 40 different documents
18To prevent the corpus from growing exponentially, uncommon words are excluded from Ngram Viewer. We know that word distribution in a corpus follows Zipf’s law whereby written languages contain a small number of widely used tool-words, such as articles, auxiliaries and prepositions, and a vast number of rarely used words, including hapaxes (which occur only once). Ngram Viewer’s designers decided to omit rare words – and thereby increase the program’s running speed – by setting a high threshold: to feature in a corpus, a word must appear in at least 40 different documents in a given year.
19A query may include several items (separated by commas) and cover more than one language corpus (easily distinguished by a code). Within one second, Ngram Viewer answers the query in the form of a graph. For users wishing to draw the graphs themselves (as is the case in this article), the corresponding data can be downloaded very easily.  Ngram Viewer seeks an optimal compromise between search accuracy and response time. This is perhaps bad news for lexicometrists, but good news for social scientists, since the approximate period when a notion comes into practical usage is more meaningful in sociological terms than the exact date of its first appearance.
The hierarchy of frequencies and the standardization problem
20The magnitude of frequencies may vary substantially from one field to another. A query containing just the joker asterisk reveals that the most frequently used words in printed French are de, la and et; they represent, respectively, 4.3%, 2.4% and 1.8% of all words printed in French over two centuries. In English, the commonest words are “the”, “of”, “and” and “to” (5,5 %, 3,7 %, 2,4 % and 2,1 %). Widely used French words such as vie (life), mort (death) or âge (age) have frequencies of 0.063%, 0.025% and 0.014%, i.e. 63, 25 and 14 per 100,000. Démographie (demography), a more technical term, has a frequency six times lower, at just 6 per million (0.0006%).  While well established in French society, démographie is by no means the centre of the world. Highly specialized expressions, such as espérance de vie en santé (healthy life expectancy), are practically invisible, with a frequency of just 1 per 100 million. But even at microscopic level, their rise and fall within the corpus may be highly revelatory: occurrences of espérance de vie en santé surged between 1990 and 2003. Rather than the absolute frequency calculated by Ngram Viewer, it is the variation in usage over time, and the frequencies of different expressions relative to each other (see Box), that are most interesting to observe.
21This is especially true for the relationship between everyday vocabulary and scholarly vocabulary, and within the latter, between generic and specialized vocabulary. In the 2000s for example, “historical demography” was 20 times less frequent in the bigrams than “demography” in the unigrams. Its curve is marked by a first peak in 1957, then an apogee in 1983 followed by a sharp decline. “Demography” follows a similar path. Ngram Viewer can thus be used to make comparisons on several scales, without confusing the different orders of magnitude: each has its own visibility level.
22The graphs produced by Ngram Viewer automatically adjust the frequency axis to the maximum observed over the chosen period. If we include France, Alsace, Strasbourg and Bas-Rhin in the same query, France overwhelms all the other terms since Strasbourg is 25 times less frequent than France, Alsace 40 times less and Bas-Rhin 500 times less. To produce comparable curves in such cases, several options are available: remove certain terms, multiply the frequencies by a scaling factor, or calculate the ratios between the curves. 
23Unless a ratio is specifically requested, the y-axis of the graphs produced by Ngram Viewer is a percentage (always at the scale of “parts per million” in this article). But what are the numbers in the numerator and the denominator? An ngram query (such as the bigram “life expectancy”) includes in the numerator all the occurrences of the year, provided that the expression is found in at least 40 different documents. One might assume that the same filtering system applied to the denominator, but in fact the Ngram Viewer developers decided to include all the year’s ngrams, not only those that satisfy the 40-document criterion. This is logical, since it is with respect to the entire written output of the year that the frequency of the analysed expressions must be determined. But this “total standardization” approach has been criticized on the grounds that the rare words included in the mass of recently scanned documents generate an increasing amount of meaningless “noise” that artificially reduces the proportion of expressions that are actually used. If this hypothesis were true, it might invalidate the finding that many scientific activities, including demography, are in decline. We will examine – and refute – this hypothesis in the second part of this article (Chateauraynaud and Debaz, 2010; Peccate, 2011).
3 – Ngram Viewer and its critics
24Ngram Viewer has been criticized for two very different reasons. First, its designers’ claim to have founded a new science of “culturomics” based on this tool is seen as somewhat overblown (Michel et al., 2010). Second, the fact that users cannot access the full texts containing the digitized word sequences is viewed as a major weakness.
Culturomics: a new science?
25If its designers are to be believed (or some of them at least), Ngram Viewer has spawned a new science, at the crossroads between the digital humanities and cultural studies, that will revolutionize the comparative study of cultures. French social scientists have been quick to point up the improvized nature of the historical, sociological or cultural deductions based on “culturomic” research, as presented in the popular science book published by two French mathematicians using the 2009 version of the corpus (Delahaye and Gauvrit, 2013). For the 2012 version, the American equivalent is the manifesto written by Ngram Viewer’s designers (Aiden and Michel, 2013), which is more up to date and more detailed, but which presents the same flaws. A digital approach to culture calls for a solid mastery of digital science (mathematics, computing and linguistics), but also of culture for its own sake. The former cannot substitute for the latter without discrediting the tool. But fortunately for us all, Ngram Viewer can be used freely without subscribing to the culturomics project. The creation of such an innovative and powerful exploration tool is already a great achievement in itself. Why seek to proclaim it as the emblem of a new science?
Some useful properties of Ngram Viewer
The series plotted by Ngram Viewer fluctuate considerably from year to year. A smoothing option is available to view data as a moving average. A smoothing of n means that the data shown for the reference year will be an average of the raw count for that year plus n years on either side, i.e. a moving average over 1 + 2n years. In practice, smoothing of 3 (moving average over 7 years) provides a good compromise between chronological accuracy and clarity of the patterns that emerge. Smoothing of 5 is sometimes useful over centuries. Conversely, for events with specific dates (declaration of war, new law, creation of an institution) smoothing of 1 is preferable, or no smoothing at all.
Case-sensitivity: institutions lose their imposing capitalized acronyms
Unlike standard search engines, Ngram Viewer is case-sensitive, i.e. it distinguishes between words written in upper and lower case. If the “case insensitive” option is used, Ngram Viewer counts all occurrences of a word, however it is written (EUROSTAT and Eurostat, for example). However, it is unable to identify punctuated acronyms or unaccented capital letters (in French a case-insensitive search for the word état (state) counts Etat but not État). Changes in French typographic conventions applied to institutions such as INED, OECD or UNESCO (written Ined, OCDE and Unesco under French typographic rules), have a social significance that can easily be tracked by Ngram Viewer (Figure A). Rarely cited in the immediate post-war years, such institutions have multiplied in recent decades and become part of common parlance; the imposing title I.N.E.D. has been gradually replaced in French by Ined, a graphic form no different from that of a family name. While the form INED still dominates, the days of this intermediate form appear to be numbered.
Main graphic forms of the “Institut national d’études démographiques”
Main graphic forms of the “Institut national d’études démographiques”Note: Cumulative frequency per million, smoothing of 1.
Words without context?
26Ngram Viewer was greeted with enthusiasm by automatic language processing specialists such as Jean Véronis, author of a popular blog, who died in December 2013. But lexicographical historians and sociologists were damning in their criticism. Just a few days after the application was launched, the author of a French language blog for historians had already given his verdict: “there is absolutely no way to access context, and there never will be. Who wrote the term being searched? In what meaning is the word used? And in what type of document? These are fundamental questions that remain unanswered” (Ruiz, 2010). In lexicometry, each word must have a “concordance”, i.e. the complete phrase in which it appears, the surrounding phrases and the exact references of the citation (author, year, edition, page, type of book). As Ngram Viewer isolates words from their context, the usual methods of interpretation cannot be applied. Readers must therefore rely upon a culture external to the corpus; the tie between the words and their context of usage is lost. So all in all, Ngram Viewer simply serves to confirm pre-existing intuitions.
27In response to such objections, Ngram Viewer’s designers point out that under their contract with Google, the digitized texts must remain anonymous (Aiden and Michel, 2013) to avoid the problem of copyright claims.  Moreover, the corpus – already the largest in existence – would become totally unmanageable, they claim, if weighted down with all these references. To which the lexicometrists retort that a change in size is no grounds for betraying what, for them, is a fundamental principle: “mastery of the corpus” (Ruiz, 2010).
28The critics are harsh, and fail to notice a very useful function in Ngram Viewer, namely the presence at the foot of each graph of a series of dates displayed in hypertext corresponding to the peaks and plateaus of each curve. A simple click on these dates gives access to the books in the Google Books library printed in the same period.  Take the example of the French words expert and expertise, representing a theme upon which Chateauraynaud is a known specialist. In his view, the curves of these two terms remain mysterious until one becomes acquainted with a particular law thesis published in 1934, or the book on the subject written by himself in 1991; the links between these and earlier texts must be examined, to see whether they proceed by citation, compilation, critique, retrieval, etc. He goes on to conclude that a detailed knowledge of the networks of links between texts is key to interpreting the Ngram Viewer graphs. Yet this objection loses its punch when we discover that the frequency curves of expert and expertise come with a set of links to Google Books which make clear reference to the two books in question! It is therefore a huge overstatement to claim that Ngram Viewer separates words “absolutely” from their context.
29By demanding complete mastery of the corpus for each citation, critics in the field of lexicography and social informatics are working at the wrong scale. They are looking for singular individuals in the quantitative landscape, with as little chance of success as a population census hoping to shed light on each citizen’s situation via their unique personal environment. No-one can interpret macro-data as if they were micro-data. Interactive and pragmatist approaches, whatever their legitimacy, cannot be applied to this type of information.
30While mastery of content is necessary for smaller corpora, in the present case it would require a bird’s-eye view of vast territories (several languages and several centuries), combined with a detailed survey of the local terrain. Much like aerial archaeology, Ngram Viewer gives an overall picture, but it is the work of field researchers on the ground to confirm the existence of structures spotted from the sky. This is an appealing metaphor, used in the conclusion to an earlier publication (Héran, 2013), but which comes up against a problem of scale: the archaeologist on the ground can visit some of the sites detected from the sky, but not the entire country. For the mere pedestrians that we are, Ngram Viewer offers a synoptic vision of the whole. Why deprive ourselves of the right to enjoy the view?
4 – Semantic drift and change
31Ngram Viewer is also criticized for assuming that word meanings remain unchanged over long periods. Yet we are reminded that the definition of a word like expert can evolve considerably over two centuries (Chateauraynaud and Debaz, 2010). Quite so, but that is stating the obvious. The links to Google Books proposed by Ngram Viewer confirm these changes of meaning. They also offer scope for discoveries that are far from intuitive.
Mortalité / natalité: why do they form a pair in French but not in English?
32A good example is the notion of mortalité (mortality). It was originally of a moral or theological nature, referring to the fact of being mortal, and “subject to death”. The references in Google Books point out that in the eyes of the Church Fathers, the incarnate Christ represented both æternitas and mortalitas. Pascal evoked “l’homme sentant sa mortalité et son néant” (man feeling his mortality and nothingness). While mortalité in French assumed its pre-demographic meaning from the eighteenth century (Figure 4A), it was first applied to the concentration of deaths among small children or in an epidemic. The 1738-1742 edition of the Dictionnaire de Trévoux cites this phrase with a decidedly scriptural flavour: “la mortalité est sur les petits enfants” (mortality is upon young children) as if struck down by a biblical plague. But as demographic calculation is not incompatible with theological fatalism, the same dictionary also introduces the notion of force de mortalité (force of mortality) brought to light in London by a certain John Graunt in his Bills of Mortality.
33There is no theological antecedent to the concept of natalité, however. For man is mortal, but not “natal”. The word natalité is a late scholarly creation not visible in the French lexicon until 1862, when it appears under its demographic definition, quite free of religious connotation. So mortalité and natalité, in fact, have very different pedigrees, largely forgotten with the passage of time.
34The English-speaking world, for its part, respecting the dissymmetrical treatment of birth and death by theologians, has never been tempted by such an analogy (figure 4B). This is why, as all good translators will know, the term mortality does not pair up with natality (though the term exists), but with birth rate. However, comparisons in Ngram Viewer suggest that the term most commonly paired with mortality has long been fertility, though the two concepts are calculated quite differently. We will leave it to the historians of English demographic terminology to pursue this investigation further.
Natalité versus mortalité in French, not found in English, which prefers to pair mortality with fertility
Natalité versus mortalité in French, not found in English, which prefers to pair mortality with fertilityNote: Frequency per million, smoothing of 5.
5 – Events, perceptions, formulations
35How does the curve of an expression in Ngram Viewer correspond to the actual unfolding of events? In a number of different ways.
The many names of the First and Second World Wars
36Ngram Viewer shows that the response to major events is swift. In the nineteenth century, the documents follow the jagged curve of cholera outbreaks or the series of great exhibitions (Expositions universelles) held in France. But the link between an event and its representations may be complex, as shown by the successive French names given to the First World War over the last century (Figure 5).
Main names and written forms used in French for the First World War
Main names and written forms used in French for the First World WarNote: Frequency per million, smoothing of 1.
37When war broke out in August 1914, it was officially called the Guerre de 1914 (War of 1914), reflecting the military’s conviction that the fighting would be over by Christmas. But it went on for much longer, and the population revived an old nickname, grande guerre (great war), promoted to Grande Guerre as the conflict grew in scale. The armistice saw the emergence of guerre de 1914-1918 (1914-1918 war) and, for a time, simply 14-18. But everything changed in 1940 when it became necessary to distinguish between the two periods of conflict. Grande Guerre fell from use – it was only a first world war after all! From the 1960s, the need to commemorate the war as a landmark of history was reflected in the capitalization of its name. Première Guerre mondiale grew in frequency, followed by Grande Guerre, the former in mainly macro-historical contexts, the latter in testimonies of personal experience. The corpus does not yet include publications marking the centenary (in which Grande Guerre seems to have won the day). But overall, the very consistent shape of the curves over almost a century clearly attests to the quality of the data.
38Words are the product of a reflexive history, punctuated by dramatic reversals (the last war is not the last), historiographic upheavals, and efforts of remembrance, sometimes following long periods of indifference. At the time – and even at the Liberation – the French paid more attention to the atrocities of the First World War than to those of the Second (Figure 6). It was thanks to the work of memorialists and historians that the eyes of the next generations were finally opened. 
The horrors of war: an asymmetrical interest in the two world wars
The horrors of war: an asymmetrical interest in the two world warsNote: Frequency per million, smoothing of 1.
39The multiplicity of labels may be due to factors other than a simple sequence of events, as illustrated by the various names used to designate decolonized countries (Figure 7). As more and more countries gained their independence the term pays sous-développés (under-developed countries) became unacceptable. In 1952, Alfred Sauvy and Georges Balandier simultaneously launched the term Tiers Monde (Third World) in response to demands for a respectful terminology from the countries concerned. The expression quickly conquered the world, before in turn becoming tainted with condescension. Other names took its place, but this time not attributable to a specific reason. Their rapid generalization reflects the growing control of international organizations – now more influential than the scholarly literature – over official vocabulary.
The various names and written forms of terms used to designate former colonies
The various names and written forms of terms used to designate former coloniesTranslations:Tiers monde: Third world; pays en voie de développement / pays en développement: developing country; pays sous-développé: under-developed country; pays du Sud: Southern country.
Note: Frequency per million, smoothing of 3.
Baby boom and baby boom: how does the word reflect the phenomenon?
40The baby boom provides an example of a time lag between immediate realities and perceptions (Figure 8). First coined in the United States in 1946, the expression made a timid entry into the French corpus in 1949 and remained barely visible until the late 1970s. Its spectacular rise in France occurred 15 years later than in the United States and five years later than in the United Kingdom. It moved above the American curve in 2000 before dipping downward in later years. Clearly, the trajectory of the word is very different from that of the actual baby boom, be it defined as a surge in births (1946-1974) or as an increase in the fertility rate (1942-1965).
The delayed perception of the baby boom in the United States, France and the United Kingdom
The delayed perception of the baby boom in the United States, France and the United KingdomNote: Frequency per million, weighted average over five year (weights 1-2-3-2-1).
41There are a number of possible explanations for this delayed response. First, French-speaking scientists may have been reluctant to use an expression which was triply distasteful for being an American import, a metaphor based on stock-market jargon, and a buzzword among journalists. Remember that the press is excluded from the corpus. Second, demographers may have dragged their feet in acknowledging the reality of the baby boom. It took Alfred Sauvy, the first INED director, two to three years to admit that the spectacular upturn in fertility that began in 1946 (200,000 additional births in just one year!) was anything more than a process of post-war catching-up (Lévy, 1990). Not without malice, Sauvy’s biographer suggests a powerful reason for this hesitancy: if the French people were spontaneously having more children, what purpose was served by an institute of demography? It took the combined efforts of INED’s young researchers to convince Sauvy that the baby boom was more than just a flash in the pan; French people were indeed behaving differently. From a threat to INED, the baby boom became its saviour when the authorities asked its experts to predict the resulting increase in demand for housing and schools.
42Years later, when its contribution to the birth rate was a thing of the past, interest in the baby boom continued nonetheless, due to its long-term effects on population ageing. Indeed, while the baby boom rejuvenated the population in the early years, it is now having the opposite effect; this late dawning of awareness explains why its presence in the corpus outlived the event itself. Today, the ageing baby-boomer has, in turn, become a familiar figure and is attracting less attention.
II – The rise and fall of formal demography
43Now that we understand the workings of Ngram Viewer, and the need for caution in interpreting its results, as illustrated by some examples taken from demography, we can take a closer look at recent developments in demography as a discipline, and its probable future. Compared with the written output of other nations covered by Ngram Viewer, that of France is marked by an early and sustained interest in demographic questions that seems to have declined in recent decades.
1 – The birth and rebirth of démographie
44The word démographie itself provides a first illustration of this pattern (Figure 9). Ngram Viewer detects a few occurrences following the publication of Éléments de statistique humaine, ou Démographie by Achille Guillard, the first author to coin the word. But this first birth went largely unnoticed. It was not until the French defeat in the Franco-Prussian war in 1870 that rivalry with Germany prompted the revival of the notion. But démographie was still a rare word, employed by just a handful of individuals, including Guillard’s grandson, Alphonse Bertillon. It finally gained prominence after its third and least well-known birth in the 1920s. Use of the term démographie grew steadily from then on, with no visible break during the Second World War, reaching its apogee in the 1970s and 1980s. Its frequency declined sharply from then on, for reasons that are discussed below.
Démographie more widely used in French, but in sharp decline over the last 30 years
Démographie more widely used in French, but in sharp decline over the last 30 yearsNote: Terms used in in German: Demographie + Demografie + Bevölkerungswissenschaft + Bevölkerungsforschung. Frequency per million, smoothing of 3, all combinations of upper and lower case letters.
45Though recent additions to their respective languages, démographie in French and demografia in Italian came into usage at an early date. There is no sign of the first and second births of the term (around 1855 and then 1872) in the other languages covered by the Ngram Viewer corpora. After the Second World War, its presence in written culture was far greater in France than elsewhere. No alternative expression (such as “population studies”) fills the gap with respect to France in the English-speaking world, and neither do the various designations of demography in German. In France, interest in demography extended beyond the circles of social science and national statistics to enter the public sphere, probably fuelled by the creation of INED in October 1945. It was Professor Robert Debré who conceived the idea of setting up a national institute for demographic research capable of informing public decision-making, and who put his idea to the Conseil national de la résistance (National Council of the Resistance) (Rosental, 2003). Alfred Sauvy (after an unsuccessful bid to head INSEE) was appointed director of the institute – labelled at the time as I.N.E.D. – and remained in his position until 1962. Sauvy penned numerous essays, wrote for a major daily newspaper and was elected to the Collège de France in 1959, his reputation outshining that of the Institute until the early 1970s (Figure 10).
Alfred Sauvy and INED: a celebrated individual, a developing institution
Alfred Sauvy and INED: a celebrated individual, a developing institutionNote: Frequency per million, smoothing of 3.
2 – The decline of demographic vocabulary: artefact or reality?
46Many of the Ngram Viewer curves of demographic terms start to trend downwards in the 1980s or 1990s, suggesting that the golden age of demography is behind us, in both the French- and English-speaking worlds (Figure 11). This is the case for expressions as elementary as démographie (demography), population (population), natalité (birth rate), données démographiques (demographic data), croissance demographique (population growth), remplacement des générations (generation replacement), transition démographique (demographic transition) or démographie historique (historical demography). The decline is even more pronounced for technical terms such as structure par âge or repartition par âge and their English equivalents, age structure and age distribution (Figure 12).
Selected demographic expressions in French whose usage has declined sharply in recent years
Selected demographic expressions in French whose usage has declined sharply in recent yearsTranslations: Natalité: birth rate; démographie historique: historical demography; démographie: demography; structure par âge: age structure; données démographiques: demographic data; croissance demographique: population growth; remplacement des générations: generation replacement; transition démographique: demographic transition.
Note: To facilitate comparison of the frequency curves with that of the word démographie, the other frequencies have been multiplied by a factor indicated in the legend. Frequency per million, smoothing of 3.
Age structure and baby boomers in France and the United Kingdom
Age structure and baby boomers in France and the United KingdomNote: Frequency per million, smoothing of 3.
47These trend reversals have since been confirmed, suggesting that an observer with the same observation tool in 1990 could easily have forecast the future simply by continuing the curve. But it is one thing to note that the relative space occupied by demography is shrinking, and quite another to interpret this observation.
48Is this an artefact? According to the authors of a study of English-language vocabulary used in climate science, “the full, unfiltered Google record includes growing numbers of characters, data, and other non-English ‘noise’” (Bentley et al., 2012) that artificially inflate the final corpus size. They suggest correcting for this by normalizing frequencies not with the gross annual total of ngrams, but with the yearly count of the most common English word, namely the definite article “the”. This is the hypothesis referred to by Bijak and his colleagues, intrigued to see that while démographie, and even more so, “demography” have declined in relative terms, they have progressed in absolute numbers of occurrences (Bijak et al., 2014). Is this plausible?
49Acerbi (2013) tried to normalize word counts with the article “the”, but the results were disappointing: while total normalization recommended by Ngram Viewer tends to slow the growth in the word count in recent decades, “the-normalization” speeds it up slightly. The difference is minute, and much too small to explain the observed decline in the words démographie and “demography”. But should we be surprised that the hypothesis is unsound? It expects us to believe that the calculations would come out right if only we could clean the national vocabulary of all foreign imports. The illusion is deepened when the word “the” is taken as a shibboleth. It is common knowledge that the definite article is often omitted in English, and that use of the zero article varies over space and time, and even between individuals. From 1945 to 2000, the frequency of “the” in Ngram Viewer fell by 11% in British English and by 14% in American English, while that of the French definite articles (le, la, les, l’) remained stable. On the choice between two expressions as ordinary as “in hospital” and “in the hospital”, a query reveals that since the 1880s, practices have diverged steadily: while “the” is used in 83% of cases in the United States today versus 39% in the United Kingdom, usage in both countries was identical just one century ago!
50Normalizing Ngram Viewer frequencies with a word whose use varies so radically is a futile exercise. It is certainly not artificial inflation due to foreign “noise” that explains the doubling of the number of French-language volumes in Ngram Viewer between 1990 and 2000. It is a global trend. Western countries are printing ever larger numbers of texts on ever more subjects, not counting the fact that the latest digitized output now comes from publishers’ catalogues as well as from libraries.
51But the best refutation of the theory of “growing foreign noise” is the contrast observed between the decline in the vocabulary of formal demography and the increase in words associated with many social, civic or ethical topics. If the noise artefact produced the biases suspected by Bentley or Acerbi, it would affect all the lexicons, not just that of demographic analysis. Topics found increasingly in the lexicon include the end of life, discrimination, gender equality, domestic violence (Figure 13), but also, and above all, health (Figure 14).
Selected demography-related terms whose usage has increased rapidly over recent decades
Selected demography-related terms whose usage has increased rapidly over recent decadesTranslations: égalité des sexes / des genres, droits de la femme / des femmes: sex / gender equality, women’s rights; fin de vie: end of life; violences conjugales / envers les femmes: domestic violence / violence against women.
Note: Frequency per million, smoothing of 3.
Health, medicine, hygiene, demography: changing priorities over 250 years
Health, medicine, hygiene, demography: changing priorities over 250 yearsNote: The frequency of the word démographie was multiplied by 10 to facilitate comparison. frequency per million, smoothing of 3.
52Other topics, such as immigration, questions of national or religious identity, and “republican values”, have progressed spectacularly in French written output. Although outside the traditional domain of demography, these questions are now central concerns of demographic research.
3 – A measure of links between science and society, not of cutting-edge scientific progress
53These new centres of interest extend beyond the sphere of academia and concern society in general. The question of how scientific journals are treated by Ngram Viewer is raised by Bijak and his colleagues (2014). They suspect that the corpus includes the vocabulary of INED’s book series, but not that of English-language demographic journals. As we have seen, it is true that periodicals are omitted from Ngram Viewer. But why should this bias have increased in recent decades?
54This brings us back to the fundamental question. If the contention is that scientific terms are increasingly confined to highly specialized publications, i.e. journals and not book series, then the phenomenon can be interpreted differently: not as selection bias, but as an indicator of true isolation. For a scientific vocabulary incapable of reaching the visibility threshold set by Ngram Viewer, including at the smallest order of magnitude (one per 100 million), is very likely disconnected from mainstream written culture. This is a key property of Ngram Viewer: its purpose is not to track the frontiers of science, but to observe its capacity to penetrate written culture (in today’s jargon, it contributes to a “measure of impact”). So it does not serve to document the competition between scholars or schools of thought, but rather to shed light on the relation between science and society. Bijak et al. hoped that Ngram Viewer would confirm the superiority of certain methods (longitudinal and multilevel approaches, simulation modelling, etc.). But apart from the fact that these methods are not specific to demography, it is aberrant to use Ngram Viewer in this way, and the results are necessarily questionable. More appropriate bibliometric tools are available to analyse changes in the content of scientific journals.
4 – A general decline in demographic analysis in favour of more targeted approaches
55We can now make a more detailed diagnosis of the decline in usage of demographic vocabulary. It is undeniable for the technical expressions defined in dictionaries and treatises  (Pressat, 1979; Caselli et al., 2001-2004; Meslé et al., 2011). We can confirm this by looking at some of the major demographic themes, though our comments will be brief.
56Interest in birth control and contraception has varied widely (Figure 15). Delayed by the arrival of the baby boom, then brought to prominence by technical progress, the militant upsurge was spectacular in the 1960s and peaked in the 1970s before rapidly losing ground. The curves of the various terms now seem to be levelling off. The focus has shifted towards the problems of sterility and infecundity, with a recent peak for assistance à la procréation (assisted reproductive technology) (Figure 16).
Limitation des naissances (birth control) and planning familial (family planning): a slow decline since the militant years
Limitation des naissances (birth control) and planning familial (family planning): a slow decline since the militant yearsTranslations: Planning familial: family planning; contrôle des naissances / limitation des naissances / regulation des naissances / prevention des naissances / planification des naissance: birth control.
Note: Frequency per million, smoothing of 3, all combinations of upper and lower case letters.
Infertilité (infecundity) and assistance médicale à la procréation (assisted reproductive technology)
Infertilité (infecundity) and assistance médicale à la procréation (assisted reproductive technology)Translations: Fécondation in vitro: in-vitro fertilization; assistance médicale à la procreation: assisted reproductive technology; infertilité: infertility; mère(s) porteuse(s): surrogate mother(s); ICSI: Intracytoplasmic sperm injection; CECOS: Centre d’étude et de conservation des œufs et du sperme (sperm and oocyte bank).
Note: Frequency per million, smoothing of 3. The expression assistance médicale à la procreation includes procréation médicalement assistée but not “PMA” (which also stands for pays les moins avancés (least advanced countries”).
Démographie historique (Historical demography)
57Forged by Louis Henry, and carried by the vogue for “serial history”, démographie historique is now a bygone expression (Figure 17). Beyond registres paroissiaux (parish registers) and reconstitution des familles (family reconstitution), a new generation of demographic historians and economists now exploit an array of sources and methods whose purely demographic contours are difficult to identify using a lexical approach.
The vocabulary of historical demography
The vocabulary of historical demographyTranslations: Registres paroissiaux: parish registers; démographie historique: historical demography; histoire sérielle: serial history; reconstitution des familles: family reconstitution.
Note: Frequency per million, smoothing of 3.
58While taux de mortalité (mortality rate) has been losing ground since the 1970s, the rise of espérance de vie (life expectancy) observed since the mid-twentieth century continues unabated (Héran, 2013). The study of mortality is in better shape than that of fertility. But causes de mort / causes de décès (causes of death), with its long history of emulation between French and English speakers, seems to have reached a plateau (Figure 18). The components of perinatal mortality, on the other hand, are new to the scene (Figure 19). The déterminants de la mortalité (mortality determinants) are of less interest than déterminants de la santé (health determinants) (Figure 20). Topics such as crise sanitaire (health crisis) or inégalités de santé (health inequalities) are gaining ground, extending beyond purely demographic research and prompting mortality specialists to develop more causal analyses relevant to public health and epidemiology.
Causes of death and alternative terms in French and British English
Causes of death and alternative terms in French and British EnglishNote: Frequency per million, smoothing of 3.
New explorations of mortalité infantile (infant mortality): la mortalité périnatale (perinatal mortality)
New explorations of mortalité infantile (infant mortality): la mortalité périnatale (perinatal mortality)Translations: Prématurité: prematurity; mortalité néonatale: neonatal mortality; mortalité périnatale: perinatal mortality; mortinatalité: stillbirth; mortalité post-néonatale: post neonatal mortality.
Note: Frequency per million, smoothing of 5. Variants with dashes (such as néo-natale) are included.
Health inequalities, health determinants: a new challenge for mortality studies
Health inequalities, health determinants: a new challenge for mortality studiesTranslations: Inégalités de santé: health inequalities; déterminants de la santé: health determinants; crise sanitaire: health crisis; déterminants de la mortalité: mortality determinants.
Note: Frequency per million, smoothing of 3.
Nuptiality: the end of the marriage model
59The notion of nuptialité (nuptiality) predates the development of démographie as a named discipline (Figure 21). It overtakes matrimonialité to occupy a leading position when démographie is reborn after the defeat of 1870. No-one doubted that nuptiality was the linchpin of reproduction, despite suspicions that marriage, in its turn, depended on rural prosperity and agricultural prices. After the marriage market disruption of the Great War, interest in nuptialité increased – along with a fear of declining birth rates and general rediscovery of demography – and maintained a fluctuating upward trend until the 1980s. This was followed by a spectacular drop, directly linked to the rise in non-marital cohabitation. The legal markers of demographic analysis became irrelevant: calculating fertility by marriage duration is now obsolete, as is the reference to naissances illégitimes (illegitimate births) (Figure 22). By contrast, we see an increase in the frequency of passage à l’âge adulte and its English equivalent “transition to adulthood” for which the demographic markers are no longer âge au premier mariage (age at first marriage) but âge au premier rapport (age at first intercourse), à la première naissance (at first birth) or au premier enfant (at first child) (Figure 23).
Rise and fall of nuptialité since 1875
Rise and fall of nuptialité since 1875Translations: Nuptialité: nuptiality; matrimonialité: marriage (approximate translation).
Note: Frequency per million, smoothing of 3 for nuptialité; no smoothing for matrimonialité.
Births outside wedlock: the end of illegitimacy
Births outside wedlock: the end of illegitimacyNote: Frequency per million, smoothing of 5.
Transition to adulthood: disappearance of first marriage
Transition to adulthood: disappearance of first marriageTranslations: Âge au premier mariage: age at first marriage; âge à la première union: age at first union; âge au premier rapport: age at first intercourse; âge au premier enfant; age at first birth.
Note: Frequency per million, smoothing of 3. âge au premier enfant / etc. also includes âge à la première naissance, âge à la première maternité and âge au premier accouchement (age at first birth / childbearing / delivery).
60The shake-up in the marital order is followed by a similar upheaval in the sexual order. The PACS civil partnership was a topic of heated debate in 1998 before being adopted by the French parliament in November 1999. But as interest in the diversity of sexual orientations increased (Figure 24), the polemic did not end there. As yet, however, the corpus ends in 2008, before the debate about mariage pour tous (marriage for everyone) got under way in France.
Sexual orientationsNote: Frequency per million, smoothing of 3. The “all inflections” option sums the singular/plural, feminine/masculine, noun/adjective forms.
61The growing presence of international migration in written documents covered by Ngram Viewer deserves a study of its own. Our analysis here will be limited to a few key points. The technical vocabulary of migration, such as solde migratoire (net migration), follows a pattern similar to that observed for the other demographic themes, with a peak in the 1970s followed by a downturn. The opposite is true for a series of expressions that reflect the content of heated public debate. They concern the scale of flows (migration de masse [mass migration], vague migratoire [migration wave]), control of migration (contrôle des frontières [border control], carte d’identité [identity card], titre de séjour [residence permit], droit de séjour [right of residence], demandeurs d’asile [asylum seekers]), migrants’ links with religion (communautarisme [multiculturalism], laicité [secularism], école de la République [Republican school], Islamisme [Islamism], Islam, etc.), the question of origins (origine étrangère [foreign origin], identité de la France [French identity], statistiques ethniques [ethnic statistics]), values of social cohesion (lien social [social bond], État de droit [rule of law], valeurs communes [shared values], devoir de mémoire [duty to remember]). Most of these expression came into common usage in the 1980s and 1990s, as political debate intensified.
62All this was already known or suspected. But Ngram Viewer also sheds a harsh light on the values meant to restore social cohesion in a context of failed migrant integration: valeurs républicaines (republican values), école républicaine (Republican school), laïcité (secularism), droits des femmes (women’s rights), droits de l’homme (human rights), respect de la dignité humaine (respect for human dignity), etc. These expressions are presented as the heritage of our ancestors, as a treasure handed down continuously since the French Revolution or the Third Republic. But in truth, the earlier generations had kept them on a back burner. These values so cherished today were brought in from outside – such as droits de l’homme after the defeat of Nazism – or created internally, as shown by the unprecedented upsurge in expressions such as École républicaine or École de la République (Figure 25). They have never been more popular than today, with proportionally 20 times more occurrences than in the times of Jules Ferry. These values do not take hold by themselves – they must be reinvented. Inculcating these values in the “newcomers” to society – not only children but also migrants (as now required by law) – calls for a more innovative and demanding educational approach than a simple history lesson.
Reference values for integration: inherited, borrowed and invented terms
Reference values for integration: inherited, borrowed and invented termsTranslations: Droits de l’homme: human rights; École républicaine / École de la République: Republican school.
Note: Frequency per million, smoothing of 3.
63History does play a role, but not always in the way we imagine. Ngram Viewer highlights the decisive role of the Great War in the introduction of identity papers, an emergency measure at the time that became a permanent aspect of French life (Figure 26). It also confirms that while focus has been placed in the “integration” of foreigners or immigrants in France, their “assimilation” has never been a strong priority (Héran, 2013). Use of the word increased at the eve of the Great War, then during the crisis of the 1930s and just before the start of decolonization, but with a very limited uptake each time (Figure 27). It is only retrospectively that a model of assimilation can be applied to earlier generations of immigrants. The concept of intégration des immigrés (immigrant integration) follows a very different trajectory. Brought into use by national and European policies from the 1980s, its usage is still increasing today. In French language publications, intégration is 20 times more frequent than assimilation in 2008. Exploration of English vocabulary (Héran, 2013), by contrast, shows that “cultural assimilation” and “cultural integration” are practically interchangeable, facilitated by “hyphenated identities” (such as Korean-American), including for the “assimilated” population.
A lasting innovation of the First World War: the identity card
A lasting innovation of the First World War: the identity cardTranslations: Carte (nationale) d’identité: (national) identity card; permis de séjour: residence permit; permis de travail: work permit; contrôle des frontières: border controls.
Note: Frequency per million, smoothing of 3. C/carte: sum of Carte + carte.
Contrast between assimilation and intégration
Contrast between assimilation and intégrationTranslations: Integration des (im)migrant(e)s: integration of immigrants; integration des étrangers: integration of foreigners; assimilation des (im)migrant(e)s: assimilation of immigrants; assimilation des étrangers: assimilation of foreigners; assimilation des indigènes: assimilation of indigenous peoples.
Note: Frequency per million, smoothing of 3.
5 – International comparisons are difficult
64Demography is a social science which claims to be universal, and multilingual thesauri, harmonized by the United Nations, have been developed over the years. Ngram Viewer, for its part, enables us to explore vocabulary changes simultaneously in several languages. Experience shows, however, that apparent equivalences across languages can be misleading.
65For example, the term pyramide des âges is widely used in French, while “age pyramid” is invisible in Ngram Viewer and “population pyramid” infrequent. English speakers are not tempted by architectural metaphors; they prefer “age structure” or “age distribution”. While French-speaking demographers do use the equivalent terms of structure par âge(s) and répartition par âge(s), they tend to be reserved for technical usage. From one language to another, “equivalent” expressions vary in frequency and register, making comparisons difficult.
66While the pyramide des âges has stood the test of time, French expressions relative to age distribution have been losing ground since the 1980s, as is the case in both British and American English (see above, Figure 12). In all three Ngram Viewer corpora, the vocabulary of demographic analysis has declined rapidly over the last three decades. In American English, this decline is accompanied by a shift towards generational marketing. References to the “baby boomer” category skyrocketed in the 1980s and 1990s, to the point where “demography” has now been overtaken by “demographics”, the activity of selling local population data in the global “big data” market. A similar shift is observed in British English, but on a smaller scale (Figure 28). Will French demography follow suit? Is it possible to imagine “big data” replacing civil registration, the census and sample surveys with the same level of reliability?
“Demographics” overtakes “demography” in American English, and is catching up in British English
“Demographics” overtakes “demography” in American English, and is catching up in British EnglishNote: Frequency per million, smoothing of 3.
Conclusion: keeping demography alive through greater openness
67The decline of demography as a discipline is by no means a new concern. The question was raised some 20 years ago by Jean-Claude Chasteland and Louis Roussel at the end of their careers. The findings of their online survey (Chasteland et Roussel, 1997) still remain valid today. Centred on the canonical concepts of demography, the lexical records confirm a definite waning of interest, notably linked to the “de-institutionalization” of lifestyles as pointed out by Roussel. But analysed with greater finesse, they show something different. It is only the narrow approach to demography, confined to scientific journals and unwilling to broaden its horizons, that is faced with extinction.
68INED has a role to play in this process of outreach to other related disciplines. It has a strong presence in the written culture explored by Ngram Viewer, comparable to that of other research bodies of a similar age but much larger in size, such as INSERM, INRA, IRD or CEA (Figure 29), (not counting the CNRS which is of a different order of magnitude).  Three factors may account for the exceptional level of visibility.
Visibility of INED and some other French research organizations in the last ten years of the corpus, 1999-2008
Visibility of INED and some other French research organizations in the last ten years of the corpus, 1999-2008Note: Frequency per million, all combinations of upper and lower case letters, average over 10 years. INRA: Institut national de la recherche agronomique (National Institute for Agronomic Research); CEA: Commissariat à l’énergie atomique (Atomic Energy Commission); INSERM: Institut national de la santé et de la recherche médicale (National Institute for Health and Medical Research); IRD: Institut de recherche pour le développement (Institute of development-oriented research).
69First, the presence of other organizations in written culture may be declining because they have focused their output on more specialized, even esoteric, questions that are of limited interest to a more general readership.
70Second, INED, which is itself affected by this tendency, has succeeded in limiting its impact through a broad-based approach to questions of society. Multidisciplinary from the outset (demography, history, psychosociology), INED has extended its reach (sociology, economics, geography, gender studies, public health), without abandoning its “core competencies”. Its surveys bear witness to this. Conducted in partnership with other organizations, and increasingly with university researchers, they are still firmly anchored in the spectrum of fertility, mortality and migration, but now tie in with topical or sensitive social issues such as non-marital cohabitation, outcomes of children born outside marriage, assisted reproductive technologies, abortion, sexuality, female genital mutilation, domestic violence, disability, adoption, homelessness, discrimination, end-of-life medical decisions, etc. (Héran, 2015).
71The third factor that may explain INED’s visibility is its adaptability, manifested in three successive phases since its creation.
72The first phase was that of official pro-natalism, formally laid down in the statutes but immediately reinterpreted by the creativity of the pioneers recruited by Sauvy: Louis Henry, Jean Bourgeois-Pichat, Pierre Depoid and Paul Vincent, along with a pioneer of survey methods, Jean Stoetzel, and a historian, Louis Chevalier.
73From the mid 1960s to the late 1970s, these advances were formalized with the aim of establishing the discipline’s autonomy and making it easier to teach. Demographic analysis (illustrated by Pressat’s manuals), and formal demography became the cornerstone of demographic science.
74The third phase, inaugurated in the 1980s, was marked by a movement away from pro-natalist objectives and from references to the “legitimacy” of unions and births. This went hand in hand with renewed interested in the explanatory statistics of economics and the social sciences, a historical and sociological critique of categories, and the development of both quantitative and qualitative surveys on social questions, with greater focus on inequality, discrimination and violence that threaten social cohesion – not forgetting the difficult but necessary organization of field surveys in Southern countries.
75INED was not the only player involved. The demographers of IRD and in French universities also played a role. A similar process took place in other countries. For anyone interested in the future of demography, the lexical data of Ngram Viewer provide the necessary historical and critical hindsight. They suggest that if demography becomes hemmed in by the system of publication specific to the “hard sciences”, it runs the risk of becoming isolated from society and culture. We invite readers to look at the curve of “publish or perish” in the French and English corpora of Ngram Viewer: this incitation may itself become perishable if it prevents us from seeing that there is much to life beyond the “impact factor”. To strengthen our fragile tie with the world in which we live, be it national or transnational, demography must remain attentive to social issues and reach out to sister disciplines. Only in this way, I believe, will it survive the twenty-first century.
The English translation of this article is slightly shorter than the original French version. Certain paragraphs of limited interest for non French-speakers have been removed, and others adapted for English-speaking readers.
Director of research, INED.
Correspondence: François Héran, Institut national d’études démographiques, 133 boulevard Davout, 75980 Paris Cedex 20, email: firstname.lastname@example.org
“Our ultimate goal is to work with publishers and libraries to create a comprehensive, searchable, virtual card catalog of all books in all languages that helps users discover new books and publishers discover new readers” <https://books.google.com/googlebooks/library/index.html> (consulted 4 July 2015).
This figure was obtained by compiling and analysing the catalogues of the entire world. The counting unit is the single “volume” identified by an ISBN number, corrected for duplicates and excluding maps, posters, microfilms and audio books. In total, and without the periodicals, an estimated 135 million volumes have been printed and conserved since Gutenberg, 165 million if periodicals are included (Taycher, 2010). For the sake of simplicity, the terms “book” or “document” will be used as acceptable equivalents of a “volume”.
The decade 1740-1749 also saw the publication of the 3rd edition of the dictionary of the Académie française, the 5th edition of the Dictionnaire de Trévoux and the 8th edition of Bayle’s dictionary (whose typeface is more legible). Diderot and d’Alembert’s Encyclopédie was published between 1751 and 1772.
Take the expression “age-specific fertility rate” which features in the digitized documents. Each year, Ngram stores in a series of separate tables the five words forming the expression (“age”, “-”, “specific”, “fertility”, “rate”), the four bigrams (“age-”, “-specific”, “specific fertility”, “fertility rate”), the three trigrams (“age-specific”, “-specific fertility”, “specific fertility rate”), the two tetragrams (“age-specific fertility”, “-specific fertility rate”) and the single pentagram (the complete expression), representing a cumulative sum of S5 = 15 columns of data.
The figures can be downloaded from the “var data” line of the query page source code. Users can simply copy this line into a spreadsheet with a column for each year.
Same order of magnitude as parts per million (ppm) used to measure dilution.
Queries can be written “France, (Alsace*40)”, “(France / 40), Alsace”, “Alsace / France” or “Alsace / (France + Alsace)”.
Copyright holders (authors in the English-speaking world, publishers in the French-speaking world) have lodged complaints against Google for digitizing documents without their authorization. The choice of “opt-out” (Google can digitize anything so long as the copyright holders do not complain) or “opt-in” (prior consent required) is still the subject of legal debate. The problem is most acute for the vast grey area of books which are still under copyright but no longer available for sale. See the instructive summary by Benhamou (2014, pp. 90-92).
These links do not give the immediate context of the expressions counted in Ngram Viewer, but the titles of the digitized books where they are found, with variable access to the full text. The research function does not follow the rules of Ngram Viewer, but those of Google Search: case-insensitive, expressions may straddle two sentences, automatic correction of approximate spellings, etc.
Only extermination was used more in 1945-1947 than in 1915-1918. By contrast, it was above all during the Great War that witnesses and contemporary commentators used the word holocauste(s). This religious term belongs to formal speech and refers to the total sacrifice of livestock in the Bible. Capitalized and written in the singular, Holocauste is a recent addition to French vocabulary. It was not until 1978 that it became widely used in reference to the extermination of Jews in Europe by the Nazis. Since the documentary masterpiece of that name by Claude Lanzmann (1985), the term Shoa(h), the official Hebrew word for this event, has become widespread in France, but it remains rare in the English-speaking world where “Holocaust” still predominates.
Pressat’s dictionary has a more normative ambition than the two other publications.
The figure is difficult to plot, given the multitude of ways in which the names of the various establishments are presented (acronyms with or without full stops between letters, with or without capitals, full names containing more than five words, etc.). For this reason, we have limited the search to the last ten years, when acronyms predominate.