As the leading preventable cause of death in France, smoking has been one of the main targets of public health policies. The prevalence of tobacco consumption depends on new ‘entries’ (initiation) and ‘exits’ (cessation). What survival benefits can be expected in the medium term from a decrease in the flow of new smokers and an increase in that of ex-smokers? This paper provides a model of their respective effects on smoking-attributable mortality, which can be used to guide health strategies.
1Smoking-attributable mortality (SAM) indicates the number of deaths caused by smoking and forms a large part of the public debate around tobacco consumption. In addition to highlighting the urgency of policy action, it also helps evaluate the effectiveness of policies that fight smoking (US Department of Health and Human Services [USDHHS], 2014). Preventing the initiation of smoking or encouraging its cessation require different interventions, about which public health officials must make critical decisions on how to allocate limited and competing resources. For this reason, we seek to help policymakers assess the cost-effectiveness of various interventions by simulating here the separate effects on SAM resulting from changes in initiation and cessation, thus establishing the optimal investment of public resources from a public health perspective.
2The standard method used to estimate SAM (Pérez-Rios and Montes, 2008; Tachfouti et al., 2014) unfortunately does not lend itself to simulating what-if scenarios. Instead, it uses two sets of readily observable data—the total number of deaths from smoking-related causes and smoking prevalence—then combines these observations with the share of deaths from each cause that can be attributed to smoking. The contribution of a specific risk factor (such as smoking) to a disease (such as lung cancer) is quantified using so-called ‘population attributable fractions’ (PAFs). [1] Estimating them is data-demanding, as they require estimating relative risks, which in our case are the mortality risks of smokers versus non-smokers. All estimates of mortality attributed to smoking, conducted in different countries, therefore employ the same PAFs, estimated from one major epidemiological study conducted in the United States (Cancer Prevention Study II [CPS-II]). The main issue with the standard method is that it calculates the PAFs for a specific population of smokers with specific smoking histories, i.e. specific ages at initiation and cessation and specific smoking intensity by age. Because smoking history summarizes the true exposure to risk better than the current status of ‘smoker’ or ‘non-smoker’, nothing guarantees that the same PAFs apply to a population with a different average smoking history.
3Observed prevalence concerns a stock of smokers at one point in time, but the same prevalence can result from very different average durations in status, e.g. 100% prevalence from ages 15 to 29 and 0% afterwards, compared to 33% prevalence from 15 to 59. This poses two problems. First, accuracy is problematic when a given population’s prevalence (from which SAM is calculated) is obtained from smoking history behaviours that differ from those of the cohort for which relative risks are estimated. This is why the standard method uses simulated rather than observed prevalence, due to the former’s compatibility with the number of deaths resulting from lung cancer, as observed in the population for which SAM is estimated. For this reason, it is called the ‘indirect approach’ (Peto et al., 1992). Lung cancer is chosen to generate adjusted smoking prevalence because the PAF for this disease does not vary much across populations (see Supplementary Material A). This study is concerned with the second problem, namely that the method accounts only for prevalence, not changes in duration and intensity when running scenarios to project the consequences of policies.
4To simulate the relationship between policies and SAM, we propose here a method that would link SAM to smoking duration and intensity, instead of prevalence. Our alternative method uses the same relative risks used to derive PAFs in the standard method, but in a very different way. Its main innovation is that it allows the analyst to run what-if scenarios to discover what the effect of changes in initiation rates would be relative to changes in cessation rates. [2] Although applying relative risks estimated for a population of smokers in the United States to smokers in France may presumably lead to the same flaw highlighted in the standard method, the relative risks used in our method are conditioned on behaviours and therefore closer to biological parameters, thus making them much less susceptible to the cultural variations of individual smoking histories in the United States and France. Thus, our method is less vulnerable to transposition than the standard method. Section I introduces our alternative method and the motivation for it through a literature review of mortality risk estimates based on duration (of smoking and abstinence) and intensity. Using these estimates of risk, Section II presents the method and data used. Section III applies the method to simulate various scenarios in which we see the effects on SAM for lung cancer in France as a result of changes in initiation and cessation rates. The final section presents our conclusions, describes our assumptions, and highlights the limitations of our estimates.
I – An alternative method based on risks according to smoking duration and intensity
1 – How mortality risk varies with duration and intensity
5The seminal study on smoking’s effect on health by Doll and Peto (1978) followed a cohort of British doctors for 20 years and modelled the effect of duration (proxied by age) and intensity (number of cigarettes per day) on the incidence of lung cancer, showing a strong duration effect and a somewhat smaller intensity effect. The authors concluded that epidemiological data confirm the biology of cancer developing cumulatively and in discrete stages. Their additional finding that smoking duration is a key factor in lung cancer incidence has been reproduced in other populations (Flanders et al., 2003; Rachet et al., 2004; Zhang et al., 2005) after controlling for potential confounders (such as body mass index, physical exercise, education) and using mortality rather than incidence as the outcome variable. Flanders et al. (2003) used CPS-II data to show that the risk of dying from lung cancer increases by a factor of 9 to 14 for males and 4 to 8 for females when smoking duration increases from 20 to 30 years.
6Epidemiological studies also demonstrate that quitting smoking at younger ages promptly reduces the risk of dying from lung cancer. Zhang et al. (2005) used a large population of middle-aged Canadian women to show that the hazard ratio of developing lung cancer decreases with earlier cessation when controlling for smoking duration and intensity. For instance, quitting before age 30 reduces the risk to the same level as never-smokers. Knoke et al. (2008) used CPS-I to show that cessation had a strong effect on lung cancer mortality, reducing the risk by half after 5 years if the smoker quits at 40 (after 10 years if they quit later) and by 90% after 15–20 years. Their study controls for the ‘quitting ill’ effect, in which some smokers quit after learning they have respiratory disease. Not controlling for this effect would overstate the relative risk of quitters.
7The evidence is therefore clear that lung cancer mortality is influenced by the duration of smoking and abstinence, and to a lesser extent by smoking intensity. Less is known about other smoking-related diseases, such as other cancers, chronic obstructive pulmonary disease (COPD), or cardiovascular diseases. In a meta-analysis of smoking’s effect on colorectal cancer, Liang et al. (2009) found a strong effect of smoking duration on both incidence and mortality, but they did not report on any cessation effect. Based on a meta-analysis, Forey et al. (2011) observed a higher incidence of COPD, chronic bronchitis, and emphysema among current smokers, with a strong effect of smoking intensity but no clear duration effect. Streppel et al. (2007) reported their results from a longitudinal study on smoking’s association with mortality (all-cause and cause-specific), which was conducted on men living in the Netherlands, born between 1900 and 1920, and followed-up from 1960 to 2000 (Zutphen Cohort Study; Keys, 1970). Controlling for competing risks, they find that 1 more year of smoking increases the hazard of all-cause mortality by 1.2%, and one more cigarette smoked per day increases it by 1.1%. Duration is the only driver of mortality for cardiovascular diseases, COPD, and lung cancer, since smoking until 40 and quitting at that age means a loss of 1.9 years of life compared to a never-smoker, but it implies a gain of 1.3 years compared to a smoker who quits at 50, and of 1.8 years over a smoker who quits at 60. The evidence on the effects of cessation for the US population is summarized in the 1990 report by the US Surgeon General, who found that after 15 years of cessation, all-cause mortality of ex-smokers relative to never-smokers is 1.0 for former light smokers (less than 20 cigarettes per day) and between 1.1 and 1.4 for former heavy smokers (more than 20 cigarettes per day) (USDHHS, 1990). [3] This review of the empirical literature establishes the role that duration and intensity play in lung cancer mortality and COPD. Overall, these two causes account for around 60% of smoking-related deaths and provide the best evidence for a causal link between smoking and mortality (PAFs of 0.82 and 0.79, respectively) (Godtfredsen et al., 2008; USDHHS, 2014; Ribassin-Majed and Hill, 2015; Bonaldi et al., 2019). Furthermore, they are proven to be highly sensitive to duration. This confirms that prevalence is not a sufficient statistic for risk exposure and legitimates the choice of a duration-based approach.
8It is therefore worth exploring the use of risk according to durations and intensity, in combination with data on duration and intensity to estimate and simulate the effects of changes in prevalence on SAM. Our new method is calibrated to mimic the actual number of deaths observed in reference epidemiological sources, which then allows us to run simulations based on changes in initiation and cessation behaviours.
2 – Exposure as a function of duration (flows) rather than prevalence (stock)
9The standard method has a considerable practical advantage in that it uses highly reliable data (number of deaths by cause and smoking prevalence in a given year). Its main weakness, however, is that it relies on the strong assumption that prevalence accurately represents exposure to risk (and can replace duration and intensity).
10Our alternative method therefore draws on the well-established ideas in the literature that the risk of dying from smoking depends on intensity and on the duration of both smoking and cessation, and that prevalence does not adequately describe duration and intensity. For these reasons, changes in prevalence can alter relative risks, and it is impossible to use the standard method to run scenarios for the effects of these changes on SAM. To illustrate, let us assume that smoking prevalence falls in the 35–54 age group due to an increase in that cohort’s proportion of smokers who quit at age 45. If the relative risk of dying is driven mostly by current smokers older than 45, the relative risk of the age category would decline. However, the standard method will keep the relative risk of the age category unchanged and only account for a decline in prevalence.
11This weakness has already been identified and acknowledged by the proponents of the standard method. According to the US Surgeon General report for 2004, ‘The burden of disease attributable to smoking is driven by those with long-term previous exposures, so unless smoking cessation among current smokers increases quite rapidly, SAM is not expected to decline substantially for many years’ (USDHHS, 2004). And, in the report for 2014, ‘Unless smoking behaviour (including cessation) is stable over time, cross-sectional SAM estimates do not accurately reflect the risks of past cohorts of smokers’ (USDHHS, 2014). This weakness is also the main reason why estimates of SAM outside the United States do not combine relative risks with in-country observations of prevalence but instead use a simulated prevalence based on a given country’s age-related smoking behaviours. In other words, the simulated prevalence is chosen to be compatible with observed mortality from lung cancer (Ribassin-Majed and Hill, 2015), which is in following with the indirect method, also known as the smoking-impact ratio or Peto–Lopez method (Peto et al., 1992; Peto et al., 1994; Ezzati and Lopez, 2003). This estimated prevalence [4] is then used to calculate SAM for all other causes of death. A more refined but conceptually similar way to use lung cancer mortality as a proxy for smoking history is to regress all-cause mortality rates on lung cancer mortality (and other factors), specifically by using the lung cancer mortality difference between smokers and non-smokers to indicate the damage from smoking and arrive at a point estimate of SAM (Preston et al., 2010; Rostron, 2010; Rostron and Wilmoth, 2011; Janssen et al., 2013).
12Using lung cancer mortality to reflect the extent of the smoking epidemic effectively produces a point estimate of SAM for one country under the assumption that most lung cancer deaths are due to smoking, as substantiated by a PAF of 0.82 (USDHHS, 2014). However, this approach cannot generate predictions or what-if scenarios to assist public policies. A better, albeit more demanding, approach combines mortality risks according to durations of smoking and abstinence and intensity (estimated using CPS-I data) with observed duration and intensity in the population of interest. Such a combination would allow researchers and policymakers to discuss and assess the effectiveness and costs of various interventions for reducing SAM.
13Our alternative method works as follows. For a given year, we know the proportions of current smokers, never-smokers, and ex-smokers in a given age–sex category. Additionally, we know the following about current and ex-smokers: their average smoking duration, average duration since quitting, and average smoking intensity. All these data are estimated from reconstructed smoking histories based on pseudo-cohorts in France, as explained in Section II.1. We then use the literature’s reported relative risks by age, sex, smoking duration, cessation duration, and intensity (Flanders et al., 2003; Knoke et al., 2008) to calculate for each age–sex category the probability of dying from smoking.
II – Data and method
14We apply our method strictly to lung cancer mortality because the reported and estimated risks according to duration and intensity for that specific cause of death are highly reliable. [5] While our method is sound in principle, it requires specific parameters that are crucial to its implementation (relative risks by duration and intensity, see Section II.2). In our two-step process, we first reconstruct data on smoking histories to retrieve smoking duration and abstinence as well as average intensity for the French population in a reference year (2010). By applying these values (duration and intensity) to a set of lung cancer mortality risks drawn from the epidemiological literature, we generate an estimate of SAM for lung cancer in France for 2010 and thus calibrate our parameters. The second step uses these calibrated parameters and duration data first to run a projection of SAM in France up to the year 2060 and then to produce a set of simulations based on what-if scenarios to assess the impact of initiation reductions and increased cessation rates on SAM in France over the next 50 years.
1 – Reconstructing smoking histories in France
Constructing life course prevalence rates for various pseudo-cohorts
15Unlike prevalence, duration distribution data are not published regularly and must be extracted from general population surveys. Thus, we use answers on current smoking status in a series of repeated cross-sections to generate duration distributions for several birth cohorts. [6] We discuss in our conclusion why we chose a pseudo-cohort method rather than use retrospective individual data on initiation and cessation. To construct our pseudo-cohorts, we pool eight cross-sectional surveys on health-related behaviours (1977 to 2010), conducted by the French national institute for health education and promotion (INPES, see Beck et al., 2007) and representative of the French population described in Table 1. These were called ‘Enquêtes CFES’ from 1977 to 1990 and have been known as ‘Baromètre Santé’ since 1992. These sources were preferred over other health-related surveys with larger samples (especially Enquête Santé, INSEE), mostly because the questionnaire and definition of smoking status have been consistent over the years.
Surveys used to reconstruct prevalence for various cohorts in France

Surveys used to reconstruct prevalence for various cohorts in France
16We were unable to access the raw data for the older waves of the survey (1977, 1981, and to a lesser extent 1986) and had to use published tables for these waves, which resulted in some inconsistencies across waves in smoking status definitions, smoking prevalence measures, and age categories. [7] Questions on smoking behaviours are similar across all waves of the Baromètre Santé, except for answer categories, in which some but not all waves distinguish between regular and occasional smokers. We could not use that information accurately and pooled all smokers into a single category for all waves. The main issue with consistency concerns the sample age range (minimum changing from 12 to 18 and maximum from 75 to 85) and age categories. Consistent age categories were reconstructed mostly by interpolation. Since the results published for the survey’s first waves are not weighted, we use unweighted data for all surveys to estimate smoking prevalence rates. For the surveys providing raw data, we compared prevalence rates based on weighted and unweighted data and concluded that there were no meaningful differences; i.e. within a given age–sex category, smokers have the same response rate as non-smokers. These data allow us to create a consistent collection of smoking prevalence rates by sex and age group over a span of 35 years (from 1975 to 2010) and to recreate age profiles of prevalence for 5-year birth cohorts, starting with the cohort born in 1936–1940. Because the surveys we pool are not conducted 1 year apart but rather every 5 years, we use the median age of intervals (e.g. 22.5 years for the age category 20–24) to interpolate our set of discrete measures (survey years and age categories) into continuous duration values.
Reconstructing smoking duration and abstinence for various pseudo-cohorts
17Based on these prevalence rates by age for various pseudo-cohorts, we reconstruct the smoking and abstinence duration distributions for seven birth cohorts, for which we have at least one measurement point before and after the pivotal age of 30, namely cohorts born between 1946–1950 and 1976–1980.
18It is essential here to set a constant average age of initiation for all cohorts. We chose 17.5 years in our reference scenario, as this is the observed age of initiation for French smokers (INPES, 2014). Published evidence shows that the age of initiation is highly concentrated between 15 and 20, with less than 5% of smokers having started before 15 and less than 15% starting after 20 (Forster and Jones, 2001, for England; López-Nicolás, 2002, for Spain; Legleye et al., 2011, for France [8]). For current smokers—those who report smoking in 2010, the year for which we calibrate our exercise—the duration of smoking status is simply their age (median age of their age category) minus 17.5, the average age of initiation. For instance, current smokers aged 45–49 in 2010 will be attributed a smoking duration of 30 years.
19For ex-smokers in 2010, we calculate the age distribution at smoking cessation for a given pseudo-cohort and by sex, based on the same cohort’s difference in prevalence rates between two consecutive waves of the survey. Cessation duration is the difference between their age in 2010 and the estimated age at cessation. We illustrate this with the 1951–1955 cohort, aged 55–59 in 2010. If the prevalence rate of those aged 40–44 in 1995 is P1 and that of 45–49-year-olds in 2000 is P2 (< P1), the proportion of quitters for the 1951–1955 cohort aged between 40–44 and 45–49 is calculated as the difference (P1 − P2). Since they were smokers at ages 40–44 and no longer at ages 45–49, we assigned them an average age at cessation of 45. The average duration of abstinence for this share (P1 − P2) of the 1951–1955 cohort in 2010 is then 12.5 years, meaning 57.5 (age in 2010) minus 45 (age at cessation).
20Because prevalence rates are not significantly different at ages 20–24 and 25–29 for most cohorts, we assume the quit rates before age 30 are 0 for all cohorts. This is corroborated by Legleye et al. (2011), who showed that less than 10% of smokers quit before age 30.
21By reproducing this calculation and aggregating these durations by age and cohort across all cohorts alive in 2010, we can produce the cross-sectional duration distributions in smoker and ex-smoker statuses.
Assumption about smoking intensity
22The same data allow us to reconstruct the distribution of the average number of cigarettes smoked per day for various cohorts (intensity). Pasquereau et al. (2018) published evidence showing that the distribution of the number of cigarettes smoked per day does not concentrate around a single number but instead spreads equally across possible values of consumption. A value of ‘15 cigarettes’ is both close to the true average and near the middle of the distribution (depending on the cohort). For simplicity, we use the present example to apply a uniform intensity to all cohorts and age categories, namely 15 cigarettes per day, which is the empirical average for most cohorts of French smokers. Treating this as our reference scenario, we test the sensitivity of our estimates and projections to average intensity and discuss alternative options in our conclusion. As discussed below, if intensity correlates with age of initiation or cessation, this assumption of universal intensity of 15 can have implications on the results of our simulations.
2 – Estimating mortality risk according to duration and intensity
23Estimates of current smokers’ lung cancer mortality risks according to smoking duration and intensity are derived from Flanders et al. (2003), who express the absolute risk of lung cancer as an exponential function of duration (D) and intensity (I) with parameters (α, β and γ) that vary by sex (s) and age category (x):
25These are complemented by estimates of lung cancer mortality risks according to cessation duration provided by Knoke et al. (2008), who estimate a decrease (denoted as f) in excess risk of lung cancer for ex-smokers compared to continuing smokers as a function of age at cessation (Ac) and the time since cessation (Tc): [9]
27Because the probability in Equation 2 is relative to that of an individual with the same characteristics (age and sex) who would have smoked and never quit, we use the probabilities generated by Equation 1 as the baseline and apply the estimated decrease parameter (f) to that number.
28Both estimates apply to people aged only 40–79, and the result we generate is thus a lung cancer SAM restricted to the population aged 40–79. This is a reasonable approximation, since almost no lung cancer deaths occur before age 40, and only a few thousand occur among those aged 80 and above.
29Having first calculated current smokers’ probability of dying with Equation 1, and then applying Equation 2 to their mortality risk to establish the probability of dying for ex-smokers of the same age and sex, we then combine each probability with the proportion of current and ex-smokers in that age–sex category. This yields an average probability of dying for that age–sex category. Next, we multiply that probability by the population in that age–sex category to generate a number of deaths attributable to smoking. This method allows one to simulate what-if scenarios by altering the smoking histories of cohorts (e.g. changing the smoking or cessation durations) to measure the effects of such changes on mortality.
30The next step is to calibrate the SAM observed in France. While Flanders et al. (2003) estimated a series of absolute mortality risks for current smokers, their true focus was on the relative effects of smoking duration and intensity (parameters β and γ) rather than baseline risk (parameter α). The reason for this is that baseline risk cannot credibly be extrapolated from their sample, while duration and intensity can, given that CPS-II comprises a self-selected sample of smokers less likely to die; it is nevertheless highly reliable for estimating the effects of duration and intensity on the relative risk of dying (Thun et al., 2000). Thus, we take the estimates for βs,x and γs,x for each s, x from Flanders et al. (2003) and combine them with a calibrated value of αs,x, denoted as αcs,x, which calibrates the number of deaths generated by the formula to the lung cancer SAM observed in France for that age–sex category in 2010. We used mortality figures published at the time by the Institut National de Veille Sanitaire (currently called Santé publique France), which used a demographic model to predict lung cancer mortality by age and sex for recent years from French population data for the year 2000 (INVS, 2005). We start with the number of projected lung cancer deaths among people aged 40–79 for the years 2005–2009, divide that number by 5 to get an annual number of deaths, and then apply a PAF of 0.82 [10] (USDHHS, 2014) to obtain only lung cancer SAM. Denoting the lung cancer SAM at age x and for sex s as SAMs,x, αcs,x we calculate as follows:
3 – Projection and simulations
32Using our method to generate lung cancer SAM over a period of 50 years (2010–2060), we apply hypotheses of unchanged smoking behaviours or of future changes in prevalence by age for various cohorts, specifically those resulting from either initiation or cessation shocks.
33This calculation involves the following three steps. First, we prolong the trends by projecting age-specific smoking prevalence rates to 2060, separately by sex and birth cohort. This projection is based on recursive smoothing in which, for a given birth cohort c, prevalence for age group a (observed in year N = c + a) results from smoking prevalence in the age group a − 1 for the same birth cohort c, which is affected by a reproduced evolution of smoking behaviour between age categories a − 1 and a in the three preceding birth cohorts c − 1, c − 2, and c − 3. [11] It is recursive in that each age category and cohort builds on preceding cohorts (years of observation), by which the projection unfolds recursively.
34Second, we apply these simulated prevalence rates of smokers and quitters to age distribution projections for the French population (Blanpain and Chardon, 2010), [12] which allows us to generate numbers of smokers by smoking and cessation duration in a given year N by adding all cohorts or, equivalently, age categories observed in that year.
35Finally, we replicate the calculation steps for SAM, by which we ultimately generate what-if scenarios that reproduce shocks to prevalence by age for a given cohort as a result of changes in either initiation or cessation behaviours. This allows us to predict future SAM evolution as a response to these shocks.
36The online supplementary material provides detailed information on the following methodologies: a step-by-step description of the overall SAM calculation (Supplementary Material D); calculations for smoking prevalence projections (Supplementary Material E); and the what-if scenarios (Supplementary Material F).
III – Results: lung cancer in France
1 – Smoking histories
37The age profiles of male smoking prevalence in France show a recent decline in total prevalence. It seems to be more the result of initiation among recent cohorts declining than cessation rates increasing among older ones (Table 2). Half of male smokers will have quit before age 54, at least for the three cohorts for which we have data past age 50.
38In each row of Table 2, we can follow the prevalence of smoking of a given cohort and calculate crude estimates of ‘quit rates’. Dividing the smoking prevalence of a given age group by the prevalence observed for the same cohort at its peak (either at ages 20–24 or 25–29) yields the proportion of smokers at peak consumption age who still smoke later in life. The complement to 1 of that proportion is the percentage of smokers at peak consumption age who will have quit at each later age for a given cohort.
Prevalence (in %) of smoking by sex and age of birth cohorts

Prevalence (in %) of smoking by sex and age of birth cohorts
Note: Grey cells required linear interpolations from other cohorts.39Quit rates for the 40–44 and 45–49 age groups are of particular interest, since epidemiological evidence proves that not quitting at this pivotal age of 40–45 years old significantly increases the likelihood of lung cancer. For males, quit rates at these ages have essentially increased over generations. The quit rate at ages 40–44 increased from 31% for the 1946–1950 cohort to nearly 40% for the next three cohorts. A similar pattern is observed for the quit rate at ages 45–49, which rose from 13% for the oldest cohort to 45% for the 1956–1960 cohort, before falling slightly for that of 1961–1965. Among females, the quit rate at ages 40–44 has remained stable around 30% for those born between 1946 and 1960, but it has increased to 38% for the most recent cohort (1966–1970). At ages 45–49, this rate has remained constant around 40% for the most recent cohorts. These numerical results are detailed in Supplementary Material G.
40Although these quit rates are inching in the right direction, they are still too low as more than half of ever-smokers continue to smoke at ages at which the risk of lung cancer increases exponentially.
41Based on similar raw data for our pseudo-cohorts, Figure 1 illustrates typical smoking duration patterns separately for males and females. The numbers of smokers for each smoking duration are expressed as a percentage of ever-smokers, where 100% represents the maximum stock of smokers observed before age 30. Cessation starts earlier for females than for males, while half the stock of ever-smokers has quit after 37 years of smoking for males but only after around 25 years for females.
2 – Smoking-attributable mortality in 2010
42The simulation under our central values for intensity and age of initiation produces a lung cancer SAM of 20,439 in 2010: 16,362 males versus 4,077 females; 12,261 current smokers versus 8,178 ex-smokers. This is very close to published SAM rates for France, as expected, since our calibration parameters are INVS-projected values (2005). After using a method developed by Ribassin-Majed and Hill (2015) to adjust for some deaths of unknown cause, Bonaldi et al. (2016) estimated between 19,920 and 22,395 strictly lung cancer deaths between the ages of 30 and 80 for 2013. These estimates are similar to ours, although they are for 2010 and between ages 40 and 80. The detailed results of SAM by age, sex, and smoking status in 2010 can be found in Table 3.
43Alternative assumptions for smoothing and interpolating prevalence produce lung cancer SAM values in the range of 19,904 to 20,799, representing a variation of −2.6% to +1.8% in our reference scenario. This suggests that our estimate is not highly sensitive to our assumptions for imputing missing or imprecise data. The results are more sensitive to the parameters for average age of initiation and smoking intensity. Varying intensity from 10 to 20 cigarettes per day yields a SAM of 16,509 to 23,886, i.e. a variation of −19% to +17% relative to the reference scenario. Using differential intensity by age and gender as reported in a recent survey (Enquête Santé et Protection Sociale, 2010; Dourgnon et al., 2012) does not lead to significant changes in observed lung cancer deaths, which are 20,041 versus 20,439 in the reference scenario. Conversely, varying the age of initiation between 13.5 and 21.5 years produces a SAM of 24,732 to 16,657, which represents a variation of +21% to −19% from the reference scenario. [13]
Typical smoking duration patterns among smokers

Typical smoking duration patterns among smokers
Interpretation: Reading from left to right, the second dash indicates that among male ever-smokers, 90% have smoked for at least 12.5 years (i.e. until age 30); that proportion falls to 80% for those who have smoked at least 17.5 years (i.e. until age 35).SAM due to lung cancer in 2010, re-evaluated by age, sex, and smoking status

SAM due to lung cancer in 2010, re-evaluated by age, sex, and smoking status
Note: Numbers rounded to nearest integer.3 – Projection to 2060 and what-if scenarios: SAM evolution if initiation and cessation were to change
44If prevalence rates are projected according to the observed trend (projection scenario), lung cancer SAM will increase rapidly up to a maximum value of 29,100 deaths around 2035. This average increase of +7% every 5 years reflects the fact that the generations of heavy smokers born during the 1950s and 1960s will be at maximum risk of lung cancer within the next 20 years, after which lung cancer SAM will plateau at around 30,000 deaths (see Figure 2). From 2040 onward, the projections strongly assume that prevalence will stabilize for future cohorts, as the prevalence rates are predicted recursively using the averaged rates of the three preceding cohorts.
Projected SAM due to lung cancer from 2010 to 2060 (reference scenario)

Projected SAM due to lung cancer from 2010 to 2060 (reference scenario)
45We now illustrate how the method can be used by simulating shocks to prevalence rates by increasing cessation (raising quit rates by 25%, 50%, 100%, and 200%) or by markedly decreasing initiation among teenagers (reducing initiation rates by 25%, 50%, 80%, and 100%), with these differential effects on mortality depending on whether the shock originates in initiation or cessation. Such shocks result from policies that alter behaviours, such as increasing taxation to reduce initiation, stringently controlling sales to youth, banning smoking in public spaces, and promoting health campaigns. Increased quit rates usually result from subsidizing cessation therapies and, to a lesser extent, by imposing higher taxes on cigarettes and bans on smoking in public spaces. The literature is quite clear that all these methods are somewhat effective in reducing initiation and/or increasing cessation (Levy et al., 2004).
46One may legitimately expect that duration and intensity also affect mortality due to causes other than lung cancer, and it is therefore essential to bear in mind that our scenarios and conclusions apply only to lung cancer mortality, which is expressed in numbers of deaths per year between 2010 and 2060. Our extreme initiation scenario is absolute (nobody starts smoking), whereas our extreme cessation scenario is relative (some smokers never quit).
47Figure 3 shows that the effect on mortality from reduced initiation would be null until 2035. Were one to extrapolate, the total number of lives saved (prevented SAM) over our simulation’s 50-year period varies between 10,000 and 41,000, with the former representing a 25% reduction in initiation and the latter an extreme scenario of no initiation at all (−100% initiation) in future cohorts.
SAM due to lung cancer from 2010 to 2060 if smoking initiation declines among youth, reference scenario vs. various reduced initiation rates

SAM due to lung cancer from 2010 to 2060 if smoking initiation declines among youth, reference scenario vs. various reduced initiation rates
48In contrast, the effect on SAM from increased cessation rates would begin as early as 2015, and total lives saved in this scenario would vary between 28,000 (increased quit rates of 25%) and 144,000 (tripling quit rates) over the next 50 years (Figure 4).
SAM due to lung cancer from 2010 to 2060 if smoking cessation increases after age 30, reference scenario vs. various increased cessation rates

SAM due to lung cancer from 2010 to 2060 if smoking cessation increases after age 30, reference scenario vs. various increased cessation rates
IV – Discussion and conclusions
1 – Supporting adult cessation versus preventing teenage initiation
49To the best of our knowledge, ours is the first study to produce smoking prevalence age profiles and their associated smoking and abstinence durations for birth cohorts in France. We find that most of this country’s reductions in prevalence result from a drop in initiation and, to a lesser extent, from higher quit rates. The only comparable evidence comes from the United States, where a substantial reduction occurs in average smoking duration (Mannino et al., 2001), with a clear linear pattern in the decline in age at which a cohort’s smoking prevalence is at half the peak value (Pierce and Gilpin, 1996). Anderson et al. (2012) confirm the recent increase in cessation rates at each age for cohorts born after 1965 in the United States. Because a cross-sectional observation in France shows a clear educational gradient of age at cessation as the more-educated quit sooner (Legleye et al., 2011; Bricard and Jusot, 2012; Khlat et al., 2016), declining average age at cessation can reasonably be projected for future cohorts.
50Our method lends itself to running simulation-based estimates of lung cancer SAM so that analysts can produce what-if scenarios. These simulations and scenarios can help policymakers decide where to allocate economic resources for smoking reduction when assessing whether to prevent youth initiation or help smokers quit. While some often argue that both avenues should be pursued simultaneously (Myers, 1999), it remains true that one has to make a decision relative to the next unit of resource to be expended and, therefore, whether to fund cessation programmes or those targeting anti-initiation.
51Our projections of SAM to 2060 under various scenarios show that an effective strategy would be to increase cessation rates. Specifically, an intervention that doubles quit rates at all ages will prevent 5,000 to 14,000 lung cancer deaths each year, whereas preventing all teenagers from starting smoking would prevent 17,000 deaths in 2060 but almost none before 2040.
52This finding is certainly quite intuitive, being the result of different time horizons of policy tools (preventing teenage initiation vs. encouraging adult cessation), and our approach provides numerical evidence on the trade-off between these policies, which altogether quantifiably substantiates that financial support should favour adult cessation rather than teenage prevention programmes (Hill, 1999). Such policy recommendations would ultimately have to rely on a cost-effectiveness analysis that compares not only outcomes (as our method does) but also the costs of both approaches. While tools used to prevent initiation (higher taxes and controls on sales to minors) are widely considered inexpensive because they incur no direct cost to the public purse, they do generate indirect costs in the form of policing contraband and illegal sales to youth. Additionally, they are often associated with health promotion campaigns that incur direct costs. Admittedly, cessation therapies cost more, and a next step would be to conduct a full-fledged cost-effectiveness study of both approaches. Thus, we would like to restate that our current goal is merely to illustrate how our method can be used to run the type of analysis conducted here, which is not possible using the standard method.
2 – Methodological choices: limitations and potential improvements
53While our method’s principle is sound and adds to the standard method in that it allows analysts to conduct what-if scenarios that can help policymakers, something the standard method is not meant or able to do, we acknowledge that it depends crucially on how it is applied, which in turn hinges on assumptions and data availability. This new method assumes that duration and intensity—not mere prevalence—determine SAM and that, furthermore, independence exists between the main determinants of mortality, namely intensity, duration, and age at cessation. Specifically, our formulae assume that duration and intensity are independent and that smoking intensity remains constant for all birth cohorts and at all ages. Yet, it is plausible that heavier smokers tend to smoke longer and have a harder time quitting than lighter smokers. If so, we underestimate SAM and, more importantly, understate the effect of cessation therapies relative to decreases in initiation rates. Only panel data following smokers over a long period could provide estimates of the association between smoking duration and intensity at various ages, and this interdependence is policy-relevant information whose effect can be tested by running scenarios with our method. Considering all potential joint probabilities of dying would require a much more complex approach, similar to the abovementioned microsimulation models developed in Schultz et al. (2012) and Hazelton et al. (2012), whereas our method serves as a tool for running simple simulations with considerable flexibility—although the process may sacrifice some realism and possibly precision.
54This new method is also more data-demanding than the standard one. Our results depended on a series of choices and assumptions for complementing the available data, which we discuss below.
Reconstruction of pseudo-cohorts
55An alternative to recreating pseudo-cohorts from cross-sections would use retrospective questions on the ages of initiation and cessation in a single cross-sectional survey, as in Bricard and Jusot (2012) and Bricard et al. (2015). This is certainly a valid option, but we preferred the pseudo-cohort approach for the following reasons. First, retrospective questionnaire respondents might err in reporting their ages of initiation or cessation. Kenkel et al. (2003) showed reported age changes across waves of a longitudinal survey among 10% of respondents, even though reporting errors are small on average. Answers on current smoking status are more reliable. Second, smokers who died young cannot report on their past smoking habits, a potential issue when studying mortality and also one of proven empirical importance when establishing prevalence among those aged 60 and over (Christopoulou et al., 2011). Pseudo-cohorts lose less information on individuals who died prematurely. Third, retrospective data allow one to reconstruct duration for only a limited number of cohorts (those alive at the time of the survey), while our simulations seek to understand changes in smoking behaviours across as many cohorts as possible. Even with a large sample size (such as the 30,000 respondents in Bricard et al., 2015), the retrospective method requires pooling many cohorts to obtain reliable estimates, as in Bricard and Jusot (2012), who pooled 15 different birth years. Our pseudo-cohorts are only 5 years long. Fourth and finally, retrospective data cannot produce joint distributions of duration and intensity—i.e. that a given cohort dedicated D1 years to smoking I1 cigarettes per day, D2 years to smoking I2 cigarettes per day, etc. This is because intensity is usually known only for current smokers and current smoking behaviour. The present application does not generate such joint distributions, but they are theoretically possible using pseudo-cohort methods, [14] which can simulate the effects of gradual withdrawal from smoking rather than of abrupt cessation. Notably, our reconstructed histories based on pseudo-cohorts are similar to those obtained by the retrospective method (Bricard et al., 2015). [15] Our pseudo-cohort approach also confirms that prevalence at age 40 tends to decrease in more recent cohorts, due mostly to lower initiation but also in part to the higher probability of quitting at younger ages.
Identification of smokers
56We pooled both regular and occasional smokers into a single category to calculate our prevalence rates, which is unusual. However, since we are interested here in risk at the cohort and not individual level, not making this distinction in our calculations would be damaging if and only if the proportion of occasional smokers changed dramatically over time. The empirical results for years that make this distinction in the Baromètre Santé surveys (1995 to 2010) show that this proportion is stable over time (Supplementary Material C, Table C2). Changes in the proportion of occasional current smokers could be reflected in our simulations through changes in the average intensity parameter, although we do not attempt to do so in this particular exercise.
Relative risks conditional on unobserved behaviours
57Relative risks conditional on smoking duration and intensity may not be perfectly structural parameters either, as they are still conditional on unobserved behaviours. Our method assumes that the US population’s estimated relationship between lung cancer mortality risk and smoking duration and intensity applies to the French population, which may not be too strong of an assumption given that the underlying mechanisms are essentially biological. These relationships are reassuringly stable over time in the United States, as observed lung cancer mortality is reproduced for birth cohorts other than those used for the original estimates (Hazelton et al., 2012; Schultz et al., 2012). Transposing these estimates to France might be less feasible, for instance, if the French smoked different types of cigarettes (as they had in the past). Another difference between French and US smokers is harder to both dismiss and confirm, namely that current smokers may self-select (i.e. start or continue smoking) differently in France than in the United States. One study suggests that quitting is not the result of a random selection process but is linked to the smoker’s knowledge of their own health status (Mehta and Preston, 2012), therefore making it possible that different populations act differently upon such information—although that study’s results are contradicted by a more robust US study using individual-level data (Lahiri and Song, 2000). We are unaware of any comparable study conducted on the French population and can only acknowledge that this could jeopardize the validity of our exercise. It may be possible that the selection process works differently in that French smokers diverge from US smokers in how they use their own health information for decisions on quitting or intensity, in which case the relative risks we use (drawn from US data) are not perfectly conditioned on behaviours and could differ across countries. We believe that such a threat to the validity of our method is minor compared to the one afflicting the standard method, which is its strict reliance on current prevalence. Similarly, if public health campaigns are simultaneously more likely either to prevent occasional or low-intensity smokers from starting smoking or to convince them to quit before it is too late, preventing initiation may be even less efficient than our prediction. A decrease in initiation would then be followed by a decrease in cessation rates 25 to 30 years later, as the stock of smokers would comprise fewer marginal smokers. We know too little about the determinants of ever starting smoking and their relationship with the ability to quit.
58Our study introduces a new method to estimate and simulate SAM. It answers questions on the respective impacts of increasing cessation and decreasing initiation, questions that the standard method cannot address. The method is practical, does not require running complex microsimulations, and offers the potential to greatly inform public health decisions. The application we present here is limited to the case of lung cancer in France, but surveys such as the CPS-II provide data that may help bring to light the relative risks conditional on duration and intensity for other causes of death linked to smoking (COPD, for instance). What is more, the limitations we emphasize are not intractable, and we encourage overcoming them with empirical studies on the determinants of quitting or initiating smoking, which can refine both our estimates and, more importantly, our simulated what-if scenarios.
Acknowledgements
We would like to thank François Beck and Romain Guignard (OFDT) for their generous help in providing access to individual data for recent years, which allowed us to create our own age categories based on the surveys’ continuous age variables. We gratefully acknowledge the help of Thierry Rochereau and Stéphanie Guillaume (IRDES), of the Enquête Santé et Protection Sociale survey, and PROGEDO Diffusion for granting us access to the 1980 version of the Enquête décennale Santé (INSEE).For their helpful comments on previous versions of the manuscript, we thank Damien Bricard and Florence Jusot, as well as three anonymous referees for their suggestions that helped improve this paper.
We are grateful for the financial support of €8,000 from Pfizer Europe at a preliminary stage of this study.
As a potential conflict of interest regarding tobacco consumption, we would like to disclose the following: the first author was a smoker for 10 years and quit 30 years ago; the second has been a heavy smoker for the last 25 years and is trying hard to quit. We have no other conflict of interest to declare.
Notes
-
[1]
A formal derivation of this standard method is provided in Supplementary Material A. https://www.cairn.info/docs/population-grignon-renaud-supplementary-material.pdf
-
[2]
Microsimulation is a more sophisticated version of our method and has been proposed in Brønnum-Hansen and Juel (2000), Holowaty et al. (2002), Hazelton et al. (2012), and Schultz et al. (2012). However, our method is more straightforward and provides more flexibility for analysts interested in contributing to the health policy debate.
-
[3]
Table B1 (Supplementary Material B) summarizes the studies discussed in this section.
-
[4]
This may differ substantially from the observed prevalence, for instance as in Japan, where male mortality from lung cancer is much lower than what the observed prevalence would suggest (Funatogawa et al., 2013).
-
[5]
However, the CPS-II study provides all the data necessary to estimate risks according to duration and intensity for either (a) each smoking-related cause of death or (b) all causes of death combined.
-
[6]
We do not claim that our description is definitive and welcome other smoking history estimates of various French cohorts. The numerical application to France is merely illustrative, and we invite researchers with access to better data or more efficient duration estimates to run their own what-if scenarios for various populations by applying our method to those values.
-
[7]
These inconsistencies and all workarounds for reconstructing smoking prevalence time series are detailed in Supplementary Material C. Using ancillary data sources and pooling several surveys, we ran crude sensitivity analyses that confirm our results do not suffer from lack of precision, since SAM estimates vary by 1% to 3% only across variants (see Supplementary Material H).
-
[8]
Between 7% and 14% of male smokers start after age 20, depending on cohort and education level. Among female smokers, 50% started after age 20 for older cohorts (1930–1944), 25% for the 1945–1964 cohorts, and 15% for the 1965–1987 cohorts.
-
[9]
Supplementary Material B provides additional details on the different parameter values by sex–age group for Flanders et al. (2003, Box B2) and on Knoke et al.’s (2008) numerical application of their main equation (Box B3).
-
[10]
We derive this PAF from US data in the CPS-II study and implicitly assume that lung cancer PAF is similar in France. This is certainly not entirely true, precisely because lung cancer PAF depends on the histories (duration and intensity) of US and French cohorts of smokers. However, data presented in Peto et al. (1992) for all developed countries indicate that average lung cancer PAF is close to 0.82. This varies by sex (from the 0.60s for women to the 0.90s for men) and to a lesser extent by age, suggesting that more refined simulations by age and sex must rely on different PAF sets.
-
[11]
On the notation here: if, for example, a represents the 45–49 age group, then a – 1 is the 40–44 age group. The same applies to birth cohorts. If c refers to people born in 1951–1955, then c – 1 is the 1946–1950 birth cohort.
-
[12]
Our numbers are derived from a baseline scenario of 2007–2060 population projections that assume the total fertility rate will remain very high (1.95) until 2060 (Blanpain and Chardon, 2010).
-
[13]
See details in Supplementary Material H, specifically Figure H2.
-
[14]
This extrapolation would rely on available data for smoking intensity distributions, which have changed somewhat over time but remain centred around 15 cigarettes per day for most cohorts. Generally, approximate distributions would be reconstructed by intensity categories (e.g. < 10, 10–14, 15–20, 20–24, 25+) and then applied non-parametrically to various birth cohorts.
-
[15]
Graph 1 in their work confirms that prevalence peaks at age 20 and declines afterwards for all sex and birth cohorts, except for the oldest female cohort—just as it does in our pseudo-cohorts.