1Sub-Saharan immigration became a major concern in Europe in the first decade of the twenty-first century. Images of migrants attempting to scramble over barbed-wire fences in Ceuta and Melilla in 2005, followed later by footage of brightly painted boats being hauled ashore on the Canary Islands, captured the attention of the general public and political decision-makers. The idea of an “African invasion” gained in currency, despite quantitative analysis showing that sub-Saharan migrants accounted for a minority of migrant flows and populations in Europe (de Haas, 2008; Lessault and Beauchemin, 2009). But the fact remains that African migration has long been under-presented in international migration research (Grillo and Mazzucato, 2008; Hatton and Williamson, 2003; Lucas, 2006). The goal of the Migration between Africa and Europe project, or MAFE project for short, was to collect quantitative data with a view to shedding new light on African migration patterns, their causes and consequences. Addressing the classic methodological problems facing the designers of international migration surveys, this article presents the approach adopted by the MAFE project.  With survey methods remaining often uncertain and ill-documented in this research field, the aim is to explain and discuss our methodological choices so as to help future survey designers go further in their quest for new solutions.
2Migration is by no means an uncharted field in socio-demographic research. Some previous surveys served as invaluable sources of inspiration for the MAFE project. Two major characteristics of the project were inspired by the Mexican Migration Project (MMP) (Massey, 1987), namely its transnational sample, with data collected in major urban regions of both Africa and Europe, and its retrospective nature, with the collection of quantitative life histories. Previous life history surveys carried out in Europe and Africa served as a starting point for the design of MAFE event history questionnaires (Antoine et al., 1999; Poirier et al., 2001). Lastly, the sampling strategy was based in part on that of the project Push and Pull Factors of International Migration (Groenewold and Bilsborrow, 2008).
3The design of a survey naturally depends on its scientific objectives. The aims of the MAFE project are as wide-ranging as those of the MMP and the Push-Pull project. The idea is to produce data that can be used to analyse migration trends, causes and consequences at micro level. The project’s founding assumption is that migration should not be seen as a one-way flow from Africa to Europe, and that return migration and transnational practices are important and need to be understood in order to develop appropriate migration policies. That idea is conveyed in the name of the project, which addresses migration between Africa and Europe rather than from Africa to Europe. The idea also justifies the project’s transnational approach that consists in conducting quasi-simultaneous surveys in three origin countries and six destination countries (Table 1). More than 4,000 household interviews were completed in Africa and over 5,400 individual life history questionnaires (also called biographic questionnaires) were filled in for migrants interviewed in Europe, and for returnees and non-migrants interviewed in Africa.
Countries included in the MAFE project
Countries included in the MAFE projectInterpretation: 19,370 Congolese lived in Belgium in 2010, representing 14.8% of the Congolese living in Europe and 1.7% of the Congolese living outside DR Congo. In Belgium that year, the Congolese accounted for 1.7% of the immigrant community.
4This article explains how the MAFE project surveys were designed. The first section looks at how migratory experiences were recorded, showing how the concepts of “migrant” and “migration” were operationalized when selecting the respondents and designing the questionnaires. The second section presents the nature of the data collected and highlights the need for longitudinal, multi-thematic and multi-level data comparable in time and space. The third and final section focuses on sampling problems, which can prove particularly complex when dealing with international migration. Descriptive in nature, this article seeks to remedy the lack of factual data on the design of surveys on international migration. Highlighting the project’s innovative aspects, while also pointing up the limits of the data collected, it is also an invitation to make use of the MAFE project data.
I – Recording the migration experience
5The terms “migrant” and “migration” obviously need to be defined before carrying out a survey aimed at studying the trends, causes and consequences of international migration. The problem is that no standard practice exists in the field, with each new survey shifting the definition of the two concepts. The MAFE project initially followed the recommendations of the statistical offices of international organizations (including the United Nations and the European Commission) that define an international migrant as a person having stayed for at least 12 consecutive months in a country where they were not born. This standard definition does not explain how to record international emigrants when surveying in the origin country, a point which MAFE had to clarify, and which we will address later. Also, while adopting this definition of international migrant, the MAFE project aimed not just to look at “long” stays (of more than a year) outside birth countries but also to explore forms of infra-annual mobility reflecting aborted migration projects (migrants unable to stay in the country of their choice), the complexity of migration itineraries (migrants travelling through several countries before arriving or failing to arrive at their destination) and preparations for a long-term stay via short-term stays. Lastly, the fundamental objective of the MAFE project is to show the reversible and possibly repetitive nature of international migration – an objective that requires more than a simple record of migrants’ latest movement. Given the breadth of our ambition, the MAFE project adopted a variety of viewpoints to record migrations. Two questionnaires were designed to provide complementary views of individuals’ migration experience (Table 2).
Level of information on international migration experience by type of MAFE survey(a)
Level of information on international migration experience by type of MAFE survey(a)(a) Mothers and fathers living abroad were not included in the MAFE-Senegal survey.
Note: For purposes of comparison, the MAFE surveys also include non-migrants in the origin countries. They are not mentioned in this table. The questionnaires are available in English, French, Spanish and Italian on the MAFE project site: http://mafeproject.site.ined.fr/en/
Which migrants are included in the MAFE surveys?
6Data on international migration are commonly collected via surveys of households in the countries of origin. Yet no standardized methodology exists, and each survey makes its own definition of the emigrants it includes in household questionnaires. Some use a social obligation criterion, as in the Push-Pull questionnaire, which documents “those who currently live elsewhere but whose main commitments and obligations concern this household [the surveyed household] and who are expected to return to this household or be joined by their family in the future”. Others, such as the NESMUWA surveys (Network of Surveys on Migration and Urbanization in West Africa), use place of residence criteria which document people having previously lived at least three months in the household and having lived abroad for at least six months at the time of the survey (Bocquier, 2003). Still others focus on family relationships, including the MMP, which identifies all the children of the household head regardless of their place of residence, be it in Mexico or abroad. This last option has a major advantage in that it is based on permanent relations (family ties) that are constant over time. In contrast, the notion of household, on which the other surveys are based, is problematic when the information recorded on the household does not correspond to the situation at the time of the survey. 
7In the light of these studies, and taking account of the analysis objectives (to establish migration trends, understand transnational family forms and study the effects of migration on domestic economies), the MAFE project adopted a combined approach; its household questionnaire includes not just the household members but the following individuals who can be considered as “associated” with the household: 
- all children of the household head living outside the household, independently of their place of residence (including those who are deceased). These may or may not be international migrants. This category therefore includes internal migrants;
- household members’ partners living abroad, and (for MAFE-Congo and MAFE-Ghana only) the household members’ mothers and fathers living abroad;
- all the other people related to the household head or his or her partner living abroad and in “regular” contact with the household in the 12 months preceding the survey. 
8Overall, a relatively large number of international migrants were recorded, with 44% of the households surveyed in Ghana reporting the existence of at least one international migrant, 47% in Senegal and 63% in DR Congo (Mazzucato et al., 2013). When working on this broad and composite population of migrants mentioned by the households, it is important to keep two things in mind. First, the people belonging to these three categories are, by definition, not “members” of the household that reported their existence because they do not live under the same roof. Second, the three categories of people constitute an extremely heterogeneous population. Some of them are systematically recorded regardless of their place of residence, including in the origin country (children of the household head). Others are recorded only if they live abroad (partners or relatives of one of the household members). And still others are included in the sample on the condition that they have maintained relations with the household in the last year (other people related to the household head or his/ her partner). Depending on their research objective, the users of MAFE data need to select the individuals to be included in their analyses and decide whether they want to focus purely on permanent household members, on international migrants or on another particular group. For example, the migration rates cannot be calculated on the basis of all the individuals mentioned in the household questionnaire. The same categories of people must be found in the numerator (migrants) and the denominator (people exposed to the risk of migration). This is the case for the children of the household head (recorded regardless of their place of residence) but not for the other categories, recorded simply because they are abroad.  To take another simple example, if the researcher is interested in the composition of households in the major agglomerations of DR Congo, Ghana or Senegal, it may be in their interest to exclude all the individuals who, though recorded in the survey, are not, strictly speaking, members of the household. And if the research consists in analysing remittances, the survey population may be limited to international migrants (Rakotonarivo et al., 2013).
9The migrant sample constructed on the basis of household surveys has two advantages. First, the relatively extensive method of recording international migrants serves to limit the conventional bias of surveys in the origin country, in which it is always possible to miss migrants who have left with their entire household (migrants that no-one can report in the origin country). Second, the household data include all international migrants, independently of their destination. The household databases of the MAFE project include migrants who may live anywhere in the world. This is not true for the individuals recorded in the life history survey, since the migrant sample is limited to the countries in which the data were collected (Table 1 and Table 2). For obvious financial and logistical reasons, it was impossible to carry out the individual surveys in all the countries that receive Congolese, Ghanaian and Senegalese migrants.
10The migrant populations in the household and life history surveys do not differ merely by the variety of the destinations included. While in the household surveys migrants are recorded independently of their age, place of birth and nationality, these variables were used as selection criteria in the life history survey. These criteria were established in the same way for all individuals, regardless of their country of residence (DR Congo, Ghana or Senegal) and migrant status (current migrant, return migrant, non-migrant), so as to ensure the greatest possible homogeneity between the transnational samples. The selection criteria are as follows:
- the respondents are aged 25-75 at the time of the survey, the lower age limit serving to ensure that the life histories are sufficiently eventful;
- the respondents were born in one of the origin countries targeted by the survey (DR Congo, Ghana, Senegal), the place-of-birth criterion serving to exclude immigrants from the African samples and the children of immigrants from the European samples;
- the respondents have (or had) the nationality of the origin country. This criterion is used, alongside the place of birth, to exclude the children of immigrants in African countries (for example, children born in Senegal to French parents);
- in Europe, migrants are included only if they left Africa at the age of 18 or over. This criterion serves to harmonize the sample by focusing on adult migration.
Migration experiences recorded in the MAFE questionnaires
11The data from the household and life history questionnaires present complementary views of migration. The data collected with households in Africa have extended (and unbiased) coverage in terms of destinations, but the variables describing the experience of international migration are limited because the information is not obtained from the migrants themselves. Conversely, the individual life history data are limited in terms of destinations (at the time of the survey) but highly detailed in their description of international mobility (Table 2).
12The household questionnaire essentially focuses on the individuals’ most recent migration experience, i.e. their last departure and their situation at the time of the survey. It also includes a few questions on first departure and return, which can be used to reconstruct migration trends (Schoumaker and Beauchemin, 2015). An additional module focuses on remittances made by the international migrants to the household. The information collected can thus be used to study relations between Africa and Europe through a description of the flows of people and goods in both directions (departure and return).
13The information in the life history questionnaire, collected through the migrants themselves (and not proxy respondents), is much more detailed and reliable than that of the household questionnaire. It includes the entire migration history of the individuals, from their birth up to the time of the survey, which can be used to describe complete migration trajectories, including circular movements or complex trajectories between countries of origin and of residence at the time of the survey. It contains information not just on the respondent’s long stays abroad (of at least 12 months, as in the household survey) but on their short stays as well. These data can be used, for example, to show the growing complexity of the itineraries taken by the migrants entering Europe. In the three origin groups, a growing proportion of respondents have transited via other countries before settling in the country where they were surveyed (Schoumaker et al., 2013a). The life history questionnaire also includes questions on migration “attempts”  and on intentions to settle in the countries of destination. For example, the data shows a decreasing trend in the proportion of Senegalese and Congolese migrants in Europe who plan to return to their country of origin (Flahaux, 2013).
14Not all forms of mobility are described in the same detail. Short stays (of under one year) for business or leisure purposes that are not connected to a long-term migration project are simply recorded (year and country). However, short stays as part of a migration project to settle outside the origin country (transit stays and interrupted stays in which the migrant intended to stay in a country but in the end was obliged to leave) are described in as much detail as stays of over one year through a set of modules including questions on the organization of each journey (itinerary taken, persons who decided on the journey, persons who financed the journey, persons accompanying the traveller, etc.), on the conditions of integration in each destination country (legal status, language proficiency, use of public services, etc.) and on the relations maintained with the origin country during each migration spell (remittances to relatives or friends, participation in community groups, community-based investments). The MMP questionnaire served as a basis for the modules describing migration experiences. But the questions were extensively revised and extended, since migration between Africa and Europe is much more complex than that between Mexico and the United States. While Mexican migrants head almost exclusively to a single destination country, African migrants target a broad range of destinations. Second, while a single border separates Mexico and the United States, African and European countries do not share a border, which means that the trajectories taken by African migrants to Europe can be considerably more tortuous than a “simple” border crossing. Taking account of the complexity of migration itineraries was one of the challenges of the MAFE life history survey.
II – Comparable, retrospective, multi-level and multi-thematic data
15While the MAFE project addresses international migration, it does not focus purely on migrants. This is because understanding the experience of migrants and the causes and consequences of their departure (and return) hinges on comparing them with people who have not migrated. The challenge and difficulties of surveys on international migration lie specifically in including the relevant comparison groups in the sample. For example, to understand the determinants of departure from a given country, the people who leave (and sometimes return) have to be compared with those who stay behind. Similarly, migrants and non-migrants must be compared in order to study the effects of international migration on individuals’ economic and family histories. This requirement of comparison has been well established in the literature (Bilsborrow et al., 1997; Massey, 1987; Rallu, 2008) but raises significant methodological problems, since it involves comparing people living in different countries and – a further difficulty – comparing them at a particular point in time which, in general, is not that of the survey. The objective of producing comparable data between different migration flows, involving multiple origin and destination countries, makes survey design even more problematic.
16To study migration, longitudinal data are needed. To understand the factors behind departure, for example, the situation of the migrants just before their departure must be compared with that of non-migrants at the exact same time. Comparing them at the time of the survey (sometimes years after migrating) will not provide any useful information on the determinants of migration. If we wish to compare individuals in the past and not only at the time of the survey, then longitudinal data must be collected. Two solutions are theoretically possible: either a panel survey involving repeat observations of the same individuals and households over time, or the collection of life histories on a one-off and retrospective basis. The first alternative was not possible for the MAFE project as not enough time or resources were available to build a panel with the timeframe required to understand changes in the medium term or to reconstitute past migration trends. Also, gathering panel data for transnational samples that include mobile and potentially vulnerable people (including undocumented individuals), poses substantial practical and methodological problems. The MAFE project, then, is based on retrospective data. Such data may have certain limitations, notably related to recall effects, but past experience has shown that high-quality data can be collected in this way using appropriate tools (Antoine et al., 1987; Freedman et al., 1988; GRAB, 1999).
17While the retrospective approach is not entirely absent from the household questionnaire, which contains a few time-specific questions, it is mainly applied to the life history questionnaire, which is designed to retrace the biographies of individual respondents in a detailed and highly standardized manner. The form of the MAFE life history questionnaire is largely based on that of biographic surveys already carried out in France and Africa (Antoine et al., 1999). It includes two separate tools: an “Ageven” grid and a book of thematic modules containing the precise questions.  The Ageven grid – Ageven being an acronym for Age Event (Antoine et al., 1987) – is an invaluable tool for establishing reliable dates for the respondents’ life events, such as migrations, unions and changes in employment. Using the grid, the interviewer and respondent can refer to exact years, the respondent’s age or other landmark events to produce a detailed inventory of his or her life history. This collection technique makes it easier to recollect events and improves dating consistency.
18Once the data have been collected and recorded, they are arranged in the form of thematic person-period files (one file per questionnaire module). Unlike more conventional databases in which each individual represents one line in the file, a single respondent can appear on several lines. For instance, in the file on international migration spells, each person is present as many times as he or she begins a stay in a new country. Each stay is listed on a new line, with the columns showing the variables describing the migrant’s living conditions. These data are designed to be used with longitudinal analysis methods, including sequential and event history analysis. 
Multi-thematic and multi-level data
19Studying the causes and consequences of migration requires information on aspects other than migration itself. Five of the 17 questions in the life history questionnaire focus on the respondents’ family history (unions and children), economic history (training and occupation, investments) and residential history. These thematic modules, a mainstay of life history surveys, provide a series of variables (having a child, forming a union, investing in a company, etc.) that can serve as dependent variables to examine the socio-demographic and economic consequences of international migration  or, inversely, as explanatory variables to study the migration process itself (González-Ferrer et al., 2014).
20Besides individual factors, migration depends on family factors as well as institutional and structural factors at community, regional, national and even international levels (Massey et al., 1993). While not having data on all those levels, the MAFE project nevertheless collected contextual information. In the life history questionnaire, several variables capture changes in the respondents’ social environment. For example, the module on residential history gathers retrospective data on the subjective well-being of each household in which the respondent has lived. For each residential spell, the interviewees were asked whether they had enough to live on and whether their living conditions were better, worse or the same as those of the other households in their village or town. An entire module is dedicated to describing the respondents’ migrant networks, and was used to verify and expand theories on the role of social capital in the migration process (Liu, 2013; Toma, 2012).
21At macro level, a contextual database was built to collate existing series of economic, socio-cultural and political variables for each project country. In addition, an original database on immigration policies, ImPol, was developed for MAFE-Senegal, coding international migration control measures each year for Spain, Italy and France (Mezger and Gonzalez-Ferrer, 2013). Community-level information could not be included in the MAFE survey system, however. A specific survey was initially planned but came up against two obstacles. The first of these was conceptual, as the idea of “community” is difficult to translate into operational terms in African urban environments (MAFE’s survey area), where it is hard to establish the boundaries of communities, particularly in the continent’s fast-changing large cities. The second obstacle was of a practical and methodological nature. Collecting community data to be included in a life history survey involves administering the survey in all the places mentioned in the respondents’ residential histories (and not just where the individual data are collected), failing which the relationship between community context and migration cannot be correctly established (Schoumaker et al., 2006). A community-based retrospective survey was therefore impracticable, given the large number of data collection locations and the high costs involved. 
The challenges of comparisons over space and time
22The longitudinal nature of the MAFE project called for particular care in designing questionnaires that have meaning for all respondents in all contexts, i.e. across countries and over time. The concepts used had to apply equally well to a Congolese man having stayed in his country in the 1970s and a Senegalese women living in Italy in the 1990s. In short, a “one-size-fits-all” solution had to be found for the MAFE life history questionnaire design. Some concepts can be easily transposed because they are universal or because methods of comparison have already been identified and are widely recognized, as in the fields of education and socioeconomic status. Other concepts raise considerable problems because they are intimately linked to a specific context, even if they may appear universal at first glance. Devising appropriate questions involves gauging how the respondents understand the questions and then identifying the exact categories and terms that have the same meaning for everyone at any time and everywhere, keeping in mind that the questionnaires are translated into several languages.  This calls for thorough preparation, with numerous survey tests and in-depth discussions between national teams. It took several years to design questionnaires that satisfied the requirements for all countries in the MAFE project. The first versions were designed in Senegal in 2005. MAFE-Senegal then carried out several tests, first in France and Senegal and then in Italy and Spain, before launching a simultaneous pilot survey in the four countries. In parallel, the questionnaires were tested and adapted in Belgium and DR Congo as part of the MAFE-Congo I survey (2007), giving rise to new adjustments that were taken into account in the final MAFE-Senegal survey (2008). Finally, after a few minor adaptations, the questionnaires were used for MAFE-Ghana (in Ghana, the Netherlands and the UK) and MAFE-Congo II (in DR Congo, Belgium and the UK) in 2009.
23While methodological problems of comparison may potentially concern any comparative survey (regardless of subject), such problems are inevitable in surveys on international migration that, by their very nature, seek to compare individuals living in different places (migrants, return migrants and non-migrants). Two examples illustrate the “one-size-fits-all” solutions devised as part of the MAFE project to improve questionnaire comparability.
24The first example concerns the legal status of migrants (i.e. documented or undocumented). The objective was to reconstitute the legal trajectories of the migrants when living outside their country of birth (and not just in Europe). The difficulty was threefold: 1) each country has its own legal system (and no worldwide database yet exists in this area); 2) the legal framework may vary over time within each country; and 3) the status of migrants is often complex, as the right to reside does not always depend on entry conditions (with or without a visa) and is not always connected to the right to work. In fact, intermediate situations exist between documented and undocumented statuses. Considering all these complications, the solution adopted by MAFE distinguished between the legal categories of “work permit” and “residence permit”. Within each category, the response modalities are designed so as to ascertain at all times whether the respondent had a permit or not, or if a permit was not needed. As a result, the questionnaire covers all possible legal situations as part of a standardized framework that is fully comparable in all situations. 
25The second example relates to the concept of the couple. The problem here is that there is no simple and universal objective criterion for determining the moment at which two people start to form a couple, particularly in the context of international migration. Living under the same roof can be used as a criterion in some single-sited surveys but is not relevant for transnational couples who, by definition, are not cohabiting since one of the partners has migrated to a different country (Baizán et al., 2014; Beauchemin et al. 2015; Mazzucato et al., 2015). Furthermore, the marriage criterion is difficult to apply in the MAFE project or at any rate in some of the countries in which it is administered. While marriage is practically universal in most sub-Saharan countries, it is relatively rare in some European countries, where consensual union has become a social norm. All in all, considering the difficulty of defining a couple with objective criteria that would be relevant in multiple contexts, it was decided to adopt a subjective definition for the MAFE project, with respondents listing the person (or people) they considered as their partner(s) at the time of the survey or in the past. That decision raised translation problems, however, since such neutral terms do not exist in all languages. For example, the translation of the word “partner” in local Senegalese languages and in Spanish is “spouse”, a term that excludes partners in consensual unions, a form of partnership that, while quite rare, does exist in these societies. To ensure that the questionnaire would be understood in the same way by all respondents in all contexts, we specified in all languages: “Let’s talk about the partners that you have had in your life, whether you were married to them or not”.
26The questionnaires are almost identical in all the countries. The sole adaptations regard cultural (religion, ethnic groups) and family variables (no polygamous unions and no reference to the family nucleus in households in Ghana and Congo; no reference to fostered children in MAFE-Senegal). The same data entry programs were used in all countries (thanks to a multi-lingual design), so the MAFE datasets, designed to facilitate comparisons across time and countries, have exactly the same structure wherever the survey took place. 
III – Sampling problems
27Building a representative sample for a survey of international migrants is a challenging task in most countries. The relatively small numbers of migrants and, even more so, of return migrants, the vulnerability of certain individuals (undocumented migrants) and the lack of appropriate sampling frames are all major obstacles. A range of methods have been tested, sometimes on an experimental basis, but none of them have proved ideal (Groenewold and Bilsborrow, 2008; McKenzie and Mistiaen, 2009). In this highly constrained methodological context, the aim of this Section is to explain the sampling strategies used in MAFE and to document the problems encountered.
Finding an acceptable compromise
28As mentioned earlier, the fundamental objective of the MAFE project was to produce data for comparison of migrants, return migrants and non-migrants. MAFE employed two techniques that can be used to simultaneously produce information on these three categories of individuals, who, by definition, live in different countries. The first technique consists in creating a sample of households in the origin country to describe their members (mostly non-migrants, sometimes return migrants) and migrants “associated” with the households, irrespective of their destination (Table 2). The other technique involves carrying out a multi-site survey by interviewing return migrants and non-migrants in their countries of origin and migrants in their destination countries. A perfect sample would include a sub-sample representative of the population of the origin country along with dispersed sub-samples representative of the entire migrant population in the rest of the world. Of course, given the dispersion of migrants across the globe, such an approach is totally unfeasible, as it would require a quasi-worldwide survey. Multi-site surveys, by nature, call for a compromise that consists in selecting at least one destination country. Thus far, MAFE is the only project to have surveyed several destinations for one origin. For each African origin country, the project decided to systematically select the former colonial capital and at least one new destination. In fact, just two or three destination countries were chosen, and they are all located in Europe (Table 1). Other destination countries are not entirely absent from the MAFE project, however; they are included in the data collected on migrants from households in the origin countries and also figure in the migration histories of international migrants interviewed in Europe and Africa.
29In another compromise, to limit survey costs, the samples were restricted to particular regions and so do not achieve national coverage. In Senegal and DR Congo, the MAFE samples are focused exclusively on the regions of the capital cities (Dakar and Kinshasa), while in Ghana the sample covered both the capital Accra and the city of Kumasi. These regions in Senegal, DR Congo and Ghana are home, respectively, to 26%, 12% and 17% of the total population of the countries  and are known for their high migration rates. For example, Dakar was the origin region for 31% of international migrants reported in 2001-2002 by Senegalese households in the ESAM II survey (Sall, 2008). The African samples, then, are not representative of the countries but are more closely linked to their capital cities, where out-migration is most frequent. Neither do MAFE’s European samples cover the destination countries in their entirety. Target regions were identified so as to maximize coverage of the target populations and minimize survey costs stemming from sample dispersion, while also collecting data on regions where migrants are less concentrated. In France, the three regions where the survey was administered account for 64% of the country’s Senegalese population. In Spain and Italy, where Senegalese also live in farming areas, the samples cover both urban and rural zones. 
30Besides the general objective of comparing migrants, return migrants and non-migrants, the MAFE project initially set specific objectives on the number and characteristics of respondents. At least 150 migrants were to be included in each destination country, to allow comparisons with non-migrants. Though relatively low, that number ensured a ratio of migrants to non-migrants as yet unattained by any other similar surveys. In the MMP survey (Massey, 1987) and the OECD survey on migrants in the Senegal River Valley (Condé and Diagne, 1986), the ratio is one migrant in the destination country for ten non-migrants in the origin country. The ratio is much higher in the MAFE survey (see ratio (1)/(2) in Table 3). While no quotas were set beforehand, the sample of migrants in destination countries necessarily had to include undocumented migrants so as to reflect the diversity of migration experiences. For each origin country, the initial objective was to include roughly 200 return migrants. Lastly, to analyse international migration from the viewpoint of gender, our samples had to include around 50% women in the destination countries, and left-behind women (partners of migrants) were to be over-represented in the origin countries.  Table 3 shows that these objectives were met and the following section shows how this was achieved.
Description of MAFE samples(a)
Description of MAFE samples(a)(a) depending on the country of residence at the time of the survey.
Sample selection techniques
31This section is based largely on Schoumaker and Diagne (2010) and Schoumaker et al. (2013b), who may be referred to for more detail on sampling design and weighting methods.
32In Africa, the MAFE surveys drew inspiration from the experience of the Push-Pull project (Groenewold and Bilsborrow, 2008) to ensure adequate representation of households and individuals of interest, some of whom form a potentially rare population (households with migrants, individual return migrants and partners of migrants). In each target region, the sampling strategy was based on random samples stratified in several stages. First, a sampling frame of primary sampling units (PSUs) was built by stratifying regions according to the level of emigration.  The PSUs (census zones in Dakar and Accra-Kumasi, and districts in Kinshasa) were selected randomly but with an over-representation of regions with a high prevalence of migration. In each PSU, a listing operation led to a ranking of households into one of three strata (international migrants, return migrants, non-migrants), which enabled us, in the second stage, to randomly select households by over-sampling those affected by migration. Lastly, in the third stage, the individuals were selected from households, once again on the basis of their relationship to migration. In Ghana and DR Congo, all return migrants and partners of migrants were selected, along with another non-migrant member selected randomly from each household. In Senegal, where the individuals were chosen in an equally random manner, the number of return migrants and migrants’ partners was limited to two per household.  Through this repeated selection of units (i.e. areas, households and individuals) affected by migration, the initial objectives for the African samples were met (Table 3). With this selection method, weights must be used to correct for over-representation (Schoumaker et al., 2013b).
33In Europe, the construction of representative samples of migrants was a major challenge. The absence of accessible sampling frames covering the migrant population (including undocumented migrants) practically ruled out the use of random selection techniques. Spain was the exception in this respect, and one that the MAFE project took advantage of. The country’s undocumented migrants are listed in the municipal registers (Padrón) compiled by the national institute of statistics, constituting a sampling frame from which Senegalese migrants were randomly selected. In the other countries, the quota method was used. This last approach is often recommended for constructing small samples, particularly in the absence of a sampling frame (Ardilly, 2006).  In all countries (apart from Spain), quotas were set by age and sex at least.  In France, the occupational category was also included as a criterion in the quotas, while in Belgium and the UK, the place of residence was used. The use of different recruitment methods (public spaces, snowballing, community groups) and experienced interviewers ensured that all types of migrants had a non-zero probability of being interviewed and, in particular, that undocumented migrants were represented in the samples. Random selection techniques were also introduced in different phases of the surveys. In Belgium, for example, survey locations were randomly chosen by taking account of the number of people originally from DR Congo living there. In France, Italy and Spain, some of the respondents were also selected on the basis of contacts obtained through the household survey carried out in Senegal.
The representativeness of MAFE data
34As detailed data on both migrants and non-migrants were needed to fully meet the objectives of the project, we constructed dispersed and heterogeneous samples. This raises the question of what exactly the MAFE data are representative of.
35We are certain that the household samples in Africa are representative of the entire population of the agglomerations surveyed (Dakar, Kinshasa, Accra and Kumasi). The selection methods ensure that all households had a chance of being interviewed (including households with immigrants in these cities), with use of weightings to correct for non-response (Razafindratsima et al., 2011; Schoumaker et al., 2013b). Migrants in the household data, for their part, are representative of the migrant population scattered across the world eligible to be reported by the surveyed households, given the criteria selected to record them (family, conjugal and other relations, as explained in Table 2).  A study in Senegal shows that the migration trends observed by the MAFE project concur with those suggested by other data sources (2002 census and the 1992 EMUS survey), namely a slight increase in the propensity to migrate out of Dakar between the 1990s and 2000s, combined with a redirection of flows from Africa to Europe, North America and other destinations (Lessault and Flahaux, 2014).
36The individual data are, by nature, more heterogeneous since they were collected in several countries. Within each origin country (Ghana, DR Congo and Senegal), individuals, like households, were representative of the populations living in the surveyed agglomerations, the exception being that immigrant populations were excluded. In DR Congo, for example, only individuals born in DR Congo and with Congolese nationality were included in the sample. Weights correct for both individual non-response and the over-representation of return migrants and international migrants’ partners (Schoumaker et al., 2013b). For each European country, it can be said that the individual data are “as representative as possible” of the migrant populations. Given the lack of a sampling frame (except in Spain), it was simply impossible to build strictly random samples. But every effort was made to diversify the sampling sources and ensure that all types of migrants had the chance of being selected (including undocumented migrants, as shown in Table 3). The important thing was to combine different forms of recruitment so that the biases of one would be offset by the biases of another. By doing so, we avoided the selection biases stemming from the exclusive use of the snowballing method initiating in the origin country (when migrants in destination countries are surveyed through contacts obtained in the origin country). This type of selection method, tested in MAFE-Senegal and inspired by the MMP, turned out to be ineffective and was not used for the MAFE-Congo and MAFE-Ghana surveys. Detailed analyses based on MAFE-Senegal revealed two main flaws in the method (Beauchemin and Gonzalez-Ferrer, 2011). First, its “yield” was very low, with the ratio between the number of international migrants reported in the household questionnaires in Senegal to the number of migrants found and surveyed in Europe being just 5%; other recruitment sources therefore had to be used. Second, the collection of contacts in the origin country resulted in biased samples, as the probability of obtaining a contact is stronger when the household head in the origin country him/herself has international migration experience and when the household is modest (not homeowners) and receives substantial financial assistance from its migrant(s). In other words, relying on a sample of individuals whose contact details are obtained in the origin country leads to over-estimating both the role of migration chains in explaining migration and the migrants’ economic contribution to their origin family.
37Assembling the data collected in the different countries – a necessary stage for comparing migrants and non-migrants and studying the causes and consequences of migration – is also problematic. The ideal scenario would be to have universal samples representative of all the Congolese, Ghanaians and Senegalese living around the world, all origin and destination countries combined. The MAFE data provides only imperfect sub-samples of these ideal samples, however, since they (partially) cover a limited number of destination countries. For analyses using these transnational samples, weightings were calculated to take account of the size of the population in each country (since for each origin group, the migrants surveyed in Europe are over-represented compared with non-migrants surveyed in Africa).  The transnational samples are nonetheless marked by geographical mismatches. The first of these stems from the regional coverage of the samples. For example, while all the non-migrants and return migrants interviewed in Senegal lived in the Dakar region at the time of the survey, 35% of the migrants interviewed in Europe had never lived there. The second mismatch lies in the incomplete coverage of the destination countries of Congolese, Ghanaian and Senegalese migrants. While at the time of the survey the interviewed migrants lived in a limited number of European countries (Table 1), the return migrants interviewed in Africa may have returned from any country in the world.  These mismatches call for caution when interpreting the data.
Conclusion: progress and future challenges
38The aim of this article was to examine the methodological choices made in the MAFE project, to explain the rationale behind them, but also to acknolwedge their limits. In short, the MAFE surveys are multi-sited (origin and destination) and comparable (between all countries), providing retrospective, multi-thematic and multi-level data. Drawing on previous experience of migration surveys, including the MMP, the Push-Pull project and the life history surveys carried out in France and Africa, the MAFE project introduced a number of innovations. It developed new retrospective modules on topics such as migrant networks, migration attempts, migration itineraries, legal status of migrants and remittances that had previously been addressed using a cross-sectional approach. These new data offer scope for new analyses.  In addition, MAFE is the first project to collect data on several migration systems, enabling comparisons of three African migration flows and several destinations for each one of those flows. But these innovations also have their limits and call for further methodological research.
39The first limit concerns the quality of the data generated by the new life history modules. The numerous tests made in the development of the questionnaire and the final survey show that the individuals surveyed were able to respond to the many retrospective questions asked. Initial analysis also showed that the data collected were coherent. But further work is needed to better assess the quality of these new types of data. For example, research has already demonstrated that subjective assessments of economic well-being at the time of the survey are relatively reliable (Razafindrakoto and Roubaud, 2001), but their usefulness for retrospective research deserves more in-depth study.  Retrospective information on migrant networks could also be assessed in more depth. The main question here is the extent to which respondents are able to reconstitute the migration history of their entourage. One way to address this issued might be to compare, on the basis of MAFE data, the components of a migration history recorded in the household questionnaires via household heads with the detailed migration histories collected in the life history questionnaires completed by the migrants themselves (return migrants in the three African countries and Senegalese migrants interviewed in Europe whose households were interviewed in Dakar). The use of proxy respondents is a widespread practice in surveys on international migration, and such an approach would provide an opportunity to identify the necessary conditions for obtaining reliable data by this means.
40The second limit concerns sampling. First of all, while the samples may be expanded in the future, their size is relatively limited, which can make some types of analysis difficult. Secondly, although every effort was made to ensure that the samples were “as representative as possible” in each European country, selection biases cannot be totally ruled out. Unfortunately, these biases are not measurable and are specific to each country, which means that differences in results between countries should be interpreted with caution in comparative analyses. Thirdly, the destination countries included in the life history surveys are limited in number (two or three) and in space (in Europe only). Since African migrants have diverse and varied destinations, this inevitably introduces a selection bias that needs to be taken into account in analysis. This bias could be assessed using the data from the household questionnaire. Lastly, while the MAFE survey was designed to enable comparisons between migrants, return migrants and non-migrants, mismatches exist between the samples, which also call for caution when preparing analyses and interpreting the results.
41Overall, the MAFE project has produced a set of data that are both unique and imperfect; unique, in that they can be used to make new analyses of international African migration,  and imperfect because the data have limits that call for caution in analysis and interpretation. These limits are inherent to all surveys of international migration, which face problems of sampling and information quality that are structurally different to those encountered in other fields of demographic research. To compare migrants, return migrants and non-migrants, there are only two options: to rely on limited information collected through proxy respondents and/or to work with an imperfect multisite sample that can nevertheless be used to collect a wealth of data from the concerned individuals themselves, irrespective of their geographical location. Considerable progress still needs to be made in sampling. The contribution of MAFE here is rather modest, and simply shows that contacts collected in origin countries lead to biased migrant samples in destination countries. In the absence of a satisfactory sampling frame – one including documented and undocumented migrants – other methods of selecting migrants in destination countries need to be tested and assessed (McKenzie and Mistiaen, 2009). That said, it is essential to improve the content and accessibility of existing sampling frames. From that standpoint, censuses that add modules on international migrants (as in Senegal and Morocco, among other countries), and questions that make it possible to identify return migrants are opening up a new avenue of progress. In the meantime, documenting survey design as precisely as possible should become standard practice. In the field of international migration, methodological papers, even if simply descriptive, remain all too rare (Groenewold and Bilsborrow, 2008). Yet recognizing and assessing the limits of the data collected is a beneficial exercise that not only allows users to analyse data with fuller knowledge of the facts but also opens the way to future improvement.
AcknowledgementsThe MAFE project is the result of a collective effort that involved a number of people (researchers, interviewers and respondents) in all the project countries. Thanks are due to all those involved, particularly the staff at the INED Surveys Department and Statistical Methods Service. The MAFE project is coordinated by INED (C. Beauchemin), in partnership with Université Catholique de Louvain (B. Schoumaker), Maastricht University (V. Mazzucato), Université Cheikh Anta Diop (P. Sakho), Université de Kinshasa (J. Mangalu), the University of Ghana (P. Quartey), Universitat Pompeu Fabra (P. Baizan), Consejo Superior de Investigaciones Científicas (A. González-Ferrer), Forum Internazionale ed Europeo di Ricerche sull’Immigrazione (E. Castagnone) and the University of Sussex (R. Black). The MAFE project received funding from the Seventh Framework Programme of the European Community (subsidy 217206). The MAFE-Senegal survey was carried out with financial support from INED, the French National Research Agency (ANR), the Île-de-France region, and the FSP programme “International Migrations, territorial reorganizations and development of the countries of the South”.
French Institute for Demographic Studies (INED).
Correspondence: Cris Beauchemin, Institut national d’études démographiques, 133 Boulevard Davout, 75980 Paris Cedex 20, tel. : +33 1 56 06 20 00, email: firstname.lastname@example.org
A more extensive account of the MAFE project and methodological choices is available in Beauchemin (2012).
The concept of household is traditionally used to describe a group of people living under the same roof, under the authority of the household head, at the time of the survey. At another point in time, the head or place of residence could be different, with some members leaving and others arriving. Consequently, when speaking of a household, the reference to the future or the past is far from clear. Does it refer to the group, the place of residence or the household head?
In the MAFE project, the household is defined in a conventional manner as a group of people living together and sharing their resources in part or in full with a view to satisfying their essential needs (housing, food). To be considered as members of a household, the people in question must have lived or have the intention to live under the same roof for at least six months.
The meaning of regular contact was left to the judgment of the respondents. A questionnaire module addresses the nature and frequency of this contact.
For more on migration trends, see the article by S. Vause and S. Toma in this issue. For more information on calculating migration rates with MAFE data, see also Schoumaker and Beauchemin (2015).
For a detailed discussion of the recording of attempted migrations and an analysis of the factors in attempted and effective migrations, see Mezger (2012). Regarding return intentions, see Flahaux in this issue.
The questionnaires can be found on the project website: http://mafeproject.site.ined.fr/en/. While the MMP is presented as an “ethnosurvey” (Massey, 1987) giving interviewers the latitude to formulate the questions, the MAFE questionnaires contain precisely worded questions to be respected by interviewers.
For a complete overview of the databases, see Beauchemin et al. (2014).
See, for example, the article by K. Caarls and V. Mazzucato in this issue on the influence of migration on divorce.
Alternatively, questions could have been included in the individual questionnaire to describe the places in which the individuals have lived. But this idea was dropped as the questionnaire was already very long.
The survey tools exist in French, English, Spanish and Italian. The questionnaires were not translated into the local African languages but workshops were organized during the interviewer training sessions to discuss the translation of potentially problematic concepts.
For a detailed analysis of the legal trajectories of Senegalese migrants, see Vickstrom (2013).
MAFE project data are available in French and English and can be freely accessed at: http://mafeproject.site.ined.fr/en/data/
The sample target regions are described in Schoumaker and Diagne (2010).
The over-sampling of some categories of persons, such as women, is corrected using weights (Schoumaker et al., 2013b).
In Senegal, this process relied on the results of the 2002 census that included questions on international migration. In DR Congo, in the absence of a recent census, stratification was carried out on the basis of information provided by qualified individuals (researchers, specialists from international organizations, managers in public administration, etc.) Given the prevalence of international migration in Accra and Kumasi and the dispersion of migrants in these cities, stratification of this kind was not necessary in Ghana.
This is not the main explanation for the smaller number of return migrants in Senegal observed in Table 3. Their number was increased in Ghana and in DR Congo through finer stratification of households. For further details, see Schoumaker et al. (2013b)
Given the small sample size in each country, it was not possible either to apply alternative selection methods designed to reach rare populations in the absence of a sampling frame, such as respondent-driven sampling (Heckathorn, 1997) or intercept point surveys (McKenzie and Mistiaen, 2009; Marpsat and Razafindratsima, 2010).
Weights were calculated to fit the distribution by sex and age observed in other available sources, the source naturally varying by country. For more detail, see Schoumaker et al. (2013b).
It should be noted that some reported migrants may never have lived in the agglomerations or even the countries surveyed (a grandfather from Accra may mention a grandson born in the USA but with whom he nevertheless has regular contacts). These migrants can, where necessary, be removed from the analysis samples.
See Schoumaker et al. (2013b) for an overview of the different weightings and discussions on the use of weightings in the specific cases of transnational samples and life event history data.
For more statistical detail on these mismatches, see Beauchemin (2012). In the future, they could be minimized by extending the samples to other countries and origin regions, or both, as the retrospective nature of the data makes such an extension possible. The MMP has collected data gradually over time. In 1982 the sample concerned just five Mexican communities; today it concerns over 100 (Massey, 2000). As part of the MAFE project, a second wave of around 400 Senegalese migrants was interviewed in Spain in 2010 and 2011.
See the MAFE project website for more details on the work already carried out.
The ideal solution would be for a new survey to collect both subjective information on the well-being of households (such as questions Q312 and Q313 in the MAFE life event history questionnaire) and objective information, like the EMIUB survey, for example, which collected retrospective information on housing quality (Poirier et al., 2001).
Although the MAFE data (particularly the African samples representative of capital city regions) can also be used to address subjects other than international migration.