CAIRN-INT.INFO : International Edition

1Social sciences as a whole and sociology in particular have for several years adapted their methods and techniques to the processing power and new data provided by digital technology. Major computational and statistical infrastructure has been installed in every country to serve as archives (HAL-SHS [2]), processing (Progedo, [3] Réseau Quételet, [4] Cessda [5]), or publication (Cleo, [6] Hypothèses [7]), to mention only the examples for France. In some cases, older traditions have adopted more efficient tools, [8] and in others, new domains or sub-disciplines have attempted to carve out their own space (online surveys, [9] databases, etc.), such as web studies, within larger domains like the “digital humanities”.* [10] Sometimes, digital technology is treated as one technique amongst others; whilst elsewhere, it is treated as a separate domain, but it is not clear that these choices are enough to account for what is at stake in the current digital evolution. In fact, all of the social sciences provide a means of reflexivity to societies, which are built on a set of mechanisms that sometimes take a long time to take shape but end up becoming a convention, [11] and appear natural not only to researchers but also to the public, decision makers, and others. In this article, I propose a frame of analysis for digital change as a new reflexivity offered to societies: new sources of data are now available, beyond the census and register [12] or polls and questionnaires. [13]

Table 1

The three generations of the social sciences

Table 1
First generation Second generation Third generation Concept of the social Society(ies) Opinion(s) Vibration(s) Means of collection Census Poll Traces (big data) Principle of validation Exhaustiveness Representativeness Traceability Co-construction institutions/research Register/survey Audience/poll Following traces /analysis of vibrations Main actors of reference (and funders) State Mass media Brands Operational actors National institutes Polling institutes Web platforms (GAFAT) Founding authors Émile Durkheim George H. Gallup, Paul Lazarsfeld Michel Callon, Bruno Latour, John Law Key problematics of initial academic approaches Division of labor and welfare state Propaganda and media influence (audience measurement) Science and technology (Scientometrics) Technological context Hollerith machines (electro-mechanical tabulation) Radio and telephones Internet, web, and big data Semiotic formats Cross-tabulation and cards Curves and histograms/pie charts Graphs, timelines,* and dashboards Metrics Statistics Sampling Topology and tweets per second (TPS) (Scores) Technical data quality criteria Pertinence, precision, topicality, accessibility, comparability, coherence Confidence interval, probabilities Volume, variety, and velocity (big data) Dominant modalities of the social sciences Explanations Descriptive, then predictive correlations Predictive correlations

The three generations of the social sciences

2To help us navigate the proliferation of computer technologies, it is helpful to establish a historical perspective on these different “periods” of reflexivity which, thanks to their long technical, institutional, and economic development, have each provided evidence to researchers and the public. A table summarizing the three eras of the social sciences is presented above. It allows us to make the coherence of the comparative approach more visible, while requiring a more schematic approach that eliminates some of the specificities of each era. It should be noted that this article will not deal with the “qualitative” aspects of the methods of the social sciences, which might display the same periodization but which have had less influence on the reflexivity of societies in everyday life or on the governance of states, the media, or brands. As a final precautionary principle, while the first two generations have been widely documented, the third to some extent represents a wager on the capacity of the social sciences to organize a form of academic response to changes in the methods of quantification that deeply affect the social as a whole.

3New entities of uncertain status are thus made accessible by digital technology, entities that cannot be fully accounted for either by “society” and its socio-demographic properties, or by “opinion”. Here, I seek to draw on the conventions that constituted the reflexivity of societies to examine in greater detail the most favorable conditions for a new convention that would necessarily involve all of the interested parties mentioned in this table, as, in fact, every era of the social sciences has allied itself with institutions external to itself in order to quantify society or opinion. We should, however, be conscious of the fact that this new generation of social sciences will have little chance of existing if we do not measure how actors (platforms and producers/sensors of traces) tend to occupy the entire terrain. [14] To put it briefly, marketing and computer science appropriate and create tools for monitoring social life, brands, reputations, communities, social networks, opinions, and more, dispensing with the interpretation and models of the social sciences, which is compensated for by the unprecedented processing power and traceability of big data.* The velocity of the traces [15] collected in this way is the main characteristic of note. It provides a way to modify the rhythm of political life itself when it focuses on tweets, for example. What I will call “high-frequency politics” [16] borrows aspects from high-frequency trading* [17] in finance, for better and especially for worse, it seems. The main concern remains action and reaction, not analysis or understanding as the traditions of sociology, political science, and the other social sciences have defined them. Traces, not data; reactivity, not reflexivity: the digital world is shaped by principles that leave less and less space for the usual debates in the social sciences. Casuistics – the detailed study of cases in context using methods sometimes termed “thick description”, with qualitative and quantitative dimensions, and frequently comparative in aim – no longer has any status in this environment, which must aggregate, standardize, and detach data from its conditions of production to facilitate an – ideally automatic – reaction. Similarly, the hypothetical-deductive approach, often coming from a disciplinary tradition – and requiring the development of a protocol for data collection and processing guided by the hypotheses – has little meaning when machines are able to test thousands of correlations and evaluate their statistical robustness even before any interpretation is necessary. There is therefore no reason why the authority of the social sciences should not be called into question as have all authorities since the rise of networked digital technology. As Geoffrey C. Bowker writes, however, this shift comes at the cost of overturning all of our points of reference.

4

“For those of us brought up learning that correlation is not causation, there’s a certain reluctance to examine the possibility that correlation is basically good enough. It is surely the case that we are moving from the knowledge/power nexus portrayed by Foucault to a data/action nexus that does not need to move through theory: all it needs is data together with preferred outcomes.” [18]

5In this article I seek to make a contribution to the work of establishing the conventions needed to develop third-generation social and political sciences, which will treat the proliferation of digital traces as new material by defining more precisely the limits of the social dimension they allow us to trace: in other words, high-frequency vibrations and not long-term [19] social structures or medium-frequency shifts in opinion.

The digital age

Traces – and not persons, identities, or communities – are the primary materials

6For many years, but extensively with social networks, the computer sciences have been calculating and modeling the social as if the traces gathered provided access to the “real” social better than any poll, survey, or census. [20] Let us take two examples, one academic and the other commercial.

7“The Web does not just connect machines, it connects people”, declared Sir Tim Berners-Lee, [21] founder of the web with Robert Cailliau in 1991. He was emphasizing the shift to a dimension of the network that was no longer technological (Internet) or documentary (World Wide Web), but social.

8Facebook, for its part, accomplished the feat of making it “normal”, from the point of view of the actors themselves, to state one’s real identity – the one provided by civil records, name and surname – against the tradition of anonymity on the web. In this way, the platform competes with Google to become the reference world or even an alternative civil record.

9Yet there is no guarantee of any connection between Facebook identities or Tim Berners-Lee’s “people”, and individuals identifiable by civil records and counted by census. Only accounts are connected and the data gathered is only the traces of activity by an entity that might potentially take the format found in civil records. In the case of scores that allow Internet sites to be ranked by a search engine, the topology that accounts for the ranking never focuses on site content as such, but on the links to and from the site which give an authority* or hub* score, in the sense of network topology [22] and not social status. This first clarification should lead us to accept the specificity of this world of traces on platforms and to reflect on the impossibility of using them to draw any lessons on society or opinion. This should oblige the social sciences as a whole to speak not of individuals but of accounts, not of communities but of clusters, not of sociability but of connectivity, not of opinions but of verbatims* from comments and other sources, using an approach marked by the “prudence” of radical empiricism. [23]

10Let me specify what I mean here by traces as distinct from data. Traces can range from signals (“raw”, generated by objects) to unstructured verbatims that propagate in the form of memes* (or quotes); they can be metadata (more than the contents of a tweet, this metadata is rich and easily calculated), traces (links, clicks, likes, cookies) [24] exploited in databases by operators or platforms (GAFAT [25]). They can also be captured independently through the API [26] proposed by these platforms that do not refer to relational databases. [27] Traces are not necessarily preformatted for precise calculation or dependent on aggregation that can be carried out later. It is easy to maintain that, despite everything, “behind” the sites or “behind” the clicks, there are humans with all of their “intentions” (which we can rarely verify because this is declarative data) and “social properties” (whose pertinence can never be guaranteed to explain a specific behavior in a precise situation). It makes more sense to track algorithms* which, conversely, only focus on one property or another depending on their aims, properties that are no longer structural but which sometimes appear very secondary (such as a prior purchase, or a like on a post) with no guarantees but an acceptance of approximations adjusted as a result of learning from large amounts of data. Traces, understood in this strict sense, are produced by digital platforms and technological systems but are not “signs” or indicators of anything other than themselves unless relationships with other attributes are created and demonstrated.

11This distinguishes them radically from the data that can be gathered extensively from client files or from administrative acts, which only fulfils the big-data attribute of “volume” (in reference to the three Vs – volume, variety, and velocity – associated with it). The methods associated with big data can of course be applied in both cases, but traces are in principle independent from other attributes, especially socio-demographic ones, which are rarely mobilized in the correlations sought between traces. Relations to the most classic parameters in data science are limited to time (timestamp*) and place (geolocation tags) that allow the production of timelines and maps which become modes of simplified presentation of traces. Drawing on these common references (which highlights the strategic importance of Google Maps), it is possible to make correlations between all types of data, by using the “variety” attribute of big data (such as actions on a cellphone, purchases in a store, and references in a database with addresses). Yet thanks to this radical simplification, where the reference entities are no longer individuals with all of their socio-demographic properties, traces circulate quickly and modify the status of the databases themselves, which become dynamic (sometimes even referred to as “real time”), which the social sciences never had the opportunity to process. “Velocity”, the third property of big data, brings unprecedented information to the social sciences and makes it possible for another approach to emerge based on the traceability of masses of entities detached from individuals. From this point on we need to be clear about the difference between:

  • traces made up of clicks, likes, or other markers of ephemeral and non-verbal attention (such as QR codes); [28]
  • comments which allow for lexical processing (most often co-occurrences) that is closer to the work done in opinion studies, without the status of established methods of measuring opinion;
  • hypertext links that represent preferential marks of attachment and can therefore be exploited like the data from analyses of social networks, or as markers of authority as traditionally analyzed by scientometrics in the form of citations. The analyses of web topologies [29] often seen in the “digital humanities” [30] rely in fact on the identifiable social properties of all of these nodes connected by the arcs constituted by hypertext links.

Traces are produced by platforms for brands

12These digital traces form a particularly profitable set of “raw materials” for platforms. Digital marketing methods are largely based on sending mass advertisements or emails to IP addresses or emails that clicked on an article (retargeting), and much more rarely on sophisticated relationships with other attributes of the supposed individuals attached to these addresses or these clicks (profiling).

13Traces are a key resource for brands. They allow them to track the effects of their actions on their public. Reputation and notoriety are no longer expressed solely in audience measurements. [31] On the Internet, brands have to measure both the form of the audience (reach), elementary ranking activities performed by undefined publics (likes, stars), but also more elaborate activities, such as their comments, which represent what is called their “level of commitment”. Brands are hungry for traces and they are the ones feeding the profits of all these platforms and, through them, the entire web. Opinion mining and sentiment analysis tools [32] are thus the response to marketer anxiety after launching a new product. However, extension of the brand domain reaches every social activity, be it commercial, cultural, political, institutional, or interindividual when each individual must measure their quality based on rankings, as researchers themselves do, [33] despite the vigorous and widely shared criticism of these indicators. [34] Brand methods are becoming prevalent everywhere, imposing their rules and rhythms, even in public services that have to practice benchmarking. What concerns these brands above all is not data that is structured and constructed to test causalities, for example, but traces that function as indices and alerts, even approximate ones, not at the individual level but at the level of tendencies or trends. In the same way, it is primarily reactivity which is sought, not reflexivity: in other words, the capacity to determine how to react on the basis of the brand features affected.

14The political world itself is now caught in this spiral of reactivity, and its addiction to tweets leads to the conclusion that we have entered the era of “high-frequency politics” in the image of the “high-frequency trading” of speculative finance. The powerful viral phenomena found on the Facebook platform with its “like” mechanism are impressive and raise questions for political science researchers. [35] In the case of a jeweler from Nice who killed an attacker in self-defense, a page was created on 11 September 2013 that had one million likes by 14 September and reached 1,635,000 likes on 7 November. The hashtag “#JesuisCharlie” spread throughout the world: [36] tweets using the hashtag, image, and slogan created by @joachimroncin started on 7 January 2015 at 12:52 pm to reach more than 6500 tweets per minute or 3.4 million tweets in twenty-four hours, with three pages and Facebook groups reaching more than 400,000 members. In all of these cases – limited here to themes of interest to political science (since “buzz” can relate to anything) – the effects of contagion are impressive and require a specific approach. The affair of the Nice jeweler gave rise to all types of analysis, from the most constructivist critiques (“all of the likes have been bought”) to the most positivist (“it is proof that opinion – or “people” – have swung behind a reflex reaction for more security”). In the Charlie Hebdo case, it was often pointed out that users of the Twitter platform were not representative of the population as a whole; which is indisputable, although it does not eliminate all value from this new material. Worse yet, one can legitimately ask if the Twitterverse does not turn in on itself, measuring its role by the number of retweets to the exclusion of any other influence. This self-referential tendency has already been indicated by marketing specialists to calm the general enthusiasm of communicators in favor of these networks, while the “rates of conversion” (actual sales) are sometimes unmeasurable or uncorrelated. For these reasons, at least, it is legitimate that political science and the social sciences in general question their own uses of these sources. There are several possible positions on them, and the research conventions to be established are not at all the same.

Doing the same social sciences from these new sources of data

15Establishing the conventions associated with traces is an ongoing process and the proliferation of opinion mining services that I have studied, [37] as well as the extreme diversity in their quality, is proof of this fact. Many researchers in the social sciences continue to work with existing paradigms and process the data collected on digital networks according to their regular methods, while carrying out the necessary adjustments and controls. Here, I am not referring to online surveys or mass processing of extensively tagged textual corpora, such as the corpora of academic articles studied by scientometrics, as these approaches do not study native online traces or those produced by platforms. Web studies coming from the social sciences use the same classical sociological frameworks to analyze these new sources of data: a preference for economic studies based on Google requests; sociability studies on networks; longitudinal analyses of “communities”, around themes, or specific Internet sites; opinion mining and sentiment analysis approaches used to augment the monitoring of public opinion or the identification of trends. The graphs created from social networks or links between Internet sites to create communities of interest or semantic maps of a controversial domain do not change the paradigm but provide it with new tools. Digital technology thus amplifies [38] “the reality of opinions” or confirms “social structures”, objects constituted in academic traditions well before computers.

16Authors such as Burt L. Monroe et al.[39] have given several examples of what could be analyzed from a political science point of view using big data methods by focusing on the phenomena observable online. These authors report on a study of Chinese social networks. [40] It allowed them to observe that the publications prohibited by the censors were not those that criticized the regime but those that attempted to launch collective actions. [41] To verify this hypothesis, the researchers produced their own publications based on these two criteria and could then verify the validity of their hypotheses, after performing a type of experimental sociology. Others [42] have used the traces left by search queries (Google Search) during Obama’s campaign to compare the proportions of racist terms in the queries for each US state. They took the queries to be spontaneous expressions that they would never have obtained from conventional methods, such as interviews.

Practicing digital methods by repurposing traces

17Another approach allows us to recover the native digital traces produced by platforms by repurposing them for academic ends. In this vein, Richard Rogers sets out a “repurposing” [43] of these traces by working, for example, on “query design”: the formulation of queries on Google to obtain responses to well-constructed hypotheses. Wikipedia represents a separate and valuable case for researchers: as the Contropedia project has shown, the platform preserves the history of every intervention, debate, and controversy, and produces arbitration for which the traces are also kept. Rogers has studied political polarizations and the terms of controversies on abortion, or the Srebrenica massacre according to the Croatian, Bosnian, or Serbian versions. But Wikipedia is unique in this regard. Rogers has also used Twitter as a “narrative machine”, to reconstitute the Iranian revolt of 2009 using the 600,000 tweets that he was able to gather: the effects of propagation are clear, as was the case for the slogan “Leave!” (“Dégage!”) used during the “Arab Spring”. Noortje Marres [44] is more concerned with responding to critiques of the dependency of researchers on data provided by platforms. Her approach to sociopolitical controversies via “issues” represents a strong position that enables us to constitute the limits of validity for empirical research (tweets – or weibos – or other traces are not generally monitored outside an arena constituted by the issues). She considers that media parameters do not need to be refined because they are part of the “process”. It is possible to correct for them, however, as she does for example by tracking the controversies surrounding Deep Packet Inspection during the 2012 World Conference on International Telecommunications (WCIT) in Dubai. To reproduce the queries performed on a corpus of tweets in a more pertinent way and independently of the calculations of the platform, Marres cross-referenced the most frequent hashtags with the terms used by experts in their documents. This gives an idea of how the effects of framing produced by the media can be accounted for in a detailed and methodical way, as researchers do with the conditions of implementation of non-digital polls or surveys. [45]

Establishing the conventions of third-generation social sciences

18This article proposes a more radical perspective on empiricism [46] than the previous approaches, by positing that the digital traces produced by (current and future) platforms should be processed in their environment and not related to movements of society or opinion. This self-restriction accepts the criticism of the biases produced by these platforms by considering that these digital traces account for other phenomena and mobilize other entities than those of the social sciences of society or opinion. The phenomena of high-frequency propagation of entities (traces) in different digital contexts allow us to track and calculate the processes of imitation (invention and opposition at the same time) that Gabriel Tarde identified long ago. [47] This does not call into question the other analyses of long-term social structures and medium-term movements of opinion, but neither of these two approaches should reduce to its own principles what I will call “vibrations”, which are also constitutive of the social. The challenge is therefore not to create an extension of the fields of the classic social sciences (corresponding to the first two eras of the social sciences discussed at the beginning of this article), nor specialized methods for dealing with identical phenomena, but to establish conventions for a new “stratum” of the social sciences related to processes and entities that have until now been incalculable. It is important to ensure that the social science ambitions of reflexivity and critique are maintained in the face of the tendency to pure reactivity by the agencies and platforms that use the same data. The challenge is all the more achievable in that it relates to simple, elementary, and almost similar entities. “Statistics is the counting of similar actions, as similar as possible”, as Gabriel Tarde wrote.

19To establish these conventions, it is necessary to go into further detail concerning certain favorable conditions, starting with what big data does with all of these traces. The tendencies of the latter may effectively provide the first directions to be compared with those taken by the previous generations of social sciences. Thus the criteria of quality for big data are often summarized as the 3V mentioned above: volume, variety, and velocity. The relationship to the demands of the social sciences is striking in this case, justifying the approach used here.

Volume and exhaustiveness

20Volume corresponds to the demand for exhaustiveness translated in a somewhat limited way, since nothing allows us to define the limits of the universes of data collected. The lack of a referential “whole” on the web, [48] in a high-frequency, dynamic system, prevents the construction of a “universe” in which classical statistics may be produced. Exhaustiveness must therefore be set aside, but without abandoning the imperative to indicate the minimal conditions of acceptability of a given corpus of traces from the point of view of volume.

Variety and representativeness

21The second criterion, variety, comprises a form of transcription of the demand for representativeness that has allowed all of the social sciences to operate by means of surveys, polls, and sampling. However, the criterion of “variety” is a weaker version of representativeness, since it accepts a sufficient level of variety. For third-generation social sciences that consent to losing the constraint of representativeness as established in the case of polling, this variety remains to be defined. The establishment of a set of sources (sourcing) in web studies, for example, should respond to some of the criteria specific to digital methods and the area studied. My work on opinion mining led to considering that no general description of the social-society, social-opinion, or social-traces can be produced for digital networks. The proliferation of traces makes it impossible to refer to a “whole” posited a priori or constituted a posteriori. The social sciences have to accept that they can only deal with “issues”, [49] or points that focus attention, for which digital technology can capture the traces that are specific to each issue. This considerably reduces the totalizing scope of the claims of big data but makes it possible to translate some of the imperatives of representativeness and exhaustiveness.

Velocity and traceability

22The final criterion, velocity, has not had an equivalent in the social sciences until now. These dynamic processes are not in fact their strong point or their main preoccupation. It was primarily essential to find a representation of positions at a moment t, in order to show the force of imposition of “society” on the diversity of individual behaviors, or how public opinion is structured beyond the unique expressions obtained from surveys. Indeed, through a very costly longitudinal study of the same populations or the repetition of the same questionnaires, it was possible to reconstitute something close to a dynamic without ever being able to follow closely the mediations that allowed these evolutions to be produced. Velocity therefore seems outside the domain of more classic approaches. [50] Yet this criterion seems to me to be the foundation on which the analysis of sometimes older phenomena can emerge (think of protests, fashions, stadium cheers, rumors, among others), which were impossible to examine closely with the regular mechanisms of the social sciences. This high-frequency and quickly propagated activity was in fact neglected or reduced – as the work by Emmanuel Todd on the 11 January 2015 protest [51] (#jesuisCharlie) demonstrates – to a “hysteria” entirely determined by causes all the more powerful because invisible and distant (and therefore non-documentable) according to the procedure used by Émile Durkheim in his study of the origins of religions with his “God-society”. [52] The controversy that followed the publication of Todd’s work was particularly telling of the misunderstanding between the three generations of social sciences and the risk of relating every account, or even explanation, to only one of these approaches. Todd therefore adopted a “long-term” position, referring to the religious foundation of territories, mobilizing a “place memory”, expressed in the form of “zombie Catholicism” that “continues even when it only exists in the form of traces, or no longer as an individual belief” (181). Without getting into a discussion of this thesis here, it appears that protest as a situational phenomenon found itself buried under powerful and untraceable causes, which were revealed by the totalizing aspect of historical statistics. Specialists in opinion [53] then had an easy time showing that the polls had been conducted after 11 January and that all of the data contradicted the “opinions” attributed to the protesters by Emmanuel Todd. The motivations and meaning of the practice could be recovered by the established method of polling. At the same time, the viral nature of the process in the streets and on computer networks was not “explained”, because this was not the object of these polls. Here is where digital methods capable of processing the velocity of these traces would allow us to account for the high frequency of the social without also thereby invalidating theses concerning the long term of belonging and belief, or those concerning the medium term of opinions.

23Nonetheless, a branch of web studies has also taken up this question of velocity in its own way, using the traces of memes propagated on the web. It is very significant that Jon Kleinberg – the same person who exported the methods of scientometrics to the study of web topology, methods that were then taken up by Google – has for many years [54] been interested in perfecting a “memetracker” with Jure Leskovec. [55] Their most famous study looked at the propagation of quotes during Barack Obama’s presidential campaign, which allowed them to produce an impressive visualization of the focusing of attention in rapidly rising and descending curves (“streams and cascades”) around certain campaign incidents. [56] Their method aggregates all of the types of traces that these quotes can leave, processed as chains of characters that can be found throughout the web. It produced a metrics anchored in time, day to day, or even minute by minute with Twitter, the unit of measurement having become the tweet per second. Taking memes into account seems promising, as long as the transformations-translations of these memes are tracked in different contexts. Each wave is propagated in a medium that possesses different properties of refraction and diffraction. It is thus legitimate to ask what this work has contributed to political science, since it was an election campaign. The authors insist first and foremost on their contribution to the understanding of the “news cycle” between blogs and media (“The dynamics of information propagation between mainstream and social media”). [57] Expressed in this way, the results seem to reserve judgment on the content of the memes traced during this cycle and are more closely related to media studies. Nonetheless, the authors set out a program of work with greater pertinence for political science:

24

“One could combine the approaches here with information about the political orientations of the different news media and blog sources to see how particular threads move within and between opposed groups.” [58]

25This statement clearly indicates that use in terms of political science would require mobilizing the classic categories of political opposition in order to understand the relationship between the circulating entities and the medium that they penetrate. Yet even though the proposed research has not been carried out by the authors, the risk of tautology can immediately be seen: these classifications will be difficult to challenge, while a radically empiricist approach would expect that the curve of propagation would give access to precisely these paths and non-classic proximities. Moving away from the self-referentiality of the media to say something about political processes founded on “opinions” (and identified camps) presents a certain risk if the power of connectivity of the entities is not maintained against a supposed “structure” of the medium of propagation.

26Once these precautions are established, it becomes possible to find an academic equivalent of the velocity of big data: traceability. This becomes the essential criterion for the entities that we can study. Some favorable conditions must be present to achieve this.

27– The traces in question must have sufficient continuity for it still to be possible to say that it is the same process, without the constraint of the first memetics [59] (identical) or the laxity of generalized intertextuality, in which everything could be a sign or repeat of everything else. The simplest traces are thus the easiest to follow: likes on a Facebook page or hashtags retweeted on Twitter, for example.

28– The traces in question must allow monitoring of heterogeneous associations, in other words, a sufficient power of connectivity. For this reason, traces with a format that is too specific to a lesser-known platform cannot give rise to extensions or monitoring.

29– Monitoring the traces in question must allow all events, all transformations, and all associations to be dated precisely. The timelines are equivalent to other conventions here, such as cardinal points for topography or income levels for first-generation social sciences.

30These elementary conditions make it possible for the social sciences to shift to monitoring elements that are not individuals, groups, society, or opinion. Digital networks have amplified uncertainty concerning the status of these categories which were already called into question by approaches like ethnomethodology. The propagation of these elements, which should be qualified beyond the technologically collected traces, becomes the object of the third generation of the social sciences: the properties of these entities allow them to create small differences and thus to circulate and affect individuals and groups, society and opinion.

31However, the conditions of feasibility of this use of traces must account for the dependency on platforms that occurs in this way, as the previously mentioned approaches using digital methods have shown. We can hardly hope to alter these traces at their source, since the platforms enjoy a position of strength. Yet it is possible to use the traces produced by the platforms by diverting them from the use for which they were designed (repurposing), as Richard Rogers proposes, but with the goal of monitoring purely digital entities. The rule here is that no explanation from another level or from another non-digital realm is taken into account, in order to compare speeds of propagation, rhythms, and possible transformations (for example, by contamination of other domains). This new generation of the social sciences must show that it is revealing processes that have never before been brought to light, either due to pre-digital limits in technology or the targets adopted by previous generations of the social sciences.

32The field is largely open for this additional approach to the social, since the era of traces has only just begun and platforms are not and will not be the only mass providers of traces. The Internet of Things is no longer just an engineer’s fantasy and daily life is starting to be populated with contactless exchanges, radio frequency identification (RFID) chips, and other automatic geolocalizations that no longer depend on people but on the objects themselves, as Bruno Latour’s interobjectivity has become traceable. [60] Their traces during their passage and their state (active or not) enable us to pilot logistical and transactional processes, often confined to the professional worlds concerned. However, their extension and open access will be more or less inevitable as we engage in the proliferation now being announced. It will no longer be possible to refer to “persons” or “social” entities in the sense of the classic social sciences. There is no reason for the social sciences not to seize upon these new sources.

Big data without theory?

33This theoretical and political ambition to establish a new generation of the social sciences could, however, come up against the force of big data methods, which are based not only on the three Vs mentioned above, but also on a radical and non-theoretical correlationist approach. Chris Anderson made waves by writing a short article called “The end of theory” in 2008. [61] His remarks were based on observation of the practices of several scientific domains (genomics and physics), where the extensive use of data to discover correlations in his view replaced the need for the models, hypotheses, and the proofs constructed to test them. For Anderson, in the same way, the web is another world that does not trace anything but makes available means to capture traces, aggregate them, calculate them, and exploit them, and, in the same movement, to change them. “Likes” do not need a theory. The platform collects the traces of actions by Internet users (or machines) who click in a standardized format; it aggregates them and produces a ranking that is posted and may be used by the platform itself to show trends. The latter allow focused advertisement placement by companies that are also seeking to measure the effects of their placement or communication choices. In essence, this is the chain that is produced. Social theory has almost no usefulness in this type of operational arrangement where the performative mechanism functions in almost the same way as audience measurement. If this was limited to this sector of marketing, the phenomenon of traces would only be an extension of the self-referential bubble that is typical of finance as speculation, a permanent play of mirrors or “economy of opinion”. [62] Questioning the need for theory touches something deeper and causes the unease described by Geoffrey C. Bowker in my introduction. The search for causes, often ultimate causes, is not an academic quality in and of itself, for very often it can accommodate very patchy empirical arguments, often the case in critical approaches or limited testing of the categories used, which are characteristic of positivism.

34The mode of reasoning associated with big data introduces something new: not so much the absence of detours through causes to account for the data gathered, but the lack of questioning of the properties and validity of the data formatted by nonetheless very powerful computer systems. It is possible to launch correlation searches between all types of data due to the mere fact that such data are available. Processing power is now readily available, the collection and storage of data seems limitless, and all correlations can be tested without restriction. While everyone knows that “correlation is not causation”, all statistics-based work uses these correlations, on the condition that they respect certain procedures and put forward certain hypotheses that are then tested by calculation. In the case of big data, the procedure is reversed: hypotheses are produced by correlations as long as they are sufficiently robust. It amounts to a form of “automatization of induction” that depends entirely on the categories implicitly present in the data. Machines in fact end up finding correlations because they can choose the most effective algorithm from a portfolio of algorithms. Not only data, but also algorithms can be combined at will. Any errors produced are not a problem because methods of learning allow lessons to be learned along the way, beyond the starting examples validated by experts, in the context of supervised learning.

35Big data is thus not only a question of volume, but also depends in large part on this new version of artificial intelligence known as “machine learning”.* Models of learning have been developed [63] and are just as important for understanding the issues of big data. This principle of high-frequency trial and error makes any attempt at theorization (other than computer or statistical theory) relatively futile, since it has no a priori discriminatory power in relation to other correlations. Of course, this does not prevent the work of interpretation, but when the goal consists of acting in a certain way (finding therapeutic protocols or generating online activity related to a brand, for example) the most important thing is the result, not its explanation. These approaches are very attractive to decision-makers, especially in situations of political responsibility. If the concern for short-term action is stronger than the thirst for for explanation, then the social sciences may be faced with a relative withdrawal of interest. Thus, the evaluation of public policy (in every area that can apply a benchmarking approach using series of sufficiently significant data) may completely shift to a type of approach that is agnostic in theoretical terms, on the condition that it never questions the entities traced by the systems of data collection or their conditions of production. In this way, it is possible to speak of an “algorithmic positivism”, which could only spread as a result of the exhaustion of the far-reaching explanatory models of the social sciences. The theory of vibrations claims to contribute to renewing the ambition of theory, while being able to mobilize methods and approaches of machine learning above and beyond mere data collection.

From traces to vibrations

36To solidify the foundations of the third-generation social sciences, these heterogeneous traces need to be given scholarly status. First, we should remember that all of these traces, which may still be connected to personal data, will probably not be accessible under the same conditions in a few years. The success of Adblock in preventing cookies and other intrusive advertisements is growing (200 million downloads, 40% installation on Firefox in 2014). Generalized encryption will become a necessity given the inability of platforms and information services to regulate their own predatory practices in terms of personal data. [64] For this reason, taking traces into account on the very surface of networks without any connection to structured and socio-demographically significant data represents a solid foundation for the social sciences, against all those who continue to want to apply their models of society and opinion to a universe that is only accessible to them because of a highly temporary laxity. Working on the surface of these traces, without any connection with personal data, allows us to reduce the ethical contradictions in which the social sciences of society and opinion that want to use these sources find themselves trapped.

37As I have emphasized, the production of traces is directly dependent on the platforms that generate their analyses. It is, however, necessary to base third-generation social sciences on a proposition that is not in thrall to the uses made of these traces, in the same way that polls are not only for the media or censuses only for the state. To the register/survey pair, established by Alain Desrosières, [65] and audience/opinion polling, we should add the pair traces/X – where X is the space which remains to be defined for the recovery of traces by the social sciences. I suggest using the term “vibrations”. It provides a suggestive metaphor, one related to earthquakes (aftershocks), where we know it is possible to track the waves that come before and after the quake. It has the advantage of focusing the attention less on particles than on these waves, and it echoes the “buzz” with which brands and the media are obsessed but which has never been theorized.

38Other concepts could claim this space, like attention, influence, issues, actor-network, memes, replies, or even conversations. [66] The word “vibrations”, however, offers the advantage of being familiar and polysemous, with each of its meanings making sense within a science that treats traces as raw materials. The key elements lie in the decentering away from the notion of actors, strategies, and representations, which are all legitimate in the context of the other social sciences, but do not allow us to account for the “power to act” of the circulating entities known as vibrations. It is impossible to determine beforehand the size and the status of these entities. Only investigations of mass corpora allow them to be identified, when their replication emerges from the sensors used – from the platforms of course, but according to our research objectives.

39A sociology of vibrations relies on the imperative to follow elements (for example, “je suis Charlie” as a hashtag, icon, or quote outside Twitter) without knowing how they will be aggregated to sets of varying size (very heterogeneous groups are caught in the dynamic of propagating “je suis Charlie” and no border can be drawn). The position adopted is therefore “elementalist” but should never become atomist, since variable geometry is a quality that we have learned from ANT. [67] Any variation in “je suis Charlie” or “je ne suis pas Charlie” is of interest. Both cases come from the same process of imitation/opposition, as proposed by Gabriel Tarde. What is sought is not the substance of the opinion of supposed individuals, but the power of circulation of a vibration that changes depending on the contexts it affects. The goal is not to move towards social physics either, which seeks out supposedly transversal laws through all flows according to known models of fluid mechanics or corpuscular physics, [68] since social vibrations retain their particularities and are not studied in general but according to the issues and problems that trigger their movement (“je suis Charlie” is not of the same nature as the “Nice jeweler” and no general “law” would be pertinent to account for them).

40The vibration approach allows us to construct infinite combinations by following the extensions, propagations, and repetitions, on the condition of remaining focused on the issues that are conveyed and animated by the vibrations, which in this sense are very different from “raw’ traces. The object of study is not so much the element, which may have very different properties, in scope and materiality, and nor aggregates alone, which tends to occur with clustering in graphing methods; [69] it is the process of circulation and aggregation or disaggregation at the moment curves bifurcate. It is not sufficient to locate the clusters produced by the propagation of “je suis Charlie” to Facebook accounts or websites to find the underlying potential “trends in opinion” (blogs on the left, far-left, environmentalist, and others). The temporal dynamic must be restored while identifying the moments when “je suis Charlie” visually transforms, switches to other formats after Twitter, mutates into “je ne suis pas Charlie”, or aggregates with a call for protest, for example. In these curves, it is important to focus on moments of emergence and on degree, as Gabriel Tarde suggested, and not on the peaks that function as aggregates, as recognized by the first-generation memetrackers, or on the plateaus, as Quetelet did. [70] The object of this science of vibrations is precisely their agency, as they propagate and finally grip us. Collective experience intuitively gives us an example of this, as it did in the mimetic impulse that “gripped” millions of people who went to protest on 11 January 2015 following the Charlie Hebdo shootings. As Gabriel Tarde demonstrated, ideas traverse people and act upon them, and not the contrary. [71]

41

“Rays of imitation first, and then the entities whose existence we induce from the variations they impose on the flows of imitation.” [72]

42It then becomes possible to study the properties of these vibrations to compare their chances of survival or contamination. These chances depend on differences in their properties that are always directly connected to the issues they convey. As we can see, the approach through vibrations is the gateway to a monadology (which is radically different from an atomist view) or to an ecology. [73]

43Traceability is not provided as such by platforms and requires the production of tools and methods adapted to a vibration approach rather than one just intended for traces alone. Jure Leskovec and Jon Kleinberg were precursors in this area when they proposed their memetracker. They are indeed capable of reconstituting flows of all kinds, which they call “streams” and “cascades”. The development of methods to trace vibrations should take this existing method into account, while making sure to test them on corpora constituted to this end, at the risk of sacrificing “realism”. I began this work in 1987 by examining how, within workplaces, conversations about television transformed to become “local public opinions”. [74] I took the same approach to studying the attributes of a photograph on the Flickr database [75] and transposable signs in a body of websites connected to a region. [76] In the first case, the attributes of the photograph (for example, the crossed arms selected by Roland Barthes as punctum) became attractors for tags and thus connected accounts or photographs that would never have been connected a priori according to these criteria. However, I went no further than the principle of this work, unable to expand it empirically to a sufficient scale. In the second case, the propagation of the Breton flag on the web became an indicator of a connection which could be compared with other entities constitutive of regional imagery where the latter had failed to propagate in the same way. On the basis of work undertaken manually on almost 600 sites by Mariannig Le Béchec, it was possible to set out an analysis of the vibrations created by this flag and to note that its semiotic properties were not unconnected to its capacity for circulation. Tags or icons are thus vibrations that we can follow, even when they do not have the explicit character of verbatims or expressions, as in the meme tracker, nor their extensive nature. Potentially, all of the traces that I have identified, such as likes, tweets, recommendations, and others, may be the object of these observations. They require specific traceability tools, which today only exist, so to speak, on Twitter. These tools have to respond to the specifications of a traceability of vibrations, not just traces for their own sake or for brand responsiveness.

44The specifications of a new generation of social sciences seem to be an unattainable goal given the scale of the methodological and political issues (how will the platforms be involved?). Nevertheless, this historic moment that disturbs the status quo of the ways of doing social sciences is not unprecedented. I have called this approach a “third generation” to highlight its similarities and differences with the two other generations presented in the initial table. It is no longer possible to question the idea of “society” that exists independently of the conceptual history and the mechanisms that stabilized it. And yet, as Alain Desrosières has shown, [77] it took many years of compiling statistics to enable this society to exist quantitatively and make it independent of the concepts of Émile Durkheim. In the same way, public opinion now seems to possess a certain reality. Yet opinion polls are what give it the force of proof, starting with Gallup [78] in 1936. They remain challenged, but the idea itself of public opinion has ended up by becoming “obvious” and they can thus be considered to have won. As Bruno Latour noted, an object holds because it was well-built, and in that moment it escapes our control. [79] The third-generation social sciences must find their place alongside the other social sciences of society and opinion, and not replace them: they are another stratum of the social that is emerging as a result of these mechanisms of high-frequency traceability. For this to happen, we need to define how conventions are to be developed; a step that allowed the previous generations to survive and move beyond the current situation in which uncontrolled methods proliferate. The dissatisfaction with the quality of results obtained from practicing the “algorithmic positivism” mentioned above is also expressed by brands and agencies. This resource represents the chance for the social sciences to reach agreement on establishing a convention with all of the interested parties, so that the principles arising from academic requirements simultaneously produce quality results from an operational point of view.

45* * *

46Constructing a proposal for the social sciences of the third generation is not guaranteed. I have tried to present potentially favorable conditions for it here, in line with actor-network theory, which set out its premises, and Gabriel Tarde, who declared its principles. For the moment, however, the tendency towards the end of theory and the presence of web platforms (GAFAT), which produce, calculate, and publish on these traces themselves, still remain dominant, primarily for commercial reasons: the demand for studies using these approaches largely comes from brands. I recognize their interest in learning how to react by using these metrics, as well as the interest of the social sciences of society and opinion in continuing to develop their approaches by using these sources. In this sense, I argue for the coexistence of these approaches, for learning how to switch perspective from one to the other, [80] and for recognising the conditions of possibility of each generation with the support of the state, the media, or brands. Each study of a question resulting from ordinary experience or posed by these parties should lead to combining the three generations, albeit without laying a claim to totalization, on the condition that the research adopts a specific framework for these traces that are invading our world. Each social wavelength should have a method and limits of validity. This would institute a welcome precautionary principle for the use of these traces. [81] In this way, the social sciences of the third generation can both help us understand unprecedented phenomena and delineate the domain of validity of each generation to avoid any instance of the “cause-finalier” mindset that Gabriel Tarde deplored. My intention is only to contribute to laying the bases of a convention, an investment of form, [82] allowing a social theory and an object – vibrations – to emerge, which do not reduce digital technology to either “digital methods” or “digital humanities”. There is a new raw material that deserves to be studied in itself and that produces a third stratum of the social, one that can be measured according to different principles and is not reducible to society or opinion. Society ended up existing, opinion ended up existing, and vibrations should end up existing in the same way. [83]

Glossary

47Big data

48This term refers to both the volume of data available for calculations of all types, and its variety, since it is now possible to combine data recorded by official services (such as requests addressed to the Pôle emploi [the French employment agency]), traces resulting from activity on networks (searches via Google to create a CV), and verbatims recovered from forums or press Internet sites, for example. Their constant updating (velocity) thanks to the networks gives this data a different status than that found in databases, with which the social sciences are familiar. The term also refers to methods of calculation that use these sources of data and that contribute to machine learning and inferences from correlations tested en masse.

49High-frequency trading

50High-frequency trading is one of the financial techniques of automatic trading; in other words trading managed directly by computers and their algorithms. It allows one to play on the disparities between sales and purchases and to profit from these differences in a microsecond. It attempts to influence the portfolios of other investors by artificially generating trends towards increase or decline, for example by inundating the market with orders that take a long time for competitors to calculate and are finally not concluded. This eminently speculative technique requires the most high-performance machines and networks.

51Verbatims

52Transcriptions of interviews, which are well-known in the social sciences, are here extended to extracts of expressions collected in various environments (forums, social network posts, consumer reviews, among others) under a non-preformatted written form that can nonetheless be calculated by algorithms of Natural Language Processing (NLP).

53Timestamp

54Each operation performed with a computerized machine and on networks is associated with metadata that indicates the date and time, and is collected in an event log. This metadata can be processed independently of the content of this operation and allows for various calculations.

55Timeline

56The timeline is a way to visualize data graphically in a chronological manner on the same temporal line, which may become a chronological strip with interactive aspects that allow direct access to the documents associated with events.

57Digital humanities

58The expression “digital humanities” designates all possible combinations between computer technology and the human or social sciences. They include the digitization of sources, the mobilization of techniques of digital capture and analysis of corpora of interest to HSS, the creation of shared corpora allowing experimentation, including the use of new sources of data produced by digital networks, and the mobilization of novel techniques of processing and visualization.

59Amplification

60This concept was used by Elisabeth L. Eisenstein [84] in her historical analysis of printed materials to show that all of the tendencies already present in the societies at the time were amplified by printing, before it could be noted much later that some of them benefitted more clearly from this new technology. This concept allows us to relativize the notion of revolution that is too often applied to digital technology without knowing what trends will be most successful over the long term. It also makes it possible to account for the proliferation of innovations and uses related to these technological innovations, which causes a form of disorientation but also enables political debate on the best technological choices.

61Memes

62This concept was introduced by Richard Dawkins in The Selfish Gene. [85] It extended the evolutionary theory of genetics to elementary cultural entities, or memes, which have the particularity of propagating by imitation or derivation. Analysis of all of the most fundamental cultural processes (language, the self, the brain) was attempted, for example by Susan Blackmore, [86] before being more or less abandoned. The concept of the “viral” on the web has given this concept new relevance, to the point that it has become a recognized and followed method of production, using reproduced and modified images and highly contagious terms that participate in the “buzz” phenomenon, analyzed here as vibrations.

63Machine learning

64Machine learning is a new version of artificial intelligence that no longer relies on an exhaustive categorization of the world into a priori ontologies to perform calculations, as in the twentieth century (which is only possible in a few, very standardized worlds, such as the aeronautics industry). Machines can now learn from the data provided to them in a permanent flow and in sufficient quantity (big data) to test the correlations within it. Machine learning engages multiple models of learning and no longer models of the world. It chooses pertinent algorithms from libraries of algorithms, testing them according to the type of data, and advances through supervised learning, including the responses and validations of experts in the domain. This extreme flexibility makes it adaptable to any context but implies enormous processing and storage capacities, which are now available at an affordable price for major companies.

65Algorithm

66An algorithm allows us to solve a problem by breaking it down into a series of instructions or operations. It is related to a procedure but its components must be precisely defined. Under these conditions, it can be executed by computer when written in the form of code.

67Authority score

68When collecting links (edges) between Internet sites (nodes) on a network, it is possible to calculate an authority score for each node according to the number of sites that point to it or the inbound links (in-degree). The authority score is one of the aspects of web topology exploited by Google to produce its page rankings.

69Hub score

70When collecting links (edges) between sites (nodes) on a network, it is possible to calculate a hub score for each node related to the number of sites to which it points, the outbound links (out-degree). The hub score is one of the aspects of web topology exploited by Google to produce its page rankings.

Notes

  • [1]
    For a precise definition of these technical terms, I refer the reader to the glossary located at the end of this article. The first mention of terms in the glossary is indicated with an asterisk.
  • [2]
    HAL-SHS open archive (human and social sciences).
  • [3]
    PROduction et GEstion des DOnnées [Production and Management of Data] in the human and social sciences.
  • [4]
    The Quételet network coordinates French data archiving, documentation, and distribution activities in the human and social sciences.
  • [5]
    Consortium of European Social Science Data Archives, European network of micro-databanks for research in SHS (the Quételet network is the French member of Cessda).
  • [6]
    Center for open electronic publishing, which manages OpenEdition Press.
  • [7]
    Platform for research notebooks open to all disciplines in the human and social sciences.
  • [8]
    This is the case for historians when they digitize their sources, establish new types of archives, and use new inquiry methods where series of data have higher visibility, as seen in the recent issue of the Revue d’histoire moderne et contemporaine (558 (4 bis), 2011). The debate started by David Armitage and Jo Guldi on the return of long-term perspectives, triggered by the unprecedented growth in the size of now digitized archives and by the use of tools like Google N-grams, shows how changes in the scale of collection, for example, can affect the framework of thought itself. This is something that others have argued against or relativized in a recent issue of Annales on “the long term under debate”. See in particular, David Armitage, Jo Guldi, “Le retour de la longue durée: une perspective anglo-américaine”, Annales. Histoire, sciences sociales, 70(2), 2015, 289-318.
  • [9]
    For example, Données Infrastructures et Méthodes d’Enquête en Sciences Humaines et Sociales (DIME-SHS).
  • [10]
    Pierre Mounier (ed.), Read/Write Book 2. Une introduction aux humanités numériques (Paris: OpenEdition, 2013).
  • [11]
    François Eymard-Duvernay, Olivier Favereau, André Orléan, Robert Salais, Laurent Thévenot, “L’économie des conventions ou le temps de la réunification dans les sciences sociales”, Problèmes économiques, 2838, January 2004, 1-8.
  • [12]
    Alain Desrosières, La politique des grands nombres. Histoire de la raison statistique (Paris: La Découverte, 1993); Emmanuel Didier, En quoi consiste l’Amérique? Les statistiques, le New Deal et la démocratie (Paris: La Découverte, 2009).
  • [13]
    Loïc Blondiaux, La fabrique de l’opinion. Une histoire sociale des sondages (Paris: Seuil, 1998). This seminal work has provided key support for several of my reflections here.
  • [14]
    I should note that Roger Burrows and Mike Savage came well before me in sounding a warning to sociology as a whole, particularly in relation to the emergence of digital “transactional data” likely to trigger a future crisis of empiricism: Roger Burrows and Mike Savage, “The coming crisis of empirical sociology”, Sociology, 41(5), 2007, 885-99. These authors recently took another look at this matter in the light of big data: Roger Burrows and Mike Savage, “After the crisis? Big data and the methodological challenges of empirical sociology”, Big Data & Society, April-June 2014, 1-6.
  • [15]
    Digital traces do not have the status of regular social scientific data, nor of the traces recorded by historians, such as those referred to by Charles Seignobos at the start of the twentieth century. He classically distinguished between direct traces (archaeological remains, clothing, etc.) and indirect traces (the printed word or archives, etc.). See Charles Seignobos, La méthode historique appliquée aux sciences sociales (Paris: Félix Alcan, 1901).
  • [16]
    Dominique Boullier, “Plates-formes de réseaux sociaux et répertoires d’action collective”, in Sihem Najar (ed.), Les réseaux sociaux sur internet à l’heure des transitions démocratiques (Paris: Karthala, 2013), 37-50.
  • [17]
    For an idea of the speed of these transactions, here is what Alexandre Laumonier says about them: “The Nasdaq matching engine generates for example one million messages per second (or one every millionth of a second) – by message, I mean an order, a valuation, etc. In a single day, more than 25 billion messages pass through their platform, which is enormous. […] In terms of the transmission of information between trading platforms (for example, between New York, where the principal stock markets are located, and Chicago, where the derivatives markets are), the ultimate limit, or the speed of light in a vacuum (or 300 meters per millisecond), has almost been attained. The McKay Brothers microwave network, the leader in New York to Chicago transmissions, allows traders to connect the two cities in 8.12 milliseconds, or 95% of the speed of light. It is unlikely that we can physically go much faster” (“L’accélération de la finance: entretien avec Alexandre Laumonier. Propos recueillis par N. Auray, V. Bourdeau, S. Cottin-Marx, S. Ouardi”, Mouvements, 79(3), 2014, 92-9). [Translator’s note: All translations from the French are by the translator of this article, unless an alternative published English-language source is given in the footnotes.]
  • [18]
    Geoffrey C. Bowker, “The theory/data thing: commentary”, International Journal of Communication, 8, 2014, 1795-9.
  • [19]
    The place of the long term (longue durée) in the work of historians is discussed in detail by Claire Lemercier, “Une histoire sans sciences sociales?”, Annales. Histoire, sciences sociales, 70(2), 2015, 345-57.
  • [20]
    See, for example, one of the first mass studies of Twitter, based on 41 million profiles and 106 million tweets, carried out by computer scientists at KAIST in South Korea: Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon, “What is Twitter, a social network or a news media?”, Proceedings of the 19th International World Wide Web (www) Conference, Raleigh, 26-30 April 2010. Or the treatment of the 2008 American presidential campaign based on the circulation of memes on the network: Jure Leskovec, Lars Backstrom, and Jon Kleinberg, “Meme-tracking and the dynamics of the news cycle”, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Paris, 2009.
  • [21]
    Knight Foundation, 14 September 2008.
  • [22]
    Jon Kleinberg, David Gibson, Prabhakar Raghavan, “Inferring web communities from link topology”, Proceedings of the 9th ACM Conference on Hypertext and Hypermedia (HYPER-98), New York, 20-4 June 1998, 225-34.
  • [23]
    William James, Essays in Radical Empiricism (New York: Longmans, Green, and Co., 1912).
  • [24]
    Dominique Cardon has provided a precise inventory of the four types of quantification on the web according to the position of the person doing the calculation: he distinguishes clicks, links, likes, and traces. See Dominique Cardon, “Du lien au like sur Internet: deux mesures de la réputation”, Communications, 93, 2013, 173-86.
  • [25]
    GAFAT: Google, Apple, Facebook, Amazon, and Twitter, to which we should add Weibo, the Chinese Twitter.
  • [26]
    API: Application Programmable Interfaces, which allow an application to connect to the services and data of these platforms in a limited way, but sufficient to produce interoperability.
  • [27]
    Geoffrey C. Bowker and Susan Leigh Star, Sorting Things Out. Classification and its Consequences (Cambridge: MIT Press, 1999).
  • [28]
    Franck Cochoy, Jan Smolinski, and Jean-Sébastien Vayre, “From marketing to ‘market-things’ and ‘market-ting’: accounting for technicized and digitalized consumption”, in Barbara Czarniawska (ed.), A Research Agenda for Management and Organization Studies (Cheltenham: Edward Elgar, forthcoming, 2016).
  • [29]
    Dominique Cardon, Camille Roth, and Guilhem Fouetillou, “Topographie de la renommée en ligne: un modèle structurel des communautés thématiques du web français et allemand”, Réseaux, 148, 2014, 85-120.
  • [30]
    Dominique Boullier, “La nouvelle fabrique des SHS”, preface to Bernard Reber, Claire Brossaud (eds), Humanités numériques (Paris: Hermès/Lavoisier, 2007), 17-20.
  • [31]
    Although the tradition of audience measurement is directly related to establishing public opinion via polls and sampling, some continue to speak incorrectly of audience measurement on the web, without any of the frameworks of analysis of this tradition that are the subject of an insightful study by Cécile Méadel, Quantifier le public. Histoire des mesures d’audience de la radio et de la télévision (Paris: Economica, 2010).
  • [32]
    Dominique Boullier and Audrey Lohard, Opinion mining et sentiment analysis. Méthodes et outils (Paris: Open-Edition Press, 2012).
  • [33]
    Isabelle Bruno, Emmanuel Didier, Benchmarking. L’État sous pression statistique (Paris: Zones, 2013).
  • [34]
    Yves Gingras, Les dérives de l’évaluation de la recherche. Du bon usage de la bibliométrie (Paris: Raisons d’agir, 2014).
  • [35]
    For a review of these approaches centered on forms of expression and communication, see Fabienne Greffet, “Le web dans la recherche en science politique: nouveaux terrains, nouveaux enjeux”, Revue de la BnF, 40(1), 2012, 78-83; and on the socio-demographic properties of Internet users in politics, see Anaïs Theviot, “Qui milite sur Internet? Esquisse du profil sociologique du ‘cyber-militant’ au PS et à l’UMP”, Revue française de science politique, 63(3), 2013, 663-78.
  • [36]
    Louise Merzeau, “#jesuischarlie ou le médium identité”, Médium, 43, April 2015, 36-46.
  • [37]
    Dominique Boullier and Audrey Lohard, Opinion mining.
  • [38]
    Elisabeth L. Eisenstein, La révolution de l’imprimé dans l’Europe des premiers temps modernes (Paris: La Découverte, 1991).
  • [39]
    Burt L. Monroe, Jennifer Pan, Margaret E. Roberts, Maya Sen, and Betsy Sinclair, “No! Formal theory, causal inference and big data are not contradictory trends in political science”, PS: Political Science & Politics, 48(1), January 2015, 71-4. My thanks to Nonna Mayer for pointing out this article.
  • [40]
    Gary King, Jennifer Pan, Margaret E. Roberts, “How censorship in China allows government criticism but silences collective expression”, American Political Review, 107(2), 2013, 326-43.
  • [41]
    This approach echoes the more qualitative work of Séverine Arsène on how Chinese Internet users frame their own statements in a context of censorship: Séverine Arsène, “De l’auto-censure aux mobilisations: prendre la parole en ligne en contexte autoritaire”, Revue française de science politique, 61(5), October 2011, 893-915.
  • [42]
    Seth Stephens-Davidowitz, “The cost of racial animus on a Black presidential candidate: evidence using Google search data”, Journal of Public Economics, 118, 2014, 26-40.
  • [43]
    Richard Rogers, Digital Methods (Cambridge: MIT Press, 2013).
  • [44]
    Noortje Marres, “Why map issues? On controversy analysis as a digital method”, Science, Technology and Human Values, March 2015, 1-32.
  • [45]
    Nonna Mayer, Sociologie des comportements politiques (Paris: Armand Colin, 2010). The question of the role of platforms in framing expression can indeed be read in the tradition of work on “agenda setting” (Maxwell McCombs and Donald Shaw, “The agenda-setting function of mass media”, Public Opinion Quarterly, 36(2), 1972, 176-87) and on “framing” (Shanto Iyengar, Is Anyone Responsible? How Television Frames Political Issues (Chicago: The University of Chicago Press, 1991)).
  • [46]
    William James, Essays in Radical Empiricism.
  • [47]
    Gabriel Tarde, Les lois de l’imitation (Paris: Félix Acan, 1890).
  • [48]
    Bruno Latour, Pablo Jensen, Tommaso Venturini, Sébastien Grauwin, and Dominique Boullier “‘The whole is always smaller than its parts’: a digital test of Gabriel Tarde’s monads”, British Journal of Sociology, 63(4), 2012, 590-615.
  • [49]
    Noortje Marres, “The issues deserve more credit: pragmatist contributions to the study of public involvement in controversy”, Social Studies of Science, 37, 2007, 759-78; Noortje Marres and Esther Weltevrede, “Scraping the social? Issues in live social research”, Journal of Cultural Economy, 6(3), 2013, 313-35.Online
  • [50]
    I will not mention here the approaches of social physics, ethology, or social epidemiology that have produced models without reference to the traditions of the social sciences.
  • [51]
    Emmanuel Todd, Qui est Charlie? Sociologie d’une crise religieuse (Paris: Seuil, 2015).
  • [52]
    Bruno Latour, “Formes élémentaires de la sociologie, formes avancées de la théologie”, Archives de sciences sociales des religions, 167, July-September 2014, 255-77.
  • [53]
    Nonna Mayer and Vincent Tiberj (Le Monde, 19 May 2015) and Luc Rouban (Notes du Cevipof, 13 May 2015).
  • [54]
    Jon Kleinberg, “Bursty and hierarchical structure in streams”, Proceedings 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2002, 91-101. Jon Kleinberg starts here by identifying “topics” that provoke “bursts” in the flow of electronic messages, his objective being to help structure them and not track them as such, He does, however, make the connection with problems of time in narration as discussed by Genette.
  • [55]
    Jure Leskovec et al. “Meme-tracking”, op. cit.
  • [56]
    We should note the similarity between “streams” and “streams of thought” and “streams of consciousness” in William James in his Principles of Psychology, for whom “it thinks”(William James, Principles of Psychology, Vol 1 (Mineola, NY: Dover Books, 1950)).
  • [57]
    Jure Leskovec et al. “Meme-Tracking”, op. cit.
  • [58]
    Jure Leskovec et al. “Meme-Tracking”, op. cit.
  • [59]
    Richard Dawkins, The Selfish Gene (Oxford: Oxford University Press, 1976); Susan Blackmore, The Meme Machine (Oxford: Oxford University Press, 1999).
  • [60]
    Bruno Latour, “Une sociologie sans objet? Remarques sur l’interobjectivité”, Sociologie du travail, 4, 1994, 587-607.
  • [61]
    Chris Anderson, “The end of theory: the data deluge makes the scientific method obsolete”, Wired Magazine, 23 June 2008.
  • [62]
    André Orléan, Le pouvoir de la finance (Paris: Odile Jacob, 1999).
  • [63]
    My thanks to Bilel Benbouzid for indicating in particular the role of the theory of Vladimir Vapnik. It allowed for testing of the capacity for generalization of a learning model built on the basis of a number of completed examples.
  • [64]
    Bruce Schneier, Data and Goliath. The Hidden Battles to Collect Your Data and Control Your World (New York: W. W. Norton, 2015).
  • [65]
    Alain Desrosières, Gouverner par les nombres. L’argument statistique II (Paris: Presses de l’École des Mines, 2008). See also his latest book: Alain Desrosières, Prouver et gouverner. Une analyse politique des statistiques publiques (Paris: La Découverte, 2014).
  • [66]
    Dominique Boullier, La télévision telle qu’on la parle. Trois études ethnométhodologiques (Paris: L’Harmattan, 2004), and “La fabrique de l’opinion publique dans les conversations télé”, Réseaux, 126, 2004, 57-87.
  • [67]
    Madeleine Akrich, Michel Callon, and Bruno Latour, Sociologie de la traduction. Textes fondateurs (Paris: Presses des Mines de Paris, 2006).
  • [68]
    Alex Pentland, Social Physics. How Good Ideas Spread. The Lessons From a New Science (New York: Penguin Press, 2014).
  • [69]
    Guilhem Fouetillou, “Le web et le traité constitutionnel européen: écologie d’une localité thématique compétitive”, Réseaux, 147, 2008, 229-57.
  • [70]
    Gabriel Tarde, Les Lois de l’imitation, 173.
  • [71]
    Gabriel Tarde, Monadologie et sociologie (Paris: Félix Alcan, 1893).
  • [72]
    Bruno Latour, “Gabriel Tarde. La société comme possession: la preuve par l’orchestre”, in Didier Debaise, Philosophie des possessions (Dijon: Les Presses du réel, 2011), 9-34 [back translated from French].
  • [73]
    Term used by Yves Citton following Gilles Deleuze: Yves Citton, Pour une écologie de l’attention (Paris: Seuil, 2014). Citton also uses the concept of vibrations to study the literary influence of Spinoza: Yves Citton, “Le réseau comme résonance: présence ambiguë du spinozisme dans l’espace intellectuel des Lumières”, in Wladimir Berelowitch and Michel Porret (eds), Réseaux de l’esprit en Europe, des Lumières au 19e siècle (Geneva: Droz, 2009), 229-49.
  • [74]
    Dominique Boullier, La télévision.
  • [75]
    Dominique Boullier and Maxime Crépel, “Biographie d’une photo numérique et pouvoir des tags: classer/circuler”, Revue d’anthropologie des connaissances, 7(4), 2013, 785-813.
  • [76]
    Mariannig Le Béchec and Dominique Boullier, “Communautés imaginées et signes transposables sur un ‘web territorial’”, Études de communication, 42, 2014, 113-25.
  • [77]
    Alain Desrosières, La politique des grands nombres.
  • [78]
    George H. Gallup, Public Opinion in a Democracy (Princeton: Herbert L. Baker Foundation/Princeton University, 1939 (Stafford Little lectures)).
  • [79]
    Bruno Latour, Petite réflexion sur le culte moderne des dieux faitiches (Paris: Les Empêcheurs de penser en rond, 1996).
  • [80]
    This approach leads to extending the work of reassembling the social started by Bruno Latour, in a more diplomatic vein here. The social sciences that produce level two society maintain their place since, as all sociology of science has taught us, they were able to make their principles become naturalized conventions. See Bruno Latour, Reassembling the Social. An Introduction to Actor-Network-Theory (Oxford: Oxford University Press, 2005).
  • [81]
    Here I share the conclusions drawn by Claire Lemercier in her recent article in Annales cited above: “Discussing and enriching the causal models of the other social sciences, in particular in thinking together the different temporalities: this program still applies today”. I would insist, however, on generations constituted historically and not as disciplines, on distinct objects dealt with in this way, and on the impossibility of thinking these temporalities “together”, but rather in turn.
  • [82]
    Laurent Thévenot, “Les investissements de forme”, in Laurent Thévenot (ed.), Conventions économiques (Paris: CEE/PUF, 1986), 21-71.
  • [83]
    This article benefitted from the critique and commentary of Emmanuel Didier, Geoffrey Bowker, Bruno Latour, Nonna Mayer, Dominique Cardon, Noortje Marres, Guilhem Fouetillou, Sylvain Parasie, Richard Rogers, Franck Cochoy, Michel Wievorka, Françoise Thibault, Yves Citton, and Michel Legrand, in particular during a semesterlong seminar in 2015 on the theme of the third generation of social sciences at the Foundation of the Maison des sciences de l’Homme. My sincerest thanks to them, as well as to the editorial team of RFSP for the final revision of the manuscript. A summary of this text was first presented at the colloquium on big data organized by the Collège de France in June 2014 (Chair: Pierre-Michel Menger).
  • [84]
    Elisabeth L. Eisenstein, La révolution de l’imprimé dans l’Europe des premiers temps modernes.
  • [85]
    Richard Dawkins, The Selfish Gene.
  • [86]
    Susan Blackmore, The Meme Machine.
English

Big Data dealing with the social is used by agencies who process it to produce predictive correlations for the benefit of brands and web platforms. Beyond ‘society’ and ‘opinion’ – whose genealogy this article sketches – lie new ‘traces’ which we should theorize as ‘vibrations’ if we wish to reap the benefits of the widespread traceability of these still uncertain entities. The phenomena of collective high vibrations did exist before the emergence of digital networks but now they leave traces that can be computed. The third generation of social sciences currently emerging must assume the specific nature of the world of data created by digital networks, without reducing them to the categories of the ‘social’ or of ‘opinion’.

Dominique Boullier
A university professor in sociology, Dominique Boullier is a member of the Sciences Po medialab. His most notable publications include: La ville-événement. Foules et publics urbains (Paris: PUF, 2010); (with Stéphane Chevrier and Stéphane Juguet) Événements et sécurité. Les professionnels des climats urbains (Paris: Les Presses des Mines, 2012); (with Audrey Lohard) Opinion Mining et Sentiment Analysis: méthodes et outils (Paris: OpenEditions Press, 2012); “L’écume des territoires numériques”, in Marta Severo and Alberto Romele (eds) Traces numériques et territoires (Paris: Presses de l’École des Mines/ ParisTech, 2015), 113-34; “Cosmopolitics: ‘to become within’ – from cosmos to urban life”, in Albena Yaneva (ed.), What Is Cosmopolitical Design? Design, Nature and the Built Environment (Farhnam: Ashgate, 2015), 39-56. His current research is focused on the sociopolitical implications of digital technology, personal data policies, and methods for monitoring digital vibrations. (Médialab, Sciences Po, 84, rue de Grenelle, 75007 Paris.
Latest publication on cairn or another partner portal
Uploaded on Cairn-int.info on 12/09/2016
Cite
Distribution électronique Cairn.info pour Presses de Sciences Po © Presses de Sciences Po. Tous droits réservés pour tous pays. Il est interdit, sauf accord préalable et écrit de l’éditeur, de reproduire (notamment par photocopie) partiellement ou totalement le présent article, de le stocker dans une banque de données ou de le communiquer au public sous quelque forme et de quelque manière que ce soit.
keyboard_arrow_up
Chargement
Loading... Please wait