1It has become second-nature for the vast majority of us. As soon as we don’t know something, or have the slightest doubt, we turn to Google. The search engine responds. The search engine always responds. A modern Oracle of Delphi that uses algorithms instead. A simple, reassuring interface. There are a lot of people and a lot of things going on in the back of the shop, but we prefer it when things appear simple. We prefer it when things are reassuring. We use Google as a guide, a compass, and a map, all in one. It scans, indexes, and ranks documents that can be accessed through the Web. It fixes its lens on something and decides who will be allowed the spotlight and who, on the other hand, will be condemned to remain in the background as an extra, or behind the digital scenes. It warns: “this content—for you, at the moment, in the place where you are, is undoubtedly more relevant than this other content, which is itself probably more relevant than this other content, which is itself relevant, but less so, and so on.”
2It is not surprising, and frankly rather reassuring, that this sort of thing is controversial. If a piece of technology claims to be handling information on a global scale, it is likely that it will be accused, rightly or wrongly, of not handling it the way it ought to. This is all the more true in that, with Google’s search engine, you find yourself in a situation where, on the one hand, you don’t know exactly how it works, and on the other, its proprietor has a quasi-monopoly on the business in most countries in the world.
3Among the many controversies surrounding Google, of particular interest here are five debates regarding the way the search engine handles information. Rather than being viewed as problems that urgently require solutions, these controversies can be considered as consubstantial with the search engine, in that they contribute to making it what it is: a socio-technical information and communication tool that is not, and never will be, perfectly satisfactory from the point of view of all parties concerned by the work it does. The engineers who are in charge of ensuring that the search engine functions correctly, and of its development, are confronted with these five controversies on a daily basis. They can neither deny nor solve them, and must work with them by performing certain work-arounds, and making certain concessions, certain adjustments. This is why what is being argued is that these controversies contribute to making the search engine what it effectively is, rather than what those who designed it (or anybody else) would have liked it to ideally be. They are part of Google. Since they are not intended to be resolved, they will always be a part of it. It is therefore best to become familiar with them if one wishes to take the search engine’s results with the appropriate grain of salt when one is entering a query into it.
Search engine neutrality
4One of the main controversies surrounding the way that Google handles information is linked to the way we perceive the role a search engine plays: is it a neutral object that ought to be completely objective, or is it inherently subjective and biased? The first way of viewing this involves picturing a search engine as a passive relayer (Pasquale 2006; Chandler 2008). From this perspective, it should treat all documents in the same fashion, and never favor one source or opinion over others. The primary flaw with this point of view is that it suggests a search engine should not rank content (Grimmelmann 2014).
5The opposite point of view considers search engines as active, subjective editors that are responsible for their output in the same way that press organizations are, and that, like them, they should be protected in the United States by the First Amendment of the Constitution (Goldman 2006; Volokh and Falk 2012). In this case, the results list expresses the point of view of the search engine’s designers, who cannot be challenged over the fact that one document has been favored over another.
6Those who see the search engine as a neutral tool emphasize the technical aspect of the way the documents are organized, whereas those who see it as an editor underline its socially constructed nature. The former focus on what the engine does, the latter on what is says (Grimmelmann 2014).
7According to James Grimmelmann (2014), the irreconcilable nature of the two arguments stems from the fact that the former (which he calls “conduit theory”) looks at the work Google does from the point of view of content publishers, whereas the latter (which he calls “editor theory”) looks at things from the point of view of the search engine’s designers. The relationship between search engine and content is thus presented as a binary: conduit or editor, objective or subjective, technical or human. Grimmelmann expresses his surprise at the fact that the user’s point of view is, itself, never taken into account, or at least not enough so, and suggests a third approach. By putting himself in the user’s shoes, he suggests that Google acts as an “adviser,” whose quality essentially depends on its ability to guarantee access to information and to prove its loyalty by not taking into account, in its algorithmic calculations, interests other than those of the person doing the Googling.
Manual intervention
8It’s common to hear web publishers, rightly or wrongly, complaining about having been manually penalized by Google. Even though the company has been loudly proclaiming the contrary for a long time, we now know that it is possible for a website to be manually “blacklisted.” However, we do not know what the exact conditions that lead to such penalization are, nor how frequent manual interventions are. Some observers believe that Google would be justified in punishing content farms [1] that are specifically designed to trick its algorithm. Based on this same kind of argument, Google would also be justified in manually penalizing websites that are pornographic, revisionist, and so on in nature. The opposite argument is advanced by proponents of the search engine’s neutrality, who hold that it shouldn’t be the prerogative of Google’s employees to decide what content ought to be listed. Proponents of this argument fear a kind of censorship, and defend the idea of the search engine as a simple “conduit.”
9It was recently revealed that home-based workers employed by companies such as Leapforce, Butler Hill, and Lionbridge, play a part in the selection and ranking process. A 160-page manual is sent to evaluators, who are then asked to provide one of the following ratings for each of the websites they visit: “vital,” “useful,” “relevant,” “slightly relevant,” “off-topic or useless,” or “unratable.” Another indicator is to be manually inserted: the “page quality.” For this indicator, evaluators must assess the website’s reputation: “good,” “bad,” “ambiguous,” “ok,” “malicious,” or “impossible to determine.”
10We don’t know what impact this manual work has on the list of results. However, it is interesting to consider that Google can be led to do “exactly the opposite of what it is programmed to” through manual evaluation, which de facto limits the “automation of the algorithm.” (Farchy and Méadel 2013)
11The controversy remains. At what point should one manually intervene? For what reason? How? Doesn’t the fact that Google’s engineers can manually intervene contribute to conferring upon them a degree of editorial responsibility whenever illegal content appears in the results? The controversy seems all the more insoluble because Google will not publish the details of the algorithm that predicts document ranking, and that, consequentially, it is impossible to empirically prove the good faith of its spokespersons when they claim that no manual intervention has taken place.
Algorithmic transparency
12Apart from the handful of engineers who came up with it, nobody knows what criteria and factors go into predicting how document relevance is calculated. [2] Starting in the early 2000s, certain observers let it be known that, according to them, it was preposterous that an algorithm meant to rank information and organize knowledge on a global scale should be allowed to remain a trade secret (Brown and Duguid 2000; Introna and Nissenbaum 2000).
13Google’s founders, who in spite of having stated, when they created the search engine, that they wanted the algorithm to be transparent (Brin and Page 1998), at a certain point decided to stop revealing the details of what criteria and factors went into their formula. This allowed them to make things more difficult for ill-intentioned web publishers, who were trying to trick the search engine into giving irrelevant documents greater visibility. If the algorithm was known to all, this sort of practice would risk becoming increasingly common, and the first people to suffer from it would be the search engine’s users. The controversy that emerged from this lack of transparency, which was nevertheless relatively necessary, was essentially an epistemological one: Google will answer every question, provided no internet users ask it how its search engine is able to perform such a feat. As philosopher Paul Mathias (2009) put it: “It’s a bit like knowing required us not to know how, or why, we know!” The controversy is also an economic one: if the algorithm was to be published, Google’s competitors might use it to design their own algorithms. However, as opposed to a trade secret like the Coca-Cola recipe, say, the problem with Google lies in the fact that isn’t a matter of producing a refreshing beverage, but a communication tool. Not publishing the algorithm means preventing internet users from thinking about the criteria that go into determining how the information has been ranked (i.e., it keeps them from understanding and critiquing the method).
14The controversy is still ongoing. Google claims, justifiably, that if the algorithm were transparent, then the results would risk being less relevant, whereas its opponents claim that it is unthinkable to use a tool that claims to organize knowledge without being able to know what criteria it bases that organization on. It is as part of this debate that Jean-Noël Jeanneney (2005) has suggested that a transparent European algorithm be set up, which could be used “in full knowledge of the facts and also critiqued, for the purpose of potential improvements, by whoever wanted to do so.”
Preferential treatment incentive
15According to some economists, Google’s desire to respond to a user’s query as best it can goes directly against the desire to maximize its profits (Hagiu and Jullien 2011; Taylor 2013). In particular, the company’s management may have a vested interest in favoring results from their own websites in the search engine’s results, rather than from those of the competition. For example, if an internet user based in France enters the search query “Paris-Brest route,” it is in Google’s interest to give preference to links that redirect toward Google Maps in its results, rather than toward, say, ViaMichelin or Mappy. In the same way, if an internet user in France is looking to compare prices, Google has a vested interest in directing them toward Google Shopping rather than toward Twenga. This is why some of Google’s competitors are so outspoken in their protests against the company, accusing it of using its general search engine’s dominant position to engage in disloyal competition when it comes to the specialized research market, by referencing its own services over those of the competition. It is this accusation that caused Google to be the object, in mid-April of 2015, of a statement of objections published by the European Commission, which could warrant a fine that would cost it 10 percent of its sales revenue, i.e. approximately 6 billion dollars.
16Furthermore, Google is an advertising franchise that helps publishers to monetize their content with advertisements that come from its network of advertisers. Publishers can thus form a partnership with Google, whereby the company will take care of finding relevant advertisements for them to display alongside their content. The revenue from these advertisements is then shared between the publisher and Google. Economically speaking, Google therefore has a vested interest in favoring its partners in its search engine’s results, in addition to its own services (Rieder and Sire 2014; 2015).
17Since the algorithm is unknown, it is impossible to determine whether Google (whose spokespersons swear that such is not the case) has a tendency to prioritize its own websites in the results of its general search engine, or those of partner companies, in its capacity as a mediator within the advertising market. For, as Google’s founders have said themselves, in the absence of transparency, it would be perfectly possible to “add a small factor to search results from ‘friendly’ companies, and subtract a factor from results from competitors. This type of bias is very difficult to detect, but could still have a significant effect on the market.” (Brin and Page 1998)
18Finally, observer Dan Sullivan holds that there is a fundamental conflict of interests in Google’s situation, and asks in what way a company that relays a significant amount of traffic to publishers can also provide them with their means of monetizing that traffic (Sullivan 2009). The economist Alexander White (2013), for his own part, expressed surprise upon reading about the different inquiries undertaken by the competition’s authorities, at the silence surrounding the fact that Google seems to have a decided incentive to manipulate its search engine results—“one wonders whether there’s an elephant in the room,” he muses.
Personalization of search results
19Google personalizes its search engine’s results. To put this another way, if two internet users enter the same query, they may obtain very different results, based on what the engine knows about them. This induces a kind of “dialectical tension” between individualization and standardization (Miconi 2014). Even though it is seen by certain observers as offering a solution to the problem of systematic bias inherent in the search engine’s workings, and although it may lead to more relevant results for individual users (Goldman 2006), personalization sparks fiery debates.
20The primary argument against it consists in pointing the finger at the risk of enclosing internet users within their own cultural understanding, by preventing them from being confronted with ways of thinking that differ from their own (Sunstein 2002; Van der Hof and Prins 2008). In the case of a very high level of personalization, the search engine could become a confirmation rather than an information tool: it would comfort the internet user with knowledge they possess prior to expressing a need for information, instead of opening up new horizons for them (Bozdag 2013).
21Eli Pariser (2011) has contributed significantly to publicizing this controversy by explaining, in a much-discussed book, how it was possible that Google might be keeping users inside an information “filter bubble.” According to him, an internet user it had identified as having progressive tendencies would get more progressive content suggested to them, or even exclusively progressive content. They would hardly, if at all, be confronted with opinions that contradicted their own. Pariser sees a kind of gatekeeping at work here, the danger of which is that it could lead to a subtle form of censorship. This is why he suggests that a tool should be developed that would allow users, regardless of how technically savvy they are, to easily control the degree to which the results they receive are personalized whenever they enter a query into the search engine.
22Some authors disagree with Pariser. This is the case, for example, with sociologist Dominique Cardon (2011), according to whom the idea that personalization “reduces the user’s worldview to their own interests hardly checks out. The internet contributes less to isolating online users to a confined information space than it does to reinforcing the divide between those who expose themselves to large amounts of information and those who increasingly keep the public sphere at arm’s length.”
23The blurriness surrounding the way the search engine works makes it difficult to know to what extent results are personalized, and if they are, how, and according to what criteria. This is why the debate rages on between supporters and critics of Google’s personalization of the results it produces. In spite of the methodological difficulties that measuring the degree of personalization entails, certain empirical studies have nonetheless been attempted, revealing that Google’s results were not as personalized as one might have been led to believe upon reading Pariser’s work (Von Schoultz and Van Niekerk 2012).
24Moreover, part of the debate around personalization relates to who benefits from these techniques. While some authors think that personalization can be beneficial to users (Goldman 2005), others claim that it probably has a tendency to benefit the advertisers for whom Google offers to target users based on their identified interests (Fuez et al. 2011). This kind of bias, which favors advertisers over users, would risk increasing the hold of filter bubbles and might contribute to making their borders more impenetrable (Granka 2010).
25Google’s search engine is a meta-editorial tool. It produces a discourse about discourses, an outlook on outlooks. Over the course of the 2000s, as the company was asserting itself over its competition, Google’s search engine designers went from being driven by a will to describe—if a page is interesting, it will be at the top—to an ambition to normalize—if a page is at the top, then it is interesting (Eisermann 2009). Today, the company has a quasi-monopoly over the information research domain. In Europe, more than 90 percent of queries are submitted via its search engine (AT Internet 2015). Given this context, Google has been the object of a number of controversies, some of which question what the search engine does to and does with information. The ones that have been discussed here can be summarized in the following series of questions:
- Should Google’s search engine be objective and neutral, or should it embrace its role as a necessarily subjective editor?
- Should engineers leave it up to the algorithm to sort through information, or should they sometimes manually intervene?
- Should the algorithm’s details be transparent, or should they be kept secret?
- Does Google favor its own websites within its search engine’s results, as well as those of its economic partners?
- Should Google personalize its results, and if so, then to what extent?
26There are no, nor will there ever be, obvious answers to these questions. In all likelihood, they will remain controversial. What is being asserted here is that they are consubstantial to treating information algorithmically, on a global scale, as Google claims to. By gaining an awareness of these controversies, users will be able to bear in mind the limitations of the tool they are using, and form their own opinions about what can and should be done with the information that the search engine grants them access to. In so doing, they will avoid both the pitfalls inherent in claiming that “the search engine is beyond reproach, you can use it without giving it a second thought,” and the one that involves saying it should never be used again. Finally, while an awareness of these controversies allows one to remain vigilant, it also makes it possible to compare Google’s search engine to those of its competitors: users can learn about how such and such an engine handles each of the aforementioned questions, and make a choice based on what they themselves think the answer to each of those questions ought to be.
Notes
-
[1]
Content farms are websites that host a lot of content that provides very little added value and which has usually been copied from elsewhere and then automatically published, with the sole purpose of inserting advertisements into the articles, thereby generating income at little cost.
-
[2]
We know of some of the criteria that Google takes into account, such as centrality (i.e. the number of links that redirect to the document), performance (i.e. how quickly the page loads), or social signals (the number of “likes” on Facebook, the number of tweets). However, on the one hand we don’t know how these criteria factor into the algorithm, and on the other, we don’t know what other criteria are taken into account (there could be between 200 and 300, according to specialists).