October 2015: ‘Cash Investigation’. An investigative French channel programme conducted a striking enquiry about the activities of firms trading personal data to optimise their activity and expand their market share. Large business groups such as Apple or Carrefour have based to a great extent their profit on data mining and data tracking. They are called ‘datavores’: these are firms which use marketing inference from users’ online activity to increase their efficiency. And this fashion does not seem to stop. It has extended to every scale of society: education, police surveillance, urban facilities, public service improvements and marketing strategies.

 

But what is exactly datafication? It is ‘the transformation of social action into online quantified data, thus allowing for real-time tracking and predictive analysis’. This quantification has occurred in unexpected fields: relationships are being quantified and so are social life and even democratic participation.

The question is the following: how does datafication influence the dynamics of inclusion and exclusion in democratic participation and exclusion?

              How does it work?

How come the Internet seems to know us so well? All activity carried out online is memorised and indexed. It is not just a matter of data storage: the data collected from a user’s web activity (‘digital traces’) is aggregated, compared and being linked together. This association allows the identification of salient patterns or habits. Every new element added to the already known data will then be categorised and will be able to predict the user’s future behaviour, depending on his/her previous experience on the web. This is what is called ‘algorithmic inference’. And if it tends to present itself as impartial and objective, that is far from being the case …

              10 consequences

1. Managing algorithms requires some form of tacit knowledge (knowledge which is not comprised in the data itself) and hence some specific skills. In the workplace, it is more and more frequent that companies ask for employees who have computer skills. What is more, not everybody deems themselves experienced enough to participate online. It can lead to a ‘spiral of silence’, that is to say an attitude of self-censorship.

2. It reduces the citizen to a consumer and/or producer of technological devices which again excludes those who refuse or are not familiar with data.

3. Politicians may use the data collected about an individual to exclude them from civic participation and cut their political rights — when they are deemed dangerous for example.

4. The web is still largely dominated by large corporations such as Google and Facebook, forming an oligopoly of shareholders. We should step back from the idyllic aspect of new social media. Facebook has taken over the whole World Wide Web by opening its structure to other platforms and websites through social plugins (e.g., the Like button) but, at the same time, as all the links and plugins flow back to Facebook, it recentralises all the fabric of the web.

5. The peer affiliations induced by algorithmic inference. For example, people on social networks are gathered in virtue of their common interests or their common background. The information offered to us only reinforce our convictions without challenging them and without allowing us to broaden their horizons. This is what is called the ‘filter bubbles’.

6. The conclusions drawn by algorithm inferences may be inaccurate or ethically questionable. Like the story of a woman who learnt she was pregnant after receiving much advertisement on pregnancy products.

7. The power of money: alternative narratives and alternative ways of participating in mainstream platforms are discredited because of their lack of resources. For most websites, revenues stem from policies of advertising and strategies of visibility. The more financially prosperous a website/platform is, the more likely it will get viewers, which in return induce a virtuous circle ensuring its viability over time.

8. The uneven repartition of information (like in countries where communication is muzzled) increases exclusion.

9. Datafication leads to discrimination, wittingly or unwittingly. If indeed data can be incorrect because of hasty correlations, discrimination may also come from too much exact information about a user. Certain information ought not to be known because it should not be taken into account when providing access to goods and services: one’s sexual orientation for example.

10. Marginalising the already marginalised. Example: the software ‘Zip Lookup’ developed by Esri, a geographic information system.  The platform uses the zip code of individuals living in the United States to deduce socio-economic characteristics such as their level of education, their way of life, their family structure, their consumption patterns. From these inferences, the software manages to know how one spends their time and therefore who they are.

Thanks to zip codes, a map of the United States divided in clusters is created, each cluster being associated with a given profile. In order to create the map, US census demographic data is combined with inferred marketing data. Each segment is thus defined by its socio-economic and demographic compositions. Segments which share common patterns are grouped together as ‘markets’ (for instance, people from the same generation or from the same country of origin). Life-mode groups are labelled as ‘soccer moms’, ‘fresh ambition’, ‘military proximity’, ‘college towns’, ‘dorms to diplomas’ etc., and are correlated to urbanization groups.

For instance, ‘principal urban centres’ which gather the ‘fresh ambitions’ and ‘laptops and lattes’ groups are defined as ‘young, mobile, diverse populations (…), occupied by singles or roommates (…) constantly connected, (…) focused on style and image with liberal spending’.

There are however some problems: first, the software overlooks the heterogeneity and complexity within the same cluster (on the contrary, it endeavours to remove all outliers so as to create homogeneous clusters). Secondly and more importantly, the data used was inferred from what citizens have done online and then generalised as a global pattern and as something equally shared. This data then becomes self-reinforcing, without accounting for the exclusions and dynamics at work. Eventually, it fosters the segmentation of clusters and impedes areas labelled negatively from prospering; for example, by hampering businesses or tourism from taking place.

              What to do then?

The problem is not about vilifying data: data offers great potential for inclusion and transparency in the production of knowledge and social interactions. It is actually about providing accessible data, not a priori, but in the practical complexity of social dynamics, taking into account that some clusters of the population do not have the same competencies as others.

The fundamental aim of datafication is to empower individuals so as to make them active users capable of making free and rational choices, and to eventually bolster public debate and the quality of democracy.