Critical data studies

Critical data studies is the exploration of and engagement with social, cultural, and ethical challenges that arise when working with big data. It is through various unique perspectives and taking a critical approach that this form of study can be practiced. As its name implies, critical data studies draws heavily on the influence of critical theory, which has a strong focus on addressing the organization of power structures. This idea is then applied to the study of data.

Interest in this unique field of critical data studies began in 2011 with scholars danah boyd and Kate Crawford posing various questions for the critical study of big data and recognizing its potential threatening impacts on society and culture. It was not until 2014, and more exploration and conversations, that critical data studies was officially coined by scholars Craig Dalton and Jim Thatcher. They put a large emphasis on understanding the context of big data in order to approach it more critically. Researchers such as Daniel Ribes, Robert Soden, Seyram Avle, Sarah E. Fox, and Phoebe Sengers focus on understanding data as a historical artifact and taking an interdisciplinary approach towards critical data studies. Other key scholars in this discipline include Rob Kitchin and Tracey P. Lauriault who focus on reevaluating data through different spheres.

Various critical frameworks that can be applied to analyze big data include Feminist, Anti-Racist, Queer, Indigenous, Decolonial, Anti-Ableist, as well as Symbolic and Synthetic data science. These frameworks help to make sense of the data by addressing power, biases, privacy, consent, and underrepresentation or misrepresentation concerns that exist in data as well as how to approach and analyze this data with a more equitable mindset.

Motivation
In their article in which they coin the term 'critical data studies,' Dalton and Thatcher also provide several justifications as to why data studies is a discipline worthy of a critical approach. First, 'big data' is an important aspect of twenty-first century society, and the analysis of 'big data' allows for a deeper understanding of what is happening and for what reasons. Big data is important to critical data studies because it is the type of data used within this field. Big data does not necessarily refer to a large data set, it can have a data set with millions of rows, but also a data set that just has a wide variety and expansive scope of data with a smaller type of dataset. As well as having whole populations in the data set and not just sample sizes. Furthermore, big data as a technological tool and the information that it yields are not neutral, according to Dalton and Thatcher, making it worthy of critical analysis in order to identify and address its biases. Building off this idea, another justification for a critical approach is that the relationship between big data and society is an important one, and therefore worthy of study.

Ribes et. al. argue there is a need for an interdisciplinary understanding of data as a historical artifact as a motivating aspect of critical data studies.The overarching consensus  in the Computer-Supported Cooperative Work (CSCW) field, is that people should speak for the data, and not let the data speak for itself.

The sources of big data and it’s relationship to varied metadata can be a complicated one, which leads to data disorder and a need for an ethical analysis. Additionally, Iliadis  and Russo (2016) have called for studying data assemblages. This is to say, data has innate technological, political, social, and economic histories that should be taken into consideration. Kitchin argues data is almost never raw, and it is almost always cooked, meaning that it is always spoken for by the data scientists utilizing it. Thus, Big Data should be open to a variety of perspectives, especially those of cultural and philosophical nature. Further, data contains hidden histories, ideologies, and philosophies.

Big data technology can cause significant changes in society's structure and in the everyday lives of people, and, being a product of society, big data technology is worthy of sociological investigation. Moreover, data sets are almost never completely without any influence. Rather, data are shaped by the vision or goals of those gathering the data, and during the data collection process, certain things are quantified, stored, sorted and even discarded by the research team. A critical approach is thus necessary in order to understand and reveal the intent behind the information being presented.One of these critical approaches has been through feminist data studies. This method applies feminist principles to critical studies and data collecting and analysis. The goal of this is to address the power imbalance in data science and society. According to Catherine D’Ignazio and Lauren F. Klein, a power analysis can be performed by examining power, challenging power, evaluating emotion and embodiment, rethinking binaries and hierarchies, embracing pluralism, considering context, and making labor visible. Feminist data studies is part of the movement towards making data to benefit everyone and not to increase existing inequalities. Moreover, data alone cannot speak for themselves; in order to possess any concrete meaning, data must be accompanied by theoretical insight or alternative quantitative or qualitative research measures. Based on different social topics such as anti-racist data studies, critical data studies give a focus on those social issues concerning data. Specifically in anti-racist data studies they use a classification approach to get representation for those within that community. Desmond Upton Patton and others used their own classification system in the communities of Chicago to help target and reduce violence with young teens on twitter. They had students in those communities help them to decipher the terminology and emojis of these teens to target the language used in tweets that followed with violence outside of the computer screens. This is just one real world example of critical data studies and its application. Dalton and Thatcher argue that if one were to only think of data in terms of its exploitative power, there is no possibility of using data for revolutionary, liberatory purposes. Finally, Dalton and Thatcher propose that a critical approach in studying data allows for 'big data' to be combined with older, 'small data,' and thus create more thorough research, opening up more opportunities, questions and topics to be explored.

Issues and concerns for critical data scholars
Data plays a pivotal role in the emerging knowledge economy, driving productivity, competitiveness, efficiency, sustainability, and capital accumulation. The ethical, political, and economic dimensions of data dynamically evolve across space and time, influenced by changing regimes, technologies, and priorities. Technically, the focus lies on handling, storing, and analyzing vast data sets, utilizing machine learning-based data mining and analytics. This technological advancement raises concerns about data quality, encompassing validity, reliability, authenticity, usability, and lineage.

The use of data in modern society brings about new ways of understanding and measuring the world, but also brings with it certain concerns or issues. Data scholars attempt to bring some of these issues to light in their quest to be critical of data.

Technical and organizational issues could include the scope of the data set, meaning there is too little or too much data to work with, leading to inaccurate results. It becomes crucial for critical data scholars to carefully consider the adequacy of data volume for their analyses.

The quality of the data itself is another facet of concern. The data itself could be of poor quality, such as an incomplete or messy data set with missing or inaccurate data values. This would lead researchers to have to make edits and assumptions about the data itself. Addressing these issues often requires scholars to make edits and assumptions about the data to ensure its reliability and relevance.

Data scientists could have improper access to the actual data set, limiting their abilities to analyze it. Linnet Taylor explains how gaps in data can arise when people of varying levels of power have certain rights to their data sources. These people in power can control what data is collected, how it is displayed and how it is analyzed.

The capabilities of the research team also play a crucial role in the quality of data analytics. The research team may have inadequate skills or organizational capabilities which leads to the actual analytics performed on the dataset to be biased. This can also lead to ecological fallacies, meaning an assumption is made about an individual based on data or results from a larger group of people.

These technical and organizational challenges highlight the complexity of working with data and emphasize the need for scholars to navigate a landscape where issues related to data scope, quality, access, and team capabilities are intricately interwoven.

Some of the normative and ethical concerns addressed by Kitchin include surveillance through one's data, (dataveillance ) the privacy of one's data is referenced in this article and one of the main key points that the National Cybersecurity Alliance touches on is how data is rapidly becoming a necessity as companies recognize it as an asset and realize the potential value in collecting, using, and sharing it (National Cyber Security Alliance]), the ownership of one's data in which Scassa writes on how debates over ownership rights in data have been heating up. In Europe, policymakers have raised the possibility of creating sui generis ownership rights in data  (Data Ownership), the security of one's data in which Data breaches pose a threat to both individuals and organizations. Learn more about data security breaches and what cybersecurity professionals do to prevent them (Data Security breach), anticipatory or corporate governance in which Corporate data and information are used interchangeably, but they are not the same terms. There are differences between these, and their purpose also differs. Corporate data is a raw form of information without proper meaning or usefulness unless it is processed and transformed into meaningful forms (Corporate Data and Information), and profiling individuals by their data. [5] This is heavily emphasized in data colonialism (Data colonialism), where data sovereignty is encouraged for individuals that are being harmed, because it can be a powerful tool for whom that data represents. A common theme across these approaches to data sovereignty is when and how to collect, protect, and share data with only those who have a legitimate or appropriate need to access it. All of these concerns must be taken into account by scholars of data in their objective to be critical.“The labels that we attach to the data are always going to be cruder and less representative of what they describe than what we would like them to be. Treating candidates under a single label, whether it's a gender label, whether it's a gender label, whether it's an age group, whether it's consumers of a particular product, or whether it’s people suffering from a particular disease, can cause people to be treated as interchangeable and fungible data points. Every one of those individuals with that label is unique and has the right to be respected as a person" (Vallor: Data Ethics). All of these concerns must be taken into account by scholars of data in their objective to be critical.

Following in the tradition of critical urban studies, other scholars have raised similar concerns around data and digital information technologies in the urban context. For example, Joe Shaw and Mark Graham have examined these in light of Henri Lefebvre's 'right to the city'.

Practical applications of critical data studies
The most practical and concerning applications of critical data studies is the cross between ethics and privacy. Tendler, Hong,Kane, Kopaczynski, Terry, and Emanuel explain that in an age where private institutions use customer data to market, perform research on customer wants and needs, and more, it is vital to protect the data collected. When looking into the medical studies field one small step in protecting participants is informed consent.

There are many algorithmic biases and discrimination in data. Many emphasize the importance of this in the healthcare field because of the gravity of data driven decision outcomes on patient care and how the data is used and why this data is collected. Institutions and companies can ensure fairness and fight systemic racism by using critical data studies to highlight algorithmic bias in data driven decision making. Nong explains how a very popular example of this is insurance algorithms and access to healthcare. Insurance companies use algorithms to allocate care resources across clients. The algorithms used demonstrated “a clear racial bias against Black patients” which caused estimated “health expenditures [to be] based on historical data structured by systemic racism and perpetuating that bias in access to care management”

In many trained machine learning and artificial models, there is no standard model reporting procedure to properly document the performance characteristics. When these models are applied to real life scenarios the consequences have major effects in the real world, most notably within the context of healthcare, education and law enforcement. Timnit Gebru explains how the lack of sufficient documentation for these models makes it challenging to users to assess their suitability for specific contexts, this is where the use of model cards comes into play. Model cards can provide short records to accompany machine learning models in order to provide information about the models characteristics, intended uses, potential biases, and measures of performance. The use of model cards aims to provide important information to its users about the capabilities and limitations of machine learning systems and ways to promote fair and inclusive outcomes with the use of machine learning technology.

Theoretical frameworks of critical data studies
Data feminism framework promotes thinking about data and ethics guided by ideas of intersectional feminism. Data feminism emphasizes practices where data science reinforces power inequalities in the world and how users can use data to challenge existing power and commit to creating balanced data. According to D'ignazio and Klein, the intersectionality of data feminism acknowledges that data must account for intersecting factors like identity, race, class, etc. to provide a complete and accurate representation of individuals' experiences. This framework also highlights the importance of various ethical considerations by advocating for informed consent, privacy, and the responsibility data collectors have to individuals data is being collected from.

Dataveillance is the monitoring of people on their online data. Unlike surveillance, dataveillance goes far beyond simply monitoring people for specific reasons. Dataveillance infiltrates people's lives with constant tracking for blanket and generalized purposes. According to Raley it has become the preferred way of monitoring people through various online presence. This framework focuses on ways to approach and understand how data is collected, processed, and used emphasizing ethical perspectives and protecting individuals information. Datafication focuses on understanding the process associated with the emergence and use of big data. According to Jose and Dijck, it highlights the transformation of social actions into digital data allowing real time tracking and predictive analysis. Datafication emphasizes the interest driven process of data collection since social activities change while the transformation into data does not. It also examines how societal changes take effect as digital data becomes more prevalent in our everyday lives. Datafication stresses the complicated relationship between data and society and goes hand in hand with dataveillance.

Algorithmic biases framework refers to the systematic and unjust biases against certain groups or outcomes in the algorithmic decision making process. Häußler says that users focus on how algorithms can produce discriminatory outcomes specifically when it comes to race, gender, age, and other characteristics, and can reinforce ideas of social inequities and unjust practices. Generally there are key components within the framework bias identification, data quality, impact assessment, fairness and equity, transparency, remediation, and implications.