User:Laureniscool123/sandbox

Big Data & Discrimination
The process of gathering massive amounts of information from numerous online platforms for the purpose of identifying potential patterns is known as “Big Data." From that aggregated data, the process of data mining can ensue. Data mining aims to locate statistical relationships in a data set and discover useful patterns. Those patterns can then be used by public or private entities to inform decision making. For example, law enforcement wanting to determine where to most effectively allocate resources, or medical professionals wanting to discover the side effects of prescription drugs.

According to some, the stream-lined decision making afforded by big data is a step toward equality and fairness, such as fairer hiring decisions. In the job market, workforce analytics are being used by tech companies to highlight qualities that are more predictive of job performance than, for instance, the reputability of an individual’s alma matter. An example of this is the tech start up "Gild"—a online company that seeks out talented computer programmers and tracks how often developers reuse their codes. This recruitment strategy has been embraced by many—including the American Government—for its ability to find capable job candidates who may be disenfranchised from the job market due to attributes like their race, gender, or lack of formal training.

Others argue that the era of big data is full or risk. In contrast to reducing inequality, data mining can exacerbate existing inequalities among marginalized members of society. Though big data is often thought to be an unbiased and purely statistical set of information, big data is also regularly collected and gathered by individuals who are inherently biased. The algorithmic systems that convert data and information are not infallible and as a result, the process of data mining can inherit the prejudice of decision makers or reflect the widespread biases that exist in society. Approached without care, data mining can reproduce patterns of discrimination, mask opportunity, and reinforce bias.

Big data and its discriminatory potential have become a growing concern in the United States to such an extent that it was addressed at the political level. In May of 2014, the White House released a report entitled "Big Data: Seizing Opportunities, Preserving Value." The report, otherwise known as the Podesta Report, found that “big data analytics have the potential to eclipse longstanding civil rights protections in how personal information is used in housing, credit, employment, health, education, and the marketplace.”

Training Data & Labelling
Data mining is able to function as an automated form of artificial intelligence because it learns by example. What a data mining model learns “depends on the examples to which it has been exposed." Training data are the examples input into the system that train the model to function in a particular way. When the training data is biased, it leads to discriminatory models. In other words, if the data mining model uses prejudiced examples of training data as accurate examples to learn from, that prejudice is then reproduced.

Training data garners the capacity to become prejudice when the labels used to classify that data are up to the discretion of the data miners. In some cases, data is pre-labelled for the data miners. For example, “an employer using grades previously given at performance reviews is using pre-labelled examples.” In other cases, data miners may have to figure out a way to label the data themselves and as a result, the labelling becomes biased. When training data is biased, that data will skew the findings of all future cases. If that biased data is then used to inform an institutions decision making, the decision can potentially be founded in prejudice.

Admissions Discrimination at St. George's Hospital Medical School
An example of prejudiced labelling occurred at the St George’s Hospital Medical School in the United Kingdom. In 1979, a computer program was developed by Dr. Franglen, a member of the faculty, to pre-screen the program’s applicants and reduce the amount of work involved with selecting candidates for interview. When the system was developed, that training data used to select candidates was based on the personal information—such as name and place of birth—of past applicants that had been accepted into the program. The computer used each candidate’s information to generate a score which was used to determine whether or not they should be interviewed. Because the majority of past students who had been accepted to the program were male and caucasian, the system was biased in favour of applicants who held those characteristics. As a result, the computer program unfairly discriminated against women and those with non-European sounding names. In the years that the system was used, women and individuals belonging to racial minorities had a reduced chance of being interviewed. Though it was developed with the intent of eliminating inconsistencies in the ways that admissions staff selected ideal applicants, the prejudice of the training data resulted in a perpetuation of sex and race based discrimination. St. George’s Hospital Medical School was ultimately found guilty by the Commission for Racial Equality of practising racial and sexual discrimination.

Training Data and Representation
Training data also has the potential to discriminate against marginalized groups when the data sample is biased, incorrect, or does not represent the entire sample. For instance, when marginalized groups are either underrepresented or overrepresented in the data. Past studies have found that many institutions maintain systematically less timely, accurate, precise and complete records for certain classes of people. This underrepresentation of marginalized groups in data is often due to a lack of access to certain technologies. Scholars Jonas Lerman and Kate Crawford point out that factors such as poverty, lifestyle, and geography systemically omit those who live on the margins of big data and that as a result, some citizens and communities are overlooked. Because historically marginalized groups are “less involved in the formal economy and its data-generating activities, have unequal access to and relatively less fluency in the technology necessary to engage online, or are less profitable customers or important constituents,” they are more likely to be the victim of big data discrimination.

The Street Bump Example
An example of big data discrimination resulting from underrepresentation can be found in the application used by many Boston residents called Street Bump. Using the accelerometers that are built into smart phones, Street Bump detects when a driver rides over a pothole. While Street Bump has been praised for being a cost effective way for the city of Boston to track where to allocate resources, it has also been criticized by scholars like Kate Crawford, who points out that the data being collected about the roads in the city that are in need of being repaired will be biased by the uneven distribution of smartphones across the city. The citizens of Boston who have smart phones—and the neighbourhoods they reside in—will be highly represented in the data that is generated by Street Bump. In contrast, members of the population who do not have smart phones, and the neighbourhoods they inhabit, will be underrepresented. Should the city of Boston use the data from Street Bump to decide what areas are in need of the most road maintenance, marginalized communities would likely be underserved. There is a strong possibility that the city of Boston would discriminate against those who lack the ability to report problems as effectively as those with smartphones.

Proxies
While bias does play a role in big data and its potential to discriminate, big data does not have to be biased in order to be discriminatory. Even if discrimination is not artificially introduced into the data mining process, data mining can still result in less ideal outcomes for members of marginalized groups. This occurs when data-informed decisions perpetuate and exacerbate real world inequalities. For instance, “when the criteria that are genuinely relevant in making rational and well-informed decisions also happen to serve as reliable proxies for class membership.” An institution may not intentionally discriminate against certain groups because of their race or class, however, the criteria that institution uses to sort individuals according to their qualifications—such as post secondary education or professional networks—may also sort individuals according to factors like their race or class.

Data mining has the ability to "uncover patterns of lower performances, skills, or protected-by-law groups." These discoveries not only show that inequalities exist, but reveal that marginalized groups are unequivocally subject to these inequalities. The most prominent implication of what data reveals is how that information impacts decision making. Though data mining is efficient and provides companies with mass insight on who is an ideal employee, a low risk for life insurance, or most likely to pay off a loan in a timely manner, it also means less favourable outcomes for the individuals who do not meet that criteria. In the job market for example, by granting more opportunities to potential employees that according to the data will prove most competent at a given task, employers may subject members of marginalized groups to discriminatory treatment. This is because members of marginalized groups tend to possess the characteristics that employers find desirable at systemically lower rates than non-marginalized individuals.

Commercial Discrimination
The discriminatory practices that are undertaken by decision makers are not necessarily a result of their own prejudice beliefs. As actors in a capitalist market, their primary priority is to turn a profit and in doing so, they often make decisions that perpetuate inequalities that already exist in society. The act of discriminating against certain groups in the name of turning a profit is referred to as “commercial discrimination." One example of an industry where commercial discrimination is present is in the health care industry. According to the Pew Internet and American Life Project, over 70 percent of adults in the United States go online to seek health information from sources like government agencies, commercials websites, and discussion forums. These sites allow third party advertisers to use their platform and in doing so, the advertisers have the ability to track users and learn about their health-related browsing habits. Website operators and third party advertisers can then take the details they have gathered about an individual’s health concerns and misuse it in order to make money. Tim Libert found that online advertisers “use aggregate browsing information to place users into ‘data silos’, marking the desirable as ‘target’ and the less fortunate as ‘waste’." Because such a high percentage of bankruptcies in the United States are medical related, Libert infers that it is possible that those with a browsing history of health concerns may be categorized as waste and denied the interest rates and discounts awarded to individuals in the target category.

Algorithmic Discrimination
Algorithms are automated decision making systems used by both private and public institutions to assist them in making decisions more efficiently and rationally. Modern algorithms use advanced computer technology and aggregated datasets to not only automate decisions, but gain information and make assumptions about individuals. The goal of using algorithms is, for example, to cut costs in the private sector or to expand social welfare in the public sector. There are two primary ways that algorithms result in discrimination. The first is in the ongoing discrimination in American society, and how that discrimination is reflected in the big data that algorithms draw from. Discriminatory data leads algorithms to “arrive at biased results that disadvantage people of colour and people from low and moderate income communities.” Additionally, algorithmic discrimination occurs as a result of who the algorithms are structured by. Scholar James Allen points out that that decision making actions of algorithms are primarily structured by a homogenous group of individuals who “develop algorithms without transparency, auditing, or oversight.” Two examples of algorithmic discrimination taking place can be found in personal networks and algorithmic redlining.

Personal Networks
Big data not only discriminates based on who you are—and who you know—but also on who you are similar to. Based on an individual’s online practices and preferences, technical mechanisms like predictive analysis make inferences about the connections and similarities between an individual and other online users. For example, friends are more likely to have the same taste in music, movies or clothing than two random strangers. This information is extremely useful to online advertisers and marketing agents looking to find customers. However, in some instances, categorizing users based on their personal networks can lead to discriminatory practices. Despite aspirations for equal opportunity, "the networked nature of modern life can lead to very different outcomes for different groups of people." In the American job market, discrimination on the basis of attributes like sex, religion, age, ability, sexuality, creed, or race is legally prohibited. What is not prohibited, however, is an employer discriminating against certain applicants based on their personal network. The popularized use of algorithmic decision-making has increased the frequency of personal network discrimination.

One online platform that facilitates network discrimination is LinkedIn—a the social network site for both employers and employees. Individuals who join LinkedIn looking for employment not only create a public resume, but also to identify their personal and professional networks on the site. The goal of displaying their networks is to solicit public approval from these connections. Essentially, the user wants to show off their networks and show potential employers that they are well-connected in the professional realm. When employers use LinkedIn, a common practice for identifying suitable candidates is to target individuals who are connected to that company’s current work force in some way. In other words, candidates who are a good “cultural fit.” Where discrimination becomes a factor is in who this process excludes. Targeting potential employees based on their personal connections to the company means that candidates with institutional privilege will more likely be attractive to employers. In contrast, individuals who have been historically excluded from employment opportunities will be overlooked. Discrimination based on personal networks is not a new practice, however, it takes on a new detrimental significance when it "becomes automated and occurs at a large scale.”

In addition to the potential discrimination that accompanies the public articulation of personal and professional contacts, marginalized groups are also disadvantaged by a lack on insider knowledge on how to best align their public resume with the sort of language that algorithms look for. LinkedIn is much more commonly used by employers recruiting professional employees rather than for hiring service or manual labor employees. Therefore, it is becoming increasingly more common practice for large companies hiring minimum wage or low-skill labor employees to filter candidates computationally through the use of tracking and screening software. This process is attractive to large enterprises because of the time it saves. However, it also means that candidates can be labelled unqualified for something as minimal as not using the right buzzword in their list or skills. This creates a barrier for applicants who are then required to figure out the type of language that algorithms look for. That knowledge is “often shared within personal networks, so much so that if you’re not properly connected, you might not even know how to play the game." As a result, marginalized groups who may have limited professional networks, or experience a language barrier, are ultimately discriminated against by potential employers.

Algorithmic Redlining
In addition to personal networks, algorithmic redlining is a second form of algorithmic discrimination. The term “redlining” was initially coined to describe a practice being done by the Home Owners’ Loan Corporation (HOLC) to decipher who they would grant a loan. The HOLC drew red lines on a map to outline low-income neighbourhoods; neighbourhoods that they viewed as too financially risky to serve. These neighbourhoods were predominantly people of colour. According to scholars like James Allen, modern algorithms have made it so that redlining has gone from a physical practice to a digital practice. Like traditional redlining, algorithmic redlining excludes and segregates minority and low-income members of society from access to adequate housing.

Algorithmic redlining functions in many different forms, one of which is in credit card scoring and access to loans. Algorithms that determine how creditworthy an individual is are of particular significance because creditworthiness determines where or not that person has access to the funds necessary to acquire housing. This is especially significant given that home equity is often cited as “an important factor in helping families enter the middle class.” In addition to biased data impacting the credit scores of marginalized groups, credit score algorithms have begun to include data points that extend beyond personal financial transactions. For example, the credit card company American Express was criticized by customers who noticed that even though they had timely and successfully made their credit card payments, “their scores were tarnished for shopping at establishments where other patrons are considered less ‘creditworthy’.” What makes instances like this problematic is that creditworthiness is used to assess whether or not someone qualifies for housing financing. The mortgage industry, for example, relies on credit scores to decide who they grant loans to. When credit card scores are generated using racist, classist data, the scoring serves to disadvantage marginalized groups.