User:ML11336/sandbox

Introduction
On this page, you will find my edits, which are mainly my assignments. I am a student at Middlesex University. The edits on Data collection: Data Collection Ethics, GDPR, Sensitive Data Leaks, Behavioural Identification and Profiling Online, Data Collection Methods on Social Networking Sites are my own, while others such as Social Capital and Pursuit of the Clicks, Other Side of Online Social Capital and Panopticon in the modern world are follow-ups to my peer's edits on Social Capital and Panopticon in the modern world respectively.

Data Collection Ethics
In recent years, we witnessed growing concerns regarding data collection. Social media enabled easy access to an enormous base of the users and their private information. Unclear and not enforced laws and policies can leave room for exploitation. The argument about the lack of clarity about the use of user data is also a cause for concern. Some of the known cases were :


 * Facebook and Cambridge Analytica
 * Polar
 * Exactis
 * Aadhaar
 * Mariott
 * Panera Bread
 * Google

Cambridge Analytica was able to harvest data from 87 million Facebook users without their knowledge. Polar, a fitness app exposed the data of U.S military and security personnel due to insufficient protection. Exactis, a marketing company left data of nearly 340 million people on a publicly accessible server, for anyone to access. Indian biometric system, Aadhaar, had collected home addresses, fingerprints, and photos of nearly 1.1 billion Indian citizens. It also faced a breach, where login credentials to the system were sold through WhatsApp. Hospitality company, Marriott International was consistently attacked from 2014 to 2018, resulting in the hackers having data points of 327 million guests. An American chain of stores, Panera Bread, leaked the records of their customers on their website and it remained in that state for over eight months. After that, the company claimed that it has been fixed but the researchers identified that it was not, resulting in further exposure of data until the website has been taken offline and the leak has been fixed.

Since 2016, Facebook had a hidden code in images that are being uploaded to the site. The IPTC embedded in the photos allows for extensive tracking of the image by Facebook as well as third parties. The metadata that follows the image can be used to associate users, as it tracks the original upload and then knows who re-uploads it. This can also be used as a means for targeting by identifying the content and then suggest associated posts to the user. In comparison, Twitter strips the very basic IPTC code from the content that is uploaded.

General Data Protection Regulation
European Union has pushed in the “General Data Protection Regulation”. It puts a legal obligation on people responsible for data collection and processing to implement technological and organisational safeguards for data protection. However, it also leaves some flexibility as it is only a regulation. Outside of EU, there are no such regulations. Therefore, institutions and companies are not obliged to disclose any breaches and leaks in any way. After Brexit, United Kingdom still must comply with GDPR. However, it is a domestic version of the regulation. It allows for exceptions in certain circumstances, such as national security or immigration.

Sensitive Data Leaks
Various data is being collected from the users. In some cases, it might not be important, while in others it may be harmful or irreversible when collected or leaked.

GEDmatch faced an incident which could result in a leak of 1.3 million user's DNA profiles to law enforcement. The platform serves to test and identify potential relatives in their database, identify ethnic roots etc. The most important aspect it, is that DNA cannot be changed. The information that might have leaked may be harmful and irreversible. GEDmatch claims to have followed their data security measures. The website was briefly taken down as a precaution.

Behavioural Identification and Profiling Online
Our device or account security is usually assured by one of the three basic measures of authentication :


 * 1) Knowledge Factors (passwords)
 * 2) Possession Factors (authentication token such as Google Authenticator)
 * 3) Inherence Factors (biometric data such as fingerprint etc)

Most websites use passwords as a means of authentication, some implement others to increase security. However, based on policies, it may lead to extensive profiling.

Christophe Rosenberger determined that it is possible to identify a users gender after a few keystrokes. With more and more user input, more information can be discovered, as typing patterns are unique to a person.

Google's reCAPTCHA 3.0 analyses activity patterns on a website to determine whether the user is human or a bot. To make it work, its code has to be planted on every page of a website to learn the usual patterns of a person. It also looks for Google cookie on the browser sending a request to determine if the user is a human. There has been some privacy concern, as Google confirmed that the API sends software and hardware data to them for analysis. The company claims that the information is only used to fight spam and abuse. However, their privacy policy does not state how the data is used.

Data Collection Methods on Social Networking platforms
Research done on social media sites is usually done for academic, business, or political purposes. The following tools are used in the process:


 * Scraping/Crawling tools (Web-Harvest, Storm, Tubekit, Octoparse, Datahut, ParseHub, Dexi.io, Mozenda, ScrapingHub, Import.io, various APIs)
 * Processing and storing data (new4j, Nvivo, SPSS)
 * Visualisation (amCharts, Gephi)

Social Capital and Pursuit of the Clicks
The pervasive presence of social media has led to increased attention in online social capital. The utilisation of click farms, bot campaigns are proof of the relevance of our perception of gaining likes and followers. The difference between the offline popularity and its current form is that it can be flaunted as a status and measured. While we can be aware of artificial measure’s interference, the importance of a user can still be determined by a number. There is also a non-quantifiable form of online social capital such as sharing knowledge.

Other Side of Online Social Capital
For the social networking sites, it is an opportunity for free labour done by the users providing the data for them to be collected.

It can be used to gather groups with malicious intent such as right-wing extremists. Exclusion and marginalisation of users can also be a promotion of intolerance. This is reinforced by filter bubbles, which effectively negate the user’s attempts to branch out. With malicious intent groups, it can be severely harmful.

Francis Fukuyama argued that the increased number of users can lead to a larger group gaining access to social capital and thus avoiding exclusion to a great extent.

The argument by Fuchs was made, that the consequences of online social interactions are gathering the fruits of user’s digital labour. This leads to a devaluation of opportunities that social networking platforms present.

Panopticon in the modern world
China has implemented the Social Credit System. It is designed to rate and classify citizens based on their behaviour and actions. Actions such as paying bills on time or charity work increase the score, while crimes and tickets, playing video games for long amounts of time or just being friends with people with a low score can decrease it. The score determines whether a citizen is able to get a loan, rent a property or a car, access to more prestigious education and use airlines. Lower rating also excludes citizens from leadership positions and puts them behind in queue for medical care. The score is calculated by an algorithm. All the data is being kept on government servers. It uses citizen’s online activity from social media to an analysis of their browsing history or online shopping, combined with 200 million cameras (an estimate of two cameras per citizen) and other biometric data gathered to determine his rating, from social media to an analysis of their browsing history or online shopping. The combination of measures forces citizens to forge relationships that are beneficial to their rating, forces their way of personal and professional life, can be a deciding factor in the selection of education. This example of intensive surveillance forces people into self-regulation not only in real life, with cameras and biometric data collected and rated, but also digital. It erases the little freedom users experience online. In China, the algorithms have taken the place of prison guards in the online sphere.