User:Tommytheprius/sandbox

= = Article Evaluations ==

Information Privacy Evaluation
This article "Information Privacy" seems to be generally well written at first glance. It's divided into subsections of privacy, and there are plenty of links to other pages that expand on certain topics within informational privacy. While a couple paragraphs seem to not have citations, there are a good amount at the end and enough throughout to suggest that most of the article is of scholarly quality. By looking at the references, I can tell that there are at least a couple from the past couple years, so I'd assume that this page is relatively up to date. While I do think it covers a lot of ground, if I had to add something it would be a subsection on information privacy as it pertains to minors.

The article doesn't seem to be biased and has a very neutral tone, and the coverage of each subsection appears to be relatively equal, except in the case of the Legality section where there is a link to another page dedicated to the subject.

The references I clicked on do work, but unfortunately some of them don't lead to very scholarly sources, such as newspapers like USA Today. Facts look to be backed up by evidence except when linked to sources other than peer reviewed journals.

This article is part of WikiProject Computing, WikiProject Mass Surveillance, and WikiProject Internet. The article was rated a C by all three projects. In the talk page, users have discussed taking out sections for bias, changing the title to more appropriately fit the search terms, and how some links are out of date. The editors are mainly discussing logistical aspects of the page, not really its content.

Phishing Evaluation
This article "Phishing" has almost 200 references, but many of them link to unreliable/unworthy sources such as blogs, magazines, and newspapers like CNN and The NYT. In fact, upon further inspection, the majority of citations are not to peer reviewed journals, but this is perhaps due to the nature of the topic and how specific it is. In the intro area, it even says [better source needed] at the end of the article's first paragraph. Also, it looks like somebody took a lot of time to add/revise the article somewhat recently because many references were retrieved in September 2016. It is organized well, both by theme and by chronological dates of phishing attacks. The subsection about SMS phishing is only one sentence long, and it could use some more information to bring it up to the length of others. The last two subsections under technical approach also look pretty weak in terms of length.

It's hard to say whether there is bias in this article because while it does seem like there may not be scholarly sources to back up each and every claim of a major phishing attack, or many of the other supposed facts, the article also seems to be neutral in tone.

Of the tens of links I clicked, only 1 led to a reliable academic journal. The others all came mainly from tech blogs and low level news sites. Most of these sites also looked very outdates. Not all of the facts are backed up as they should be.

This article is rated a B. It is a part of WikiProject Computer Security/Computing and Wikipedia Version 1.0 Editorial Team. Most of the people on the talk page discuss outdated information, especially in relation to some of the more bold statistics the article gives, like saying that people lose $5billion/year to phishing.

ASA Bibliography and Annotations

 * 1) Allen, Jeffrey and Ashley Hallene. 2018. “Privacy and Security Tips for Avoiding Financial Chaos.” American Journal of Family Law101–7.
 * 2) Castellà-Roca, Jordi, Alexandre Viejo, and Jordi Herrera-Joancomartí. 2009. “Preserving User’s Privacy in Web Search Engines.” Computer Communications 32:1541–51.
 * 3) https://ac.els-cdn.com/S014036640900125X/1-s2.0-S014036640900125X-main.pdf?_tid=c3a8a45f-8799-42f6-9ff5-534d94b9d029&acdnat=1538685871_a6a7da0a81906c4acc5dc8312ec07bf2
 * 4) Caufield, James. 2005. “Where Did Google Get Its Value?” Portal: Libraries and the Academy5(4):555–72.
 * 5) Link: https://muse-jhu-edu.libproxy.berkeley.edu/article/188622/pdf
 * 6) Annotation: The main focus of this article, as the title suggests, determining how Google became so prominent. It shows how the Google search engine has, unlike other search engines, adopted what the author refers to as “library values” and created an amazing algorithm. These library values are essentially a code of ethics that literal libraries adhere to. Caufield asserts that early search engines failed due to their lack of adherence to traditional library values. These failures include letting private interests interfere with search results and having a poor indexing system. When Google invented their patented PageRank algorithm, it was revolutionary in how much it improved search results. But this is also when privacy started becoming a major concern because it is when collecting user data started to become a tool in getting better search results. The author notes the tendency of Google to put cookies on a user’s computer, which then track their search queries and the websites they visit. Caufield makes sure to show that in addition to improving user experience, ad targeting is another reason that Google may be collecting data. This is a slippery slope though because a search engine may be trying to enhance the user experience but can still go over the line in terms of what would consider a breach of privacy by collecting personal data. It is also unclear to what extent Google and other search engines sell user data. This article is a very useful source because it has a thoughtful approach to how and to what ends Google collects data on its users. It seems reliable and unbiased because Caulfield does his best to discuss both sides of the privacy debate at equal length and doesn’t necessarily assume fishy motives. The article was very easy to read, although at times dull in the pre-google search engine section, and its target audience is probably people who want to learn more about the search engines they use daily. Although it may not have introduced me to new subjects, I do think this article did a good job of presenting Google’s data collection in a fair way by showing the reader that they may simply be trying to improve user experiences. I think this article will help me remember not to take a biased tone against search engines when eventually writing my own article.
 * 7) Chiru, Claudiu. 2016. “SEARCH ENGINES: ETHICAL IMPLICATIONS.” Economics, Management, and Financial Markets11(1):162–67.
 * 8) Link: http://content.ebscohost.com/ContentServer.asp?T=P&P=AN&K=114462805&S=R&D=bth&EbscoContent=dGJyMNLe80SeprI4zdnyOLCmr1Cep7JSs6y4SLCWxWXS&ContentCustomer=dGJyMPGnsEq0qbVIuePfgeyx43zx
 * 9) Church, Peter and Georgina Kon. 2007. “Google at the Heart of a Data Protection Storm.” Computer Law & Security Report23(5):461–65.
 * 10) Link: https://ac.els-cdn.com/S0267364907000726/1-s2.0-S0267364907000726-main.pdf?_tid=df88ab86-b149-44fa-bffe-da4ce4f3e10a&acdnat=1538682341_4b8078ef5750370d997f33b01041f75f
 * 11) de Mars, Sylvia and Patrick O'Callaghan. 2016. "Privacy and Search Engines: Forgetting or Contextualizing?" Journal of Law and Society43(2):257–84.
 * 12) Link: http://eds.a.ebscohost.com/eds/pdfviewer/pdfviewer?vid=6&sid=592a1411-f464-4c86-96fc-6db29efaee0c%40sdc-v-sessmgr01
 * 13) Dolin, Ron A. 2010. “Search Query Privacy: The Problem of Anonymization.” Hastings Science and Technology Law Journal 2(2):137–82.
 * 14) Link: https://heinonline.org/HOL/P?h=hein.journals/hascietlj2&i=151
 * 15) Annotation: The purpose of this article written by Ron Dolin was to look at the pros and cons of implementing anonymization, also known as deletion, of user search queries by search engines and to caution against hastily deciding as a society to take that route. Mainly, Dolin outlines the upsides of sticking to the status quo and the benefits involved in letting search engines log IP addresses, web browser, and information collected from cookies. In terms of a user’s search query history, the most important arguments Dolin gives supporting the usefulness of data collection are that it helps search engines return the most optimal results and advertisements, it helps when correcting spelling mistakes by users, and it helps the search engine decide what level of filtering to use in terms of filtering out profanities and inappropriate content. On the search engine’s side, Dolin claims that the collection of cookies and IP addresses helps them in improving their algorithms and protecting users from spam or phishing attacks. Dolin concludes that the arguments made for anonymization for the sake of protecting user privacy fail to fully take into consideration the tradeoffs involved. I found this article to be very unbiased and reliable because Dolin takes care to include both sides of the anonymization debate. I thought this article was quite useful because it shed a lot more light on the argument for search engines collecting user information, which is unusual compared to most of the articles I have read so far on the subject. I would recommend it to people who may have made up their mind and landed on the pro privacy protection side because it provides solid arguments for the collection side. It looks like the target audience is everyday people who are interested in learning more on the subject because the reading level is somewhat easy. There is no technical jargon that was hard to understand, perhaps because there was no experiment. It has changed how I think about search engine privacy in that, as I said before, it shed a lot more light on why it can be beneficial for both parties to collect user data.
 * 16) Evans, David S. n.d. “The Online Advertising Industry: Economics, Evolution, and Privacy.” The Journal of Economic Perspectives 23(3):37–60.
 * 17) Link: https://www.jstor.org/stable/27740539
 * 18) Annotation: In this article, Evans is focused mainly on the new phenomenon of targeted advertising. He discusses how advertising companies have changed their models when adapting to online advertising, how search engines are paid for different types of ads, and what there is to be gained from using targeting methods. Evans also makes note of what sorts of user data is being collected by both search engines and advertisers and to what end. The section that is most relevant to the subject of search engine privacy is called “The Privacy Dilemma” which details how search engines collect and store user search queries and IP addresses for months and how advertisers use tracking cookies and web beacons to get more information on user behaviors. He concludes his article with a discussion of the role of public policy in targeted online advertising, saying that the most important issue here is property rights of private data. I found this article to be very objective because Evans always considered both sides of the argument. When discussing use of behavioral data for targeted advertising, he acknowledged the benefits it could bring. He also noted how consumers, to an extent, have control over how private their data is via the privacy settings available on most platforms. I think the article is aimed at public policy makers, but I would also recommend it to anyone interested in how targeted advertising works and the costs to privacy involved since it was relatively easy to read. This article has altered my understanding of search engine privacy because it articulated the exact ways in which advertisers and search engines work together to compromise user privacy. It also showed me more specifically the financial incentive for search engines to do so in that advertisers are willing to pay a premium to place ads they think will be better received by certain users, which can only be achieved by the search engine letting the advertiser in on behavioral data of users.
 * 19) Foley, Jayni. 2007. “Are Google Searches Private? An Originalist Interpretation of the Fourth Amendment in Online Communication Cases.” Berkeley Technology Law Journal 22(1):447–75.
 * 20) Link: https://www.jstor.org/stable/24118241
 * 21) Annotation: The objective of Foley’s paper is to shed light on what sort of data search engines like Google collect about their users and how she views it ought to be protected under the Fourth Amendment when law enforcement agencies subpoena it. She begins by outlining facts on how many people use the internet, to what frequency, and exactly what data Google stores about each search query. In her literature review, Jayni discusses court decisions regarding whether Google had to comply with government subpoenas for search query data, and the result was somewhat surprising. The court ruled that Google did not have to turn over all the data the government was asking for not because of undue burden or irrelevance, but because it would make Google less trustworthy in its users’ eyes and could therefore have an uncalled for large negative effect on their business. The court decisions she discusses also generally seem to respect the privacy of search engine users. But the court decisions did not fully side with Google because Google was forced to provide the government with tens of thousands of URLs from searches. Her main conclusion about the privacy implications for search engine users so far as the Fourth Amendment applies is that it is still a grey area whether or not one has a reasonable expectation to privacy. While multiple more specific court cases differ on their findings, there is not consensus on how the constitution applies. Foley, however, is adamant in her view that the Fourth Amendment should indeed apply to user search queries because she thinks the Framers would have deemed search engine data to be under the umbrella of personal information. She suggests that although no court cases have protected the right to privacy in search engine data that Congress could, and maybe should since companies like Google may not put up much resistance to subpoenas, create a law to protect it anyway. This source was extremely useful, easy to read, and cited plenty of sources to show its credibility. This source seemed generally objective, although I did not see much discussion in defense of the government agencies that were requesting data from search engines. I’d definitely recommend this article to anyone interested in learning what happens behind the scenes with their search engine data, although the review of court cases was somewhat lengthy and dull. For me, this article provided a solid amount of knowledge about government interactions with search engines and legal precedent that will definitely help shape my understanding of these institutions. I didn’t realize there were so many laws to consider when thinking about search engine privacy, so this source will be very valuable when writing my wikipedia page.
 * 22) Ghose, Anindya, Panagiotis G. Ipeirotis, and Beibei Li. 2014. “Examining the Impact of Ranking on Consumer Behavior and Search Engine Revenue.” Management Science60(7):1632–54.
 * 23) Link: https://pubsonline.informs.org/doi/pdf/10.1287/mnsc.2013.1828
 * 24) Annotation: The authors of this article set out to research the effects of ranking that a search engine can have in regards to how consumers react. They identify direct effects of ranking, relationship between product ratings and their rankings, and a personalization effect of ranking as the three main types of search engine ranking effects they intend to study. In order to conduct experiments and manipulate scenarios, the researchers designed and created their own hotel search engine. I think it’s also important to note that in their experiments, the researchers were measuring projected revenue streams of search engines. The researchers found that to increase search engine revenue, a utility-based ranking method would be highly effective. They also concluded that there was a strong relationship between where a product was ranked in search engine results and its ranking and that if a fancy hotel was ranked lower, it really affected its popularity. For maximization of search engine revenues, the researchers suggest including social media signals in how the ranking system is developed. Finally, they found that when using an active personalized ranking mechanism, which is not really existent in the real world when looking at hotels, people had a lower amount of purchases, perhaps because of information overload. The researchers also confirm that search engine rankings have a substantial effect on how much users click certain links and on their overall purchase behavior. This paper looked extremely reliable and objective because the researchers took a lot of time to explain how they designed and tested their model and used randomized experiments. The authors also make sure to include in the conclusion the limitations of their experiment, mainly the how the internet is natural heterogenous, which makes it hard to control for all variables. The target audience of this article appears to be both search engine companies since it offers advice on how to increase profits and other academics since the researchers end the article with a lot of suggestions for further research. The reading level was somewhat difficult what with all the technical jargon and equations when they explain their models. I would only recommend it to people who have a true interest in the subject and are fluent enough in technical language to understand the middle section. While I thought some aspects of this article were interesting, I didn’t find it very useful to search engine privacy because the article is more focused on search engine advertising, with little to no mention of implications for privacy. The only real relation to privacy I can see from this article is the aspect of search engines measuring user click rates and giving that information to advertisers.
 * 25) Goldfarb, Avi and Catherine Tucker. 2011. “Online Display Advertising: Targeting and Obtrusiveness.” Marketing Science30(3):389–404.
 * 26) Link:
 * 27) Annotation: In this article, Goldfarb and Tucker conduct a study to examine different types of advertising online and the implications they may have for privacy concerns. The two major types of different online advertising are obtrusive ads and targeted plain text ads. In the face of dwindling popularity and effectiveness of banner ads, the authors note that obtrusive ads that are purposively designed to distract users and make ignoring them harder have gained popularity among advertisers. The second major type of advertising, plain text ads targeted precisely to websites visited, is highly profitable and perhaps spearheaded by Google’s AdSense. The downside of targeted plain text ads, however, is that they may be seen as more intrusive by consumers and therefore become less effective. The researchers created a model to test which was the most effective form of advertising: using obtrusive advertising, using targeted plain text advertising, or using a combination of the two. Their main conclusion was that either solution worked on its own, but when put together, they negatively affected each other and became ineffective overall. One possible reason for this, the authors say, is that users have a privacy concern that might be heightened when a targeted ad is more obtrusive. This article seemed unbiased and reliable because, in addition to having lengthy references, it reads like a scientific study. The authors even say that they are surprised at the outcome that combining the advertising techniques proved to be ineffective, which I think shows that they are not trying to push a specific point. The authors also acknowledge the limitations of their work and use that platform to suggest further research, which tells me that the intended audience is other academics. The reading level was somewhat easy, although it may be a little difficult for just anyone to interpret some of the charts and tables they include. I think this source will be useful for me because it articulates how privacy concerns relate to online advertising, although the article does not discuss search engines much. It has definitely broadened my knowledge of online advertising, but I think the majority of the article will not be directly applicable to search engine privacy.
 * 28) Goldfarb, Avi and Catherine Tucker. 2011. “Search Engine Advertising: Channel Substitution When Pricing Ads to Context.” Management Science57(3):458–70.
 * 29) Link: https://pubsonline.informs.org/doi/pdf/10.1287/mnsc.1100.1287
 * 30) Annotation: Goldfarb and Tucker’s aim in this paper is to determine if there is a substitution relationship with online and offline advertising, meaning that an ad online has the same effect as an offline ad does. Their other goal is to see how advertisers use targeting in relation to the substitutional relationship. They achieve this by looking at how much it costs advertisers to buy ads attached to certain keywords in different geographical locations. Goldfarb and Tucker conclude that there is indeed a substitutional relationship between search engine and offline advertising, especially when the targeted group is small. When the targeted group is very large, the two are not as substitutable because, the authors claim, simply advertising through a normal mass media campaign is effective enough. In their conclusions the authors also find a positive, mainly that intrusions on user privacy by search engines are curbed by restrictions on active solicitation of customers in grieving periods. While I did not find this article to be extremely useful to my topic, I did find it to be reliable and objective as the authors controlled for various factors in their study and conducted it in a scientific manner. It also has plenty of references, and the authors acknowledge how narrow their study is. This article is most applicable to antitrust authorities, other academics, and advertising agencies. I probably would not recommend it to a fellow student because the article was rather dry especially in the discussion of the intricacies of pricing for different keywords, though it was not very hard to read. Although this article did broaden my understanding of how specific pricing schemes may work and how new and technologies interact, I don’t think there are any profound implications for my search engine privacy article other than the general theme that search engines are participants in targeted advertising.
 * 31) Grimmelmann, James. 2007. “The Structure of Search Engine Law.” Iowa Law Review93(1):1–63.
 * 32) http://eds.a.ebscohost.com.libproxy.berkeley.edu/eds/pdfviewer/pdfviewer?vid=3&sid=50213593-d844-4eed-968a-d9c695eaa3a2%40sessionmgr4007
 * 33) Annotation: Grimmelmann’s intention in his article is to shed light on how the relatively new world of search engines is governed by current laws and what legal issues have arisen in regard to searching. He sets off by describing how a search engine works by introducing indexing, queries, results, and content as information flows. When discussing privacy of search queries, he notes that users of search engines who repeatedly make search queries make it possible for search engines to start constructing profiles of users. He also notes an apparent conflict between users, search engines, and third parties in regards to privacy because users want it but not at the expense of getting good results, search engines don’t want to be thought of as compromising privacy but also need to use the data to improve results, and third parties may want to compromise user privacy for any number of legitimate reasons. While he examines many different laws, including the First Amendment and free speech, and their applicability to search engines, Grimmelmann’s main conclusion is that the field is important, complex, and developing. I found this article to be reliable and unbiased because the author states in the beginning that he is not arguing for an agenda but simply analyzing the current framework in order for others to be informed from a public policy standpoint. The target audience seems to be other researchers and legal scholars makers because Grimmelmann repeatedly notes the need for more research on the subject and the need for more discussion on the subject of search engine law. I do not believe I’d recommend it to others because while informative, it felt difficult to identify key points and perhaps out of date since it was written over a decade ago and the legal framework may well have changed. This article has expanded my knowledge of search engine privacy in that I have learned much more about the legal framework surrounding it, but I do not think it will be very useful in terms of my wikipedia article because in addition to any laws or cases discussed being at least 11 years old, it doesn’t come to many concrete conclusions.
 * 34) Hands, Africa. 2012. “Duckduckgo http://www.duckduckgo.com or http://www.ddg.gg.” Technical Services Quarterly29(4): 345-347.
 * 35) Link: https://doi.org/10.1080/07317131.2012.705751
 * 36) Lenard, Thomas M. and Paul H. Rubin. 2010. “In Defense of Data: Information and the Costs of Privacy.” Policy & Internet2(1):1–56.
 * 37) Link: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1407731
 * 38) Annotation: Lenard and Rubin’s article, like Dolin’s, discusses the negative ramifications of increasing user privacy and therefore restricting commercial access to user data. The authors say that the user experience is improved with targeted advertising and that if user data was restricted, consumers would have a harder time getting the information they wanted while also getting inundated with irrelevant ads. Another point they make is that advertising supports free usage of search engines. From an economic standpoint, they also say that firms are incentivized to respond to privacy concerns to stay competitive if consumers care about privacy and that better information increases efficiency. Another pro for letting user data continue to be somewhat unrestricted is what they call the “public good” where the spreading of user information can aid in various innovations. The main conclusions of this article are that the spread of search engine user data is what allows for targeted advertising, which funds internet content and search engines, and that if consumers were truly worried about privacy and valued it, the market would supply more privacy protection. In true economist fashion, Lenard and Rubin say at the end that there should only be regulation in the case of market failure. This article proved to be both unbiased, in tone and in shedding light on the sides of both pro and anti user data spreading debate, and reliable with its extensive references. The target audience is most likely public officials who have a say in public policy because the authors caution them against regulation for privacy’s sake before fully understanding the benefits of using consumer data. I’d recommend this article to others because I found it easy to read, informative, and useful to my topic of search engine privacy. It has broadened my knowledge of the topic by discussing search engine privacy in an economic sense rather than merely a moral one. I hadn’t thought before about how search engines would be in business if not for the revenue they get from advertisers who largely use targeted advertising.
 * Ma, Ruxia, Xiaofeng Meng, and Zhongyuan Wang. 2012. “Preserving Privacy on the Searchable Internet.” International Journal of Web Information Systems 8(3):322–44.
 * 1) Link: http://delivery.acm.org/10.1145/2100000/2095577/p238-ma.pdf?ip=169.236.57.65&id=2095577&acc=ACTIVE%20SERVICE&key=CA367851C7E3CE77%2E21EA8071FB88D747%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&__acm__=1538514488_6c3105b16b9a53a48780b5564d748e23
 * 2) Nissenbaum, Helen. 2011. “A Contextual Approach to Privacy Online.” Daedalus, the Journal of the American Academy of Arts & Sciences140(4):32–48.
 * 3) Link: https://www.jstor.org/stable/pdf/41060684.pdf?refreqid=excelsior%3A4ccfaff93780a3838889981343db8870
 * 4) Annotation: The focus of Nissenbaum’s article is on internet privacy in general, but she does make note of a couple search engine related situations. She discusses the notice and consent phenomenon, which is what most current privacy policies are, and explores what she proposes to be a better way to preserve online. Notice and consent, which consists of showing the user a privacy policy and having them click through, supposedly lets the user freely decide whether or not to go ahead and use the website. This decision, however may not actually be made so freely because the costs of opting out can be very high. Another big issue with simply putting the privacy policy in front of users and having them accept quickly is that they are often very hard to understand, even in the unlikely case that a user decides to read them. While talking about how commercialized the internet is, Nissenbaum makes sure to include the fact that the founders of Google didn’t originally think this way. In fact, they thought that commercializing could corrupt the performance of Google’s mission, which was to provide a way for people to easily access information. Unlike many other articles I’ve read, Nissenbaum sort of gives Google the benefit of the doubt when she claims that it is committed to protecting the privacy of its users. Although to her credit, she does question how capable they are of doing this. One major concern she points out is the flow of personal information between subsidiaries of large companies like Google. Overall, Nissenbaum believes that online privacy protection is a subset of privacy protection in general and the privacy norms that she thinks prop up social institutions. Her approach to online privacy is to not think of it as a new field, but to apply long standing rules and norms to it and not let the protection of privacy fall into the hands of the companies that breach it. This paper has been one of the less useful ones I’ve read to date. While it does vaguely deal with search engines, they are not its main focus. Also, nothing Nissenbaum discusses is really new or important information. Instead, she is proposing thinking about online privacy in a different way. Her target audience is most likely regulators and people involved in making public policy decisions. I trust that the information she provides is reliable because of the volume of sources cited and objective because she does attempt to describe both sides of the privacy debate. Though easy to read, I probably would not recommend this article to others because it seemed pretty dry, boring, and didn’t communicate much information that isn’t seen elsewhere. This article did not shape my view of search engine privacy, but it was good to hear her opinion on the field, and I may include some of her thoughts in my article.
 * 5) Pàmies-Estrems, David, Jordi Castellà-Roca, and Alexandre Viejo. 2016. “Working at the Web Search Engine Side to Generate Privacy-Preserving User Profiles.” Expert Systems with Applications64:523–35.
 * 6) Link: https://www.sciencedirect.com/science/article/pii/S0957417416304328?via%3Dihub
 * 7) Peddinti, Sai Teja and Nitesh Saxena. 2014. “Web Search Query Privacy: Evaluating Query Obfuscation and Anonymizing Networks.” Journal of Computer Security22(1):155–99.
 * 8) Link: http://content.ebscohost.com/ContentServer.asp?T=P&P=AN&K=94007304&S=R&D=a9h&EbscoContent=dGJyMNLe80SeprI4zdnyOLCmr1Cep7dSr6%2B4S7GWxWXS&ContentCustomer=dGJyMPGnsEq0qbVIuePfgeyx44Dt6fIA
 * 9) Annotation: The main focus of authors of this article is to analyze the effectiveness of query obfuscation and anonymizing network techniques in protecting search engine user privacy. They accomplish this by looking at query obfuscation Firefox plugin TrackMeNot and Tor, a network that anonymizes users. TrackMeNot functions by obscuring true user searches through adding in machine generated ones. Tor works by having a pool of anonymizing network users and trusting that a search engine will not be able to link certain queries to certain users within the pool. They find that neither of these techniques hold up when tested against readily available machine learning software. Specifically, the naive, adversarial, unsophisticated search engine the researchers modeled and created to attack user privacy was able to match about 50% of user queries when the user used the TrackMeNot plugin and could match about 20% of queries made using anonymizing networks, with a pool of 1000 users, with high certainty. The researchers were able to use real data in their study from 60 users involved in the AOL search data public release of 2006. Overall, this article felt very unbiased and reliable because it was scientific in nature. It had an experimental set up and plenty of references to prove its scholarliness. Its target audience appears to be academics working in the field because they suggest avenues for further research. The reading level was somewhat difficult, especially in sections that got into the nitty gritty of the methodology of the experiment. I would recommend this article to someone interested in protecting their own search engine privacy, but one need look no further than the abstract to see that the researchers determine neither method to be effective. All in all, I did find this article to be quite interesting, but it was very similar to another one by Albin Petit called SimAttack: Private Web Search under Fire, so I did not learn much new information. Therefore, I do not see this being hugely useful to me in the writing of my search engine privacy article, but it  definitely would have been extremely useful had I not already read the other article. I suppose it’s also generally good to know that there is consensus in the field about these two methods.
 * 10) Pekala, Shayna. 2017. “Privacy and User Experience in 21st Century Library Discovery.” Information Technology and Libraries36(2):48–58.
 * 11) Link: http://content.ebscohost.com/ContentServer.asp?T=P&P=AN&K=124036046&S=R&D=a9h&EbscoContent=dGJyMNXb4kSeprE4zdnyOLCmr1Cep7VSsKy4SrSWxWXS&ContentCustomer=dGJyMPGnsEq0qbVIuePfgeyx43zx
 * 12) Petit, Albin et al. 2016. “SimAttack: Private Web Search under Fire.” Journal of Internet Services and Applications 7(2):1–17.
 * 13) Link: https://jisajournal.springeropen.com/track/pdf/10.1186/s13174-016-0044-x
 * 14) Annotation: The authors introduce their topic by reminding us that search engines, which are by far the most popular route for people to get answers from the internet, collect, and use, information on users via their queries. This paper evaluates the effectiveness of the methods of unlinkability and indistinguishability, which are used to obscure user search engine data, against a privacy intrusion software called SimAttack, which compares user profiles and queries and tries to match them. Unlinkability “consists in hiding the user’s identity from the search engine” while indistinguishability works by “altering the user’s queries or hiding the user’s interests” (1). A couple examples of an indistinguishability solution are the Firefox TrackMeNot plugin and the GooPIR program for Google, both of which create fake search queries under the guise of a user to trick the search engine, but it’s hard to make them look real. An example of an unlinkability solution is using a VPN or anonymous network when searching the internet. To test how competent these privacy solutions were at protecting user data, the researchers tested them using and AOL search dataset. The researchers found that the unlinkability solutions were somewhat easily penetrated by the SimAttack software, exposing the search queries that were supposedly protected. The SimAttack software was also quite successful in decoding the queries protected by the indistinguishability at a much higher rate than a simple machine learning program, although a higher number of fake queries did offer better protection. The authors conclude that while still not by any means perfect at preserving privacy, one’s best bet may be to combine an unlinkability and an indistinguishability technique. This was a very useful source because it discussed current solutions to search engine privacy issues, how they worked, and then how they were penetrable by SimAttack, showing that even going to some lengths to protect privacy may still be futile. This source appears to be very objective, going over pros and downfalls of each solution and approaching the problem scientifically. Its target audience is privacy oriented people who use search engines often and researchers who may want to expand on the topic. The beginning of the article was very readable, but it got a bit more difficult when graphs and calculations started appearing. This article was extremely helpful and informative because I had never heard of any of these privacy preserving techniques or “unlinkability” and “indistinguishability” solutions. It provided insight into just how easy it is for someone who wants to to access data that has been anonymized.
 * 15) Reyman, Jessica. 2013. “User Data on the Social Web: Authorship, Agency, and Appropriation.” College English75(5):513–33.
 * 16) Link: https://www.jstor.org/stable/pdf/24238250.pdf
 * 17) Annotation: In this article, Reyman discusses the “social Web”, which is basically what the internet is now with the proliferation of social media sites and interactions between users online. She then delves into how user data is protected or mined by various tech companies and the ethical implications of what they do with user data. She often uses the specific example of a university collecting data about its students since it’s easy to collect it when they use their specific accounts. There is then a lengthy discussion of social media policies on data use and how user data can be appropriated. Reyman’s conclusions center on the complexities of user contributions to the social Web. Basically anything a user does online can be considered a contribution once it is recorded, but the rights of what can be done with that data is a grey area. Reyman notes that data mining, customization, and targeted advertising are so common among most communication and social networking platforms that they are essentially conditions of use. Her fear about this ubiquitous data mining is that eventually only a few large companies will be able to use the extensive data they collect for their own purposes sort of secretively. Instead, Reyman wants people to be able to know what is going on with their personal data and to be able to make educated decisions on the management and regulations of it. While this article was interesting, much of it was spent on discussing social media, which wasn’t quite applicable to search engine privacy. Reyman is objective throughout the article, summarizing both sides of debates and not injecting personal opinions. I probably would not recommend this article to others because although it’s from a scholarly journal, it’s not structured like a normal journal article with abstract, literature review, results, and conclusion sections, so it is harder to pick out important information. While the language is easy to read, the structure makes it feel like a very dense article. Her target audience was most likely other academics and, as she mentions professors need to spread the word about unfair data practices with students and colleagues. Since this article was introducing a new way to think about the way we behave on the internet and how our data from that behavior is collected and stored, I think I did learn to see search engine privacy in a slightly different light. However, Reyman didn’t really suggest major solutions other than advising people to envision a new system of transparency. What was useful were the parts of her discussion that focused on giving up data for access to online platforms.
 * 18) Ridgway, Renee. 2017. “Against a Personalisation of the Self.” Ephemera: Theory & Politics in Organization 17(2):377–97.
 * 19) Link: http://web.b.ebscohost.com/ehost/pdfviewer/pdfviewer?vid=4&sid=5dbce403-5587-4aeb-9215-2749e8a5a5c6%40pdc-v-sessmgr06
 * 20) Annotation: In this article, Ridgway attempts to discern if there is a way to be a truly anonymous user on the internet in a world where search engines like Google are collecting tons of user data to personalize results. She investigates this by doing an experiment where she used her Mac with Google personalization and a PC with Tor, a software known for providing anonymity, to search the same terms and compared results. Ridgway first offers a discussion of the ways in which Google violates the privacy of its users, claiming that they capture IP addresses, location information, and data for advertising when one uses the autocomplete feature. She says one doesn’t even have to be logged into their Google account for the search engine to log search queries and tailor a user’s results and ads. Ridgway then goes into a discussion of how Tor anonymizes users. In terms of results, Ridgway’s experiment was essentially a success, concluding that the results she got when using Tor were both unique from and ranked differently than the ones from Google. Using Tor, she says, is a solid alternative to Google in that it protects users from the privacy compromising Google personalization and collection of data. I found this article to be useful in that it provided a valid way to use search engines more privately, but I also didn’t find it to be incredibly reliable. I’m not saying her results are invalid, but Ridgway herself acknowledges many shortcomings of her study, especially several that were out of her control. It seemed unbiased because the facts she offered line up with the other articles I’ve read. The reading level was easy, and I’d recommend it to people who are interested in DIY solutions to search engine privacy. This group would also probably be her target audience. All in all, this source did not greatly contribute to my understanding of search engine privacy in a meaningful way. It did, however, provide a few useful and easy to understand specifics on how Google personalizes results to its users. I could see using some of the statistics on amount of queries and types of data collected she provides in my future article.
 * 21) Sieg, Ahu, Bamshad Mobasher, and Robin Burke. 2007. “Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search.” IEEE Intelligent Informatics Bulletin8:7–18.
 * 22) Link: https://www.semanticscholar.org/paper/Learning-Ontology-Based-User-Profiles%3A-A-Semantic-Sieg-Mobasher/3dd95a886c04947008f0304ffae005e56bf95d62
 * 23) Strahilevitz, Lior Jacob and Matthew B. Kugler. 2016. "Is Privacy Policy Language Irrelevant to Consumers?" The Journal of Legal Studies45(S2).
 * 24) Link: https://www.journals.uchicago.edu/doi/pdfplus/10.1086/68993
 * 25) Squitieri, Chad. 2015. “CONFRONTING BIG DATA: APPLYING THE CONFRONTATION CLAUSE TO GOVERNMENT DATA COLLECTION.” Virginia Law Review101(7):2011–49.
 * 26) Link: https://www-jstor-org.libproxy.berkeley.edu/stable/pdf/24643632.pdf?refreqid=excelsior%3Af68cbaa5bcf9f97bb41b730f323ff0ee
 * 27) Annotation: The purpose of Squitieri’s article is to inspect “big data” by looking at what it is, how the Confrontation Clause of the Sixth Amendment is applicable, and how Google responds to government subpoenas. When someone is manipulating big data, they have bought access to large amounts of cheap data and aggregated it to get a clearer picture of people. Squitieri’s main argument in his article is that whenever search engines like Google hand over user data to government agencies, it falls under the category of a testimonial statement according to the Confrontation Clause when it’s part of a prosecutor’s case in a criminal trial. Essentially, Squitieri is claiming in this article that when Google provides the government with user data that they use to prosecute someone, they are providing the government with a testimonial statement. Further, he states that when a Google “producer” decides what data is relevant and provides the government with what they’ve asked for, that producer is making a testimonial statement. He goes on to say that the Google analyst who certifies the proper collection and transmission of data and the custodian who keeps records, in addition to the producer, must all be available to be confronted by the accused party under the Confrontation Clause. I found this article to be very reliable with many references and mentions to verifiable court cases. The only bias I could detect was in the main argument since the author was clearly attempting to argue for certain protections under the Confrontation Clause, although I believe he kept it pretty neutral in tone. I think the target audience was people in the legal field, and as such I would only recommend it to people interested in understanding very specific legal implications in relation to big data privacy. The reading level was generally easy, and I think that this article will be useful in my search engine privacy article because it provided me with a new set of legal interpretations relating to the topic. I had read another article that discussed several laws and specifically the Fourth Amendment, but this application of the Sixth Amendment and the author’s conclusion that the Confrontation Clause checks uses of government collected big data was new to me and quite informative.
 * 28) Tene, Omer. 2008. “What Google Knows: Privacy and Internet Search Engines.” Utah Law Review 2008(4):1433–92.
 * 29) Link: http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=43799183&site=eds-live&authtype=ip,guest&custid=s1226370&groupid=main&profile=eds
 * 30) Annotation: In this article, Tene provides an in depth account of what data is collected by search engines, how it is used, whom it may be used by or given to, how user privacy can be compromised, and how user privacy can be protected by privacy enhancing technologies. In terms of user data in a legal sense, Tene notes that it is collected by search engines, and their logs may be subpoenaed by government agencies or by third party litigators for long after it is collected. Tene uses a definition of privacy set out by somebody named Daniel Solove and says that search engine practices relating to how they store and use logs of search query data violate it. The author also says that in his opinion, user privacy and their “reasonable expectation of privacy” is violated by Google in that personally identifiable information is used for a secondary purpose without informed consent of the user. The first purpose in Tene’s mind is that Google uses the query made by a user to return relevant information. He is sure to note that this is his intuition, not fact, and that the laws are very different between the European Union and United States. He also mentions that governments are on the side of requiring search engines to retain user data, not erase it soon after it is collected, which makes it ready to be subpoenaed whenever the government may want it. Tene ends with a discussion about the law of confidentiality and his opinion that a user’s search queries should be protected under it. Overall, this article seemed extremely reliable since it constantly cited well known legal cases, data breaches, and search engine practices. I cannot say, however, that it was totally unbiased because Tene did inject his own opinions about privacy concerns into the article, although he did acknowledge when he did so. I do think he could have spent more time on the reasons why data collection can be a bonus, but that was not the stated purpose of the article. The reading level was easy, and I’d recommend this article to anyone interested in getting a comprehensive lesson on search engine privacy. This article was extremely useful, and in a way it’s fitting that it’s the last one I read because it seemed to touch on almost all of the other research I’ve read on the subject. While it may not have broadened my knowledge on the subject, Tene’s proposal for using the law of confidentiality was a new take on search engine privacy concerns and solutions.
 * 31) Trautman, Lawrence J. and Peter C. Ormerod. 2017. “CORPORATE DIRECTORS' AND OFFICERS' CYBERSECURITY STANDARD OF CARE: THE YAHOO DATA BREACH.” SSRN Electronic Journal66(5).
 * 32) http://p8888-ucelinks.cdlib.org.libproxy.berkeley.edu/sfx_local?genre=article&atitle=CORPORATE%20DIRECTORS%27%20AND%20OFFICERS%27%20CYBERSECURITY%20STANDARD%20OF%20CARE%3A%20THE%20YAHOO%20DATA%20BREACH&title=American%20University%20Law%20Review&issn=00031453&isbn=&volume=66&issue=5&date=20170601&aulast=Trautman,%20Lawrence%20J.&spage=1231&pages=&sid=EBSCO:InfoTrac%20LegalTrac:edsgcl.509109656
 * 33) Tsai, Janice Y., Serge Egelman, Lorrie Cranor, and Alessandro Acquisti. 2011. “The Effect of Online Privacy Information on Purchasing Behavior: An Experimental Study.” Information Systems Research 22(2):254–68.
 * 34) Link: http://www.jstor.org/stable/23015560.
 * 35) Annotation: This article describes an experiment that tested consumer response to perceived privacy protections on shopping websites as it applied to whether they would factor privacy into decisions and consumers who were particularly concerned about privacy would pay extra for the same good and more privacy. They used a search engine for the treatment group called Privacy Finder, which scans websites and automatically generates an icon to show the level of privacy the site will give the consumer as it compares to the privacy policies that consumer has specified that they prefer. The authors created a utility function to show the factors that go into a consumer’s decision to buy a product from a given, such as price, appraisal of the good, and whatever privacy concerns the consumer may have about that company. Perhaps unsurprisingly, the results of the experiment were that subjects in the treatment group, those who were using a search engine that indicated privacy levels of websites, purchased products from websites that gave them higher levels of privacy, whereas the participants in the control groups opted for the products that were simply the cheapest. The authors acknowledge the limitations of their study, but it seems that they’ve accounted for what they can, so I deem it reliable and would recommend it to others because it’s easy to read and pretty interesting. It seems that the experimenters have done their best to avoid bias, going as far as creating two control groups in the study. This study affirms the hypotheses about consumers and privacy that the article begins with. The target audience is both researchers, who the authors believe should further study of their topic, and businesses, which the authors think could stand to benefit monetarily from making good privacy policies more visible. I think that this article will prove useful to me because it addresses search engine privacy in a slightly different manner than I first imagined. Instead of discussing the privacy a search engine affords somebody in terms of their browsing history and the data it collects, it makes claims about a search engine that identifies privacy policies of other websites, specifically that if consumers are given a search engine that makes privacy concerns more obvious, people are likely to change their buying habits. This shows the influence search engines wield in general and the effect they can have on browsing and buying habits of consumers.
 * 36) van Otterlo, Martijn. 2014. “Automated Experimentation in Walden 3.0. : The Next Step in Profiling, Predicting, Control and Surveillance.” Surveillance & Society 12(2):255–72.
 * 37) Link: https://ojs.library.queensu.ca/index.php/surveillance-and-society/article/view/walden3/walden
 * 38) Annotation: This article by van Otterlo discusses a somewhat scientific approach to evaluating how data is collected and manipulated through algorithm and artificial intelligence and what implications that has for the privacy of users. A few of the main ideas it deals with are profiling of users, targeting, bias of models, and A/B testing of users. The beginning of the article focuses on four points that the author believes prove his belief in “privacy-as-control” such as access and sharing of data, using data to make predictions, creating predictive models, and finally the generation of the model that is most discussed: Walden 3.0. In his discussion of models, van Otterlo goes over discriminative ones, which serve to differentiate users, and generative ones, which more complexly gather data and are better at profiling. Moving on to predictions, van Otterlo claims that profiling is divided into the categories of socialization, which uses data from others to predict one’s tastes, and personalization, which uses one’s own data to alter their search results. Going back to Walden 3.0, the author discusses it as a future continuation of an old model that will be adapted to use new technologies and behavior profiles. What is perhaps most pertinent to search engine privacy in this article are the claims van Otterlo makes about what search engines like Google do that would seem to violate user privacy. This would include Google experimenting on users by giving people different search results, targeting them for certain ads and offers, and using the conclusions to adjust their algorithms in a way that will eventually increase Google’s profits. This experimenting is often done using A/B testing, which is when Google alters results and sees, on a large scale, how unknowing people react. In addition to general profiling, privacy concerns in relation to Google exist in even simple search query completion and click behavior. This article was useful in explaining the motives and methods by which search engines, mainly Google, attain user data. Because of its scientific tone and often quoting, the article seems very reliable and objective. I would recommend it to people who are interested in learning about data profiling but have some knowledge of modeling or algorithms because there is a good amount of technical jargon. The level of reading difficulty was somewhat high in the middle section due to this technical analysis of different types of models, but the introduction and conclusion were relatively easy to read. While this article didn’t introduce me to a ton of new information, I did appreciate its discussion of profiling. Profiling is a key concept when thinking about search engine privacy, so I think this article will help me explain that concept with more detail. It hasn’t really changed the way I think about search engine privacy but given more evidence to confirm what I’ve learned elsewhere.
 * 39) Viejo, Alexandre and Jordi Castellà-Roca. 2010. “Using Social Networks to Distort Users’ Profiles Generated by Web Search Engines.” Computer Networks 54(9):1343–57.
 * 40) Link: https://ac.els-cdn.com/S1389128609003557/1-s2.0-S1389128609003557-main.pdf?_tid=46dccfc0-496b-4b10-b2b8-6d2d48ddbd03&acdnat=1538508577_f8a55d96c8bd6c613ff90df14b4b0bf2
 * 41) Annotation: Viejo and Castella-Roca’s goal in their paper is to showcase a low cost, communication and computation wise, solution to search engine privacy issues that they believe performs better than past solutions while protecting users from profiling by search engines. Simply put, the dilemma the authors are trying to solve is how search engines can personalize results while maintaining privacy. The set up of their proposal is that there would be groups of people, and instead of each person submitting their own search queries to a search engine, their search is forwarded to other users in the group, one of whom will eventually submit it on the original user’s behalf. The effect is that this makes all queries in the group distributed equally, so each user’s profile not indicative of that person because they have submitted queries on behalf of everyone else in their group and that user’s own search queries are distributed amongst many. The weak points in their plan are dishonest users who may try to collect data on users in their group and selfish users who doesn’t submit queries on behalf of others but does use others to submit their own queries. The authors go over many different simulations of their model that cannot be discussed here due to length, but they are worth looking at if interested in technical aspects. I found this article extremely useful, and I agree with the authors that it covers a largely unaddressed problem. It was easy to read, although some middle sections got quite technical. I’d recommend it to anyone interested in privacy solutions, although the target audience may be other academics in the field who have more of a say in getting these ideas to the public. With its scientific methodology and plenty of citations, the article seemed both reliable and objective. I liked reading it because this article had a seemingly great solution, though the authors did acknowledge potential shortcomings, to protecting search engine privacy for users while not compromising results. The article definitely broadened my understanding of the field, and it has made me consider adding a “solutions” section to my wikipedia page.
 * 42) Watters, Carolyn and Ghada Amoudi. 2002. “GeoSearcher: Location-Based Ranking of Search Engine Results.” Journal of the American Society for Information Science and Technology54(2):140–51.
 * 43) Link: https://onlinelibrary.wiley.com/doi/full/10.1002/asi.10191
 * 44) Wicker, Jörg and Stefan Kramer. 2017. “The Best Privacy Defense Is a Good Privacy Offense: Obfuscating a Search Engine User’s Profile.” Data Mining and Knowledge Discovery31(5):1419–43.
 * 45) Link: https://link.springer.com/content/pdf/10.1007%2Fs10618-017-0524-z.pdf
 * 46) Annotation: The goal of the authors of this article is to show the reader a new way for users to protect their own privacy instead of relying on the back end, service providers, to do this for them. Their approach involves the user using a data mining technique to assist in obfuscating their real search queries, rendering whatever profile the search engine creates of them inaccurate. This idea is very similar to the indistinguishability solutions examined in Petit’s article, although Jorg and Kramer do conclude that it is possible for this obfuscation to work in protecting user privacy while Petit thought it to be a weak solution. Their discussion of what data search engines collect and why confirms what Evans said about the financial incentive search engines have to collect user data. The authors then present how they made their model and the methodology of their study, which involved testing their approach by using it with search queries and analyzing the resulting personalized ads that the search engine generated. This approach involved using an algorithm that was not too different from a machine learning one to add search queries intended to obfuscate the user profile from the search engine. The authors conclude that it would be feasible to confuse a search engine using their method, although they do acknowledge that this is only a study and that they cannot be certain that it would work in the real world since their model of a user was more simplified than most users actually are. I found this article to be useful, thought provoking, reliable, and objective. It was clearly reliant on a very scientific methodology, so I trust it, and the authors do note the limitations of their work. The reading level was slightly difficult, especially during the method section, but I’d recommend it to their target audience of other academics in the field. This article has changed my thinking in that, unlike Petit’s indistinguishability solutions, they see a user confusing a search engine as a real possibility. They succeed in showing, in a somewhat simple situation, that their approach was able to modify the results of search engine in terms of ads shown, which was new information.

Thoughts on Article
I'm creating this article because I feel like search engines are used so regularly and are such an integral part of everyday life that people forget about what goes on behind the scenes. While there is a subsection of the Internet Privacy wikipedia page that addresses search engine privacy, I think the topic deserves more attention. In terms of what I will be including in this article, I hope to address how people's searches are being catalogued, used, and passed on to others and what types of data search engines collect. This would include user data being given to government departments, sold to advertisers, the search engine itself using the data to rank search results, and where all records are kept. I also hope to include a review of the different search engines and how they compare on the privacy front as well as a discussion of laws related to search engine privacy, such as the GDPR but also some US ones. There is already a page that compares search engines, but it doesn't have much text and is mostly a table of years active. Additionally, there are a lot of relevant Wikipedia pages I want to link to throughout this future article, such as Internet Privacy, Google, Data retention, Web search engine, Search engine optimization, List of search engines, Search engine marketing, General Data Protection Regulation, DuckDuckGo, and Comparison of web search engines.

Simhhyena peer evaluation
Your article is already very detailed and uses a lot of sources. Nice job! A few minor corrections I would make, however, are the following. The opening sentence was slightly confusing and I had to read it a couple of times to fully understand it. Rewording that would help the reader to stay engaged past the lead section. I like the hyperlinks, as they seem to be meticulously placed in places where the reader may want to find out more about the topic. I feel like the beginning of the privacy policies can be seen as subjective when you say “many people never read them and therefore are unaware” unless if you cite this information from a study. I like the summary you gave in the rest of that section; it was short but conveyed what was necessary to get across. In the Legal rights and court cases section, I would make sure to hyperlink consistently; either hyperlink each subsection court case or do none at all. Also, I was slightly confused when you posed the questions “How is this data protected by search engines/subsidiaries? How long do they keep it? What do they do with it?” Are you planning on answering these questions in further drafts? Or are these questions a part of your article?

Overall, I liked your article a lot; its bones are already really good and I think if you address the stuff above, you’ll already have a great article. I’m excited to see what more sources will add to your article and potentially the additional sections you may add to it.

Angryflyingdolphins peer review
Overall, your first draft is already very well developed so I only recommend a few minor changes. The first sentence in your lead section is a bit clunky and should be reworded. The font sizes in the legal section are not uniform and not every part of the titles are hyperlinked which may be a bit jarring for readers. I feel like the "User profiling" section does not belong in the "Types of data collected by search engines" section since it seems like they are two separate subtopics. The two sections within the "Ethical debates and controversies" section could also be split into two separate subtopics. Your "Comparison of search engines" section does not seem like a comparison section. The information provided just seems like brief explanations of different search engine policies. The questions you provide at the beginning of the comparison section is also a bit confusing and unnecessary. I feel like there is a better way to introduce the section using a non-question, paragraph format. Sentences throughout the article could also be edited for wording to ensure flow.

Your article was great. Every idea is already well developed with proper citations and hyperlinks. Everything was informative and relevant. Awesome work so far!

Response to peer reviews
These peer reviews contain very similar advice. They mention that the questions at the beginning of the comparison section and in the types of data collected section are confusing. I agree and they are only there as part of my former outline that I didn't get to filling out yet, so I will try to take them out or answer them in the future. As for not hyperlinking every court case, Gonzalez v. Google did not have its own wikipedia page to link to, which I addressed in lab and they understood. My response to Simhhyena's comment that "the beginning of the privacy policies can be seen as subjective when you say “many people never read them and therefore are unaware” unless if you cite this information from a study" is that this information is actually in a couple of my articles, and I will try to find that information and cite it in the future. These peer reviews also both mention the clarity of the first sentence of my lead section, which I agree was an issue and have altered to make the intent clearer. As for font sizes in legal section, I believe Angryflyingdolphins is referring to the fact that Smith v. Maryland is a smaller size than the other court cases, but I did this intentionally which I thought was appropriate since it was decided based off precedent set in United States v. Miller, which had a bigger font. Angryflyingdolphins suggests splitting up ethical debates and controversies, which I think is a good tip and something I changed. I also agree with him that my comparisons section doesn't really appear to be a comparisons section. I intend to add to this section to turn it into more of a comparison and answer the questions or to combine it with the privacy policies section where I already distinguish between search engines. I disagree, however, with the suggestion that user profiling not be included in the types of data collected section because a user profile is made up of the very user data search engines collect and are discussed in the same section.

Breadyornot peer review
Overall great job! This article looks about ready to be published to the main space. A few minor tweaks could be adjusted to make it a bit easier for the reader to understand. First off, the lead section uses search engine privacy a bit too much, a couple sentences could be simplified by getting rid of the word. In terms of formatting and structure, I think the "proposed recommendations and solutions" portion could be split up into a couple categories, just so the reader can easily find something they're looking for specifically. More so, the court cases are oddly different sizes/colors, which I know correlates to hyperlinks and the last court case, but maybe putting a note below the titles with links would be better, in order to clean up the overall appearance of the article. In conjunction, I really like the tone and wording of most of the article, it holds a fairly neutral standpoint, which I think can be pretty hard when analyzing a lot of different authors.

Midwestmich99 peer review
I think your lead gave a good summary of your overall article. You touched on every section included in your article in the lead, so the reader has a good idea of what information will be included in the article.

The information in your article is well balanced. I didn’t feel like one section overpowered any of the other sections. There are some smaller subsections within a section that could have more information, such as the "Baidu" and “Bing” sections in the "Comparison of privacy policies". However, the overall article is balanced well.

A small suggestion would be changing the format of the “Proposed recommendations and solutions”. There is a lot of information presented in this section, so breaking up each recommendation into its own smaller subsection can make it easier for the reader to interpret. For example, you can have a small subsection titled “Software Solutions” or one named “Unlikability and indistinguishability”. Also, the heading is a little unclear. I suggest specifying that the recommendations are geared towards consumers in the title.

The information provides a great overview of Search Engine Privacy. I thought you did a good job of adding sources and citing the information you presented in your article. Some sections, such as “Proposed recommendations and solutions” section, could have a few more citations to help the reader understand where all the facts are coming from.

The overall tone of the article was unbiased. I think you do a great job of presenting the information in a factual way. Overall, I think the information you have so far is really good. I also thought all the examples you provided were really well explained and easy to follow.

Response to peer reviews
Both of these reviews had the suggestion to make the "Proposed recommendations and solutions" section a little less bulky. I agreed with this recommendation and assigned subheadings accordingly. Also, I changed the title like Midwestmich99 suggested to reflect that these proposals were to protect user privacy. Breadyornot mentioned the formatting of the legal section, which was something that the peer reviews from last week mentioned as well. At first, I explained that it looked weird because one of the court cases simply didn't have a wikipedia page to link to, but I now agree that it looks inconsistent, so I switched to hyperlinking each case within the text. I also made the Smith v. Maryland case subheading the same size as the others because I realize it was confusing although its ruling was somewhat based on the United States v. Miller case and the two are discussed together in the literature. Breadyornot also suggested that I used the term "search engine privacy" too often in the lead section, so I took one out to make it flow better.The final suggestion was from Midwestmich99 and she recommended using more citations specifically in the solutions section. This is something I intend to work on in general and especially since I have now read all my articles. In lab, there was also the suggestion that I add in information about Mozilla, so that is something I intend to look into as well. For now, I put a short sentence about the Mozilla manifesto in as a placeholder.

Lead Section
I think the second sentence, "both types... information privacy", seem a little abrupt and out of place. I would recommend either breaking it down somewhere else, or seeing if there's a diagram that shows like information privacy, that branches down into internet and the other type of privacy and then highlighting what branch the search engine privacy is and how it ties back to the overall arching idea of information privacy.

Maybe for sentence structure too, "this is a controversial topic..." seems very abrupt and unprofessional. Try "This is controversial because search engines...". Also, not sure search engines pertain to pronouns, so the sentence could read something like "This is controversial because search engines often claim to collect a user's data in order to tailor better results to that specific user and provide the user with a better searching experience. However, search engines can also abuse and compromise the privacy of the user's data by selling it to advertisers in order to gain profits." There were some things that were not clear as to who was gaining profit and which data was belonging to whom. The last sentence is a little long for one sentence. If you write something like, "For individuals interested in preserving their privacy from search engines, there are many routes (many change the word choice of routes) available to them. One of these solutions includes anonymity software (only use this is you have multiple solutions that are not different softwares but like different completely like say a hardware or software), like Tor, which attempts to separate the location and information from the user's search.

Privacy Policies
"many people...logged" - not sure what tense this sentence should be in.

"supposedly" - not sure if this word makes the sentence seem bias that the website is like luring in the user to click things.

"Another big issue with simply putting the privacy policy in front of users and having them accept quickly, is that they are often very hard to understand, even in the unlikely case that a user decides to read them." - potentially fixed a grammatical issue?

"Private search ... Google or Yahoo." maybe reword this because theres a lot of clutter due to the commas. "Private search engines can state in their privacy policies if they collect less, if any, data than bigger public? search engines. For example, DuckDuckGo has been seen to state in their privacy policy that they collect much less data than Google or Yahoo."

Google --> DuckDuckGo
Using the phrase "by far" might seem subjective.

How come there is a Wikipedia page for the Yahoo privacy policy but not one for Google's privacy policy?

Using the phrase "relatively small amount", maybe say compared to Google, it is relatively small or something, so there is some sort of measure as to how it is okay to be classified as smaller than other search engines.

Legal rights and court cases
I am not sure you need the sentence "Such laws include the Fourth Amendment" because 1, the link would make more sense in the paragraph below and 2, it intrudes a little bit into the intro and doesn't talk about anything else, so why does the 4th Amendment take priority over the other laws?

Add a source for Katz v. US?

"In the case of a search engine company...keeps records" - this is a very very long sentence but also there is no verb. In the case a search engine company does xyz, these witnesses are the ones.... There is no xyz stated, so I'm not sure what exactly the search engine company does that would then lead to witnesses handing over data to the government.

I think the sentence is worded a little weirdly from "to show that people...compromising to children" it didn't really make sense? Was people's search information compromising to children?

Katz v. US the wording has a little bit too many directions, cutting it down would be good - "Katz v. United States was debating if it was unconstitutional for the government to electronically listen to and record a conversation a bystander (do you need to be specific it was him when you are just general outlining the case?) had from a public phone booth."

What does it mean that the Law of Confidentiality is not strictly defined yet it's still a law? How is it treated differently?

Types of data collected by search engines
Why is there a citation in the middle of the sentence but then 3 at the end of the sentence? Also, should they be in chronological order? Is the "Data can be stored for extended period of time", can this be turned into a new paragraph?

"With more effective ads comes more purchases from consumers that they may not have made otherwise." - I think this might be subjective.

Is there any update to the complaint by EPIC against Google? If so, add it to the paragraph.

Ethical debates
Make a new paragraph when you start talking about the contrasting POV.

Data and privacy breaches
Elaborate on Yahoo example?

The proposed recommendation parts are really good, short and to the point!

I am not sure for social network solution and anonymity networks if you need the opening "another solution", "another option", it's very repetitive and unnecessary. Just start diving into the subject. Split the paragraphs of Unlinkability and indstinguishability so that way it's easier to see there are like three points being made.

Overall, this page is very well written and it is informational and hits a lot of topics. There were some phrases that seemed to be a little subjective, but I've pointed them out and taking a look at them just to make sure it might not just my interpretation would be helpful. For some parts as well, sometimes information would just be interjected into the middle of the paragraph when it was not needed. Maybe elaborating on some topics as well, like the one about google and yahoo under the Data and privacy breaches section because it is not as balanced as the first controversy with AOL search. Having those would help with present day controversial things to be looked at. The solutions were all really good and I couldn't find that many instances of bias anywhere.

Response to Peer Review
She suggested altering a few sentences in the lead section, but I had already reworded them after previous peer reviews, so I think I'm going to keep them as is for another week and see if I can think of how to improve it. However, I did take her recommendations for breaking up one sentence into two for clarity. Rainbowdolph suggested that using the word "supposedly" may sound biased, so I reworded that sentence in my privacy policies section. I also reworded another sentence that made it sound like there were private and public search engines to fit my original meaning that there were privacy minded search engines. Though she said that the phrase "by far" may seem subjective when saying how google is by far the most used search engine, I think I'm going to keep it in for now because google does account for over 70% of search engine queries made worldwide. As for there being a yahoo privacy policy link but not one for google as she pointed out, I've fixed it by taking away the yahoo link. I clarified the "relatively small amount" that she said was unclear to make it "relatively small amount compared to other search engines". In the legal rights and court cases section, I took out the sentence about the Fourth Amendment in the intro, moved the link to the Fourth Amendment subsection, and added a citation to the Katz v US case as she suggested. I agree with Rainbowdolph that a sentence in the confrontation clause was very wordy and confusing, so I rephrased and broke it up into two sentences. I also added a little to a sentence that she suggested wasn't totally clear in regards to COPA. Rainbowdolph makes a good point that the Katz v US interpretation could use some work, and I have a specific page of an article that I will add information from this week. I amended the info about the law of confidentiality to clarify that it is common law, not law enacted by congress since she said it was unclear. I see the point she's making when she says it's weird to have 3 citations at the end of the sentence and one in the middle of the first sentence of the types of data collected section. The 3 at the end are lingering citations to privacy policies that I am working on getting rid of and replacing with academic article citations. I also made a sentence there less biased seeming and broke that section up into two paragraphs as she suggested, which I think made sense because it was a bit bulky. While there is probably an update on the EPIC complaint against Google, I don't believe it was in the scholarly article that I read, so I don't think I can include it unless I happen upon information in another article. I split up the ethical debates section like she advised, which definitely makes it easier to read. I only recently found an academic article about the Yahoo data breaches, so I do plan on expanding that section like Rainbowdolph suggested. Finally, I did split up the unsinkability and indistinguishability with subheadings, which makes it look much more proportional to the rest of the section and eliminated some repetitive language from that general section.

Week 9 Mrzy732993 peer review
The lead section looks really nice! You use brief sentences to cover all the content in the next sections, and for relatively absolute argument such as “the legal framework for protecting user privacy is not very solid”, you add a citation there so the readers won’t think it is your personal opinion.

The privacy policies part is really interesting. I especially learn a lot from the Google and Yahoo section because I never realize that the potential use of personal data is already shown to all the users, but we easily ignore it. I like how you compare Google & Yahoo with more privacy-focused engines like DuckDuckGo, and the language also looks neutral and encyclopedic. But in the DuckDuckGo part, there are some conclusion like “DuckDuckGo does not collect or share any personal information of users” which sound a little bit absolute. Does it come from the official statement of the company? or it has been tested by relevant institution? Although you cite the source, it’s not bad to clarify how this argument is derived. Maybe a rewrite like “DuckDuckGo claims that…” or “It has been proved by XXX institution that….” Besides, you can let readers know more about StartPage and Disconnect if you can find any other information~

The legal rights part is very informative! I like how you put all the cases together so the readers can read them in a clearer way. For advice, I feel there are 2 or 3 cases in the middle that you only discussed the meaning but didn’t clarify the process of the case (like United States v, Miller and Smith v. Maryland. what happened between Smith and Maryland Court?) In addition, the cases you mentioned are from Constitution or common law, from U.S or Europe. It might make more sense if you add some sub-headers onto this big section, the readers can also spend shorter time distinguishing one case from another ones.

In the “types of data collected by search engines” part, I got so much new information about how my data could be used by search engine company so I really enjoyed reading this section! I personally feel that maybe this section can be put to the front a little bit? Since when you talk about the privacy issues in search engines, the types of data being collected should be the first few things being discussed. You can think about how to order each section better, but it’s totally fine if you feel the current arrangement makes more sense~ For the sub-headers in “Uses” part, the “google” title can be changed into “improving searching quality” etc. Since the previous two sub-headers are “user personalization” and “targeted advertising”, the third one should also be some type of function rather than just “google”.

The ethical debates part is really educational, and you separate both positive and negative voices regarding search engine data collection, which provides enough objectivity. In the data breaches part, there are many real-world examples, but they look a little dense. You could also try to separate them by adding sub-headers, like “AOL search data leak” and “Google+ data breach”. Otherwise, the content is very informative and interesting to read!

The recommendations and solution part is very easy to read due to the sub-headers. The explanation of each part is also very clear, so the readers can easily understand. But I feel the last sub-header “privacy ratings integrated with search engines” does not belong to this section because it does not talk about related solution or recommendation. If you feel the same way, maybe you could integrate this part into other section or just create a new section?

Overall, the article is very well-developed! You’ve included 21 sources and many hyperlinks. The section length is well-balanced, and the language is very encyclopedia and neutral. I can’t really find problem haha. Keep working on it, we’re almost done!!

Response to peer review
Firstly, I'm grateful for the complements! She mentioned that one sentence about DuckDuckGo was possibly subjective, but since I have a citation in it, I think I'm going to keep it as is for now. She also said that I could add more info about StartPage and Disconnect, but the article I saw those names in didn't have any more information, and I don't feel like it's worth going out of my way to do more research on the subject since DuckDuckGo is more known and used than those two search engines. I agree with the suggestion about making more subheadings in the legal rights section. I think originally I had fewer subheadings, but now there are much more which makes the section look cluttered. Therefore, I've added United States and Europe subheadings to break it up a little bit. I appreciate the suggestion that the types of data collection could be moved up slightly. I've been thinking about reorganizing the page, and I agree that the types of data collected section is relevant enough to be moved up. The only issue with this is that I feel like it makes sense to go from there to ethical debates to data breaches to solutions, meaning I would have to move the legal section to the bottom, which I think is fine for now. I also really like the suggestion made about subheadings in the Uses section, and I've switched it from saying "google" to saying "improving search quality" which I think makes much more sense. I also implemented her suggestion for creating more subtitles in the data breaches section. The last suggestion she makes is about the "Privacy ratings integrated with search engines" section and moving it out of the general solutions section if I agreed with her in the thought that it doesn't really belong. I do agree that it doesn't quite fit there, so I moved it to the ethical debates section under the title "user perceptions of privacy" because I felt like it might make sense to show that this study indicates users valued privacy over the financial incentive they were given.

= Search engine privacy = Search engine privacy is a subset of internet privacy that deals with user data being collected by search engines. Both types of privacy fall under the umbrella of information privacy. Privacy concerns regarding search engines can take many forms, such as search engines logging individual search queries, browsing history, IP addresses, and cookies of users and conducting user profiling in general. The collection of personally identifiable information of users is commonly thought of as a search engine "tracking" its users. This is controversial because search engines often claim to collect a user's data in order to tailor better results to that specific user and provide the user with a better searching experience. However, search engines can also abuse and compromise the privacy of the user's data by selling it to advertisers in order to gain profits. Users must decide what is more important to their search engine experience: relevance and speed of results or their privacy. The legal framework for protecting user privacy is not very solid. A few of the most popular search engines are Google, Yahoo, Bing, and Baidu, but many other search engines that are focused on privacy have cropped up recently, such as DuckDuckGo. There have been several well publicized breaches of search engine user privacy that occurred with companies like AOL and Yahoo. For individuals interested in preserving their privacy from search engines, there are many routes available to them, such as using an anonymity software like Tor that attempts to separate the user's location and information from their search.

Privacy policies
Search engines generally have privacy policies in order to inform users about what data of theirs they may be collecting and what purposes it may be used for. While these policies may be an attempt at transparency by search engines, many people never read them and therefore are unaware of how much of their private information, like data collected from cookies, may be logged. This ties in with the phenomenon of notice and consent, which is how many privacy policies are structured. Notice and consent essentially consist of a site showing the user a privacy policy and having them click through to agree. This is intended to let the user freely decide whether or not to go ahead and use the website. This decision, however, may not actually be made so freely because the costs of opting out can be very high. Another big issue with simply putting the privacy policy in front of users and having them accept quickly is that they are often very hard to understand, even in the unlikely case that a user decides to read them. Privacy minded search engines, such as DuckDuckGo, state in their privacy policies that collect much less, if any, data than search engines such as Google or Yahoo. As of 2008, search engines were not in the business of selling user data to third parties, though they do note in their privacy policies that they comply with government subpoenas.

Google and Yahoo
Google, founded in 1998, is by far the most widely used search engine, receiving billions and billions of search queries every month. Google logs all search terms in a database along with the date and time of search, browser and operating system, IP address of user, the Google cookie, and the URL that shows the search engine and search query. The privacy policy of Google states that they do pass on user data to various affiliates, subsidiaries, and "trusted" business partners. Yahoo, founded in 1995, also collects user data. It is a well known fact that users do not read privacy policies, even for services that they use daily, such as Yahoo mail and Gmail. This persistent failure of consumers to read these privacy policies can be disadvantageous to them because while they may not pick up on differences in the language of privacy policies, judges certainly do. This means that search engine and email companies like Google and Yahoo are technically able to keep up the practice of targeting advertisements based on email content since they declare that they do so in their privacy policies. A study was done to see how much consumers cared about privacy policies of Google, specifically Gmail, and their detail, and it determined that users often thought that Google's practices were somewhat intrusive but that users would not often be willing to counteract this by paying a premium for their privacy.

DuckDuckGo
DuckDuckGo, founded in 2008 and therefore a much newer search engine than Google, is known for being privacy focused and not tracking its users. DuckDuckGo does not collect or share any personal information of users such as IP addresses or cookies, which other search engines usually do log and keep for some time. It also does not have spam, and protects user privacy further by anonymizing search queries from the website the user chooses and using encryption. Similarly privacy oriented search engines include StartPage and Disconnect.

Types of data collected by search engines
Most search engines can, and do, collect personal information about their users according to their own privacy policies.This user data could be anything from location information to cookies, IP addresses, search query histories, click-through history, and online fingerprints. This data is then often stored in large databases, and users may be assigned numbers in an attempt to provide them with anonymity.

Data can be stored for extended period of time. For example, the data collected by Google on its users is retained for up to 9 months. Some studies state that this number is actually 18 months. This data is then used for various reasons such as optimizing and personalizing search results for users, targeting advertising, and trying to protect users from scams and phishing attacks. Such data can be collected even when a user is not logged in to their account or when using a different IP address by using cookies.

User profiling and personalization
What search engines often do once they've collected information about a user's habits is create a profile of them, which helps the search engine when it decides which links to show for different search queries submitted by that user or which ads to target them with. An interesting development in this field is the invention of automated learned. Using this, search engines can refine their profiling models to more accurately predict what any given user may want to click on by doing A/B testing of results offered to users and measuring the reactions of users.

Companies like Google, Netflix, YouTube, and Amazon have all started personalizing results more and more. One notable example is how Google Scholar takes into account the publication history of a user in order to produce results it deems relevant. Personalization also occurs when Amazon recommends books or when IMDb suggests movies by using previously collected information about a user to predict their tastes. For personalization to occur, a user need not even be logged into their account.

Targeted advertising
The internet advertising company DoubleClick, which helps advertisers target users for specific ads, was bought by Google in 2008 and was a subsidiary until June 2018, when Google rebranded and merged DoubleClick into its Google Marketing Platform. DoubleClick worked by depositing cookies on user's computers that would track sites they visited with DoubleClick ads on them. There was a privacy concern when Google was in the process of acquiring DoubleClick that the acquisition would let Google create even more comprehensive profiles of its users since they would be collecting data about search queries and additionally tracking websites visited. This could lead to users being shown ads that are increasingly effective with the use of behavioral targeting. With more effective ads comes the possibility of more purchases from consumers that they may not have made otherwise.

Improving search quality
Besides ad targeting and personalization, Google also uses data collected on users to improve the quality of searches. An example of this is how Google uses databases of information to refine Google Spell Checker.

Privacy organizations
There are many who believe that user profiling is a severe invasion of user privacy, and there are organizations such as the Electronic Privacy Information Center (EPIC) and Privacy International that are focused on advocating for user privacy rights. In fact, EPIC filed a complaint in 2007 with the Federal Trade Commission claiming that Google should not be able to acquire DoubleClick on the grounds that it would compromise user privacy.

Ethical debates
Many individuals and scholars have recognized the ethical concerns regarding search engine privacy.

Pro data collection
The collection of user data by search engines can be viewed as a positive practice because it allows the search engine to personalize results. This implies that users would receive more relevant results, and be shown more relevant advertisements, when their data, such as past search queries, location information, and clicks, is used to create a profile for them. Also, search engines are generally free of charge for users and can remain afloat because one of their main sources of revenue is advertising, which can be more effective when targeted.

Anti data collection
This collection of user data can also be seen as an overreach by private companies for their own financial gain or as an intrusive surveillance tactic. Search engines can make money using targeted advertising because advertisers are willing to pay a premium to present their ads to the most receptive consumers. Also, when a search engine collects and catalogs large amounts of data about its users, there is the potential for it to be leaked accidentally or breached. The government can also subpoena user data from search engines when they have databases of it. Search query database information may also be subpoenaed by private litigants for use in civil cases, such as divorces or employment disputes.

User perceptions of privacy
Experiments have been done to examine consumer behavior when given information on privacy of retailers by integrating privacy ratings with search engines. Researchers used a search engine for the treatment group called Privacy Finder, which scans websites and automatically generates an icon to show the level of privacy the site will give the consumer as it compares to the privacy policies that consumer has specified that they prefer. The results of the experiment were that subjects in the treatment group, those who were using a search engine that indicated privacy levels of websites, purchased products from websites that gave them higher levels of privacy, whereas the participants in the control groups opted for the products that were simply the cheapest. The study participants also were given financial incentive because they would get to keep leftover money from purchases. This study suggests that since participants had to use their own credit cards, they had a significant aversion to purchasing products from sites that did not offer the level of privacy they wanted, indicating that consumers value their privacy monetarily.

AOL search data leak
One major controversy regarding search engine privacy was the AOL search data leak of 2006. For academic and research purposes, AOL made public a list of about 20 million search queries made by about 650,000 unique users. Although they assigned unique identification numbers to the users instead of attaching names to each query, it was still possible to ascertain the true identities of many users simply by analyzing what they had searched, including locations near them and names of friends and family members. A notable example of this was how the New York Times identified Thelma Arnold through "reverse searching". Users also sometimes do "ego searches" where they search themselves to see what information about them is on the internet, making it even easier to identify supposedly anonymous users. Many of the search queries released by AOL were incriminating or seemingly extremely private, such as "how to kill your wife" and "can you adopt after a suicide attempt".

This data has since been used in several experiments that attempt to measure effectiveness of user privacy solutions.

Google and Yahoo
In October 2018, there was a Google+ data breach that potentially affected about 500,000 accounts which led to the shutdown of the Google+ platform.

Both Google and Yahoo were subjects of a Chinese hack in 2010. While Google responded to the situation seriously by hiring new cybersecurity engineers and investing heavily into securing user data, Yahoo took a much more lax approach. Google started paying hackers to find vulnerabilities in 2010 while it took Yahoo until 2013 to follow suit. Yahoo was also identified in the Snowden data leaks as a common hacking target for spies of various nations, and Yahoo still did not give its newly hired chief information security officer the resources to really effect change within the company. In 2012, Yahoo hired Marissa Mayer, previously a Google employee, to be the new CEO, but she chose not to invest much in the security infrastructure of Yahoo and went as far as to refuse the implementation of a basic and standard security measure to force the reset of all passwords after a breach.

Yahoo is known for being the subject of multiple breaches and hacks that have compromised large amounts of user data. As of late 2016, Yahoo had announced that at least 1.5 billion user accounts had been breached during 2013 and 2014. The breach of 2013 compromised over a billion accounts while the breach of 2014 included about 500 million accounts. The data compromised in the breaches included personally identifiable information such as phone numbers, email addresses, and birth dates as well as information like security questions (used to reset passwords) and encrypted passwords. Yahoo made a statement saying that their breaches were a result of state sponsored actors, and in 2017, two Russian intelligence officers were indicted by the United States Department of Justice as part of a conspiracy to hack Yahoo and steal user data. As of 2016, the Yahoo breaches of 2013 and 2014 were the largest of all time.

Government subpoenas of data
The government may want to subpoena user data from search engines for any number of reasons, which is why it a big threat to user privacy. In 2006, they wanted it as part of their defense of COPA, and only Google refused to comply. While protecting online privacy of children may be an honorable goal, there are concerns about whether the government should have access to such personal data to achieve it. At other times, they may want it for national security purposes; access to big databases of search queries in order to prevent terrorist attacks is a common example of this. Whatever the reason, it is clear that the fact that search engines do create and maintain these databases of user data is what makes it possible for the government to access it. Another concern regarding government access to search engine user data is "function creep," a term that here refers to how data originally collected by the government for national security purposes may eventually be used for other purposes, such as debt collection. This would indicate to many a government overreach. While protections for search engine user privacy have started developing recently, the government has increasingly been on the side that wants to ensure search engines retain data, making users less protected and their data more available for anyone to subpoena.

Switching search engines
A different, although popular, route for a privacy centered user to take is to simply start using a privacy oriented search engine, such as DuckDuckGo. This search engine maintains the privacy of its users by not collecting data on or tracking its users. While this may sound simple, users must take into account the tradeoff between privacy and relevant results when deciding to switch search engines. Results to search queries can be very different when the search engine has no search history to aid it in personalization.

Using privacy oriented browsers
Mozilla is known for its beliefs in protecting user privacy on Firefox. Mozilla Firefox users have the capability to delete the tracking cookie that Google places on their computer, making it much harder for Google to group data. Firefox also has a button called "Clear Private Data" which allows users to be more in control of their settings. Internet Explorer users have this option as well. When using a browser like Google Chrome or Safari, users also have the option to browse in "incognito" or "private browsing" modes respectively. When in these modes, the user's browsing history and cookies are not collected.

Opting out
The Google, Yahoo!, AOL, and MSN search engines all allow users to opt out of the behavioral targeting they use. Users can also delete search and browsing history at any time. The Ask.com search engine also has AskEraser, which, when used, purges user data from their servers. Deleting a user's profile and history of data from search engine logs also helps protect user privacy in the event a government agency wants to subpoena it. If there are no records, there is nothing the government can access.

Social network solution
An innovative solution, proposed by researchers Viejo and Castellà-Roca, is a social network solution whereby user profiles are distorted. In their plan, each user would belong to a group, or network, of people who all use the search engine. Every time somebody wanted to submit a search query, it would be passed on to another member of the group to submit on their behalf until someone submitted it. This would ideally lead to all search queries being divvied up equally between all members of the network. This way, the search engine cannot make a useful profile of any individual user in the group since it has no way to discern which query actually belonged to each user.

Delisting and reordering
After the Google Spain v. AEPD case, it was established that people had the right to request that search engines delete personal information from their search results in compliance with other European data protection regulations. This process of simply removing certain search results is called delisting. While effective in protecting the privacy of those who wish information about them to not be accessed by anyone using a search engine, it does not necessarily protect the contextual integrity of search results. For data that is not highly sensitive or compromising, reordering search results is another option where people would be able to rank how relevant certain data is at any given point in time, which would then alter results given when someone searched their name.

Anonymity networks
A sort of DIY option for privacy minded users is to use a software like Tor, which is an anonymity network. Tor functions by encrypting user data and routing queries through thousands of relays. While this process is effective at masking IP addresses, it can slow the speed of results. While Tor may work to mask IP addresses, there have also been studies that show that a simulated attacker software could still match search queries to users even when anonymized using Tor.

Unlinkability and indistinguishability
Unlinkability and indistinguishability are also well known solutions to search engine privacy, although they have proven somewhat ineffective in actually providing users with anonymity from their search queries. Both unlinkability and indistinguishability solutions try to anonymize search queries from the user who made them, therefore making it impossible for the search engine to definitively link a specific query with a specific user and create a useful profile on them. This can be done in a couple different ways.

Unlinkability
Another way for the user to hide information such as their IP address from the search engine, which is an unlinkability solution. This is perhaps more simple and easy for the user because any user can do this by using a VPN, although it still does not guarantee total privacy from the search engine.

Indistinguishability
One way is for the user to use a plugin or software that generates multiple different search queries for every real search query the user makes. This is an indistinguishability solution, and it functions by obscuring the real searches a user makes so that a search engine cannot tell which queries are the software's and which are the user's. Then, it is more difficult for the search engine to use the data it collects on a user to do things like target ads.

Legal rights and court cases
Being that the internet and search engines are relatively recent creations, no solid legal framework for privacy protections in terms of search engines has been put in place. However, scholars do write about the implications of existing laws on privacy in general to inform what right to privacy search engine users have. As this is a developing field of law, there have been several lawsuits with respect to the privacy search engines are expected to afford to their users.

The Fourth Amendment
The Fourth Amendment is well known for the protections it offers citizens from unreasonable searches and seizures, but in Katz v. United States (1967), these protections were extended to cover intrusions of privacy of individuals in addition to simply intrusion of property and people. Privacy of individuals is a broad term, but it is not hard to imagine that it includes the online privacy of an individual.

The Sixth Amendment
The Confrontation Clause of the Sixth Amendment is applicable to the protection of big data from government surveillance. The Confrontation Clause essentially states that defendants in criminal cases have the right to confront witnesses who provide testimonial statements. If a search engine company like Google gives information to the government to prosecute a case, these witnesses are the Google employees involved in the process of selecting which data to hand over to the government. The specific employees who must be available to be confronted under the Confrontation Clause are the producer who decides what data is relevant and provides the government with what they've asked for, the Google analyst who certifies the proper collection and transmission of data, and the custodian who keeps records. The data these employees of Google curate for trial use is then thought of as testimonial statement. The overall effectiveness of the Confrontation Clause on search engine privacy is that it places a check on how the government can use big data and provides defendants with protection from human error.

Katz v. United States
This 1967 case is prominent because it established a new interpretation of privacy under the Fourth Amendment, specifically that people had a reasonable expectation of it. Katz v. United States was about whether or not it was constitutional for the government to listen to and record, electronically using a pen register, a conversation Katz had from a public phone booth. The court ruled that it did violate the Fourth Amendment because the actions of the government were considered a "search" and that the government needed a warrant. When thinking about search engine data collected about users, the way telephone communications were classified under Katz v. United States could be a precedent for how it should be handled. In Katz v. United States, public telephones were deemed to have a "vital role" in private communications. This case took place in 1967, but surely nowadays, the internet and search engines have this vital role in private communications, and people's search queries and IP addresses can be thought of as analogous to the private phone calls placed from public booths.

United States v. Miller
This 1976 Supreme Court case is relevant to search engine privacy because the court ruled that when third parties gathered or had information given to them, the Fourth Amendment was not applicable. Jayni Foley argues that the ruling of United States v. Miller implies that people cannot have an expectation of privacy when they provide information to third parties. When thinking about search engine privacy, this is important because people willingly provide search engines with information in the form of their search queries and various other data points that they may not realize are being collected.

Smith v. Maryland
In the Supreme Court case Smith v. Maryland of 1979, the Supreme Court went off the precedent set in the 1976 United States v. Miller case about assumption of risk. The court ruled that the Fourth Amendment did not prevent the government from monitoring who dialed which phone numbers by using a pen register because it did not qualify as a "search".

Both the United States v. Miller and the Smith v. Maryland cases have been used to prevent users from the privacy protections offered under the Fourth Amendment from the records that internet service providers (ISPs) keep. This is also articulated in the Sixth Circuit Guest v. Leis case as well as the United States v. Kennedy case where the courts ruled that Fourth Amendment protections did not apply to ISP customer data since they willingly provided ISPs with their information just by using the services of ISPs. Similarly, the current legal structure regarding privacy and assumption of risk can be interpreted to mean that users of search engines cannot expect privacy in regards to the data they communicate by using search engines.

Electronic Communication Privacy Act
The Electronic Communications Privacy Act (ECPA) of 1986 was passed by Congress in an effort to start creating a legal structure for privacy protections in the face of new forms of technologies, although it was by no means comprehensive because there are considerations for current technologies that Congress never imagined in 1986 and could account for. The EPCA does little to regulate ISPs and mainly prevents government agencies from gathering information stored by ISPs without a warrant. What the EPCA does not do, unsurprisingly because it was enacted before internet usage became a common occurrence, is say anything about search engine privacy and the protections users are afforded in terms of their search queries.

Gonzales v. Google Inc.
The background of this 2006 case is that the government was trying to bolster its defense for the Child Online Protection Act (COPA). It was doing a study to see how effective its filtering software was in regards to child pornography. To do this, the government subpoenaed search data from Google, AOL, Yahoo!, and Microsoft to use in its analysis and to show that people search information that is potentially compromising to children. This search data that the government wanted included both the URLs that appeared to users and the actual search queries of users. Of the search engines the government subpoenaed to produce search queries and URLs, only Google refused to comply with the government, even after the request was reduced in size. Google itself claimed that handing over these logs was to hand over personally identifiable information and user identities. The court ruled that Google had to hand over 50,000 randomly selected URLs to the government but not search queries because that could seed public distrust of the company and therefore compromise its business.

Law of Confidentiality
While not a strictly defined law enacted by congress, the Law of Confidentiality is common law that protects information shared by a party who has trust and an expectation of privacy from the party they share the information with. If the content of search queries and the logs they are stored in is thought of in the same manner as information shared with a physician, as it is similarly confidential, then it ought to be afforded the same privacy protections.

Google Spain v. AEPD
The European Court of Justice ruled in 2014 that its citizens had the "Right to Be Forgotten" in the Google Spain SL v. Agencia Española de Protección de Datos case, which meant that they had the right to demand search engines wipe any data collected on them. While this single court decision did not directly establish the "right to be forgotten", the court interpreted existing law to mean that people had the right to request that some information about them be wiped from search results provided by search engine companies like Google. The background of this case is that one Spanish citizen, Mario Costeja Gonzalez, set out to erase himself from Google's search results because they revealed potentially compromising information about his past debts. In the ruling in favor of Mario Costeja Gonzalez, the court noted that search engines can significantly impact the privacy rights of many people and that Google controlled the dissemination of personal data. This court decision did not claim that all citizens should be able to request that information about them be completely wiped from Google at any time, but rather that there are specific types of information, particularly information that is obstructing one's right to be forgotten, that do not need to be so easily accessible on search engines.

General Data Protection Regulation (GDPR)
The GDPR is a European regulation that was put in place to protect data and provide privacy to European citizens, regardless of whether they are physically in the European Union. This means that countries around the globe have had to comply with their rules so that any European citizen residing in them is afforded the proper protections. The regulation became enforceable in May of 2018.