Google Personalized Search

Google Personalized Search is a personalized search feature of Google Search, introduced in 2004. All searches on Google Search are associated with a browser cookie record. When a user performs a search, the search results are not only based on the relevance of each web page to the search term, but also on which websites the user (or someone else using the same browser) visited through previous search results. This provides a more personalized experience that can increase the relevance of the search results for the particular user. Such filtering may also have side effects, such as the creation of a filter bubble.

Changes in Google's search algorithm in later years put less importance on user data, which means the impact of personalized search is limited on search results. Acting on criticism, Google has also made it possible to turn off the feature.

History
Personalized Search was originally introduced on March 29, 2004 as a beta test of a Google Labs project. On April 20, 2005, it was made available as a non-beta service, but still separate from ordinary Google Search. On November 11, 2005, it became a part of the normal Google Search, but only to users with Google Accounts.

Beginning on December 4, 2009, Personalized Search was applied to all users of Google Search, including those who are not logged into a Google Account.

In addition to customizing results based on personal behavior and interests associated with a Google Account, Google also implemented social search results in October 2009 based on people whom one knows. Operating on the assumption that one's associates share similar interests, these results would give a ranking boost to sites from within a user's "Social Circle". These two services integrated into regular results by February 2011 and expanded results by including content shared to users known through social networks.

Data collection
Google's search algorithm is driven by collecting and storing web history in its databases. For non-authenticated users Google looks at anonymously stored browser cookies on a user's browser and compares the unique string with those stored within Google databases. Google accounts logged into Google Chrome use user's web history to learn what sites and content they like and base the search results presented on them. Using the data provided by the user Google constructs a profile including gender, age, languages, and interests based on prior behaviour using Google services.

When a user performs a search using Google, the keywords or terms are used to generate ranked results based upon the PageRank algorithm. This algorithm, according to Google, is their "system of counting link votes and determining which pages are most important based upon them. These scores are then used along with many other things to determine if a page will rank well in a search." "PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at considerably more than the sheer volume of votes, or links a page receives; for example, it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages 'important.' Using these and other factors, Google provides its views on the pages' relative importance,"

Since the search division launched the very first version with customized search results in 2005 and began to give consideration to previously visited sites, new factors have been added to refine search results. According to Google, the conclusion they have made after many years of testing, the incomparably best indicator for deciding which results are relevant to the user is the search phrase itself - not user data - and that personalisation of search results is not as big a factor as it used to be.

Harvard law professor Jonathan Zittrain disputed the extent to which personalization filters distort Google search results, saying that "the effects of search personalization have been light". Further, Google provides the ability for users to shut off personalization features if they choose, by deleting Google's record of their search history and setting Google to not remember their search keywords and visited links in the future.

Types of data collected
There are 50+ factors (called 'signals' by Google) used to determine search results. The top factors in personalizing search results are: Each of these variables will factor into the personalization of a user's search results in hopes of quickly providing the most relevant results to the user to answer whatever question is being asked.
 * Location
 * Search History
 * Web History
 * Social Networks

Location data
Location data allows Google to provide information based upon current location and places that the user has visited in the past, based upon GPS location from an Android smartphone or the user's IP address. Google uses this location data to provide local listings grouped with search results using the Google Local platform featuring detailed reviews and ratings from Zagat.

Search history
Search history was first used to personalize search results in 2005 based upon previous searches and clicked links by individual end users. Then, in 2009, Google announced that personalized search would no longer require a user to be logged in, and instead Google would use an anonymous cookie in a web browser to customize search results for those who were not logged in.

Web history
Web history differs from search history, as it's a record of the actual pages that a user visits, but still provides contributing factors in ranking search results. Lastly, Google+ data is used in search results as Google is provided a lot of demographics about a user from this information, such as age, gender, location, work history, interests, and social connections.

Social networks
Google's social networking service, Google+ also collects this demographic data including age, sex, location, career, and friends. This largely comes into play when presenting reviews and ratings from people within a user's immediate circle.

Effectiveness
In order to determine the actual impacts of search customization on end users, researchers at Northeastern University determined in a study with logged in users vs. a control group that 11.7% of results show differences due to personalization. The research showed that this result varies widely by search query and result ranking position.

In the following example, the Portent Team performed a search query for 'JavaScript' (shown on the right) and then performed a search for 'Programming Textbooks' and 'Books on HTML' prior to searching for 'JavaScript, which changed the search results by bringing in three book listings that were not part of the original set of results. The study showed that of the various factors being tested, the two with the most measurable impact were whether the user was logged in with a Google account and the IP address of searching users. This same study also investigated the impact of the 11.7% personalization by utilizing Amazon Mechanical Turk (AMT) (a crowdsourcing Internet Marketplace and a part of Amazon Web Services) vs. a control group to determine the difference between the two. The results showed that the top ranked URLs are less likely to change based on personalization, and that the most personalization is taking place at lower ranks of the resulting pages.

Reception
Several concerns have been brought up regarding the feature. It decreases the likelihood of finding new information, since it biases search results towards what the user has already found. It also introduces some privacy problems, since a user may not be aware that their search results are personalized for them, and it affects the search results of other people who use the same computer (unless they are logged in as a different user). The feature also has profound effects on the search engine optimization (SEO) industry, since search results are not ranked the same way for every user – thus making it more difficult to identify the effects of SEO efforts. Personalization makes search experience inconsistent for different users requiring the SEO industry to be aware of both personalized and non-personalized search results to get an increase in ranking.

Personalized search suffers from creating an abundance of background noise to search results. This can be seen as the carry-over effect where one search is performed followed by a subsequent search. The second search is influenced by the first search if a timeout period is not set at a high enough threshold. An example of the negative effects of the carry-over effect is a search for a store in Hawaii could carry-over the results of a previous, failed search that showed the same store in California, creating noise.

However, in recent years new research had stated that search engines do not create the kind of filter bubbles previously thought. In a study of the political impact of search engines in seven countries carried out at Michigan State University, researchers discovered that search engines were a complement to other news sources that people already used. Users checked out an average of 4.5 news sources across various media to obtain an understanding, and those with a specific interest in politics checked even more. The researchers note that filter bubbles sound like a real problem and that they primarily appear to apply to people other than yourself. Their conclusion is, nonetheless, that the problem is overblown, the evidence anecdotal, and it is impossible to see that search engines contribute to the creation of filter bubbles based on the empirical evidence produced by the study.