Talk:Collaborative filtering

Add history and characteristics of Collaborative filtering
We could introduce how collaborative filtering techniques are developed and evolved over time, and what are the characteristics of collaborative filtering compared with content based recommendation.

Add a Limitation Section
We could add a section to introduce the limitations of collaborative filtering, such as the "code start" of new user or new item, sparsity problem etc. We can also discuss how they might be solved.

Adding an Applications Section
I think we can add a section "Applications" to introduce how the collaborative filtering techniques are applied in different products and websites. — Preceding unsigned comment added by Nova2358 (talk • contribs) 00:05, 22 March 2012 (UTC)

Innovations in Collaborative Filtering
This section seems more like a tentative opinion piece. DanielLemire (talk) 20:21, 25 August 2009 (UTC)

AlexLit
The AlexLit.com is not a collaborative filtering company or product but rather a booko store that sells eBooks. I think it should be removed.


 * Actually, AlexLit.com has a recommendation system that uses collaborate filtering. Look at the left hand side of the page for 'Recommender' under Departments.

Barnes & Noble
On the other hand, where is the collaborative filtering on Barnes & Noble?
 * "B&N Customers who bought this book also bought...". See an example.

Analysis of applications

 * 1) Audiobaba                     Alexa rank =  760,909          Possibly commercial
 * 2) FilmAffinity                  Alexa rank =  18,305          Commercial.
 * 3) KindaKarma                    Alexa rank =  2,780,513         And Commercial
 * 4) MovieCritic                   Alexa rank =  -                 Now Closed from Macromedia so clearly was commercial
 * 5) Musicmobs                     Alexa rank =  193,467           And Commercial
 * 6) MyStrands                     Alexa rank =  153,177           Commercial.
 * 7) Popularism                    Alexa rank =  NO DATA           Not working
 * 8) Rate Your Music               Alexa rank =  11,012            Commercial


 * Speaking of Alexa ranks... Why is Everyone's a Critic (AR 3,392,523) listed, but not Criticker (AR 89,981) or Movielens (AR 916,562)...? --88.193.78.58 (talk) 17:32, 7 June 2009 (UTC)

For Wikipedia
Some enterprising graduate may want to create a tool that recommends Wikipedia articles you would like to edit based on the set of edits made by users.


 * Heh, see SuggestBot. -- ForteTuba 00:14, 8 April 2006 (UTC)

Difference bet. "passive filtering" and "implicit filtering"?
What really is the difference between "passive filtering" and "implicit filtering"? --Amit 20:22, 18 September 2006 (UTC)

It's subtile passive is checking what you do ... implicit will be more convert action into data. I.e first one is looking at your surf ... second one is more say that what you buy is what you like. | french user Leafar

They seem like the same thing, I think the sections should be merged. --Robbrown 23:23, 24 October 2006 (UTC)


 * Only if there are sufficient references stating that they are the same would a merger be recommended. At this time the sections could probably be made clearer. --Amit 00:01, 25 October 2006 (UTC)

Reading this article, I really can't see any difference between the two. It's very confusing! -- orangejon

Just as a passerby, I think the section on active vs. passive and implicit vs. explicit is a little off. Active filtering does not imply explicit voting, and passive filtering does not imply implicit voting. Rather, passive filtering is where one collects metadata about an item and ranks it thus (ex, everyone buys this book, so you should, too). Active filtering is where one collects data on each user than compares them to each other (or items, if its an item-centric CF). The way you collect the information, either explicitly or implicitly, is not directly linked to the filtering method - for example, one could use product purchases in an active filtering way, or take votes for articles in a passive way (think of Digg). Just my two cents.

These are the same thing. See my comments on this under the "Cleanup" section. --Donn 30 avril 2008, 14:07:29 (UTC+0200) —Preceding unsigned comment added by 83.180.79.78 (talk) 14:08, 30 April 2008 (UTC)

Introduction
The first sentence in the introduction is patently (but amusing) nonsense. "The growth of the Internet has made it much more difficult to effectively extract useful information from all the available online information." Akselx (talk) 20:21, 11 September 2016 (UTC)

The animated GIF is really annoying when you are trying to read the article. Please make it stop! 209.6.95.251 (talk) 21:39, 7 February 2017 (UTC)

Broken link
"A collection of past and present "information filtering" projects (including collaborative filtering) at MIT Media Lab" is broken - anyone fancy figuring out whether the same content is now elsewhere?

Cleanup
This article needs a cleanup, along with the ones on information filtering systems and recommender systems. There is too much overlap between these articles. —Preceding unsigned comment added by IKiddo (talk • contribs) 22:29, 11 September 2007 (UTC)


 * I agree. A brief appraisal of relevant literature shows that there are two broad types of CF - user-based and item-based. Although item-based is shown as one of the "Types" (a list which seems to contrast factors of CF which are not really comparable), user-based is not, and is instead discussed in the introduction as if it is the default method. User-based finds users that are similar to the active user, and then attempts to predict ratings for items which the active user has not rated but the similar users have. Item-based CF, on the other hand, finds similarly rated items and, for an active user, recommends unrated items in this group. Think of user-based CF as clustering similar users, and item-based CF as clustering similar items. ref - page 144


 * The article also blurs the distinction between CF techniques (e.g. item-based vs. user-based) and the sources of CF data (implicit ratings vs. explicit ratings). These should not be compared this way, as they are on inherently different levels. ref - page 17


 * As for implicit versus explicit, and active versus passive, these are the same concept. Implicit CF collects the ratings data without having the user explicitly rate the items. That means exactly what is said in the current article. The implicit ratings are inferred by some user interaction such as purchasing an item, clicking on search results, etc. Explicit CF collects ratings by having users explicitly rate them, such as films (Netflix) and books (Amazon). The implications are that explicit data is usually more accurate, but the users have to spend the time and energy to make the recommender system work well. --Donn 30 avril 2008, 15:58:03 (UTC+0200) —Preceding unsigned comment added by 83.180.79.78 (talk) 14:06, 30 April 2008 (UTC)

Addition of introductory paragraph on the definition of CF
I wanted to explain why I wrote an introductory paragraph. The term collaborative filtering is in fact a general term, like planning and scheduling or robotics. It includes recommender systems, and they are very popular at the moment, however it seems insufficiently general to define CF without a broader inclusion of other applications.

Cindy Mason www-formal.cs.stanford.edu/~cmason —Preceding unsigned comment added by Cindymason (talk • contribs) 21:29, 27 December 2007 (UTC)

It's a good introductory paragraph, and an excellent point (I know someone who is looking at using collaborative filtering to find relationships among diseases). However, the rest of the article is still written as if the term only applied to recommender systems; in fact, there are two introductory paragraphs. The article needs to be rewritten to incorporate this information. Tritium h3 (talk) 19:24, 27 February 2008 (UTC)

Cleanup References to Software Libraries
I just completed an evaluation of some of the available tools to aid in CF. I used the list provided here as a starting point, but quickly learned that a few of the open source tools are no longer being actively supported (in particular, both Cofi and Cofe are not being supported). I was wondering if someone could either remove these or potentially add a note that they are unsupported open source tools? —Preceding unsigned comment added by Cjpeltz (talk • contribs) 13:07, 10 August 2008 (UTC)

Confusing taxonomy
The taxonomy of CF in this article is rather confusing and is, i think, inconsistent.

I would suggest two possible taxonomies:
 * J. S. Breese et al. (ref) categorized CF algorithms into two classes, either memory-based or model-based. The former is relatively slow and is suitable for small or medium scale computation; while the latter needs additional training time but offers rapid yet precise prediction (e.g., by using a Bayesian network).
 * B. Sarwar et al. (ref) suggested item-based CF and attributed the old CF algorithms as "user-based".

A detailed "Algorithms" section is also probably needed, as CF is strongly related to machine learning and data mining. Merging with Recommender systems can be considered as a subtask of this process. I may set about doing this later. --Allenchue (talk) 02:42, 12 January 2009 (UTC)

Article header is unclear, mechanical
The cold copy of the present article header fails to make it clear that the subject involves computer software, rather than human beings. CF is as old as one fisherman asking another where the big ones are. The header is not only unclear but unneccessarily full of jargon and academe-speak -- the topic is not all that complicated to explain in a more prosaic way. Maybe explain so that even an ad exec could understand. Twang (talk) 23:11, 22 September 2009 (UTC)

Na yah. There is a concept and this is not the fisherman. (A try to comment) —Preceding unsigned comment added by 84.75.166.62 (talk) 00:47, 30 May 2010 (UTC)

Apparently it didin't work(the comment).

I agree that the head is pretty off. The CF is just a math approach, But not at all that simple. —Preceding unsigned comment added by 84.75.166.62 (talk) 00:52, 30 May 2010 (UTC)

econophysics
I would also seek for the roots of collaborative filtering into econophysics papers. Not to forget game theories.

Where CF is just a representation of a given state. —Preceding unsigned comment added by 84.75.166.62 (talk) 00:23, 30 May 2010 (UTC)

CF in matrixes
The two steps description is simply wrong.

At least in theory. All you need is the matrix (users,objects). Then the CF will simply evaluate the "overlap". —Preceding unsigned comment added by 84.75.166.62 (talk) 00:32, 30 May 2010 (UTC)

Coverage in Recommender system
Recommendation_system has significant coverage of CF. I think some information there is not present here. --Chealer (talk) 13:59, 16 January 2012 (UTC)

GA fail
Apart from the summary, the rest of article does not appear to have any copyright issues. As the summary is mean to reflect the content of the article, this should be an easy fix. Cheers! Encycloshave (talk) 16:36, 9 May 2012 (UTC)

Error in formula
In the "memory based" section constant k is denoted as 1/sum(abs(sim)) but I believe it should be n/sum(abs(sim)). Someone with better knowledge of this area should however confirm before updating the article. — Preceding unsigned comment added by JohanLand (talk • contribs) 17:43, 12 August 2012 (UTC)

Impact of recent student edits
This article has recently been edited by students as part of their course work for a university course. As part of the quality metrics for the education program, we would like to determine what level of burden is placed on Wikipedia's editors by student coursework.

If you are an editor of this article who spent time correcting edits to it made by the students, please tell us how much time you spent on cleaning up the article. Please note that we are asking you to estimate only the negative effects of the students' work. If the students added good material but you spent time formatting it or making it conform to the manual of style, or copyediting it, then the material added was still a net benefit, and the work you did improved it further. If on the other hand the students added material that had to be removed, or removed good material which you had to replace, please let us know how much time you had to spend making those corrections. This includes time you may have spent posting to the students' talk pages, or to Wikipedia noticeboards, or working with them on IRC, or any other time you spent which was required to fix problems created by the students' edits. Any work you did as a Wikipedia Ambassador for that student's class should not be counted.

Please rate the amount of time spent as follows:
 * 0 -No unproductive work to clean up
 * 1 - A few minutes of work needed
 * 2 - Between a few minutes and half an hour of work needed
 * 3 - Half an hour to an hour of work needed
 * 4 - More than an hour of work needed

Please also add any comments you feel may be helpful. We welcome ratings from multiple editors on the same article. Add your input here. Thanks! -- LiAnna Davis (WMF) (talk) 19:46, 27 May 2012 (UTC)

Substantial, unattributed text copied from a journal paper.
It looks like almost all of the material added to this article on 26 March 2012 were copied directly from a review paper. [http://www.hindawi.com/journals/aai/2009/421425/ Xiaoyuan Su and Taghi M. Khoshgoftaar, “A Survey of Collaborative Filtering Techniques,” Adv. in Artificial Intelligence, vol. 2009, Article ID 421425, 2009. ]

In particular, the paragraphs beginning with the following lines all include several sentences taken directly from the paper, with no attribution and at most extremely minor changes. "In practice, many commercial recommender systems. . ." "As the numbers of users and items grow. . ." "For example, the seemingly different items. . ." "Gray sheep refers to the users whose. . ."

Although the publisher of the journal may raise suspicions, as far as I can tell it was originally published in 2009, and seems to be the legitimate origin of this text. The original is published under a CC Attribution license, so in principle I suppose it could be legitimately included here; however, proper attribution seems complicated.

As someone without much experience editing wikipedia articles, I'm hesitant to remove the offending material wholesale, and would welcome suggestions from seasoned editors about how best to proceed.

HiramJ (talk) 02:26, 7 December 2012 (UTC)

External links modified
Hello fellow Wikipedians,

I have just added archive links to 5 one external links on Collaborative filtering. Please take a moment to review my edit. If necessary, add after the link to keep me from modifying it. Alternatively, you can add to keep me off the page altogether. I made the following changes:
 * Added archive http://web.archive.org/web/20120606225352/http://www.redbeemedia.com:80/insights/integrated-approach-tv-vod-recommendations to http://www.redbeemedia.com/insights/integrated-approach-tv-vod-recommendations
 * Added archive http://web.archive.org/web/20131019134152/http://uai.sis.pitt.edu/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=231&proceeding_id=14 to http://uai.sis.pitt.edu/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=231&proceeding_id=14
 * Added archive http://web.archive.org/web/20101023032716/http://research.yahoo.com:80/pub/2435 to http://research.yahoo.com/pub/2435
 * Added archive http://web.archive.org/web/20080602151647/http://ieeexplore.ieee.org:80/xpls/abs_all.jsp?arnumber=1423975 to http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1423975
 * Added archive http://web.archive.org/web/20101023032716/http://research.yahoo.com:80/pub/2435 to http://research.yahoo.com/pub/2435

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.—cyberbot II  Talk to my owner :Online 23:23, 2 March 2016 (UTC)

New research progress in collaborative filtering
The Wikipedia article for collaborative filtering (CF) lacks contents about recent research progress in this field. We suggest to cite a paper that was published in 2014 entitled "Collaborative Filtering beyond the User-Item Matrix: A Survey of the State of the Art and Future Challenges". In this survey, Shi et al. claim that user-Item (U-I) matrix provides a basis for collaborative filtering techniques, however in recent years, a large body of researches of CF algorithms being benefit from other data sources beyond U-I matrix. The data consists of additional information regarding users or items (e.g. social links, tags and comments etc), as well as information about interactions between users and items. They present an introduction to the recommendation algorithms that are developed by incorporating auxiliary data sources. This paper can not only demonstrate the importance of this topic - collaborative filtering - as well as this Wikipedia article, but also improve its quality by making it more comprehensive and state of the art. 01:12, 26 February 2017 (UTC)Xianteng.


 * Great idea and very good reference, Xianteng Rostaf (talk) 14:54, 28 February 2017 (UTC)

Adding Context-aware collaborative filtering
In this article, there is a lack of discussion about recent state-of-the-art techniques that could tackle real-world problems. In recent years, with availability of different data sources which reflects various dimensions of users' preferences, the traditional representation of user-item matrix has been extended into higher order tensor representation. In a common scenario, for a 3rd order tensor we can have three dimension of users, items, and context. The context dimension usually represents time, location or devices associated to the target users that the recommendation is for. It is shown that, leveraging this additional aspect could significantly increase the effectiveness of a collaborative filtering in providing effective recommendation. Therefore, in my opinion, adding context-aware collaborative filtering to this article alongside a brief introduction of a common technique in this field, like tensor factorization, would make this article more beneficial for practical use. However before talking about tensor factorization, I suggest to provide a sub-section about matrix factorization approach that is being use to both reduce high dimensions of user-item matrix and handle data sparsity. In compare to tensor factorization, this topic it is more understandable for readers. Besides, I think there should be more discussion about item-based and user-based collaborative filtering. The pros and cons of these two important variation of CF should be compared together and provide more details about how they can be modified to handle cold start problem in collaborative filtering.

Sdjavadi (talk) 03:51, 26 February 2017 (UTC)

I've added more concrete paragraph about matrix factorization and its role in collaborative filtering. Also I created a new section about context-aware collaborative filtering. Sdjavadi (talk) —Preceding undated comment added 21:42, 11 April 2017 (UTC)

Black sheep
This is a very broad statement that is doubtful. There are other approaches for recommending, based on modeling preferences by direct entry, instead of doing it by observation. Therefore calling this an acceptable failure is an excuse, instead of a neutral assement. It would be correct to say it is a drawback and limitation that is to be expected from the approach.

Furthermore the terms black and gray sheep have a doubtful connotation that do not fit a scientific description.

External links modified
Hello fellow Wikipedians,

I have just modified one external link on Collaborative filtering. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20120422120844/http://www.readwriteweb.com/archives/collaborative_filtering_social_web.php to http://www.readwriteweb.com/archives/collaborative_filtering_social_web.php

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 17:01, 4 September 2017 (UTC)

Conclusion of Memory-based section needs improvement
The conclusion of the Memory-based section is dubious. Most of these conclusions seem implementation specific, while much of the previous discussion is general. As a key example, the statement "Although it can efficiently handle new users because it relies on a data structure" is not particularly meaningful or necessarily true. — Preceding unsigned comment added by Ivirshup (talk • contribs) 06:18, 17 June 2020 (UTC)

Wikipedia Ambassador Program course assignment
This article is the subject of an educational assignment supported by  the Wikipedia Ambassador Program.

The above message was substituted from by PrimeBOT (talk) on 15:56, 2 January 2023 (UTC)