Thursday, October 21, 2010

Collaborative Filtering

"If an user A has liked the movies "Matrix " and "The Lord of the Rings" and many other users that have liked these two movies also liked "Memento", then it is likely that "Memento" will be recommended to user A."

Collaborative Filtering is a type of recommender system widely implemented, and it is known for giving more accurated predictions than other approaches.

The basic idea of the algorithms in the collaborative filtering area is to provide recommendations based on what people with similar taste have liked in the past. These people, the neighbors, are selected by comparing the user's past preferences (usually presented as ratings on items). So, by measuring the ratings similarity its possible to recommend items liked by the neighborhood. There are two major techniques to compare ratings.


Let us consider a user as an N-dimensional vector of ratings, where each cell represents the rating of the user on a certain item. Then, in the user-based technique, the similarity between a user A and a user B is measured by the similarity between its vectors. It possible, for example, to apply the Pearson Correlation to the two vectors and find what we call the similarity coefficient between A and B (Sim AB). Once we have done this we can rank each item according to how many similar users have liked it.


In the item-based method instead of comparing the vectors of users we compare items. Suppose that all users vectors are together in a matrix. It is, then, possible to look at its columns instead of its rows. If the rows represents the users, then the columns intuitively represents the items. The basic idea of the algorithm consists in comparing the columns of this matrix and thus finding the similarity between the vectors of items. The prediction is then made by selecting items similar with the ones that the target user has preferred.
This approach was implement by Amazon, and it is described in recommendations: Item-to-item collaborative filtering by Greg Linden, Brent Smith, and Jeremy York.


Monday, October 18, 2010

Recommender Systems

"Suggest new items that fit the user’s preference."


The increasing amount of information in the web has promoted the advance of the recommender systems research area. 
These systems help users by offering useful suggestions to them. The aim of Recommender Systems is to provide personalized recommendations, representing a fundamental role on e-commerce (widely used by companies such as Amazon, Netflix and Google).
They highlight items that the users have not yet seen and may appreciate. Such items include books, restaurants, webpages or even lifestyles. A suggestion is usually made based on the user's historical preferences.
These preferences may be collected implicitly or explicitly. When a user is buying an item, or entering a web-page, for example, he is giving an implicit preference feedback. In the case of a user giving a rating to an article, he is providing an explicit feedback.
A substantial challenge in this area is the volume of information available. A recommendation algorithm may have to deal with enormous data sets, and for that, scalability and effectiveness have to be taken in count.
These systems are usually classified based on how recommendations are made. The most common categories are: Collaborative Filtering, Content-Based Filtering and Hybrid Filtering.

Recommender Systems Classification 

  • Content-Based Filtering
"If an user has liked the movie "Titanic", a typical recommendation would be "Ghost" because both of the items present the feature "Romance category". "

This approach relates the user's  past  preferences with content information. Therefore, it defines objects of interest to the users based on the associated features of items. 
The content-based technique proposes a method that creates an user profile based on his preferences on items. This profile is then compared with information about other items. Those with the most similarity, namely, the items with closer descriptions regarding the profile, are then recommended to the user.

  • Collaborative Filtering
"If an user A has liked the movie "Matrix " and "The Lord of the Rings" and many other users that have liked these two movies also liked "Memento", then it is likely that "Memento" will be recommended to user A."

The second technique recommends an item by comparing an user with a neighborhood of users. In collaborative filtering method, we are presented with a set of ratings of a user on items. It then identifies a group of other users and compares the ratings of the first with the ratings of the group.

  • Hybrid Filtering
These two major recommender techniques present theirs strength and weakness. The collaborative filtering doesn't need many information about the items. However, it presents some problems when a new item is inserted in the system, since no one has hated it yet. Despite of the content-based method not having this new item problem, it can lack of originality when giving recommendations. The recommended items are those very similar with the ones that the user has already seen, leading to uncreative suggestions. In addition, this method works well for items with good textual descriptions but it can fail with other types of multimedia items.
An hybrid approach is then proposed to overcome these problems. The Hybrid Filtering combines the two above cited methods to avoid the problems existent in them. In Hybrid Recommender Systems:  Survey and Experiments by Robin Burke it is presented many options of combining the two approaches and creating an hybrid method.

Where can I get a Recommender System?

If you are willing to have your own recommender system you may have two options: you can either build your own from zero or download one package from the internet and addapt it to your needs.
In case that you are planning to build you own, there are many good articles about the subject available. I suggest for you to understand your own dataset, discover wich approach is better  for you and then develop it. As testing data,y ou can use the package from MovieLens (from Grouplens Research), a set of ratings that users have given on movies.
If however, you can't spend that much time and effort on the subject, you may download an open source project from the internet. I recommend the Apache Mahout project, build in Java.

Some useful links and articles

Greg Linden, Brent Smith, and Jeremy York. recommendations: Item-to-item
collaborative filtering.

Robin Burke. Hybrid recommender systems: Survey and experiments. User Modeling
and User-Adapted Interaction.

 Badrul M. Sarwar, George Karypis, Joseph A. Konstan, and John Reidl. Item-based collaborative filtering recommendation algorithms.