algorithm - K nearest neighbour vs User based nearest neighbour -

- February 15, 2013

i reading on recommender systems on wikipedia , section on "algorithms" seems suggest k nearest neighbour , collaborative filtering based user based algorithm 2 different things. correct? given understanding, aren't both same? if not, differences between them? thanks.

not exactly. similar (they share same ideas), there several major differences between them. in fact, article on wikipedia describes 2 distinct ways implement recommender systems, there more of them use idea both these ways.

so here's how understood wikipedia article.

1st approach (knn/profiles similarity)

first of all, knn not main feature of first approach. it's algorithm find nearest items among whole collection, can used in collaborative filtering well. important idea lies in term "similarity". recommend user in question, find people neighborhood have similar profile. example, want make recommendation user john on facebook. @ fb profile , @ profiles of friends. find 10 people similar profiles , check like. if 8 of 10 people similar profiles new film, john too.

so, there 2 important points here:

you @ user's neighborhood
you measure similarity of profiles

wikipedia article doesn't cover question of how find similarity measure, there many ways, including searching common terms in profile's text, finding best friends (my number of messages between them, connection graph analysis, etc.) , many others.

2nd approach (collaborative filtering)

in second approach don't need analyze neighborhood , find similar profiles, need collect users choices. let's recall example facebook user john. imagine, can "likes" of fb users, including of john. them can build large correlation matrix, rows user ids , columns possible items may "like". if "liked" item, cell current user , current item set 1, otherwise 0.

with such matrix (built or abstract) can use association mining find strong associations. example, 10000 people liked "pirates of caribbean 2" liked "pirates of caribbean 3", 500 of them liked "saw". can suppose association between 2 episodes of "pirates" stronger. note, haven't analyzed neither users, nor films (we didn't take account film names, plots, actors or - "likes"). major advantage of collaborative filtering on methods based on similarity.

finally, recommend film out user john iterate on "likes" , find other items have strongest associations current one.

so, important points here are:

you don't use neighborhood, instead complete database of users
you use people's choices , find associations

both approaches have strong , weak points. 1st approach based on kind of connections between people (e.g. friends on facebook) , hardly can used services amazon. @ same time 2nd approach based on averaged preferences of users , isn't option systems highly different favors.

Search This Blog

Support