# INCORPORATING USER-ITEM SIMILARITY IN HYBRID NEIGHBORHOOD-BASED RECOMMENDATION SYSTEM

**Anonymous authors**
Paper under double-blind review

ABSTRACT

Modern hybrid recommendation systems require a sufficient amount of data.
However, several internet privacy issues make users skeptical about sharing their
personal information with online service providers. This work introduces various novel methods utilizing the baseline estimate to learn user interests from
their interactions. Subsequently, extracted user feature vectors are implemented
to estimate the user-item correlations, providing an additional fine-tuning factor
for neighborhood-based collaborative filtering systems. Comprehensive experiments show that utilizing the user-item similarity can boost the accuracy of hybrid
neighborhood-based systems by at least 2.11% while minimizing the need for
tracking users’ digital footprints.

1 INTRODUCTION

The continuously accelerated growth of communication technology and data storage in the past
decades has benefited customers with an enormous amount of online multimedia content such as
movies, music, news, and articles, creating billion-dollar industries. Following this evolution, recommendation systems (RSs) have been widely developed to automatically help users filter redundant
information and suggest only suitable products that fit their needs. Such systems are used in a variety
of domains and have become a part of our daily online experience (Ricci et al., 2015).

RSs are commonly classified into three main types (Adomavicius & Tuzhilin, 2005): the contentbased technique, the collaborative filtering technique, and the hybrid technique. The content-based
approach, as in Lang (1995); Pazzani & Billsus (1997); Lops et al. (2011); Narducci et al. (2016),
learns to recommend items that are similar to the ones that a user liked based on items’ features.
The main weakness of this approach is the lack of available and reliable metadata associated with
item (Mooney & Roy, 2000). Meanwhile, the collaborative filtering (CF) approach does not require
product information but only relies on users’ interaction history which can be either explicit or implicit feedback. CF systems can be divided into two major categories: i) neighborhood-based models
which suggest items that are most similar to the item that a user is interested in (Herlocker et al.,
2000; Tintarev & Masthoff, 2007), and ii) matrix factorization models which could explore the latent factors connecting items to users in order to make accurate recommendations (Ricci et al., 2015;
Koren et al., 2009; Koren, 2008). However, it is often the case that there is not enough transaction
data to make accurate recommendations for a new user or item. To tackle this so-called cold-start
problem, hybrid methods are proposed by combining auxiliary information into CF models (Singh
& Gordon, 2008; Agarwal & Chen, 2009; Wang & Blei, 2011; Li et al., 2011; Rendle, 2010).

In the interest of the hybrid approach and its advantages, our study attempts to improve typical
neighborhood-based RSs utilizing available content-related knowledge. The main contributions of
this work are summarized as follows:

-  Introducing new methods to represent user preference via combining user’s interaction data
and item’s content-based information, which helps to estimate the similarity between a user
and an item.

-  Integrating the user-item similarity degree into the baseline estimate of neighborhood-based
RSs to provide more precise recommendations, surpassing competitive hybrid models.


-----

The remainder of this paper is organized as follows. Section 2 reviews the basic knowledge on
neighborhood-based CF systems, including hybrid models. Detail descriptions of our proposed
methods are presented in Section 4. Section 5 gives experimental results and in-depth analysis. At
last, we conclude this study in Section 6.

2 PRELIMINARIES

In this paper, u, v denote users and i, j denote items. rui denotes the preference by user u for item
_i, also known as the rating, where high values indicate strong preference, and all the (u, i) pairs are_
stored in the set K = {(u, i)|rui is known}. Meanwhile, R(u) denotes the set of all items rated by
user u. In rating prediction task, the objective is to predict unknown rating ˆrui where user u has not
rated item i yet.

Popular neighborhood-based CF techniques for the rating prediction task and an existing hybrid
variant are briefly reviewed as follows.

2.1 NEIGHBORHOOD-BASED MODELS

The neighborhood-based approach is one of the most popular techniques in CF, which is only based
on the similarity between users or items to give recommendations. There are currently two methods
for implementing neighborhood-based CF models: i) user-oriented (or user-user) model which predicts a user’s preference based on similar users, and ii) item-oriented (or item-item) model which
finds similar items to the item a user liked and recommends these items to her. Of the two methods,
the latter introduced by Sarwar et al. (2001) has become dominant. This is due to the fact that the
number of users in real-life systems is often orders of magnitude bigger than of items, which makes
the user-oriented model inefficient. Furthermore, the item-oriented model is capable of providing
a rational explanation for recommendations (Ricci et al., 2015). Therefore, our implementations in
this work adopt the item-item approach as the base model.

The fundamental of neighborhood-based models is similarity measure. By computing the similarity
degree sij between all pairs of items i and j using popular similarity measures such as Cosine
similarity (Cos) or Pearson Correlation Coefficients (PCC), we can identify the set of k neighbors
S[k](i, u) which consists of k most similar items to i rated by user u. Then, ˆrui can be predicted as a
weighted average of the ratings of similar items:


_sijruj_
_j∈SX[k](i,u)_ (1)

_sij_
_j∈SX[k](i,u)_


_rˆui =_


Even though Equation (1) can capture the user-item interactions, much of the observed ratings are
due to the bias effects associated with either users or items, independently of their interactions. In
detail, some items usually receive higher ratings than others, and some users tend to give higher ratings than others. kNNBaseline model proposed by Koren (2010) adjusts the above formula through
a baseline estimate which accounts for the user and item effects as follows.

_sij (ruj −_ _buj)_

_rˆui[kNNBaseline]_ = bui + _j∈SX[k](i;u)_ (2)

_sij_
_j∈SX[k](i,u)_

where bui = µ + bu + bi denotes the baseline estimate, µ denotes the mean of overall ratings, bu
and bi correspond to the bias of user u and item i, respectively, which can be trained using popular
optimization algorithms such as Stochastic Gradient Descent (SGD) or Alternating Least Squares
(ALS).

2.2 INTEGRATING CONTENT-BASED INFORMATION INTO NEIGHBORHOOD-BASED MODELS

A number of problems regarding kNN models using similarity measure on the rating information
were noticed by Duong et al. (2019). The first problem is the sparsity of the rating matrix, which


-----

might yield an inaccurate similarity score between two items that share only a few common users.
Secondly, filtering common users who rated both items to calculate the similarity score is a timeconsuming task due to a large number of users. To address these problems, a novel similarity measure was proposed using item content-based information. Assuming that each item i is characterized
by a feature vector qi = {qi1, qi2, ..., qif _} ∈_ R[f] where f is the number of features, which is stored
in matrix Q ∈ R[n][×][f] . The value of each element encodes how strong an item exhibits particular
properties. The similarity score sij between movies i and j is calculated as follows.

_f_
_k=1_ _[q][ik][q][jk]_
_s[Cos]ij_ [content] = (3)
_f_ _f_
_kP=1_ _[q]ik[2]_ _k=1_ _[q]jk[2]_

or qP qP

_fk=1[(][q][ik][ −]_ **_q[¯]i)(qjk_** **_q¯j)_**
_s[PCC]ij_ [content] = _−_ (4)
_fk=1P[(][q][ik][ −]_ **_q[¯]i)[2]_** _fk=1[(][q][jk][ −]_ **_q[¯]j)[2]_**

where ¯qi and ¯qj are the mean of feature vectorsqP **_qi andqP qj, respectively. Experiments showed that_**
the item-oriented CF models using Cos[content] and PCC[content] provide equivalent accuracy to the stateof-the-art CF models using rating information whilst performing at least 2 times faster. Hereafter,
kNNBaseline model using one of these similarity measures is referred to as kNNContent.

3 EXPERIMENTAL SETTING

3.1 MOVIELENS DATASET AND EVALUATION CRITERIA

In this work, the MovieLens 20M dataset is used as a benchmark. This is a widely used dataset in
the study of RSs which originally contains 20,000,263 ratings and 465,564 tag applications across
27,278 movies. The ratings are float values ranging from 0.5 to 5.0 with a step of 0.5. Tag Genome
data, which is computed on user-contributed content including tags, ratings, and textual reviews, is
firstly introduced in this version of MovieLens dataset (Harper & Konstan, 2016). To utilize this
kind of data, a cleaning process is applied to the dataset. Specifically, the movies without genome
tags are excluded from the dataset. Then, only movies and users with at least 20 ratings are kept.
Table 1 summarizes the result of the cleaning stage.

Table 1: Summary of the original MovieLens 20M and the preprocessed dataset.

|Dataset|# Ratings|# Users|# Movies|Sparsity|
|---|---|---|---|---|
|Original|20,000,263|138,493|27,278|99.47%|
|Preprocessed|19,793,342|138,185|10,239|98.97%|


**Dataset** # Ratings # Users # Movies Sparsity

Original 20,000,263 138,493 27,278 99.47%

Preprocessed 19,793,342 138,185 10,239 98.97%


The preprocessed dataset is split into 2 distinct parts: 80% as the training set and the remaining
20% as the testing set. To evaluate the performance of the proposed models, three commonly used
indicators in the field of rating prediction are used: RMSE (Root Mean Squared Error) and MAE
(Mean Absolute Error) for accuracy evaluation where smaller values indicate better performance,
and Time [s] for timing evaluation. Here, RMSE and MAE are defined as follows.


(ˆrui _rui)[2]_ _/_ TestSet (5)

su,i∈XTestSet _−_ _|_ _|_

_rˆui_ _rui_ _/_ TestSet (6)
_u,i∈XTestSet_ _|_ _−_ _|_ _|_ _|_


RMSE =

MAE =


where |TestSet| is the size of the testing set. The total duration of the model’s learning process on
the training set and predicting all samples in the testing set is measured as Time [s]. All experiments
are carried out using Google Colaboratory with 25GB RAM and no GPU.


-----

3.2 BASELINE MODELS

In this paper, several popular models are selected as baselines to evaluate the proposed methods.
Firstly, we implement two competitive neighborhood-based models including kNNBaseline (Koren,
2010) and kNNContent (Duong et al., 2019). Besides, SVD (Funk, 2006) and SVD++ (Koren,
2008), two well-known representatives of matrix factorization technique, are also experimented due
to their superior accuracy and flexible scalability with extremely sparse data (Funk, 2006).

Error rates and the time to make predictions are measured for comparison. The optimal hyperparameters for each baseline model are carefully chosen using 5-fold cross-validation. In particular, the error rates of the neighborhood-based models are calculated with the neighborhood size
_k ∈{10, 15, 20, 25, 30, 35, 40}. Due to better performance compared to Cos and Cos[content], PCC_
and PCC[content] are chosen as the similarity measures in kNNBaseline and kNNContent models. SVD
and SVD++ models are trained using 40 hidden factors with 100 iterations and the step size of 0.002.

4 PROPOSED SYSTEMS

So far, kNNBaseline models have successfully applied the item-item similarity exploiting rating
information and the available metadata representing item features provided by users. In contrast,
the knowledge about user-user correlation finds it difficult to be deployed in practical applications
due to its modest performance and high memory requirement (Ricci et al., 2015). Besides, to our
best knowledge, the interest of a user in individual characteristics of an item also lacks careful
consideration, which is a major problem restricting the growth of RSs. One of the main reasons is
that the user-item correlation is commonly defined as a similarity degree between a user’s interest
in individual item features and an item feature vector, which requires a customer to provide her
personal preferences as much as possible for accurate recommendations. In reality, it is impractical
due to a variety of data privacy concerns (Jeckmans et al., 2013).

This study first tackles this problem by introducing various novel methods to represent a user preference in the form of a vector, utilizing her past interactions with items and the feature vectors of
those items. We then propose a modification to the baseline estimate of kNNBaseline model and its
variants by integrating the user-item similarity score, which boosts the precision of the conventional
kNNBaseline model.

4.1 ESTIMATING USER INTERESTS FOR USER-ITEM SIMILARITY MEASURE

In RSs, there are two main sources of information: the interaction records (such as transactions
history, ratings, ...) and the item content information (item catalog, movie genres, or the Tag Genome
data in the MovieLens 20M dataset). User personal data, however, is not stored or included in
publicly available datasets due to the risk of exposing user identities. Therefore, in most datasets for
research, there is rarely any data or statistic that directly specifies user interest in each item feature.
In this section, we present 3 different methods to characterize a user’s interest based on the ratings
and metadata of the movies she watched.

The most straightforward approach to estimate a user interest is via a weighted average of the feature
vectors qi of items that she rated as follows.

_rui[norm]_ **_qi_**

_·_

**_p[norm]u_** = _i∈XR(u|R)_ (u)| (7)

where rui[norm] is the rating of user u for item i which has been normalized to the range of [0, 1]. As a
result, the normalized feature vector p[norm]u of user u has the same dimension and range of element
values as an item vector qi. More importantly, each user is currently described in an explainable way:
elements with higher values indicate that the user has a greater preference for the corresponding item
attributes and vice versa.

Although this method helps to create a simple shortcut to understand user preferences, all users
are treated in the same way. Specifically, users’ ratings are all normalized using the minimum and
maximum values of the system’s rating scale. Whereas, in practice, different users have a variety


-----

tendencies of rating an item according to their characters. For example, easy-going people often
rate movies a little higher than they really feel, and conversely, strict users often give lower scores
than the others. That means if two users have conflicting views after watching a movie but accept
to give a 3-star rating for that movie, for example, then the system will implicitly assume they have
the same weight of opinion. This fact leads to the researches taking into account the user and item
biases, which have a considerable impact on kNNBaseline and biased SVD models (Koren et al.,
2009; Koren, 2010). Therefore, a modification of Equation (7) incorporating the effect of biases is
proposed as follows.

_zui_ **_qi_**
_·_

**_p[biased]u_** = _i∈XR(u)_ (8)

_zui_
_|_ _|_
_i∈XR(u)_

less than her expectation to a moviewhere the residual rating zui = rui − i. In more detail, this formula applies the residual ratings asbui denotes how much extra rating a user u gives more or
weighting factors to the corresponding item feature vectors, which helps to eliminate the restrictions
of rui[norm]. The resulting biased user feature vector p[biased]u now has its elements in the value range of

[−1, +1], where −1 / +1 indicates that she totally hates / loves the respective item attribute, and 0
is neutral preference. It is expected that p[biased]u could measure the user interest in each item attribute
more precisely than its normalized version p[norm]u .

However, both of the above methods treat all items equally in profiling a user interest. For example,
consider user Janet and two movies “Titanic” and “Mad Max”. The scores of “Titanic” and “Mad
Max” for the romantic genre are 0.90 and 0.05, respectively, which means “Titanic” is a romantic
movie while “Mad Max” has almost no romantic scene. Assume that Janet’s normalized ratings
for these movies are ˜rJanet,Titanic = 0.7 and ˜rJanet,Mad Max = 0.72, which are almost identical. Thus,
the romantic genre score of Janet calculated by Equation (7) is quite low: (0.7 × 0.9 + 0.72 ×
0.05)/2 = 0.333. The fact that “Mad Max” has no romantic element does not mean that Janet
doesn’t like romantic movies. Equation (8) also encounters the same problem. This might lead to
misunderstanding the character of a user in some cases.

This problem can be solved by alleviating the influence of low score features whilst primarily focusing on features with high values. Accordingly, the simplest method is to use the scores themselves as
the weights in parallel with normalized ratings to estimate user feature vectors. Specifically for the
above example, the score of Janet for the romantic genre is equal to [(0][.][7][×](0[0][.].[9]7[×]×[0]0[.].[9)+(0]9)+(0[.].[72]72[×]×[0]0[.].[05]05)[×][0][.][05)] =

0.854, which is much more reasonable than measuring the affection of a user for a kind of genre
based on items that are not relevant to that genre. The biased feature vector of user u weighted by
item feature vector p[w]u _[−][biased]_ can be formulated as follows.


_zui_ **_qi[2]_**
_·_
_i∈XR(u)_ (9)

_zui_ **_qi_**
_|_ _| ·_
_i∈XR(u)_


**_pu[w][−][biased]_**


From the user feature vectors calculated using one of the above methods, it is noticeable that each
value describing user interests has comparatively the same meaning as the corresponding value in
the item feature vector. Therefore, the strength of the relevance between a user and an item can be
evaluated using common similarity measures such as Cos or PCC (Section 2.2), which eventually
calculates a user-item similarity matrix. In the next section, we demonstrate the effectiveness of
these vector representations by integrating user-item similarity into the popular kNNBaseline model
and its variants.

4.2 INTEGRATING THE USER-ITEM CORRELATIONS INTO THE BASELINE ESTIMATE

In item-oriented kNNBaseline, the baseline estimate takes the main role of predicting the coarse
ratings while the analogy between items serves as a fine-tuning term to improve the accuracy of
the final predicted ratings. Furthermore, it also means that the more precise the baseline estimate
is to the targeted rating, the better the kNNBaseline models get in terms of prediction accuracy
(Duong Tan et al., 2020). However, we realize that a conventional baseline estimate only considers


-----

the biases of users and items separately, ignoring the user-item correlations, which might lead to a
rudimentary evaluation approach.

For example, an RS needs to estimate the ratings of user James to two movies “Titanic” and “Mad
Max”. Assuming that the average rating, µ, is 3.7 stars. Furthermore, “Titanic” is better than an
ordinary movie, so it tends to be rated 0.5 stars above the average. Meanwhile, James is a critical
user, who usually rates 0.3 stars lower than a moderate user. Thus, the baseline estimate of “Titanic”
rated by James would be 3.9 stars (= 3.7 − 0.3 + 0.5). On the other hand, “Mad Max” tends
to be rated 0.6 stars higher than the mean rating; hence, the baseline estimate of James for “Mad
Max” would be 4.0 stars (= 3.7 − 0.3 + 0.6). However, from James’s past interactions with other
movies, the system estimates James’s interests using one of the methods described in Section 4.1
and discovers that a romantic and drama movie like “Titanic” seems to be very suitable for James
while his personality is contradictory compared to an action and thriller movie like “Mad Max”.
Consequently, the above predicted ratings of James now turn out to be rather irrational.

Figure 1: The residual rating of several users with respect to the user-item similarity degree. sij
values are calculated using PCC similarity measure to compare between Tag Genome data of the
movies in the MovieLens 20M dataset and the p[w-biased]u matrix. The red trendlines are determined
using linear regression.

As illustrated in Figure 1, there is an approximate-linear relationship between the user-item correlations and the residual ratings: the more interested a user is in a movie (i.e., the larger user-item
similarity score), the higher she tends to rate that movie. In order to take the analogy between
user and item into account, we propose a revised version of the baseline estimate by integrating the
user-item similarity score as follows.
_bui = µ + bu + bi + ω × sui_ (10)
where sui is the similarity degree between user u and item i, and ω serves as the weight to adjust
the contribution of the user-item correlation term to fit the rating information.

By introducing ω, the least squares problem of the enhanced baseline estimate term is now updated
to the following function.


_b[∗]u[, b][∗]i_ _[, ω][∗]_ [= arg min]
_bu,bi,ω_


_u,iX∈K(rui −_ (µ + bu + bi + ωsui))[2]


(11)


(11)

+ λ _b[2]u_ [+] _b[2]i_ [+] _ω[2]_

 

_u_ Xi _u,iX∈K_

[X] 

In this paper, two common optimization techniques, namely SGD and ALS, are experimented to
solve this problem. An SGD optimizer minimizes the sum of the squared errors in Equation (11)
using the following update rule.
_bu ←_ _bu + α(eui −_ _λ.bu)_
_bi ←_ _bi + α(eui −_ _λ.bi)_ (12)
_ω_ _ω + α(eui.sui_ _λ.ω)_
_←_ _−_


-----

where eui = rui − _rˆui is the predicting error, α is the learning rate, and λ is L2 regularization term._

Different from SGD, the ALS technique decouples the calculation of one parameter from the others
(Koren, 2010). In one iteration, the ALS process can be described as follows. First, for each item i,
the optimizer fixes the bu’s and ω to solve for the bi’s.


_rui_ _µ_ _bu_ _ωsui_
_−_ _−_ _−_
_u|(Xu,i)∈K_ (13)

_λi + |{u|(u, i) ∈_ K}|


_bi =_


Then, for each user u, the optimizer fixes the bi’s and ω to solve for the bu’s.


_rui_ _µ_ _bi_ _ωsui_
_−_ _−_ _−_
_i|(u,iX)∈K_ (14)

_λu + |{i|(u, i) ∈_ K}|


_bu =_


Finally, the optimizer fixes both the bu’s and the bi’s to solve for ω.


_sui(rui_ _µ_ _bu_ _bi)_
_u,iX∈K_ _−_ _−_ _−_ (15)

_λω + |K|_


_ω =_


Here, the regularization terms λi, λu, and λω are the shrinkage and vary due to the number of the
ratings that affect each parameter. Therefore, each parameter of bu’s, bi’s, and ω needs a distinct
value of λ, which can be determined by cross-validation. By applying a learnable weighting factor
_ω to the user-item similarity term, the new kNNBaseline model is capable of exploiting auxiliary_
information to achieve more precise predictions.

5 PERFORMANCE EVALUATION

To assess the new methods of characterizing user preferences and the proposed baseline estimate in
Section 4, Tag Genome in the MovieLens 20M dataset is used to construct a movie feature vector:
**_qi = {gi1, gi2, ..., gik, ...} where gik is the genome score of genome tag k[th]. In our experiments,_**
**_p[norm]u_**, p[biased]u, and p[w-biased]u are first integrated into the traditional baseline estimate to find the optimal
technique of profiling user interest in terms of predicting accuracy. The enhanced baseline estimate
is then implemented into several neighborhood-based models to comprehensively evaluate its impact
on the final rating prediction.

5.1 ACCURACY OF THE BASELINE ESTIMATE UTILIZING THE USER-ITEM CORRELATION

The enhanced baseline estimates are learned using both optimization algorithms SGD and ALS
for comparison. For SGD, the baseline are trained using the learning rate α = 0.005 and the
regularization λ = 0.02. For ALS, typical values for λu and λi in the MovieLens dataset are 15 and
10, respectively (Hug, 2020). However, the number of training points in set K is much larger than
the number of appearances of each user or item, which completely differs the value of λω from λu
and λi. Therefore, a grid search is performed on λω, which finds that λω = −9, 500, 000 is the best
choice.

Table 2 shows that utilizing the user-item correlation helps to improve the accuracy of the traditional
baseline estimate at the price of increased complexity. Empirical results also prove the superior of
**_p[w-biased]u_** over its counterparts for both similarity measures being used. Specifically, calculating the
user-item similarity with PCC achieves the coarse rating prediction with 6.46% lower RMSE and
6.71% lower MAE but takes approximately 3.6 times as much time as the original baseline estimate
(optimized via ALS). A noteworthy point here is that ALS achieves consistently lower error rates
than SGD for all cases at the expense of requiring an additional hyperparameter tuning process (and
thus a further computational complexity). However, this trade-off is acceptable at this stage because
the absolute time to determine the baseline estimate compared to the total time to make the final
prediction is negligible. Hence, ALS is selected as the optimizer for the proposed baseline estimate
hereafter.


-----

Table 2: Performance of the enhanced baseline estimates with different types of user feature vectors.
The conventional baseline estimate without the user-item similarity is included for comparison.

|User feature vectors|Similarity measure|SGD|Col4|Col5|ALS|Col7|Col8|
|---|---|---|---|---|---|---|---|
|||RMSE|MAE|Time [s]|RMSE|MAE|Time [s]|
|None||0.8593|0.6595|24|0.8576|0.6590|34|
|pnorm u|Cos|0.8553 (-0.47%)|0.6567 (-0.42%)|71 (x3.0)|0.8351 (-2.62%)|0.6348 (-3.67%)|114 (x3.4)|
||PCC|0.8432 (-1.87%)|0.6474 (-1.83%)|75 (x3.1)|0.8184 (-4.80%)|0.6274 (-4.79%)|121 (x3.6)|
|pbiased u|Cos|0.8153 (-5.12%)|0.6239 (-5.40%)|73 (x3.3)|0.8129 (-5.21%)|0.6228 (-5.49%)|117 (x3.4)|
||PCC|0.8096 (-5.78%)|0.6201 (-5.97%)|79 (x3.3)|0.8072 (-5.88%)|0.6186 (-6.13%)|126 (x3.7)|
|pw-biased u|Cos|0.8149 (-5.17%)|0.6235 (-5.46%)|74 (x3.1)|0.8057 (-6.05%)|0.6172 (-6.34%)|119 (x3.5)|
||PCC|0.8069 (-6.10%)|0.6171 (-6.43%)|80 (x3.3)|0.8022 (-6.46%)|0.6148 (-6.71%)|122 (x3.6)|


**User feature** **Similarity** **SGD** **ALS**
**vectors** **measure** **RMSE** **MAE** **Time [s]** **RMSE** **MAE** **Time [s]**

_None_ _0.8593_ _0.6595_ _24_ _0.8576_ _0.6590_ _34_

0.8553 0.6567 71 0.8351 0.6348 114
Cos

_(-0.47%)_ _(-0.42%)_ _(x3.0)_ _(-2.62%)_ _(-3.67%)_ _(x3.4)_

**_p[norm]u_** 0.8432 0.6474 75 0.8184 0.6274 121

PCC

_(-1.87%)_ _(-1.83%)_ _(x3.1)_ _(-4.80%)_ _(-4.79%)_ _(x3.6)_

0.8153 0.6239 73 0.8129 0.6228 117
Cos

_(-5.12%)_ _(-5.40%)_ _(x3.3)_ _(-5.21%)_ _(-5.49%)_ _(x3.4)_

**_p[biased]u_** 0.8096 0.6201 79 0.8072 0.6186 126

PCC

_(-5.78%)_ _(-5.97%)_ _(x3.3)_ _(-5.88%)_ _(-6.13%)_ _(x3.7)_

0.8149 0.6235 74 0.8057 0.6172 119
Cos

_(-5.17%)_ _(-5.46%)_ _(x3.1)_ _(-6.05%)_ _(-6.34%)_ _(x3.5)_

**_p[w-biased]u_** 0.8069 0.6171 80 **0.8022** **0.6148** **122**

PCC

_(-6.10%)_ _(-6.43%)_ _(x3.3)_ _(-6.46%)_ _(-6.71%)_ _(x3.6)_


5.2 PERFORMANCE OF THE UNIFIED NEIGHBORHOOD-BASED SYSTEM

Finally, the advanced baseline estimates are integrated into kNNBaseline and kNNContent models
to refine the ultimate rating predictions. For calculating the item-item similarity in kNNBaseline
model, two common measures Cos and PCC are examined. The same goes for kNNContent model,
where Cos[content] and PCC[content] are both implemented for comparison.

Figure 2: Error rates of kNNBaseline and kNNContent models when incorporating the user-item
correlations with different sizes of the neighborhood.


-----

As shown in Figure 2, the outperformance of the modified baseline estimates over their original
makes a significant improvement in predicting accuracy: the newly proposed neighborhood-based
models totally surpass their initial versions for all cases. It is noticeable that even though incorporating sui calculated by PCC is clearly better than Cos when using p[norm]u, the difference between these
two similarity measures gets much smaller in the case of p[biased]u and almost disappears with p[w-biased]u .
This is because the user feature vector generated by Equation (8) or Equation (9) has the original
ratings subtracted by the baseline estimate, which makes the mean of the resulting vector come
close to 0. Therefore, applying Cos or PCC to the approximately zero-mean vectors produces nearly
identical results. In the following experiments, p[w-biased]u and PCC are thus opted for calculating the
user-item similarity for best accuracy.

Table 3 shows a comparison between the neighborhood-based models incorporating the user-item
correlations and several common CF ones. The most accurate model, kNNContent with sui, gains:

-  4.80% lower RMSE and 4.88% lower MAE than original kNNBaseline.

-  2.11% lower RMSE and 2.03% lower MAE than original kNNContent.

-  2.56% lower RMSE and 2.91% lower MAE than SVD.

-  2.22% lower RMSE and 2.10% lower MAE than SVD++.

Table 3: Performance of the neighborhood-based models utilizing the user-item correlations against
popular CF models.

|Model|RMSE|MAE|Time [s]|
|---|---|---|---|
|kNNBaseline (k = 40) kNNContent (k = 20) SVD (40 factors) SVD++ (40 factors)|0.8108 0.7885 0.7922 0.7894|0.6167 0.5988 0.6042 0.5992|565 293 292 27,387|
|kNNBaseline incorporating s ui (k = 40)|0.7853|0.5981|659|
|kNNContent incorporating s ui (k = 25)|0.7719|0.5866|392|


**Model** **RMSE** **MAE** **Time [s]**

kNNBaseline (k = 40) 0.8108 0.6167 565

kNNContent (k = 20) 0.7885 0.5988 293

SVD (40 factors) 0.7922 0.6042 292

SVD++ (40 factors) 0.7894 0.5992 27,387

kNNBaseline incorporating sui 0.7853 0.5981 659
(k = 40)

kNNContent incorporating sui **0.7719** **0.5866** **392**
(k = 25)


These improvements in predicting accuracy are achieved at the expense of the additional complexity.
However, in practice evaluating the user-item similarity matrix from fixed-length vectors could be
performed in parallel with a low computational cost. Hence, we consider that this trade-off is worth
it in real-life applications.

6 CONCLUSION

In this paper, we first introduced various techniques to characterize user preferences utilizing both
rating data and item content information. The new user representations not only help to understand
user interests in each item attribute but also make it possible to measure the user-item correlations.
An innovative method was then proposed to adjust the baseline estimate of kNNBaseline model that
takes the user-item similarity into account. Thereby, the resulting hybrid models achieve at least
2.11% lower RMSE and 2.03% MAE compared to their neighborhood-based counterparts. This
leads to the conclusion that neighborhood-based RSs could be greatly improved by integrating both
the item-item and user-item correlations in the predicting model.

REFERENCES

Gediminas Adomavicius and Alexander Tuzhilin. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE transactions on knowledge
_and data engineering, 17(6):734–749, 2005._


-----

Deepak Agarwal and Bee-Chung Chen. Regression-based latent factor models. In Proceedings of
_the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp._
19–28. ACM, 2009.

Tan Nghia Duong, Viet Duc Than, Tuan Anh Vuong, Trong Hiep Tran, Quang Hieu Dang, Duc Minh
Nguyen, and Hung Manh Pham. A novel hybrid recommendation system integrating contentbased and rating information. In International Conference on Network-Based Information Sys_tems, pp. 325–337. Springer, 2019._

Nghia Duong Tan, Tuan Anh Vuong, Duc Minh Nguyen, and Quang Hieu Dang. Utilizing an
autoencoder-generated item representation in hybrid recommendation system. IEEE Access, PP:
1–1, 04 2020. doi: 10.1109/ACCESS.2020.2989408.

Simon Funk. Netflix update: Try this at home, 2006.

F Maxwell Harper and Joseph A Konstan. The movielens datasets: History and context. Acm
_transactions on interactive intelligent systems (tiis), 5(4):19, 2016._

Jonathan L Herlocker, Joseph A Konstan, and John Riedl. Explaining collaborative filtering recommendations. In Proceedings of the 2000 ACM conference on Computer supported cooperative
_work, pp. 241–250. ACM, 2000._

Nicolas Hug. Surprise: A python library for recommender systems. Journal of Open Source Soft_[ware, 5(52):2174, 2020. doi: 10.21105/joss.02174. URL https://doi.org/10.21105/](https://doi.org/10.21105/joss.02174)_
[joss.02174.](https://doi.org/10.21105/joss.02174)

Arjan JP Jeckmans, Michael Beye, Zekeriya Erkin, Pieter Hartel, Reginald L Lagendijk, and Qiang
Tang. Privacy in recommender systems. In Social media retrieval, pp. 263–281. Springer, 2013.

Yehuda Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model.
In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and
_data mining, pp. 426–434. ACM, 2008._

Yehuda Koren. Factor in the neighbors: Scalable and accurate collaborative filtering. ACM Trans_actions on Knowledge Discovery from Data (TKDD), 4(1):1, 2010._

Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for recommender
systems. Computer, 42(8):30–37, 2009.

Ken Lang. Newsweeder: Learning to filter netnews. In Machine Learning Proceedings 1995, pp.
331–339. Elsevier, 1995.

Wu-Jun Li, Dit-Yan Yeung, and Zhihua Zhang. Generalized latent factor models for social network
analysis. In Twenty-Second International Joint Conference on Artificial Intelligence, 2011.

Pasquale Lops, Marco De Gemmis, and Giovanni Semeraro. Content-based recommender systems:
State of the art and trends. In Recommender systems handbook, pp. 73–105. Springer, 2011.

Raymond J Mooney and Loriene Roy. Content-based book recommending using learning for text
categorization. In Proceedings of the fifth ACM conference on Digital libraries, pp. 195–204.
ACM, 2000.

Fedelucio Narducci, Pierpaolo Basile, Cataldo Musto, Pasquale Lops, Annalina Caputo, Marco
de Gemmis, Leo Iaquinta, and Giovanni Semeraro. Concept-based item representations for a
cross-lingual content-based recommendation process. Information Sciences, 374:15–31, 2016.

Michael Pazzani and Daniel Billsus. Learning and revising user profiles: The identification of
interesting web sites. Machine learning, 27(3):313–331, 1997.

Steffen Rendle. Factorization machines. In 2010 IEEE International Conference on Data Mining,
pp. 995–1000. IEEE, 2010.

Francesco Ricci, Lior Rokach, and Bracha Shapira. Recommender systems: introduction and challenges. In Recommender systems handbook, pp. 1–34. Springer, 2015.


-----

Badrul Munir Sarwar, George Karypis, Joseph A Konstan, John Riedl, et al. Item-based collaborative filtering recommendation algorithms. Www, 1:285–295, 2001.

Ajit P Singh and Geoffrey J Gordon. Relational learning via collective matrix factorization. In
_Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and_
_data mining, pp. 650–658. ACM, 2008._

Nava Tintarev and Judith Masthoff. A survey of explanations in recommender systems. In 2007
_IEEE 23rd international conference on data engineering workshop, pp. 801–810. IEEE, 2007._

Chong Wang and David M Blei. Collaborative topic modeling for recommending scientific articles.
In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and
_data mining, pp. 448–456. ACM, 2011._


-----