Transcript
Scalable infrastructures for personalization Anne-Marie Kermarrec Inria, France: 8 research centers, 150 research teams Les huit centres de recherche Inria Inria RENNES Bretagne Atlantique Inria BORDEAUX Sud-Ouest Inria PARIS - Rocquencourt Inria LILLE Nord Europe Inria NANCY Grand Est Inria SACLAY le-de-France Inria GRENOBLE Rhne-Alpes Inria SOPHIA ANTIPOLIS Mditerrane - 3 June 2014A.-M. Kermarrec (Inria) June 2014A.-M. Kermarrec (Inria) A cry for personalization June 2014A.-M. Kermarrec (Inria) Why is personalization so difficult? Huge volume of data: small portion of interest Dynamic interests Interesting stuff does not come always from friends Classical notification systems do not filter enough or too much Scalable personalization infrastructures June 2014A.-M. Kermarrec (Inria) KNN computation over large data Basic building block for many applications Similarity search Machine learning Data mining Image processing Collaborative filtering June 2014A.-M. Kermarrec (Inria) KNN-based user-centric collaborative filtering Provide each user with her k closest neighbors (Users owns a profile, the system has its favorite similarity metric) Use this topology for personalized notifications recommendation Alice Bob Carl Dave Ellie June 2014A.-M. Kermarrec (Inria) Dealing with truly big data Want to scale? Think P2P June 2014A.-M. Kermarrec (Inria) Do not look exhaustively June 2014A.-M. Kermarrec (Inria) The key to scalability in KNN graph construction Consider a partial set of candidates Sampling-based approach June 2014A.-M. Kermarrec (Inria) P2P KNN graph construction Which nodes are close? How to discover them? Similarity metric Sampling June 2014A.-M. Kermarrec (Inria) Which nodes are close? Model U(sers) I(tems) (items) Profile(u) = vector of liked/shared/viewed items Cosine similarity metric Jaccard metric Minimal information: no tag, no users input, generic June 2014 A.-M. Kermarrec (Inria) Each node maintains a set of neighbors (c entries) Peer exchange Shuffle P Q How to discover them: Gossip-based computing Result random graph Highly resilient against churn, partition Small diameter [JGKVV, ACM TOCS 2007] June 2014A.-M. Kermarrec (Inria) KNN construction Similarity computation exchange of neighbors lists neighborhood optimization 1 2 Alice Bob Carl DaveEllie Frank June 2014A.-M. Kermarrec (Inria) Decentralized KNN selection [FGKL Middleware 2010] RPS layer providing random sampling KPS clustering layer gossip-based topology clustering Interest-based linkRandom link Alice Bob Carl Dave Ellie Alice Bob Carl Dave Ellie June 2014A.-M. Kermarrec (Inria) Convergence Cycles c current neighbors versus the c closest Biased sampling Random sampling June 2014A.-M. Kermarrec (Inria) Applications - Decentralized news recommendation [BFGJK, IPDPS 2013] - Top-K [BGKL, ACM TODS 2011] [BGK, ACM TOIT 2014] - Geo recommendation [BKKT, ICDCS 2012] June 2014A.-M. Kermarrec (Inria) DECENTRALIZED NEWS RECOMMENDER Notification is taking over June 2014A.-M. Kermarrec (Inria) Whats wrong with news feed Interest are dynamic Wrong granularity for filtering of classical notification systems Small portion of the available information is of interest Interesting stuff does not come always from friends June 2014A.-M. Kermarrec (Inria) WhatsUp in a nutshell KNN selection Dissemination June 2014A.-M. Kermarrec (Inria) Dissemination: orientation and amplification Orientation: to whom? Exploit: Forward To friends Explore: Forward to random users Amplification: to how many? Increase Fanout (Log(n)) Decrease Fanout (1) June 2014A.-M. Kermarrec (Inria) WhatsUp in action on the survey (480 users) Precision Recall F1-Score Messages Gossip (f=4) 0.34 0.99 0.51 2.3 M Cosine-CF 0.50 0.65 0.57 5,9k Whatsup (f=10) 0.471 0.83 0.60 2,4k 160 180 200 w (WHATSUP) 80 100 120 140 160 180 200 Cycle (b) Similarity in WUP view (WHATSUP-Cos) 80 100 120 140 160 180 200 Cycle (c) Reception of liked news items (WHATSUP) Figure 7: Cold start and dynamics in WHATSUP eiving news quickly as shown in n the number of interesting news ode joins. This is a result of both (Section II-D) and our metrics h small proles. Once the nodes mber of received news per cycle arable to those of the reference oining node reaches 80% of the after only a few cycles. e, we select a pair of random ataset and, at 100 cycles into the r interests and start measuring the uild their WUP views. Figure 7 by averaging 100 experiments. auses the views to converge faster cycles as opposed to over 100. ecall and precision for the nodes nterestsnever decreasebelow 80% ues. These results are clearly tied window, set to about 40 cycles in windows would in fact lead to an nodes (machines and users) deployed on a 25-node cluster equipped with theModelNet network emulator. For practical reasons we consider a shorter trace and very fast gossip and news-generation cycles of 30sec, with 5 news items per cycle. These gossip frequencies are higher than those we use in our prototype, but they were chosen to be able to run a large number of experiments in reasonable time. We also use a prole window of 4min, compatible with the duration of our experiments (1 to 2 hours each). 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 2 4 6 8 10 12 F1-Score Fanout (Flike) Simulation PlanetLab ModelNet (a) Survey: F1-Score June 2014A.-M. Kermarrec (Inria) Orientation (survey) News items received through a dislike forward Number of dislikes 0 1 2 3 4 Fraction of liked news 54% 31% 10% 3% 2% hat likes that did he news by nodes he dislike om users cross the ke 0 10 20 30 40 50 60 70 0 5 10 15 20 25 30 NBNodes NB Hops Forward by like Infection by like Forward by dislike Infection by dislike Figure 6: Survey (f LIKE = 5): Impact of amplication of BEEP June 2014A.-M. Kermarrec (Inria) WhatsUp versus Pub/Sub Approach Precision Recall F1-Score Pub/Sub 0.40 1.0 0.58 WhatsUp 0.47 0.83 0.60 June 2014A.-M. Kermarrec (Inria) WhatsUp versus cascading Approach Precision Recall F1-Score Cascading 0.57 0.09 0.16 WhatsUp 0.56 0.57 0.57 June 2014A.-M. Kermarrec (Inria) PRIVACY MATTERS June 2014A.-M. Kermarrec (Inria) Privacy issues During user clustering Exchange of profile in clear During item dissemination Predictive nature of the protocol Profile Obfuscation Randomized dissemination June 2014A.-M. Kermarrec (Inria) Privacy Obfuscation Does not reveal the exact profile Does not reveal the least sensitive information Randomized dissemination Flips the opinion with a given probability (pf) June 2014A.-M. Kermarrec (Inria) Structure profiles Private Profile Compact profile In clear: Full information about the interests Aggregate signatures of liked items June 2014A.-M. Kermarrec (Inria) Structure profiles Private Profile Compact profile Filter profile Item profile Obfuscated profile In clear: Full information about the interests Aggregate signatures of liked items Interests of users that like similar items Least sensitive information about interests Aggregate interests of users that liked it June 2014A.-M. Kermarrec (Inria) Obfuscation mechanism News item (received) Private profile Profiles kept locally June 2014A.-M. Kermarrec (Inria) Obfuscation mechanism News item (received) Private profile Compact profile News item (forwarded) + Profiles kept locally Profiles exchanged with others signature item profile June 2014A.-M. Kermarrec (Inria) Obfuscation mechanism News item (received) Private Profile Compact Profile Filter Profile Obfuscated ProfileNews item (forwarded) x+ Profiles kept locally Profiles exchanged with others signature item profile item profile mask of popularity System parameter June 2014A.-M. Kermarrec (Inria) Randomized dissemination Flips the opinion with a given probability (pf) Attacker could still learn from the profile Private profile contains a field with the result of the randomized decision Generate Randomized compact profile Users still use locally their non randomized profile for clustering Differentially private protocol June 2014A.-M. Kermarrec (Inria) Experimental setup Simulations and Planetlab Alternatives Cleartext profile (CT); 2DP (DP dissemination and randomized profile for clustering) Metrics Recommendation: recall/precision Privacy: Distance between obfuscated profile and real profile; Dataset: Real survey, 120 users on 200 news items (4 instances) June 2014A.-M. Kermarrec (Inria) Impact of randomization June 2014A.-M. Kermarrec (Inria) Impact of randomization Decrease of precision with increasing pf June 2014A.-M. Kermarrec (Inria) http://131.254.213.98:8080/wup/ Operational prototype Tested on 500 users @ TrentoRise last year TRY IT Take away message Personalization is needed Decentralization is healthy Gossip-based computing is one (the) way to go June 2014A.-M. Kermarrec (Inria) For those who are afraid of P2P June 2014A.-M. Kermarrec (Inria) Hybrid recommendation engine June 2014A.-M. Kermarrec (Inria) June 2014A.-M. Kermarrec (Inria) HyRec: Taking the best of both worlds Online KNN selection Restricted andidate set (k) No data stored at the client HyRec client: Javascript (widget) running in the browser June 2014A.-M. Kermarrec (Inria) June 2014A.-M. Kermarrec (Inria) View similarity June 2014A.-M. Kermarrec (Inria) Dataset Users Items Ratings MovieLens1 943 1700 100,000 MovieLens2 6,040 4000 1,000,000 MovieLens3 69,878 10,000 10,000,000 Digg 59,167 7724 782,807 Recommendation quality June 2014A.-M. Kermarrec (Inria) HyRec versus the client load Impact of HyRec Impact of the client load Negligible disruption of HyRec 50% load

Top Related