anne-marie kermarrec – scalable personalization infrastructures

49
Scalable infrastructures for personalization Anne-Marie Kermarrec

Upload: yandex

Post on 23-Aug-2014

450 views

Category:

Science


5 download

DESCRIPTION

The ever-growing amount of data available on the Internet calls for personalization. Yet, the most effective personalization schemes, such as those based on collaborative filtering (CF), are notoriously resource greedy. We argue that scalable infrastructures should rely on P2P design to scale to that increasing number of users, data and dynamics. In this talk, I will present a scalable algorithm for collaborative filtering, which P2P flavor provides scalability by design. This scheme can been instanciated in various settings: P2P, hybrid infrastructure offloading CPU-intensive recommendation tasks to front-end client browsers, while retaining storage and orchestration tasks within back-end servers, as well as centralized infrastructures. As personalization relies on users giving away information, I will also present the potential encountered privacy issues and a range of solutions to preserve users privacy in such systems.

TRANSCRIPT

Page 1: Anne-Marie Kermarrec – Scalable personalization infrastructures

Scalable infrastructures for personalization

Anne-Marie Kermarrec

Page 2: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Inria, France: 8 research centers, 150 research teams

June 2014

Page 3: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria) June 2014

Page 4: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

A cry for personalization

June 2014

Page 5: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Why is personalization so difficult?

• Huge volume of data: small portion of interest

• Dynamic interests

• Interesting stuff does not come always from friends

• Classical notification systems do not filter enough or too much

Scalable personalization infrastructures

June 2014

Page 6: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

KNN computation over large data

Basic building block for many applications

• Similarity search

• Machine learning

• Data mining

• Image processing

• Collaborative filtering

June 2014

Page 7: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

KNN-based user-centric collaborative filtering

Provide each user with her k closest neighbors

(Users owns a profile, the system has its favorite similarity metric)

Use this topology for

• personalized notifications

• recommendation

AliceBob

Carl

Dave

Ellie

June 2014

Page 8: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Dealing with truly big data

Want to scale? Think P2P

June 2014

Page 9: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Do not look exhaustively

June 2014

Page 10: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

The key to scalability in KNN graph construction

Consider a partial set of candidates

Sampling-based approach

June 2014

Page 11: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

P2P KNN graph construction

Which nodes are close?

How to discover them?

Similarity metric

Sampling

June 2014

Page 12: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Which nodes are close?

Model

U(sers) × I(tems) (items)

Profile(u) = vector of liked/shared/viewed items

Cosine similarity metric Jaccard metric

Minimal information: no tag, no user’s input, generic June 2014

Page 13: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Each node maintains a set of neighbors (c entries)

Peer exchange

ShuffleP Q

How to discover them: Gossip-based computing

Result random graph

Highly resilient against churn, partition

Small diameter

[JGKVV, ACM TOCS 2007]

June 2014

Page 14: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

KNN construction

Similarity computation

exchange ofneighbors lists

neighborhoodoptimization1 2

Alice Bob

Carl

DaveEllie

Frank

June 2014

Page 15: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Decentralized KNN selection

[FGKL Middleware 2010]

RPS layer providing random sampling

KPS clustering layer gossip-based topology clustering

Interest-based linkRandom link

AliceBob

Carl

Dave

Ellie

AliceBob

Carl

Dave

Ellie

June 2014

Page 16: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Convergence

Cycles

c current neighbors versus the c closest

Biasedsampling

Random sampling

June 2014

Page 17: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Applications

- Decentralized news recommendation [BFGJK, IPDPS

2013]

- Top-K [BGKL, ACM TODS 2011] [BGK, ACM TOIT 2014]

- Geo recommendation [BKKT, ICDCS 2012]

June 2014

Page 18: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

DECENTRALIZED NEWS RECOMMENDER

Notification is taking over

June 2014

Page 19: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

What’s wrong with news feed

Interest are dynamic

Wrong granularity for filtering of classical notification systems

Small portion of the available information is of interest

Interesting stuff does not come always from friends

An implicit notification system

based on collaborative filtering

June 2014

Page 20: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

WhatsUp in a nutshell

KNN selection

Dissemination

June 2014

Page 21: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Dissemination: orientation and amplification

Orientation: to whom?

Exploit: ForwardTo friends

Explore: Forward to random users

Amplification: to how many?

Increase Fanout(Log(n))

DecreaseFanout(1)

June 2014

Page 22: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

WhatsUp in action on the survey (480 users)

Precision Recall F1-Score Messages

Gossip (f=4) 0.34 0.99 0.51 2.3 M

Cosine-CF 0.50 0.65 0.57 5,9k

Whatsup (f=10)

0.471 0.83 0.60 2,4k

June 2014

Page 23: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Orientation (survey)

News items received through a dislike forward

Number of dislikes

0 1 2 3 4

Fraction of liked news

54% 31% 10% 3% 2%

June 2014

Page 24: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

WhatsUp versus Pub/Sub

Approach Precision Recall F1-Score

Pub/Sub 0.40 1.0 0.58

WhatsUp 0.47 0.83 0.60

June 2014

Page 25: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

WhatsUp versus cascading

Approach Precision Recall F1-ScoreCascading 0.57 0.09 0.16WhatsUp 0.56 0.57 0.57

June 2014

Page 26: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

PRIVACY MATTERS

June 2014

Page 27: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Privacy issues

During user clustering• Exchange of profile in clear

During item dissemination• Predictive nature of the protocol

Profile Obfuscation

Randomized dissemination

June 2014

Page 28: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Privacy

Obfuscation• Does not reveal the exact profile• Does not reveal the least sensitive information

Randomized dissemination• Flips the opinion with a given probability (pf)

June 2014

Page 29: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Structure profiles

Private Profile

Compact profile

In clear: Full information about the interests

Aggregate signatures of liked items

June 2014

Page 30: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Structure profiles

Private Profile

Compact profile

Filter profile

Item profile

Obfuscated profile

In clear: Full information about the interests

Aggregate signatures of liked items

Interests of users that like similar items

Least sensitive information about interests

Aggregate interests of users that liked it

June 2014

Page 31: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Obfuscation mechanism

News item(received)

Private profile

Profiles kept locally

June 2014

Page 32: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Obfuscation mechanism

News item(received)

Private profile Compact profile

News item(forwarded)

+

Profiles kept locally

Profiles exchanged with others

signatureitem

profile

June 2014

Page 33: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Obfuscation mechanism

News item(received)

Private Profile Compact Profile Filter Profile

Obfuscated ProfileNews item(forwarded)

x+

Profiles kept locally

Profiles exchanged with others

signatureitem

profileitem

profile

mask ofpopularity

System parameter

June 2014

Page 34: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Randomized dissemination

Flips the opinion with a given probability (pf)• Attacker could still learn from the profile• Private profile contains a field with the result of the randomized

decision

Generate Randomized compact profile• Users still use locally their non randomized profile for clustering• Differentially private protocol

June 2014

Page 35: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Experimental setup

Simulations and PlanetlabAlternatives• Cleartext profile (CT); • 2DP (DP dissemination and randomized profile for clustering)Metrics• Recommendation: recall/precision• Privacy: Distance between obfuscated profile and real profile; Dataset: Real survey, 120 users on 200 news items (4 instances)

June 2014

Page 36: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Impact of randomization

June 2014

Page 37: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Impact of randomization

Decrease of precision with increasing pf

June 2014

Page 38: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

http://131.254.213.98:8080/wup/

Operational prototype

Tested on 500 users @ TrentoRise last year

TRY IT

Take away message

Personalization is needed

Decentralization is healthy

Gossip-based computing is one (the) way to go

June 2014

Page 39: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

For those who are afraid of P2P

June 2014

Page 40: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Hybrid recommendation engine

June 2014

Page 41: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria) June 2014

Page 42: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

HyRec: Taking the best of both worlds

Online KNN selectionRestricted andidate set (k)No data stored at the client

HyRec client: Javascript (widget) running in the browser

June 2014

Page 43: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria) June 2014

Page 44: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

View similarity

June 2014

Dataset Users Items RatingsMovieLens1 943 1700 100,000

MovieLens2 6,040 4000 1,000,000

MovieLens3 69,878 10,000 10,000,000

Digg 59,167 7724 782,807

Page 45: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

Recommendation quality

June 2014

Page 46: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

HyRec versus the client load

Impact of HyRec Impact of the client load

Negligible disruption of HyRec 50% load<60ms on smartphone<10ms on laptop

June 2014

Page 47: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

HyRec versus a centralized recommender

Impact of the request stressImpact of the profile size

June 2014

Page 48: Anne-Marie Kermarrec – Scalable personalization infrastructures

A.-M. Kermarrec (Inria)

To take away

Personalization is crucial

P2P in a design mindset

Randomization and obfuscation provides a good tradeoff between privacy and quality

June 2014

Page 49: Anne-Marie Kermarrec – Scalable personalization infrastructures

Thank you