apéro rubybdx - mongodb - 8-11-2011

39
Pierre-Louis Gottfrois Bastien Murzeau Apéro Ruby Bordeaux, 8 novembre 2011

Upload: pierrerenaudin

Post on 31-May-2015

1.869 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Apéro RubyBdx - MongoDB - 8-11-2011

Pierre-Louis GottfroisBastien MurzeauApéro Ruby Bordeaux, 8 novembre 2011

Page 2: Apéro RubyBdx - MongoDB - 8-11-2011

• Brève introduction

• Cas pratique

• Map / Reduce

Page 3: Apéro RubyBdx - MongoDB - 8-11-2011

Qu’est ce que mongoDB ?

mongoDB est une base de donnée de type NoSQL,

sans schéma

document-oriented

Page 4: Apéro RubyBdx - MongoDB - 8-11-2011

sans-schéma

• Très utile en développements ‘agiles’ (itérations, rapidité de modifications, flexibilité pour les développeurs)

• Supporte des fonctionnalités qui seraient, en BDDs relationnelles :• quasi-impossible (stockage d’éléments non finis, ex. tags)

• trop complexes pour ce qu’elles sont (migrations)

Page 5: Apéro RubyBdx - MongoDB - 8-11-2011

document-oriented

• mongoDB stocke des documents, pas de rows

• les documents sont stockés sous forme de JSON; binary JSON

• la syntaxe de requêtage est aussi fournie que SQL

• le mécanisme de documents ‘embedded’ résout bon nombre de problèmes rencontrés

Page 6: Apéro RubyBdx - MongoDB - 8-11-2011

document-oriented

• Les documents sont stockés dans une collection, en RoR = model

• une partie des ces données sont indexées pour optimiser les performances

• un document n’est pas une poubelle !

Page 7: Apéro RubyBdx - MongoDB - 8-11-2011

stockage de données volumineuses

• mongoDB (et autres NoSQL) sont plus performantes pour la scalabilité horizontale

• ajout de serveurs pour augmenter la capacité de stockage («sharding»)

• garantissant ainsi une meilleur disponibilité

• load-balancing optimisé entre les nodes

• augmentation transparente pour l’application

Page 8: Apéro RubyBdx - MongoDB - 8-11-2011

Cas pratique• ORM devient ODM, la gem de référence mongoid

• ou : mongoMapper, DataMapper

• Création d’une application a base de NoSQL MongoDB

• rails new nosql

• edition du Gemfile

• gem ‘mongoid’

• gem ‘bson_ext’

• bundle install

• rails generate mongoid:config

Page 9: Apéro RubyBdx - MongoDB - 8-11-2011

Cas pratique• edition du config/application.rb

• #require 'rails/all'

• require "action_controller/railtie"

• require "action_mailer/railtie"

• require "active_resource/railtie"

• require "rails/test_unit/railtie"

Page 10: Apéro RubyBdx - MongoDB - 8-11-2011

Cas pratique

class Conversation include Mongoid::Document include Mongoid::Timestamps

field :public, :type => Boolean, :default => false

has_many :scores, :as => :scorable, :dependent => :delete has_and_belongs_to_many :subjects belongs_to :timeline embeds_many :messages

class Subject include Mongoid::Document include Mongoid::Timestamps

has_many :scores, :as => :scorable, :dependent => :delete, :autosave => true has_many :requests, :dependent => :delete belongs_to :author, :class_name => 'User'

Page 11: Apéro RubyBdx - MongoDB - 8-11-2011

Map Reduce

Page 12: Apéro RubyBdx - MongoDB - 8-11-2011

Example

{“id” : 1,“day” : 20111017,“checkout” : 100

}

{“id” : 2,“day” : 20111017,“checkout” : 42

}

{“id” : 3,“day” : 20111017,“checkout” : 215

}

{“id” : 4,“day” : 20111017,“checkout” : 73

}

A “ticket” collection

Page 13: Apéro RubyBdx - MongoDB - 8-11-2011

Problematic

• We want to

• Calculate the ‘checkout’ sum of each object in our ticket’s collection

• Be able to distribute this operation over the network

• Be fast!

• We don’t want to

• Go over all objects again when an update is made

Page 14: Apéro RubyBdx - MongoDB - 8-11-2011

Map : emit(checkout)

{“id” : 1,“day” : 20111017,“checkout” : 100

}

{“id” : 2,“day” : 20111017,“checkout” : 42

}

{“id” : 3,“day” : 20111017,“checkout” : 215

}

{“id” : 4,“day” : 20111017,“checkout” : 73

}

100 42 215 73

The ‘map’ function emit (select) every checkout value of each object in our collection

Page 15: Apéro RubyBdx - MongoDB - 8-11-2011

Reduce : sum(checkout)

{“id” : 1,“day” : 20111017,“checkout” : 100

}

{“id” : 2,“day” : 20111017,“checkout” : 42

}

{“id” : 3,“day” : 20111017,“checkout” : 215

}

{“id” : 4,“day” : 20111017,“checkout” : 73

}

100 42 215 73

142 288

430

Page 16: Apéro RubyBdx - MongoDB - 8-11-2011

Reduce function

The ‘reduce’ function apply the algorithmic logic for each key/value received from ‘map’ function

This function has to be ‘idempotent’ to be called recursively or in a distributed system

reduce(k, A, B) == reduce(k, B, A)reduce(k, A, B) == reduce(k, reduce(A, B))

Page 17: Apéro RubyBdx - MongoDB - 8-11-2011

Inherently Distributed

{“id” : 1,“day” : 20111017,“checkout” : 100

}

{“id” : 2,“day” : 20111017,“checkout” : 42

}

{“id” : 3,“day” : 20111017,“checkout” : 215

}

{“id” : 4,“day” : 20111017,“checkout” : 73

}

100 42 215 73

142 288

430

Page 18: Apéro RubyBdx - MongoDB - 8-11-2011

Distributed

Since ‘map’ function emits objects to be reduced and ‘reduce’ function processes for each emitted

objects independently, it can be distributed through multiple workers.

map reduce

Page 19: Apéro RubyBdx - MongoDB - 8-11-2011

Logaritmic Update

For the same reason, when updating an object, we don’t have to reprocess for each obejcts.

We can call ‘map’ function only on updated objects.

Page 20: Apéro RubyBdx - MongoDB - 8-11-2011

Logaritmic Update

{“id” : 1,“day” : 20111017,“checkout” : 100

}

{“id” : 2,“day” : 20111017,“checkout” : 42

}

{“id” : 3,“day” : 20111017,“checkout” : 210

}

{“id” : 4,“day” : 20111017,“checkout” : 73

}

100 42 215 73

142 288

430

Page 21: Apéro RubyBdx - MongoDB - 8-11-2011

Logaritmic Update

{“id” : 1,“day” : 20111017,“checkout” : 100

}

{“id” : 2,“day” : 20111017,“checkout” : 42

}

{“id” : 3,“day” : 20111017,“checkout” : 210

}

{“id” : 4,“day” : 20111017,“checkout” : 73

}

100 42 210 73

142 288

430

Page 22: Apéro RubyBdx - MongoDB - 8-11-2011

Logaritmic Update

{“id” : 1,“day” : 20111017,“checkout” : 100

}

{“id” : 2,“day” : 20111017,“checkout” : 42

}

{“id” : 3,“day” : 20111017,“checkout” : 210

}

{“id” : 4,“day” : 20111017,“checkout” : 73

}

100 42 210 73

142 283

430

Page 23: Apéro RubyBdx - MongoDB - 8-11-2011

Logarithmic Update

{“id” : 1,“day” : 20111017,“checkout” : 100

}

{“id” : 2,“day” : 20111017,“checkout” : 42

}

{“id” : 3,“day” : 20111017,“checkout” : 210

}

{“id” : 4,“day” : 20111017,“checkout” : 73

}

100 42 210 73

142 283

425

Page 24: Apéro RubyBdx - MongoDB - 8-11-2011

Let’s do some code!

Page 25: Apéro RubyBdx - MongoDB - 8-11-2011

$> mongo

> db.tickets.save({ "_id": 1, "day": 20111017, "checkout": 100 })> db.tickets.save({ "_id": 2, "day": 20111017, "checkout": 42 })> db.tickets.save({ "_id": 3, "day": 20111017, "checkout": 215 })> db.tickets.save({ "_id": 4, "day": 20111017, "checkout": 73 })

> db.tickets.count()4

> db.tickets.find(){ "_id" : 1, "day" : 20111017, "checkout" : 100 }...

> db.tickets.find({ "_id": 1 }){ "_id" : 1, "day" : 20111017, "checkout" : 100 }

Page 26: Apéro RubyBdx - MongoDB - 8-11-2011

> var map = function() {... emit(null, this.checkout)}

> var reduce = function(key, values) {... var sum = 0... for (var index in values) sum += values[index]... return sum}

Page 27: Apéro RubyBdx - MongoDB - 8-11-2011

Temporary Collection> sumOfCheckouts = db.tickets.mapReduce(map, reduce){ "result" : "tmp.mr.mapreduce_123456789_4", "timeMills" : 8, "counts" : { "input" : 4, "emit" : 4, "output" : 1 }, "ok" : 1}

> db.getCollectionNames()[ "tickets", "tmp.mr.mapreduce_123456789_4"]

> db[sumOfCheckouts.result].find(){ "_id" : null, "value" : 430 }

Page 28: Apéro RubyBdx - MongoDB - 8-11-2011

Persistent Collection> db.tickets.mapReduce(map, reduce, { "out" : "sumOfCheckouts" })

> db.getCollectionNames()[ "sumOfCheckouts", "tickets", "tmp.mr.mapreduce_123456789_4"]

> db.sumOfCheckouts.find(){ "_id" : null, "value" : 430 }

> db.sumOfCheckouts.findOne().value430

Page 29: Apéro RubyBdx - MongoDB - 8-11-2011

Reduce by Date

Page 30: Apéro RubyBdx - MongoDB - 8-11-2011

> var map = function() {... emit(this.date, this.checkout)}

> var reduce = function(key, values) {... var sum = 0... for (var index in values) sum += values[index]... return sum}

Page 31: Apéro RubyBdx - MongoDB - 8-11-2011

> db.tickets.mapReduce(map, reduce, { "out" : "sumOfCheckouts" })

> db.sumOfCheckouts.find(){ "_id" : 20111017, "value" : 430 }

Page 32: Apéro RubyBdx - MongoDB - 8-11-2011

What we can do

Page 33: Apéro RubyBdx - MongoDB - 8-11-2011

Scored Subjects per User

Subject User Score

1 1 2

1 1 2

1 2 2

2 1 2

2 2 10

2 2 5

Page 34: Apéro RubyBdx - MongoDB - 8-11-2011

Scored Subjects per User (reduced)

Subject User Score

1 1 4

1 2 2

2 1 2

2 2 15

Page 35: Apéro RubyBdx - MongoDB - 8-11-2011

$> mongo

> db.scores.save({ "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 })> db.scores.save({ "_id": 2, "subject_id": 1, "user_id": 1, "score": 2 })> db.scores.save({ "_id": 3, "subject_id": 1, "user_id": 2, "score": 2 })> db.scores.save({ "_id": 4, "subject_id": 2, "user_id": 1, "score": 2 })> db.scores.save({ "_id": 5, "subject_id": 2, "user_id": 2, "score": 10 })> db.scores.save({ "_id": 6, "subject_id": 2, "user_id": 2, "score": 5 })

> db.scores.count()6

> db.scores.find(){ "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 }...

> db.scores.find({ "_id": 1 }){ "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 }

Page 36: Apéro RubyBdx - MongoDB - 8-11-2011

> var map = function() {... emit([this.user_id, this.subject_id].join("-"), {subject_id:this.subject_id,... user_id:this.user_id, score:this.score});}

> var reduce = function(key, values) {... var result = {user_id:"", subject_id:"", score:0};... values.forEach(function (value) {result.score += value.score;result.user_id = ... value.user_id;result.subject_id = value.subject_id;});... return result}

Page 37: Apéro RubyBdx - MongoDB - 8-11-2011

ReducedScores Collection

> db.scores.mapReduce(map, reduce, { "out" : "reduced_scores" })

> db.getCollectionNames()[ "reduced_scores", "scores"]

> db.reduced_scores.find(){ "_id" : "1-1", "value" : { "user_id" : 1, "subject_id" : 1, "score" : 4 } }{ "_id" : "1-2", "value" : { "user_id" : 1, "subject_id" : 2, "score" : 2 } }{ "_id" : "2-1", "value" : { "user_id" : 2, "subject_id" : 1, "score" : 2 } }{ "_id" : "2-2", "value" : { "user_id" : 2, "subject_id" : 2, "score" : 15 } }

> db.reduced_scores.findOne().score4

Page 38: Apéro RubyBdx - MongoDB - 8-11-2011

Dealing with Rails Query

ruby-1.9.2-p180 :007 > ReducedScores.first => #<ReducedScores _id: 1-1, _type: nil, value: {"user_id"=>BSON::ObjectId('...'), "subject_id"=>BSON::ObjectId('...'), "score"=>4.0}>

ruby-1.9.2-p180 :008 > ReducedScores.where("value.user_id" => u1.id).count => 2

ruby-1.9.2-p180 :009 > ReducedScores.where("value.user_id" => u1.id).first.value['score'] => 4.0

ruby-1.9.2-p180 :010 > ReducedScores.where("value.user_id" => u1.id).last.value['score'] => 2.0

Page 39: Apéro RubyBdx - MongoDB - 8-11-2011

Questions ?