random generation of relational bayesian networks
DESCRIPTION
Only the title page (génération aléatoire de réseaux Bayésiens relationnels) is in French Presentation during JFRB’14 25-27 juin, IHP, Paris, FranceTRANSCRIPT
PRMs Random generation Population Conclusion & ongoing work
Génération aléatoire de réseaux Bayésiensrelationnels
Mouna Ben Ishak1,2, Philippe Leray2 and Nahla Ben Amor1
1 Laboratoire de Recherche Opérationnelle de Décision et de Contrôle deProcessus (LARODEC), ISG Tunis, Tunisie
2 Laboratoire d’Informatique de Nantes Atlantique (LINA), UMR CNRS 6241,Université de Nantes, France
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 1/27
PRMs Random generation Population Conclusion & ongoing work
Motivation (1/3)
f1
f2
f3
… fm
x1
v1
v3
v2
… v1
x2
v2 v1 V3 … v1
x3
v1 v2 v3 … v2
… … … … … …
xn
v1 v3 v2 … v1
Learned model
Features
Observ
atio
ns
Training
set
Learning
algorithm
Flat data representation
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 2/27
PRMs Random generation Population Conclusion & ongoing work
Motivation (2/3)
Presentation
Data Data Business logic Data
Relational
representation!!!
How to use this data with classical machine learning algorithms?
Presentation
Data Data Business logic Data
Relationa
l
represent
ation!!!
How to use relational data with classical machine learning algorithms?
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 3/27
PRMs Random generation Population Conclusion & ongoing work
Motivation (3/3)
PropositionalizationIt has been shown that propositionalization is not alwaysappropriate to perform learning in relational domains (Maier etal., 10)
Relational transitionExtend classical machine learning techniques in the context ofrelational data representation
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 4/27
PRMs Random generation Population Conclusion & ongoing work
Motivation (3/3)
PropositionalizationIt has been shown that propositionalization is not alwaysappropriate to perform learning in relational domains (Maier etal., 10)
Relational transitionExtend classical machine learning techniques in the context ofrelational data representation
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 4/27
PRMs Random generation Population Conclusion & ongoing work
Outline ...
1. PRMs2. Random generation
2.1. Relational schema random generation2.2. PRM random generation
3. Population4. Conclusion & ongoing work
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 5/27
PRMs Random generation Population Conclusion & ongoing work
Bayesian networks (BN) (Pearl, 85)
Definition
G qualitative description ofconditional dependences/ independencesbetween variablesdirected acyclic graph(DAG)
Θ quantitative descriptionof these dependencesconditional probabilitydistributions (CPDs)
Gender
Occupation
0.50.30.2High,M
0.20.50.3High,F
00.10.9Middle,M
0.40.40.2Middle,F
0.20.50.3Low,M
0.40.10.5Low,F
Oc3Oc2Oc1
OccupationAge
Gender
0.30.30.4
High Middle Low
Age
AgeGender
0.60.4
FM
Gender
0.30.30.4
High Middle Low
Age
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 6/27
PRMs Random generation Population Conclusion & ongoing work
BN structure learning
Constraint-based methodsBN = independence model⇒ find cond. indep. (CI) in data in order to build the DAGex : IC (Pearl & Verma, 91), PC (Spirtes et al., 93)problem : reliability of CI statistical tests (ok for n < 100)
Score-based methodsBN = probabilistic model that must fit data as well aspossibleproblem : size of search space (ok for n < 1000)
Hybrid/ local search methodslocal search / neighbor identification (statistical tests)global (score) optimizationusually for scalability reasons (ok for high n)ex : MMHC algorithm (Tsamardinos et al., 06)
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 7/27
PRMs Random generation Population Conclusion & ongoing work
BN structure learning
Constraint-based methodsBN = independence modelproblem : reliability of CI statistical tests (ok for n < 100)
Score-based methodsBN = probabilistic model that must fit data as well aspossible⇒ search the DAG space in order to maximize a scoringfunctionex : Maximum Weighted Spanning Tree (Chow & Liu, 68),Greedy Search (Chickering, 95), evolutionary approaches(Larranaga et al., 96) (Wang & Yang, 10)problem : size of search space (ok for n < 1000)
Hybrid/ local search methodslocal search / neighbor identification (statistical tests)global (score) optimizationusually for scalability reasons (ok for high n)ex : MMHC algorithm (Tsamardinos et al., 06)
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 7/27
PRMs Random generation Population Conclusion & ongoing work
BN structure learning
Constraint-based methodsBN = independence modelproblem : reliability of CI statistical tests (ok for n < 100)
Score-based methodsBN = probabilistic model that must fit data as well aspossibleproblem : size of search space (ok for n < 1000)
Hybrid/ local search methodslocal search / neighbor identification (statistical tests)global (score) optimizationusually for scalability reasons (ok for high n)ex : MMHC algorithm (Tsamardinos et al., 06)Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 7/27
PRMs Random generation Population Conclusion & ongoing work
Evaluating structure learning algorithms
Standard practicegenerating data from a reference modelapplying a structure learning algorithm with this datacomparing the learned and reference models
Which reference model ?existence of reference benchmarks (e.g., Asia, Alarm, ...).randomly generated models (Ide et al., 04)arbitrarily large BN by tiling (Tsamardinos et al., 06)
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 8/27
PRMs Random generation Population Conclusion & ongoing work
Evaluating structure learning algorithms
Standard practicegenerating data from a reference modelapplying a structure learning algorithm with this datacomparing the learned and reference models
Which reference model ?existence of reference benchmarks (e.g., Asia, Alarm, ...).randomly generated models (Ide et al., 04)arbitrarily large BN by tiling (Tsamardinos et al., 06)
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 8/27
PRMs Random generation Population Conclusion & ongoing work
Which kind of data ?
BN learning from data... but which kind of data ?
how to deal with structured data ?
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 9/27
PRMs Random generation Population Conclusion & ongoing work
Which kind of data ?
BN learning from data... but which kind of data ?how to deal with structured data ?
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 9/27
PRMs Random generation Population Conclusion & ongoing work
Relational schema
Movie
User
Vote
Movie
User
Rating
Gender
Age
OccupationRealiseDate
Genre
A relational schema Rclasses + relational variables
reference slots (e.g.,Vote.Movie,Vote.User )slot chain = a sequence ofreference slots
allow to walk in the relationalschema to create new variablesex : Vote.User .User−1.Movie :all the movies voted by aparticular user
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 10/27
PRMs Random generation Population Conclusion & ongoing work
Relational schema
Movie
User
Vote
Movie
User
Rating
Gender
Age
OccupationRealiseDate
Genre
A relational schema Rclasses + relational variablesreference slots (e.g.,Vote.Movie,Vote.User )
slot chain = a sequence ofreference slots
allow to walk in the relationalschema to create new variablesex : Vote.User .User−1.Movie :all the movies voted by aparticular user
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 10/27
PRMs Random generation Population Conclusion & ongoing work
Relational schema
Movie
User
Vote
Movie
User
Rating
Gender
Age
OccupationRealiseDate
Genre
A relational schema Rclasses + relational variablesreference slots (e.g.,Vote.Movie,Vote.User )slot chain = a sequence ofreference slots
allow to walk in the relationalschema to create new variables
ex : Vote.User .User−1.Movie :all the movies voted by aparticular user
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 10/27
PRMs Random generation Population Conclusion & ongoing work
Relational schema
Movie
User
Vote
Movie
User
Rating
Gender
Age
OccupationRealiseDate
Genre
A relational schema Rclasses + relational variablesreference slots (e.g.,Vote.Movie,Vote.User )slot chain = a sequence ofreference slots
allow to walk in the relationalschema to create new variablesex : Vote.User .User−1.Movie :all the movies voted by aparticular user
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 10/27
PRMs Random generation Population Conclusion & ongoing work
Probabilistic Relational Models
(Koller & Pfeffer, 98)
DefinitionA PRM Π associated to R :
a qualitative dependencystructure S (with possiblelong slot chains andaggregation functions)a set of parameters θS
Vote
Rating
MovieUser
RealiseDate
Genre
AgeGender
Occupation
0.60.4
FM
User.Gender
0.40.6Comedy, F
0.50.5Comedy, M
0.10.9Horror, F
0.80.2Horror, M
0.70.3Drama, F
0.50.5Drama, M
HighLow
Votes.RatingMovie.Genre
User.G
ender
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 11/27
PRMs Random generation Population Conclusion & ongoing work
Probabilistic Relational Models
Definition
Vote
Rating
MovieUser
RealiseDate
Genre
AgeGender
Occupation
0.60.4
FM
User.Gender
0.40.6Comedy, F
0.50.5Comedy, M
0.10.9Horror, F
0.80.2Horror, M
0.70.3Drama, F
0.50.5Drama, M
HighLow
Votes.RatingMovie.Genre
User.G
ender
Aggregators
Vote.User .User−1.Movie.genre → Vote.ratingmovie rating from one user can be dependent with thegenre of all the movies voted by this user
how to describe the dependency with an unknown numberof parents ?solution : using an aggregated value, e.g. γ = MODE
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 11/27
PRMs Random generation Population Conclusion & ongoing work
Ground Bayesian Network
GBNBN created from onePRM and aninstantiateddatabase= relational skeleton
+ probabilisticdependenciesused for probabilisticinference
Age
Rating
Age
Gender
Occupation
Age
Gender
Occupation
Gender
Occupation
Genre
RealiseDate
Genre
Genre
Genre
Genre
U1
U2
U3
M1
M2
M3
M4
M5
#U1, #M1
Rating
#U1, #M2
Rating
#U2, #M1
Rating
#U2, #M3
Rating
#U2, #M4
Rating
#U3, #M1
Rating
#U3, #M2
Rating
#U3, #M3
Rating
#U3, #M5
RealiseDate
RealiseDate
RealiseDate
RealiseDate
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 12/27
PRMs Random generation Population Conclusion & ongoing work
Ground Bayesian Network
GBNBN created from onePRM and aninstantiateddatabase= relational skeleton+ probabilisticdependenciesused for probabilisticinference
Age
Rating
Age
Gender
Occupation
Age
Gender
Occupation
Gender
Occupation
Genre
RealiseDate
Genre
Genre
Genre
Genre
U1
U2
U3
M1
M2
M3
M4
M5
#U1, #M1
Rating
#U1, #M2
Rating
#U2, #M1
Rating
#U2, #M3
Rating
#U2, #M4
Rating
#U3, #M1
Rating
#U3, #M2
Rating
#U3, #M3
Rating
#U3, #M5
RealiseDate
RealiseDate
RealiseDate
RealiseDate
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 12/27
PRMs Random generation Population Conclusion & ongoing work
PRM structure learning
Constraint-based methodsrelational PC (Maier et al., 10) relational CD (Maier et al.,13)don’t deal with aggregation functions
Score-based methods
Hybrid methods
Critics - previous works
Propositiona synthetic approach to randomly generate and populate PRMsand databases
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 13/27
PRMs Random generation Population Conclusion & ongoing work
PRM structure learning
Constraint-based methods
Score-based methodsgreedy search (Getoor et al., 07)
Hybrid methods
Critics - previous works
Propositiona synthetic approach to randomly generate and populate PRMsand databases
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 13/27
PRMs Random generation Population Conclusion & ongoing work
PRM structure learning
Constraint-based methods
Score-based methods
Hybrid methodsrelational MMHC (Ben Ishak et al., in progress)
Critics - previous works
Propositiona synthetic approach to randomly generate and populate PRMsand databases
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 13/27
PRMs Random generation Population Conclusion & ongoing work
PRM structure learning
Constraint-based methods
Score-based methods
Hybrid methods
Critics - previous workslack of evaluation process, in a common frameworkabsence of relational benchmarks for evaluation algorithmsabsence of relational data generation process
Propositiona synthetic approach to randomly generate and populate PRMsand databases
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 13/27
PRMs Random generation Population Conclusion & ongoing work
PRM structure learning
Constraint-based methods
Score-based methods
Hybrid methods
Critics - previous works
Propositiona synthetic approach to randomly generate and populate PRMsand databases
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 13/27
PRMs Random generation Population Conclusion & ongoing work
PRMs random generation
Related work(Maier et al., 10, 13)
relational schemas are generated as tree structure ... toosimple
(Wuillemin et al., 12)object-oriented paradigm rather than relational oneno population nor interaction with a relational database
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 14/27
PRMs Random generation Population Conclusion & ongoing work
The overall process
DB instanceDB instance
PRMPRM
Instantiate
Sample
Relational SchemaRelational Schema Probabilistic dependenciesProbabilistic dependencies
Ground BNGround BNRelational SkeletonRelational Skeleton Probabilistic dependenciesProbabilistic dependencies
Model generation
Instance generation
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 15/27
PRMs Random generation Population Conclusion & ongoing work
The overall platform
RDB
Visualization
InferenceLearning PRM
PRM API
Parameters learning Structure learning
+
score-based
+
constraint-based
+
Hybrid
Statistical learning
+
Bayesian learning
Benchmarking
Evaluation
+
FIGURE: PRM API under the PILGRIM platform
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 16/27
PRMs Random generation Population Conclusion & ongoing work
Outline ...
1. PRMs2. Random generation
2.1. Relational schema random generation2.2. PRM random generation
3. Population4. Conclusion & ongoing work
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 17/27
PRMs Random generation Population Conclusion & ongoing work
Generating the relational schema
Hypotheseswith respect to the relational model definition (Date, 08) :avoid referential cycles when generating constraints∀Xi ,Xi ∈ X there exist a referential path from Xi to Xj :searching for DAG structures with a single connectedcomponent
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 18/27
PRMs Random generation Population Conclusion & ongoing work
Example
Clazz0
Clazz1
Clazz2
Clazz3n1
n0
n2n3
Clazz0
Clazz1
Clazz2
Clazz3att0att1
att0att1att2
att0att1att2att3
att0
Clazz0
Clazz1
Clazz2
Clazz3
clazz0id
clazz1idclazz3id
clazz2id
clazz1id
clazz0id
clazz3id
clazz2id
Clazz0
Clazz1
Clazz2
Clazz3att0att1
att0att1att2
att0att1att2att3
att0
clazz1id
clazz0id
clazz3id
clazz2id
#clazz0fkatt03
#claszz1fkatt13
#clazz2fkatt23
#clazz1fkatt12
#clazz1fkatt10
G
generate primary keys
generate attributes
generate foreign keys
generate foreign keys
1
2
3
3
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 19/27
PRMs Random generation Population Conclusion & ongoing work
Generating the PRM
Goalrandomly generating probabilistic dependencies Sbetween the attributes of the classes structuresampling CPDs like for usual BNs
Hypothesis
Principle
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 20/27
PRMs Random generation Population Conclusion & ongoing work
Generating the PRM
Goal
Hypothesisthe dependency structure S should be a DAGone descriptive attribute is dependent with another one,but with which slot chain ?we need a user-defined maximum slot chain length Kmax
Principle
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 20/27
PRMs Random generation Population Conclusion & ongoing work
Generating the PRM
Goal
Hypothesis
Principlestep I : add dependencies while keeping a DAG structure,first into classes, then intra classesstep II : random choice of a legal slot chain weighted by itslength
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 20/27
PRMs Random generation Population Conclusion & ongoing work
Example
Clazz0
Clazz1
Clazz2
Clazz3
att0
att1
att0
att0
att0
clazz1fkatt10
clazz0fkatt03
claszz1fkatt13
clazz2fkatt23
clazz1fkatt12
att2
att1
att3
att1
att2
[Clazz0.clazz1fkatt10]
[Clazz2.clazz1fkatt12]
[Calzz2.clazz2fka
tt23-1 ]
MODE
[Clazz2.clazz1fkatt12. clazz1fkatt12-1]
MODE
[Ca
lzz2.clazz2fka
tt23-1.
claszz1fka
tt13. clazz1fka
tt10-1]
MODE
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 21/27
PRMs Random generation Population Conclusion & ongoing work
Outline ...
1. PRMs2. Random generation
2.1. Relational schema random generation2.2. PRM random generation
3. Population4. Conclusion & ongoing work
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 22/27
PRMs Random generation Population Conclusion & ongoing work
GBN creation and sampling
Generating the relational skeleton
by generating a random number of objects per classadding links between objects : all referencing classes havetheir generated objects related to objects from referencedclasses
Creating the GBN
Populating the database
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 23/27
PRMs Random generation Population Conclusion & ongoing work
GBN creation and sampling
Generating the relational skeleton
Creating the GBN
the GBN is constructed by using the CPDs already definedby the PRM
Populating the database
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 23/27
PRMs Random generation Population Conclusion & ongoing work
GBN creation and sampling
Generating the relational skeleton
Creating the GBN
Populating the database
sampling from the GBN
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 23/27
PRMs Random generation Population Conclusion & ongoing work
Example
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 24/27
PRMs Random generation Population Conclusion & ongoing work
Outline ...
1. PRMs2. Random generation
2.1. Relational schema random generation2.2. PRM random generation
3. Population4. Conclusion & ongoing work
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 25/27
PRMs Random generation Population Conclusion & ongoing work
Conclusion - Perspectives
Conclusionwe proposed one process to randomly generate PRMs andinstantiate them to populate a relational database
Ongoing work
propose a new approach to learn PRM structure fromrelational datacomparing it with existing state-of-the-art approaches, withdatabases using our random generation processextend our generation approach to address other relationalprobabilistic graphical models (e.g., DAPER)
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 26/27
PRMs Random generation Population Conclusion & ongoing work
Conclusion - Perspectives
Conclusionwe proposed one process to randomly generate PRMs andinstantiate them to populate a relational database
Ongoing work
propose a new approach to learn PRM structure fromrelational datacomparing it with existing state-of-the-art approaches, withdatabases using our random generation processextend our generation approach to address other relationalprobabilistic graphical models (e.g., DAPER)
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 26/27
A suivre :-)Jeudi 9h30 - Ghada Trabelsi -Evaluation des algosd’apprentissage de structure desRB dynamiquesJeudi 10h - Anthony Coutant -Apprentissage d’une extensiondes PRMVendredi 10h30 - MarouaHaddad - Apprentissage desréseaux possibilistes
DDonnéesData
UtilisateursUser
UConnaissancesKnowledge
Ke
A suivre :-)Jeudi 9h30 - Ghada Trabelsi -Evaluation des algosd’apprentissage de structure desRB dynamiquesJeudi 10h - Anthony Coutant -Apprentissage d’une extensiondes PRMVendredi 10h30 - MarouaHaddad - Apprentissage desréseaux possibilistes
Any question ?