thèse utilisation des ontologies contextuelles pour le partage

N° d’ordre 04-ISAL-0094 Année 2004

Thèse

Utilisation des ontologies contextuelles pour le partage sémantique entre les systèmes

d’information dans l’entreprise

Présentée devant L’Institut National des Sciences Appliquées de Lyon

Pour obtenir Le Grade de Docteur

École doctorale : Informatique et Information pour la société (EDIIS-EDA 335) Spécialité : Documents Multimédia, Images et Systèmes D’Information Communicants (DISIC)

Par RAMI DIB RIFAIEH

(DEA en Informatique)

Soutenue le 20 Décembre 2004 devant la Commission d’examen

Jury

Michel Schneider Professeur (Université Blaise Pascal, Clermont-Ferrand), Rapporteur Jean-Pierre Giraudin Professeur (lab. LSR, IMAG, Grenoble), Rapporteur Jacky Akoka Professeur (Conservatoire National des Arts et Métiers, Paris), Examinateur Jacques Kouloumdjian Professeur Émérite (lab. LIRIS, INSA de Lyon), Directeur de thèse Aîcha-Nabila Benharkat Maître de Conférence (lab. LIRIS, INSA de Lyon), Directeur de thèse André Bonnevialle Directeur R&D (TESSI Informatique et Conseil, St Etienne), Examinateur

Laboratoire d'InfoRmatique en Images et Systèmes d'information- FRE 2672 CNRS

Institut National des Sciences Appliquées de Lyon

TToo mmyy MMootthheerr,,

AA mmaa MMèèrree,,

MAI 2003

INSTITUT NATIONAL DES SCIENCES APPLIQUEES DE LYON

Directeur : STORCK A. Professeurs : AUDISIO S. PHYSICOCHIMIE INDUSTRIELLE

BABOT D. CONT. NON DESTR. PAR RAYONNEMENTS IONISANTS

BABOUX J.C. GEMPPM***

BALLAND B. PHYSIQUE DE LA MATIERE

BAPTISTE P. PRODUCTIQUE ET INFORMATIQUE DES SYSTEMES MANUFACTURIERS

BARBIER D. PHYSIQUE DE LA MATIERE

BASTIDE J.P. LAEPSI****

BAYADA G. MECANIQUE DES CONTACTS

BENADDA B. LAEPSI****

BETEMPS M. AUTOMATIQUE INDUSTRIELLE

BIENNIER F. PRODUCTIQUE ET INFORMATIQUE DES SYSTEMES MANUFACTURIERS

BLANCHARD J.M. LAEPSI****

BOISSON C. VIBRATIONS-ACOUSTIQUE

BOIVIN M. (Prof. émérite) MECANIQUE DES SOLIDES

BOTTA H. UNITE DE RECHERCHE EN GENIE CIVIL - Développement Urbain

BOTTA-ZIMMERMANN M. (Mme) UNITE DE RECHERCHE EN GENIE CIVIL - Développement Urbain

BOULAYE G. (Prof. émérite) INFORMATIQUE

BOYER J.C. MECANIQUE DES SOLIDES

BRAU J. CENTRE DE THERMIQUE DE LYON - Thermique du bâtiment

BREMOND G. PHYSIQUE DE LA MATIERE

BRISSAUD M. GENIE ELECTRIQUE ET FERROELECTRICITE

BRUNET M. MECANIQUE DES SOLIDES

BRUNIE L. INGENIERIE DES SYSTEMES D’INFORMATION

BUREAU J.C. CEGELY*

CAVAILLE J.Y. GEMPPM***

CHANTE J.P. CEGELY*- Composants de puissance et applications

CHOCAT B. UNITE DE RECHERCHE EN GENIE CIVIL - Hydrologie urbaine

COMBESCURE A. MECANIQUE DES CONTACTS

COUSIN M. UNITE DE RECHERCHE EN GENIE CIVIL - Structures

DAUMAS F. (Mme) CENTRE DE THERMIQUE DE LYON - Energétique et Thermique

DOUTHEAU A. CHIMIE ORGANIQUE

DUFOUR R. MECANIQUE DES STRUCTURES

DUPUY J.C. PHYSIQUE DE LA MATIERE

EMPTOZ H. RECONNAISSANCE DE FORMES ET VISION

ESNOUF C. GEMPPM***

EYRAUD L. (Prof. émérite) GENIE ELECTRIQUE ET FERROELECTRICITE

FANTOZZI G. GEMPPM***

FAVREL J. PRODUCTIQUE ET INFORMATIQUE DES SYSTEMES MANUFACTURIERS

FAYARD J.M. BIOLOGIE FONCTIONNELLE, INSECTES ET INTERACTIONS

FAYET M. MECANIQUE DES SOLIDES

FERRARIS-BESSO G. MECANIQUE DES STRUCTURES

FLAMAND L. MECANIQUE DES CONTACTS

FLORY A. INGENIERIE DES SYSTEMES D’INFORMATIONS

FOUGERES R. GEMPPM***

FOUQUET F. GEMPPM***

FRECON L. REGROUPEMENT DES ENSEIGNANTS CHERCHEURS ISOLES

GERARD J.F. INGENIERIE DES MATERIAUX POLYMERES

GERMAIN P. LAEPSI****

GIMENEZ G. CREATIS**

GOBIN P.F. (Prof. émérite) GEMPPM***

GONNARD P. GENIE ELECTRIQUE ET FERROELECTRICITE

GONTRAND M. PHYSIQUE DE LA MATIERE

GOUTTE R. (Prof. émérite) CREATIS**

GOUJON L. GEMPPM***

GOURDON R. LAEPSI****.

GRANGE G. GENIE ELECTRIQUE ET FERROELECTRICITE

GUENIN G. GEMPPM***

GUICHARDANT M. BIOCHIMIE ET PHARMACOLOGIE

GUILLOT G. PHYSIQUE DE LA MATIERE

GUINET A. PRODUCTIQUE ET INFORMATIQUE DES SYSTEMES MANUFACTURIER

GUYADER J.L. VIBRATIONS-ACOUSTIQUE

GUYOMAR D. GENIE ELECTRIQUE ET FERROELECTRICITE

HEIBIG A. MATHEMATIQUE APPLIQUEES DE LYON

JACQUET-RICHARDET G. MECANIQUE DES STRUCTURES

JAYET Y. GEMPPM***

JOLION J.M. RECONNAISSANCE DE FORMES ET VISION

JULLIEN J.F. UNITE DE RECHERCHE EN GENIE CIVIL - Structures

JUTARD A. (Prof. émérite) AUTOMATIQUE INDUSTRIELLE

KASTNER R. UNITE DE RECHERCHE EN GENIE CIVIL - Géotechnique

KOULOUMDJIAN J. INGENIERIE DES SYSTEMES D’INFORMATION

LAGARDE M. BIOCHIMIE ET PHARMACOLOGIE

LALANNE M. (Prof. émérite) MECANIQUE DES STRUCTURES

LALLEMAND A. CENTRE DE THERMIQUE DE LYON - Energétique et thermique

LALLEMAND M. (Mme) CENTRE DE THERMIQUE DE LYON - Energétique et thermique

LAUGIER A. PHYSIQUE DE LA MATIERE

LAUGIER C. BIOCHIMIE ET PHARMACOLOGIE

LAURINI R. INFORMATIQUE EN IMAGE ET SYSTEMES D’INFORMATION

LEJEUNE P. UNITE MICROBIOLOGIE ET GENETIQUE

LUBRECHT A. MECANIQUE DES CONTACTS

MASSARD N. INTERACTION COLLABORATIVE TELEFORMATION TELEACTIVITE

MAZILLE H. PHYSICOCHIMIE INDUSTRIELLE

MERLE P. GEMPPM***

MERLIN J. GEMPPM***

MIGNOTTE A. (Mle) INGENIERIE, INFORMATIQUE INDUSTRIELLE

MILLET J.P. PHYSICOCHIMIE INDUSTRIELLE

MIRAMOND M. UNITE DE RECHERCHE EN GENIE CIVIL - Hydrologie urbaine

MOREL R. MECANIQUE DES FLUIDES ET D’ACOUSTIQUES

MOSZKOWICZ P. LAEPSI****

NARDON P. (Prof. émérite) BIOLOGIE FONCTIONNELLE, INSECTES ET INTERACTIONS

NIEL E. AUTOMATIQUE INDUSTRIELLE

NORTIER P. DREP

ODET C. CREATIS**

OTTERBEIN M. (Prof. émérite) LAEPSI****

PARIZET E. VIBRATIONS-ACOUSTIQUE

PASCAULT J.P. INGENIERIE DES MATERIAUX POLYMERES

PAVIC G. VIBRATIONS-ACOUSTIQUE

PELLETIER J.M. GEMPPM***

PERA J. UNITE DE RECHERCHE EN GENIE CIVIL - Matériaux

PERRIAT P. GEMPPM***

PERRIN J. INTERACTION COLLABORATIVE TELEFORMATION TELEACTIVITE

PINARD P. (Prof. émérite) PHYSIQUE DE LA MATIERE

PINON J.M. INGENIERIE DES SYSTEMES D’INFORMATION

PONCET A. PHYSIQUE DE LA MATIERE

POUSIN J. MODELISATION MATHEMATIQUE ET CALCUL SCIENTIFIQUE

PREVOT P. INTERACTION COLLABORATIVE TELEFORMATION TELEACTIVITE

PROST R. CREATIS**

RAYNAUD M. CENTRE DE THERMIQUE DE LYON - Transferts Interfaces et Matériaux

REDARCE H. AUTOMATIQUE INDUSTRIELLE

RETIF J-M. CEGELY*

REYNOUARD J.M. UNITE DE RECHERCHE EN GENIE CIVIL - Structures

RIGAL J.F. MECANIQUE DES SOLIDES

RIEUTORD E. (Prof. émérite) MECANIQUE DES FLUIDES

ROBERT-BAUDOUY J. (Mme) (Prof. émérite) GENETIQUE MOLECULAIRE DES MICROORGANISMES

ROUBY D. GEMPPM***

ROUX J.J. CENTRE DE THERMIQUE DE LYON – Thermique de l’Habitat

RUBEL P. INGENIERIE DES SYSTEMES D’INFORMATION

SACADURA J.F. CENTRE DE THERMIQUE DE LYON - Transferts Interfaces et Matériaux

SAUTEREAU H. INGENIERIE DES MATERIAUX POLYMERES

SCAVARDA S. AUTOMATIQUE INDUSTRIELLE

SOUIFI A. PHYSIQUE DE LA MATIERE

SOUROUILLE J.L. INGENIERIE INFORMATIQUE INDUSTRIELLE

THOMASSET D. AUTOMATIQUE INDUSTRIELLE

THUDEROZ C. ESCHIL – Equipe Sciences Humaines de l’Insa de Lyon

UBEDA S. CENTRE D’INNOV. EN TELECOM ET INTEGRATION DE SERVICES

VELEX P. MECANIQUE DES CONTACTS

VIGIER G. GEMPPM***

VINCENT A. GEMPPM***

VRAY D. CREATIS**

VUILLERMOZ P.L. (Prof. émérite) PHYSIQUE DE LA MATIERE

Directeurs de recherche C.N.R.S. :

BAIETTO-CARNEIRO M-C. (Mme) MECANIQUE DES CONTACTS ET DES SOLIDES

BERTHIER Y. MECANIQUE DES CONTACTS

CONDEMINE G. UNITE MICROBIOLOGIE ET GENETIQUE

COTTE-PATAT N. (Mme) UNITE MICROBIOLOGIE ET GENETIQUE

ESCUDIE D. (Mme) CENTRE DE THERMIQUE DE LYON

FRANCIOSI P. GEMPPM***

MANDRAND M.A. (Mme) UNITE MICROBIOLOGIE ET GENETIQUE

POUSIN G. BIOLOGIE ET PHARMACOLOGIE

ROCHE A. INGENIERIE DES MATERIAUX POLYMERES

SEGUELA A. GEMPPM***

Directeurs de recherche I.N.R.A. :

FEBVAY G. BIOLOGIE FONCTIONNELLE, INSECTES ET INTERACTIONS

GRENIER S. BIOLOGIE FONCTIONNELLE, INSECTES ET INTERACTIONS

RAHBE Y. BIOLOGIE FONCTIONNELLE, INSECTES ET INTERACTIONS

Directeurs de recherche I.N.S.E.R.M. :

PRIGENT A.F. (Mme) BIOLOGIE ET PHARMACOLOGIE

MAGNIN I. (Mme) CREATIS**

*GEMPPM GROUPE D’ETUDE METALLURGIE PHYSIQUE ET PHYSIQUE DES MATERIAUX

** CREATIS CENTRE DE RECHERCHE ET D’APPLICATIONS EN TRAITEMENT DE L’IMAGE ET DU SIGNAL

***LAEPSI LABORATOIRE d’ANALYSE ENVIRONNEMENTALE DES PROCEDES ET SYSTEMES INDUSTRIELS

****CEGELY CENTRE DE GENIE ELECTRIQUE DE LYON

INSA DE LYON DEPARTEMENT DES ETUDES DOCTORALES ET RELATIONS INTERNATIONALES SCIENTIFIQUES OCTOBRE 2001

Ecoles Doctorales et Diplômes d’Etudes Approfondies

Habilités pour la période 1999-2004

ECOLES

DOCTORALES

n° code national

RESPONSABLE

PRINCIPAL

CORRESPONDANT

INSA

DEA INSA

n° code national

RESPONSABLE

DEA INSA

Chimie Inorganique

910643

M. J.F. QUINSON

Tél 83.51 Fax 85.28

Sciences et Stratégies Analytiques

910634

CHIMIE DE LYON

(Chimie, Procédés,

Environnement)

EDA206

M. D. SINOU

UCBL1

04.72.44.62.63

Sec 04.72.44.62.64

Fax 04.72.44.81.60

M. P. MOSZKOWICZ

83.45

Sec 84.30

Fax 87.17

Sciences et Techniques du Déchet

910675

M. P.

MOSZKOWICZ

Tél 83.45 Fax 87.17

Villes et Sociétés

911218

Mme M.

ZIMMERMANN

Tél 84.71 Fax 87.96

ECONOMIE,

ESPACE ET

MODELISATION

DES

COMPORTEMENTS

(E2MC)

EDA417

M.A. BONNAFOUS

LYON 2

04.72.72.64.38

Sec 04.72.72.64.03

Fax 04.72.72.64.48

Mme M. ZIMMERMANN

84.71

Fax 87.96

Dimensions Cognitives et

Modélisation

992678

M. L. FRECON

Tél 82.39 Fax 85.18

Automatique Industrielle

910676

M. M. BETEMPS

Tél 85.59 Fax 85.35

Dispositifs de l’Electronique

Intégrée

910696

M. D. BARBIER

Tél 85.47 Fax 60.81

Génie Electrique de Lyon

910065

M. J.P. CHANTE

Tél 87.26 Fax 85.30

ELECTRONIQUE,

ELECTROTECHNIQ

UE, AUTOMATIQUE

(E.E.A.)

EDA160

M. G. GIMENEZ

INSA DE LYON

83.32

Fax 85.26

Images et Systèmes

992254

Mme I. MAGNIN

Tél 85.63 Fax 85.26

ECOLES

DOCTORALES

n° code national

RESPONSABLE

PRINCIPAL

CORRESPONDANT

INSA

DEA INSA

n° code national

RESPONSABLE

DEA INSA

EVOLUTION,

ECOSYSTEME,

MICROBIOLOGIE ,

MODELISATION

(E2M2)

EDA403

M. J.P FLANDROIS

UCBL1

04.78.86.31.50

Sec 04.78.86.31.52

Fax 04.78.86.31.49

M. S. GRENIER

79.88

Fax 85.34

Analyse et Modélisation des

Systèmes Biologiques

910509

M. S. GRENIER

Tél 79.88 Fax 85.34

Documents Multimédia, Images et

Systèmes d’Information

Communicants

992774

M. A. FLORY

Tél 84.66 Fax 85.97

Extraction des Connaissances à

partir des Données

992099

M. J.F. BOULICAUT

Tél 89.05 Fax 87.13

INFORMATIQUE

ET INFORMATION

POUR LA SOCIETE

(EDIIS)

EDA 407

M. J.M. JOLION

INSA DE LYON

87.59

Fax 80.97

Informatique et Systèmes

Coopératifs pour l’Entreprise

950131

M. A. GUINET

Tél 85.94 Fax 85.38

INTERDISCIPLINAI

RE SCIENCES-

SANTE

(EDISS)

EDA205

M. A.J. COZZONE

UCBL1

04.72.72.26.72

Sec 04.72.72.26.75

Fax 04.72.72.26.01

M. M. LAGARDE

82.40

Fax 85.24

Biochimie

930032

M. M. LAGARDE

Tél 82.40 Fax 85.24

Génie des Matériaux :

Microstructure, Comportement

Mécanique, Durabilité

910527

M. J.M.PELLETIER

Tél 83.18 Fax 85.28

MATERIAUX DE

LYON

UNIVERSITE LYON 1

EDA 034

M. J. JOSEPH

ECL

04.72.18.62.44

Sec 04.72.18.62.51

Fax 04.72.18.60.90

M. J.M. PELLETIER

83.18

Fax 84.29

Matériaux Polymères et

Composites

910607

M. H. SAUTEREAU

Tél 81.78 Fax 85.2

ECOLES

DOCTORALES

n° code national

RESPONSABLE

PRINCIPAL

CORRESPONDANT

INSA

DEA INSA

n° code national

RESPONSABLE

DEA INSA

Matière Condensée, Surfaces et

Interfaces

910577

M. G. GUILLOT

Tél 81.61 Fax 85.31

MATHEMATIQUES

ET

INFORMATIQUE

FONDAMENTALE

(Math IF)

EDA 409

M. NICOLAS

UCBL1

04.72.44.83.11

Fax 04.72.43.00.35

M. J. POUSIN

88.36

Fax 85.29

Analyse Numérique, Equations aux

dérivées partielles et Calcul

Scientifique

910281

M. G. BAYADA

Tél 83.12 Fax 85.29

Acoustique

910016

M. J.L. GUYADER

Tél 80.80 Fax 87.12

Génie Civil

992610

M. J.J.ROUX

Tél 84.60 Fax 85.22

Génie Mécanique

992111

M. G. DALMAZ

Tél 83.03

Fax 04.78.89.09.80

MECANIQUE,

ENERGETIQUE,

GENIE CIVIL,

ACOUSTIQUE

(MEGA)

EDA162

M. J. BATAILLE

ECL

04.72.18.61.56

Sec 04.72.18.61.60

Fax 04.78.64.71.45

M. G.DALMAZ

83.03

Fax 04.72.89.09.80

Thermique et Energétique

910018

Mme. M.

LALLEMAND

Tél 81.54 Fax 60.10

En grisé : Les Ecoles doctorales et DEA dont l’INSA est établissement principal

Remerciements

Tout d’abord, je tiens à remercier mes directeurs de thèse Mme. Nabila Benharkat et Mr. Jacques Kouloumdjian pour leurs aides et leurs assistances qui m’ont permis de mieux réussir mes travaux de recherche. Je salut encore leur qualité humaine remarquable et leurs encouragements continus. Je remercie également Mr. André Bonnevialle responsable R&D à Tessi Informatique qui m’a encadré durant la période du contrat CIFRE, il m’a initié à la recherche dans le secteur industriel tout en gardant l’esprit d’équipe.

Je suis aussi reconnaissant envers les personnes avec qui j’ai travaillé étroitement et qui ont contribué par leurs idées et leurs assistances à ma recherche : Ahmed Arara, Uddam Chukmol, Pierre-Henri Phauntanaud et Sana Sellami.

Je remercie, de même, les rapporteurs de ma thèse Mr. M. Schnider et Mr. J-P. Giraudin qui ont aidé à travers leurs remarques constructives à l’aboutissement de la thèse. Je remercie, aussi bien, Mr. J. Akoka pour sa participation comme membre du jury et pour ses encouragements.

Au-delà de tout, je dois à ma Mère Mme. Rouyaida Abou-Taha beaucoup de reconnaissance. Ses encouragements m’ont aidé à surmonter les étapes les plus difficiles et de tenir bien dans la vie. Ainsi, je lui dédie cette thèse comme expression de ma profonde reconnaissance. Je remercie également les encouragements et l’appui de mes frères Amer, Nasser, Imad et de ma sœur Randa et de leur famille.

Je suis également reconnaissant à la personne qui a marqué le plus ma vie et qui est toujours là pour le meilleur et pour le pire. Ainsi, je remercie Mlle. Lisa Deutz, pour sa patience, ses efforts pour corriger cette thèse et pour son soutient.

Un message de reconnaissance à mes amis surtout à mon confident Chadi Nour, à tous les membres des Associations d’Etudiants Libanais à Lyon, à l’Amicale de l’Ecole de Pères Carmes surtout Mohammed Rafie et Nazih Jamal El-Dine, mes collègues à l’Université Libanaise surtout Oussama Zein et Siba Haidar, et aux membres de l’Arab Computer Society.

Je salut aussi, mes collègues de Tessi Inforamtique à Saint-Etienne : Cédric Barllier, Franck Ducottet et Denis Perrera. Je remercie aussi bien mes collègues les chercheurs du LIRIS pour leurs amitiés et leurs soutiens Ludwig Seitz, Jean-marc Pierson, Lionel Brunie, Robert Laurini, Frédrique Lebel, Rachid Saadi, David Coquil, Solomon Atnafu, Chirine Ghedira, Girma Berhe, Abraham Alvarez, Youakim Badr, Richard Chbeir, Reim Doumat, Ahmed Taher, Maryian Scuturici, Tarek Chaari, Amjad Rattrout, Samer Abel Gafour, Mohmmed Safadi, Wassim Aouk, etc. Je remercie aussi bien tous les chercheurs de LIRIS qui ont fait mon séjour parmi eux très agréable et confortable.

Acknowledgements

Firstly, I sincerely thank my supervisors Mrs. Nabila Benharkat and Mr. Jacques Kouloumdjian. Their assistance allowed to achieve successful results in my research work. I acknowledge, as well, their remarkable human quality and their continuous encouragement. I also thank Mr. André Bonnevialle manager of R&D at Tessi Informatique who supervised me during the period of the CIFRE contract. He initiated me with research in the industrial sector meanwhile keeping the team spirit and the enterprise aims ahead.

I am also grateful to the people with whom I worked closely and who contributed by their ideas and their assistances to my research: Ahmed Ab Arara, Uddam Chukmol, Pierre-Henri Phauntanaud and Sana Sellami.

I thank, as well, the reviewers of my thesis Mr. M.Schnider and Mr. J-P.Giraudin who helped me through their constructive remarks to improve the contents of the thesis. I also thank Mr. J.Akoka for his participation as member of the jury and for his encouragements.

Beyond all, I owe my adorable mother Mrs. Rouyaida Abou-Taha much of recognition. She taught me the basic lessons of life and showed me how to stand courageous to overcome problems and difficult situations. Thus, I dedicate to here this thesis as an expression of my gracious recognition. I also thank the encouragements and the support for my brothers Amer, Nasser, Imad and of my sister Randa and their families.

I am also grateful to the person who marked my life and stand by me in the good and worst. Thus, I thank Miss LISA Deutz, for her patience, her effort to correct this thesis and for her unlimited encouragement.

A message of acknowledgment to all my friends especially to my confidant Chadi Nour, all the members of Associations of Lebanese Students in Lyon, members of the Graduated Club of Carmelite Fathers friars especially Mohammed Rafie and Nazih Jamal El-Dine, my colleagues at the Lebanese University especially Oussama Zein and Siba Haidar, and the members of Arab Society Computer.

I salute also, my colleagues from Tessi Inforamtique: Cédric Barllier, Franck Ducottet, and Denis Perrera. I as well thank my colleagues the researchers for the LIRIS for their friendships and their supports, Ludwig Seitz, Pierson Jean-marc, Lionel Brunie, Frédrique Lebel, Rachid Saadi, David Coquil, Solomon Atnafu, Chirine Ghedira, Girma Berhe, Abraham Alvarez, Youakim Badr, Richard Chbeir, Reim Doumat, Ahmed Taher, Maryian Scuturici, Tarek Chaari, Amjad Rattrout, Samer Abel Gafour, Mohmmed Safadi, Wassim Aouk, etc. I finally thank all the researchers of LIRIS who made my stay very pleasant and comfortable.

Résumé Aujourd’hui le partage sémantique entre les systèmes d’information est devenu un challenge dans les entreprises. En effet, ces systèmes constituent les fondements de la gestion du métier de l’entreprise, mènent les stratégies économiques et décisionnelles, et gèrent la communication avec les partenaires. Pour ces raisons, ces systèmes doivent être amenés à fonctionner ensemble pour permettre d’atteindre les objectifs visés par l’entreprise. Nous constatons que le partage sémantique représente une réelle barrière empêchant de faire interopérer et réutiliser ces systèmes à travers une architecture pratique. Depuis quelques années, l’usage des ontologies est de plus en plus grandissant pour résoudre le problème de partage sémantique. Le but de cette thèse est de contextualiser les ontologies et de s’en servir comme base formelle pour mettre en place des plateformes d’intégration et d’échange de données dans les systèmes d’information d’entreprise (SIE). Une architecture et une implémentation couvrant des scénarios d’utilisation, vient à l’appui du cadre formel. Dans un premier temps, nous présenterons un état de l’art sur l’ingénierie des systèmes d’information dans les entreprises et nous mettons l’accent sur le problème du partage sémantique entre les systèmes. Nous détaillerons, ensuite, deux instances des SIE qui couvrent l’entreposage de données et l’échange de données informatisées. Nous proposons un modèle d’expression de mappage pour exprimer les processus d’intégration et de transformation dans les SI respectifs. Nous étudierons ensuite l’utilisation des ontologies dans le cadre du partage sémantique entre les SIE et examinerons, également, les raisons de la phobie des ontologies dans les entreprises, ainsi que les arguments en faveur de leur utilisation. L’utilisation de la notion de contexte associée aux ontologies se dessine alors comme le moyen de pallier les difficultés du partage sémantique. Nous proposons alors un formalisme basé sur les logiques modales de description pour assurer ce couplage. Nous illustrerons ensuite les objectifs du projet EISCO qui montrent la mise en valeur des ontologies contextuelles dans les entreprises. Une architecture et plusieurs scénarios d’utilisation, dont un qui sera complété par une étude de cas, seront également présentés. Finalement, nous récapitulerons les différentes implémentations réalisées à partir des modèles et outils proposés. Mots clés: Système d’information dans les entreprises, entrepôt de données, échange de données informatisées, ontologies, contextes, logiques modales de description, multi représentation.

Abstract The enterprise information systems (EIS) offer the cornerstone for managing enterprise business, applying strategic and economical decisions, and holding communication with partners. Bringing systems to work together is increasingly becoming essential for leveraging the enterprise information systems and reaching common goals. Currently, enterprises develop their systems independently with low consideration for the collaboration that systems can play with other systems. Certain, semantic sharing represents the daunting barrier for making these systems work together through a more convenient global architecture and providing interoperability and reusability. In the last decade, theoretical research such as ontologies and context were suggested separately as formal support for treating the semantics sharing problem. This thesis concentrates on studying the application of tightening together context and ontologies which can serve as formal background for reaching a suitable global enterprise environment. The study covers the usefulness of contextual ontologies for data integration and data exchange platforms. It invests in resolving the semantic sharing problem between these platforms as well as suggesting individuals’ solutions for their state of the art. It brings along architecture and implementation, with some scenario of use within EIS. This thesis covers, firstly, a state of the art in enterprise information systems and exposes the semantics sharing problem. It discusses, later on, two instances of these systems including EDI translator and Data Warehouse. It suggests a mapping expression model for managing the data integration and message translation process. It studies, as well, the use of ontologies for resolving semantics sharing across the EIS. It also includes a discussion concerning the depreciation of the use of ontologies in the enterprise domain and how to make ontologies more attractive for practical use. The thesis suggests pairing-up the two notions of ontologies and context in order to overcome the multi-representation issue left in semantics sharing. It shows formalism based on modal description logics for representing the coupling between ontologies and context. It introduces the EISCO (Enterprise Information Systems Contextual Ontology) project along with architecture and its scenarios of use. It concentrates also on detailing a selected scenario with a full implemented case study. Finally, it sketches the different implementations realized from suggested models and architecture. In essence, this thesis adheres to advance the cooperation between EIS and can be counted as yet another approach for removing a brick from the wall of semantic sharing.

Keywords: Enterprise Information Systems, Data warehouse, Electronic Data Interchange, Ontologies, Context, Modal Description Logics, Multi-perspectives.

Table of Contents i

TABLE OF CONTENTS

TABLE OF CONTENTS....................................................................................................................................... I

LIST OF FIGURES .......................................................................................................................................... VII

LIST OF TABLES .............................................................................................................................................. IX

RÉSUMÉ EN FRANÇAIS ................................................................................................................................... 1

I. INTRODUCTION ........................................................................................................................................... 1 II. ETAT DE L’ART SUR LES SYSTÈMES D’INFORMATION DANS LES ENTREPRISES (SIE) .................................. 3

II.1 Définitions et développement ............................................................................................................ 3 II.2 Problématique du partage sémantique.............................................................................................. 4 II.3 Perspectives des SIE.......................................................................................................................... 5

III. ETUDE DES PLATEFORMES D’INTÉGRATION ET D’ÉCHANGE DE DONNÉES (DW ET EDI)......................... 5 III.1 Etude des outils d’entreposage de données....................................................................................... 6 III.2 Etude du problème de traduction des messages dans les systèmes EDI ........................................... 6 III.3 Modèle d’expression des mappages .................................................................................................. 8 III.4 Utilisation avec les entrepôts de données (outil QELT).................................................................... 9 III.5 Utilisation avec l’Echange des Données Informatisées .................................................................. 11

III.5.1 Traducteur EDI....................................................................................................................................... 11 III.5.2 Algorithme de matching entre les schémas EDI XML (EX-SMAL) ...................................................... 13

IV. LES ONTOLOGIES DANS LES ENTREPRISES............................................................................................. 16 IV.1 Ontologies définition et utilisations ................................................................................................ 16

IV.1.1 L’Interopérabilité.................................................................................................................................... 17 IV.1.2 La Réutilisation ...................................................................................................................................... 17

IV.2 Les Ontologies formelles et les systèmes d’information des entreprises......................................... 17 IV.2.1 Spécification des ontologies locales pour les SIE................................................................................... 17 IV.2.2 Les ontologies locales à des SIE et les ontologies globales de l’entreprise ............................................ 18

IV.3 Coupler les ontologies et les contextes............................................................................................ 18 IV.3.1 La notion de contexte dans les systèmes d’information.......................................................................... 18 IV.3.2 Comment coupler les ontologies locales et les contextes ....................................................................... 19

V. LE FORMALISME DES ONTOLOGIES CONTEXTUELLES ................................................................................ 20 V.1 Exemple d’utilisation dans les SIE.................................................................................................. 20 V.2 Mécanisme d’estampillage.............................................................................................................. 21

Table of Contents ii

V.3 Mécanisme de rapprochement sémantique ..................................................................................... 21 V.4 Mécanisme d’inférence global ........................................................................................................ 22 V.5 La logique de description et la logique modale (syntaxe et sémantique de ALCNM) ....................... 22 V.6 Représentation des ontologies contextuelles avec la logique modale de description (ALCNM) ...... 23 V.7 Résolution d’exemple : .................................................................................................................... 25

VI. UTILISATION DES ONTOLOGIES CONTEXTUELLES DANS LES SIE........................................................... 26 VI.1 Architecture et composants dans le projet EISCO .......................................................................... 26

VI.1.1 EISCO KB Server .................................................................................................................................. 27 VI.1.2 EISCO Core Service............................................................................................................................... 27 VI.1.3 EISCO Accessibility Server ................................................................................................................... 29

VI.2 Scénario d’utilisation et Etude de cas ............................................................................................. 29 VII. IMPLÉMENTATION ................................................................................................................................ 31

VII.1 Implémentation de QELT ................................................................................................................ 31 VII.1.1 Scénario d’utilisation.............................................................................................................................. 31 VII.1.2 Résultats: ................................................................................................................................................ 33

VII.2 Implémentation du Traducteur EDI ................................................................................................ 33 VII.2.1 Utilisation du modèle des expressions de mappage................................................................................ 33

VII.2.1.1 Scénario d’utilisation......................................................................................................................... 34 VII.2.1.2 Résultats ............................................................................................................................................ 34

VII.2.2 Implémentation de l’algorithme de matching (EX-SMAL) .................................................................... 35 VII.2.2.1 Etapes d’exécution : .......................................................................................................................... 35 VII.2.2.2 Tests de performance: ....................................................................................................................... 36

VII.3 Implémentation & étude de faisabilité du projet EISCO................................................................. 38 VII.3.1 Etude de Faisabilité ................................................................................................................................ 38 VII.3.2 Implémentation du prototype.................................................................................................................. 39 VII.3.3 Scénario de test....................................................................................................................................... 40

VII.3.3.1 Rôle de l’Administrateur : ................................................................................................................. 40 VII.3.3.2 Rôle du Client : ................................................................................................................................. 41

VIII. CONCLUSION ........................................................................................................................................ 41

1. INTRODUCTION........................................................................................................................................... 43

1.1 RESEARCH CONTEXT AND PROBLEM IDENTIFICATION ......................................................................... 44 1.2 MOTIVATIONS ...................................................................................................................................... 44 1.3 RESEARCH METHODOLOGY AND GOALS .............................................................................................. 45 1.4 ORIGINALITY OF THIS WORK AND RESULTS.......................................................................................... 46 1.5 CIFRE GRANT ..................................................................................................................................... 47 1.6 OUTLINES OF THE THESIS ..................................................................................................................... 47

2. UNDERSTANDING ENTERPRISE INFORMATION SYSTEMS........................................................... 49

2.1 WHAT IS AN ENTERPRISE INFORMATION SYSTEM ABOUT? ................................................................... 50 2.1.1 Position of Enterprise Information Systems .................................................................................... 50 2.1.2 Spectrum of Enterprise Information Systems .................................................................................. 51

2.1.2.1 Horizontal systems: ................................................................................................................................ 51 2.1.2.2 Vertical systems ..................................................................................................................................... 52 2.1.2.3 External systems..................................................................................................................................... 53

2.1.3 Enterprise Information System Development .................................................................................. 53 2.1.4 Architecture of Enterprise Information Systems: ............................................................................ 55

Table of Contents iii

2.1.5 Perspectives of Enterprise Information Systems:............................................................................ 57 2.2 STATE OF THE ART OF EIS PROBLEMS .................................................................................................. 58

2.2.1 Semantic sharing............................................................................................................................. 58 2.2.1.1 Semantic heterogeneity........................................................................................................................... 58 2.2.1.2 Understanding the problem .................................................................................................................... 59

2.2.2 Interoperable and reusable Enterprise Information Systems.......................................................... 60 2.2.2.1 Interoperability: ...................................................................................................................................... 60 2.2.2.2 Dimension of interoperability................................................................................................................. 60 2.2.2.3 Reusability.............................................................................................................................................. 62 2.2.2.4 Dimension of reusability ........................................................................................................................ 63 2.2.2.5 Common enterprise understanding: ........................................................................................................ 64

2.3 SUMMARY ............................................................................................................................................ 64

3. IMPROVING ENTERPRISE DATA INTEGRATION AND DATA INTERCHANGE PLATFORMS 65

3.1 DATA WAREHOUSING SYSTEMS: ........................................................................................................... 67 3.1.1 Data warehouse Components: ........................................................................................................ 67

3.1.1.1 ETL tools:............................................................................................................................................... 68 3.1.1.2 Restitution process: ................................................................................................................................ 69 3.1.1.3 Meta-data Management: ......................................................................................................................... 69

3.1.2 Issues for improving data warehouse systems: ............................................................................... 70 3.1.2.1 XML vehicle for improving Data warehouse ......................................................................................... 70 3.1.2.2 Needed Data warehouse features:........................................................................................................... 71 3.1.2.3 Evolution of Business rules: ................................................................................................................... 71 3.1.2.4 How Meta-data Improves Data Transformation: .................................................................................... 72

3.2 AN ANALYSIS OF EDI MESSAGE TRANSLATION AND MESSAGE INTEGRATION PROBLEMS.................. 72 3.2.1 EDI Technology: ............................................................................................................................. 73

3.2.1.1 Existing standards:.................................................................................................................................. 74 3.2.1.2 Components of EDI message: ................................................................................................................ 74

3.2.2 Our EDI Translator: ....................................................................................................................... 75 3.2.2.1 Existing translators: ................................................................................................................................ 75 3.2.2.2 Translation unit: ..................................................................................................................................... 76 3.2.2.3 Translation scenarios: ............................................................................................................................. 77

3.2.3 Challenging issues for our EDI Translator: ................................................................................... 79 3.2.3.1 Uniform Message Representation:.......................................................................................................... 79 3.2.3.2 Schema matching applied to message translation:.................................................................................. 79

3.3 MAPPING GUIDELINE & MAPPING EXPRESSION MODEL:........................................................................ 80 3.3.1 Existing solutions: ........................................................................................................................... 80 3.3.2 Mapping expression examples: ....................................................................................................... 81 3.3.3 Mapping expression model: ............................................................................................................ 81 3.3.4 Usefulness of mapping expression: ................................................................................................. 84 3.3.5 Comparable approaches:................................................................................................................ 84

3.4 PRACTICAL USE: ................................................................................................................................... 85 3.4.1 QELT:.............................................................................................................................................. 85

3.4.1.1 QELT Architecture:................................................................................................................................ 86 3.4.1.1.1 Meta-Data components:..................................................................................................................... 86 3.4.1.1.2 Extraction Process ............................................................................................................................. 87

Table of Contents iv

3.4.1.1.3 Loading process:................................................................................................................................ 87 3.4.1.2 SQL Generator and transformation process:........................................................................................... 87

3.4.2 Semi-Automatic EDI Translator...................................................................................................... 88 3.4.2.1 Similarity Algorithms:............................................................................................................................ 88 3.4.2.2 Basic similarity....................................................................................................................................... 89 3.4.2.3 Structural similarity ................................................................................................................................ 90 3.4.2.4 Pair-wise element similarity ................................................................................................................... 92 3.4.2.5 Filtering .................................................................................................................................................. 92

3.5 SUMMARY ............................................................................................................................................ 93

4. ONTOLOGIES IN THE SERVICE OF ENTERPRISE INFORMATION SYSTEMS ........................... 95

4.1 WHAT IS AN ONTOLOGY ABOUT?.......................................................................................................... 97 4.1.1 Origins ............................................................................................................................................ 97 4.1.2 Contemporary definition ................................................................................................................. 97 4.1.3 Formal ontology and information systems ...................................................................................... 99 4.1.4 What ontologies can do for Enterprise Information Systems ........................................................ 100

4.1.4.1 Communication .................................................................................................................................... 101 4.1.4.2 Interoperability ..................................................................................................................................... 101 4.1.4.3 System Engineering.............................................................................................................................. 101 4.1.4.4 How can we use ontologies for semantics sharing................................................................................ 103

4.1.5 Overview of languages, implementation tools, and applications .................................................. 104 4.1.5.1 Ontology representation languages....................................................................................................... 104 4.1.5.2 Ontology development techniques, implementation tools, and methodologies .................................... 105 4.1.5.3 Ontology applications........................................................................................................................... 106

4.2 DIFFERENCE BETWEEN ENTERPRISE ONTOLOGY AND LOCAL INFORMATION SYSTEMS ONTOLOGY... 107 4.2.1 Enterprise ontologies .................................................................................................................... 107

4.2.1.1 Edinburgh Enterprise Ontology (EEO)................................................................................................. 108 4.2.1.2 Core Enterprise Ontology: CEO........................................................................................................... 108 4.2.1.3 Toronto Virtual Ontology TOVE ontology .......................................................................................... 109

4.2.2 Specifying local Enterprise Information System Ontology............................................................ 109 4.2.3 Local enterprise information system ontology versus global enterprise ontology ........................ 109

4.3 OUR VISION FOR PROMOTING ONTOLOGIES WITHIN THE ENTERPRISE:............................................... 110 4.3.1 Is it ontology-phobia or ignorance?.............................................................................................. 111 4.3.2 Defeating ontology-phobia: .......................................................................................................... 111

4.4 CONTEXT AND ONTOLOGY ................................................................................................................. 112 4.4.1 The notion of context in information systems ................................................................................ 112 4.4.2 Contextualizing local ontologies (pairing-up ontology and context) ............................................ 113 4.4.3 Requirements for Contextual ontologies in EIS ............................................................................ 114 4.4.4 What problems do contextual ontologies help solving in EIS? ..................................................... 114

4.5 SUMMARY: ......................................................................................................................................... 114

5. CONTEXTUAL ONTOLOGIES FORMALISM ...................................................................................... 117

5.1 EXAMPLES OF MULTI-REPRESENTATION............................................................................................ 118 5.1.1 Example PMIS &HRIS (Example.1): ............................................................................................ 119 5.1.2 Example EDI & DW (Example.2): ................................................................................................ 119

5.2 STUDYING CONTEXTUAL ONTOLOGIES MECHANISMS ....................................................................... 119 5.2.1 Stamping Mechanism: ................................................................................................................... 120

Table of Contents v

5.2.2 Semantic Similarity Mechanism:................................................................................................... 120 5.2.3 Semi-Global Interpretation Mechanism........................................................................................ 121

5.3 DESCRIPTION LOGICS & MODAL LOGICS ........................................................................................... 121 5.3.1 Description Logics: ....................................................................................................................... 122 5.3.2 DL-based Knowledge Representation System:.............................................................................. 123 5.3.3 Modal Description Logics:............................................................................................................ 124 5.3.4 Syntax and semantics of ALCNM: .................................................................................................. 125

5.4 EXPRESSING CONTEXTUAL ONTOLOGIES WITH MODAL DESCRIPTION LOGICS (ALCNM) ................... 126 5.5 REVISITED EXAMPLES: ....................................................................................................................... 128

5.5.1 Example 1: .................................................................................................................................... 128 5.5.2 Example 2: .................................................................................................................................... 128

5.6 RELATED WORK ................................................................................................................................. 129 5.6.1 Comparison with Other Techniques.............................................................................................. 129 5.6.2 Comparison with Other Formalism .............................................................................................. 131

5.6.2.1 Logic based models: ............................................................................................................................. 131 5.6.2.2 Ontology based models ........................................................................................................................ 132 5.6.2.3 Evaluation............................................................................................................................................. 133

5.7 SUMMARY .......................................................................................................................................... 134

6. ENTERPRISE INFORMATION SYSTEMS CONTEXTUAL ONTOLOGIES (EISCO) PROJECT. 137

6.1 EISCO PROJECT ................................................................................................................................. 138 6.1.1 Project identity .............................................................................................................................. 138 6.1.2 EISCO logical architecture ........................................................................................................... 139 6.1.3 EISCO software architecture ........................................................................................................ 140

6.1.3.1 EISCO KB Server ................................................................................................................................ 140 6.1.3.2 EISCO Core Service............................................................................................................................. 141

6.1.3.2.1 Applications Resources Provider:.................................................................................................... 142 6.1.3.2.2 Applications Importer:..................................................................................................................... 142 6.1.3.2.3 Knowledge Manager:....................................................................................................................... 142

6.1.3.3 EISCO Accessibility Server ................................................................................................................. 143 6.2 SCENARIOS OF USE: ............................................................................................................................ 143

6.2.1 Used Example: .............................................................................................................................. 143 6.2.2 Interoperability ............................................................................................................................. 144 6.2.3 Query answering ........................................................................................................................... 145 6.2.4 Reusability..................................................................................................................................... 147

6.3 CASE STUDY FOR REUSABILITY .......................................................................................................... 148 6.4 SUMMARY .......................................................................................................................................... 151

7. IMPLEMENTATION AND FEASIBILITY STUDY................................................................................ 153

7.1 PROTOTYPING QELT.......................................................................................................................... 154 7.1.1 Scenario of using QELT:............................................................................................................... 154 7.1.2 Results:.......................................................................................................................................... 156

7.2 PROTOTYPING EDI TRANSLATOR: ...................................................................................................... 157 7.2.1 Using Mapping Expression Model with XQuery-based EDI Translator: ..................................... 157

7.2.1.1 Scenario of test ..................................................................................................................................... 158 7.2.1.2 Results .................................................................................................................................................. 159

7.2.2 Semi-automatic matching discovery (EX-SMAL Algorithm):........................................................ 160

Table of Contents vi

7.2.2.1 Scenario of test ..................................................................................................................................... 162 7.2.2.2 Results .................................................................................................................................................. 162

7.3 PROTOTYPING EISCO PROJECT ......................................................................................................... 163 7.3.1 Feasibility study & used technologies:.......................................................................................... 163

7.3.1.1 EISCO KB Server ................................................................................................................................ 163 7.3.1.2 EISCO Core services............................................................................................................................ 164 7.3.1.3 EISCO Accessibility Server ................................................................................................................. 164

7.3.2 Prototype implementation ............................................................................................................. 165 7.3.3 Scenario of test .............................................................................................................................. 167

7.3.3.1 Role of the Administrator: .................................................................................................................... 167 7.3.3.2 Role of the Client: ................................................................................................................................ 168

7.4 SUMMARY .......................................................................................................................................... 169

8. CONCLUSION AND PERSPECTIVES..................................................................................................... 171

8.1 REACHED GOALS ............................................................................................................................... 172 8.2 PERSPECTIVES & FUTURE WORKS ...................................................................................................... 173 8.3 PERSONAL FEEDBACK......................................................................................................................... 174

REFERENCES.................................................................................................................................................. 175

APPENDIX A: ORGANIZATION OF ROLES BETWEEN THE PROJECT’S MEMBERS.................. 191

APPENDIX B: A SNAPSHOT OF ISO 15944-4 (OPEN-EDI ONTOLOGY COLLABORATION MODEL) ............................................................................................................................................................ 192

APPENDIX C: THE UML MODEL REPRESENTING QELT TOOL ...................................................... 193

APPENDIX D: THE UML MODEL REPRESENTING EDI TRANSLATOR TOOL.............................. 194

APPENDIX E: THE XQUERY USED WITHIN EDI TRANSLATOR BETWEEN (PAYMUL AND MT103)............................................................................................................................................................... 195

APPENDIX F: THE BRANCHING DIAGRAM OF RUM REPRESENTATION .................................... 200

APPENDIX I: SNAPSHOOT OF EX-SMAL PROTOTYPE ....................................................................... 201

APPENDIX J: SCREENSHOT OF EISCO PROTOTYPE ADMINISTRATOR INTERFACE WITH LINKING CONCEPT OPTION...................................................................................................................... 202

List of Figures vii

LIST OF FIGURES

Figure 1.1: Research Methodology .................................................................................................... 45 Figure 2.1: Classification of EIS ........................................................................................................ 52 Figure 2.2: Linkages among the Architecture Views ([C4ISR 97]) ................................................ 56 Figure 2.3: the Problem Icon.............................................................................................................. 59 Figure 2.4: Levels and corresponding computing environments ([C4ISR 97]) ............................. 61 Figure 2.5: Dimensions of interoperability (adapted from [Obrst 2003]) ...................................... 62 Figure 3.1: Data warehouse System Architecture ............................................................................ 67 Figure 3.2: An EDI Translator's Architecture ................................................................................. 73 Figure 3.3: Scenario of translation between SWIFT MT 103 &MT202......................................... 74 Figure 3.4: Scenario of Use of EDI message translation between EDIFACT & SWIFT.............. 78 Figure 3.5: Illustration of Mapping Expressions Model .................................................................. 82 Figure 3.6: QETL Architecture ......................................................................................................... 86 Figure 3.7: Short description of EX-SMAL ...................................................................................... 89 Figure 3.8: Aggregation Function...................................................................................................... 90 Figure 4.1: Ontology spectrum (adapted from [Obrst 03]) ............................................................. 99 Figure 4.2: Ontology architectures .................................................................................................. 103 Figure 5.1: A side of UML model for HRIS.................................................................................... 118 Figure 5.2: A side of UML model for PMIS.................................................................................... 118 Figure 5.3: A snapshot of UML model for EDI Translator.......................................................... 119 Figure 5.4: A snapshot of UML model for DW, QETL ................................................................ 119 Figure 5.5: DL-based Knowledge Representation System............................................................. 124 Figure 5.6: Labeled oriented graph representing Kripke’s structure ......................................... 127 Figure 5.7: Accessibility relation between worlds or contextual ontologies................................. 127 Figure 6.1: Multi layer Architecture ............................................................................................... 139 Figure 6.2: EISCO based architecture ............................................................................................ 139 Figure 6.3: EISCO Server architecture.......................................................................................... 141 Figure 6.4: A side of UML model for HRIS.................................................................................... 143 Figure 6.5: A side of UML model for PMIS.................................................................................... 143 Figure 6.6: Collaboration diagram of Interoperability Scenario.................................................. 144 Figure 6.7: Collaboration Diagram of Query Answering Scenario .............................................. 146 Figure 6.8: Collaboration Diagram of Reusability Scenario ........................................................ 147 Figure 6.9: Use Case Diagram of Reusability Case Study ............................................................. 148 Figure 6.10: Sequence Diagram of Reusability Case Study........................................................... 149

List of Figures viii

Figure 6.11: Pseudo-code of DefineMappingExpression Method .................................................. 150 Figure 7.1: Levels of prototyping..................................................................................................... 154 Figure 7.2: Snapshots for QELT Interfaces.................................................................................... 155 Figure 7.3: Target model for After-sales- Service .......................................................................... 156 Figure 7.4: Case study meta-data model including Mapping meta-data ..................................... 156 Figure 7.5: Scenario of use................................................................................................................ 158 Figure 7.6: General view of EDI Translator implementing EX-SMAL algorithm ..................... 161 Figure 7.7: Extraction of the matching result between PAYMUL, MT103 and their self.......... 162 Figure 7.8: Component Diagram of EISCO Prototype.................................................................. 164 Figure 7.9: Deployment Diagram of EISCO Prototype ................................................................. 165 Figure 7.10: Screenshot of EISCO’s Administrator Graphical Interface.................................... 166 Figure 7.11: Screenshot of Client EISCO Interface ....................................................................... 167 Figure 7.12: The list of methods using the selected concept .......................................................... 168 Figure 7.13: The Source-code of selected method .......................................................................... 169

List of Tables ix

LIST OF TABLES

Table 3.1 ETL tools capabilities ......................................................................................................... 68 Table 3.2 Translators features ........................................................................................................... 76 Table 4.1 Comparison between XML-based ontology languages ([Rifaieh 01]).......................... 105 Table 5.1: Syntax and Semantic of ALCN ........................................................................................ 123 Table 5.2: Syntax and Semantic of modal operators...................................................................... 125 Table 5.3: Comparative analysis ...................................................................................................... 130 Table 5.4: Comparison between related works formalisms .......................................................... 134 Table 7.1: Description of XML based EDI messages used in the test........................................... 159 Table 7.2: Results values................................................................................................................... 160

RÉSUMÉ EN FRANÇAIS

I. Introduction La demande de partage sémantique entre les systèmes d’information croît sans cesse dans le domaine des entreprises. En effet, ces systèmes constituent les fondements de la gestion du métier de l’entreprise, mènent les stratégies économiques et décisionnelles, et gèrent la communication avec les partenaires. Pour ces raisons, ces systèmes doivent être amenés à fonctionner ensemble et permettre d’atteindre les objectifs visés par l’entreprise. Les Systèmes d’Information dans les Entreprises (SIE) sont face à deux défis importants dans leur fonctionnement. D’un part, il faut assurer une interopérabilité entre les systèmes hétérogènes qui sont développés, en général, indépendamment en se basant sur une mono- représentation du domaine. D’autre part, l’environnement des entreprises est dynamique ce qui entraîne des changements continus dans ces systèmes selon les besoins. Ainsi, il faut optimiser le processus de réutilisation des modules entre les différents systèmes pour accélérer les développements. En analysant ces défis, nous constatons que le problème du partage sémantique est un facteur majeur qui dégrade l’interopérabilité et qui empêche la réutilisation entre les SIE. Le moyen possible, de faire face au problème de l’hétérogénéité sémantique, consiste à réduire, voire même éliminer, les incompatibilités terminologiques et conceptuelles. Dès lors, établir une compréhension commune (terminologique et conceptuelle), avec plusieurs points de vues, peut aider à établir une base de communication entre les utilisateurs, à gérer l’interopérabilité entre les systèmes et à améliorer le processus d’ingénierie de la réutilisation. Depuis une dizaine d’années, des techniques d’intelligence artificielle (IA), comme les ontologies et les contextes, ont marqué leur place dans le domaine de la recherche : d’ingénierie des systèmes, d’intégration de données hétérogènes, etc. Ainsi, les ontologies locales jouent un rôle essentiel dans la résolution des conflits sémantiques entre les systèmes. La notion de contexte permet elle d’identifier des vues et des perspectives pour le développement des systèmes complexes. Cependant, dans les entreprises, ces résultats de recherche ne sont pas encore exploitables. En effet, les SIE évoluent avec une forte demande d’efficacité et de qualité d’information. Ils changent avec le métier des entreprises et les challenges du marché: fusion, rachat, partenariat, partage d’information et de connaissance, transparence, etc. Autrement dit, faire évoluer les SIE dépend de deux facteurs : push

Résumé Français 2

des technologies développées par les chercheurs et le pull de l’industrie. Nous nous intéressons à faire rapprocher ces deux mondes pour amener l’exploitation des résultats théoriques (ontologies, contextes) par des outils et logiciels du SIE. Ainsi, nous proposons une plateforme pour les SIE qui permet de mieux concevoir, faire coopérer et réutiliser ses composants. Précisément, nous avons suivi la méthodologie suivante :

• Analyser des plateformes d’intégration et d’échange de données pour mieux identifier le problème de partage sémantique entre les SIE étudiés. Nous avons ainsi étudié les systèmes d’entreposage de données (DW) et les systèmes d’échanges de données informatisées (EDI).

• Proposer des solutions locales pour ces systèmes (le modèle des expressions de mappage), et le développement des outils (QELT et EDI Translator) basés sur ce modèle.

• Utiliser les résultats de cette expérience pratique pour montrer les difficultés du partage sémantique entre les systèmes et applications développées, en particulier prendre en considération le sujet de la réutilisation.

• Proposer une solution théorique qui résout le problème du partage sémantique en se basant sur le formalisme des logiques modales de description.

• Proposer une architecture et des scénarios d’utilisations tout en montrant les apports techniques avec des implémentations et une étude de cas traitant la réutilisation.

Le travail présenté dans cette thèse a été mené, en coordination, entre le Laboratoire d'InfoRmatique en Images et Systèmes d'information et l’équipe R&D de TESSI- Informatique et Conseil. Le travail a été, partiellement, financé par l’Association National de Recherche Technologique (ANRT) à travers une convention CIFRE N°20000661 entre mai 2001 et avril 2004, en assurant une alternance totale de 330 jours dans l’entreprise et 450 jours au laboratoire. En résumé, ce travail met en avant le problème du partage sémantique entre les SIE. Notre contribution pour résoudre ce problème se décline en l’examen des modèles concernant les ontologies et les contextes, leurs applications et leurs utilisations dans l’entreprise. L’idée principale que nous défendons, dans cette thèse, consiste à encourager la prise en compte des ontologies contextuelles dans les SIE. Une récapitulation des résultats obtenus par ce travail de recherche est décrite ci-après :

• les résultats théoriques : o le modèle d’expression de mappage. o Le choix d’un formalisme adéquat pour les ontologies contextuelles.

• les résultats pratiques : o modélisation d’un outil d’entreposage de données (QELT) basé sur le modèle

d’expression de mappage déjà trouvé. o définition d’un algorithme et d’un outil pour déterminer la similarité sémantique entre

les éléments du guide d’utilisation des messages EDI. Cet outil sera intégré dans un Traducteur EDI/XML. Ce traducteur implémente également le modèle d’expression de mappage.

o conception d’une architecture et d’un prototype qui promeut l’utilisation des ontologies contextuelles dans les SIE avec le projet EISCO. La proposition des scénarios d’utilisation et d’une étude de cas appliquée sur les outils QETL et Traducteur EDI.


II. Etat de l’art sur les Systèmes d’Information dans les Entreprises (SIE)

Les entreprises modernes utilisent quotidiennement les SIE. En effet, ces systèmes assurent des tâches indispensables pour améliorer la productivité des employés à travers des outils de business intelligence, de communication d’information et de données avec les partenaires, de gestion du métier de l’entreprise, etc.

II.1 Définitions et développement

Les SIE sont par définition un ensemble de systèmes qui couvrent l’infrastructure d’information de l’entreprise et qui offrent aussi des services aux utilisateurs. Ils se composent, comme tout système d’information, de trois parties essentielles : des programmes d’applications, des ressources d’informations comme les bases de données ou bases de connaissances et des interfaces utilisateurs. Nous pouvons identifier trois catégories de SIE : systèmes horizontaux, systèmes verticaux, et systèmes externes [Rockart 96]. Les systèmes horizontaux sont utilisés par les employés des différents services. Les systèmes verticaux sont utilisés par les managers pour l’aide à la décision. Les systèmes externes permettent la communication informatique de l’entreprise avec ses partenaires. Le développement d’un SIE n’est pas un processus de plug-and-play, par contre, il requiert un grand effort pour analyser et identifier les besoins des entreprises. Typiquement, le développement d’un système passe par plusieurs étapes comme: l’analyse des besoins, la spécification, la conceptualisation, etc. Par exemple, le génie logiciel, [Conger 94], identifie trois grandes méthodes pour la spécification des besoins : méthodes orientées processus [Benaroch 02] [McCarthy 83], méthodes orientées données [Chen 76] [Codd 70] et méthodes orientées objets [UML 98]. Ces techniques ne sont pas parfaitement adéquates pour le développement des SIE, en effet, l’ingénierie des systèmes d’information a besoin de prendre en considération des facteurs comme les représentations multiples, les différents points de vues et la conceptualisation guidée avec les ontologies. Nous résumons ces besoins comme suit : La multi-représentation : dans le domaine de modélisation d’information, le problème de la multi représentation connu depuis longtemps au travers l’utilisation des vues dans les bases de données pour personnaliser l’accès d’un groupe d’utilisateur. La spécification avec des vues multiples : la notion des points de vues a été couramment utilisée pour faciliter le développement des systèmes complexes. Ainsi, un point de vue peut être défini comme un objet local qui encapsule des connaissances partielles du système et du domaine [Finkelstein 94]. La notion des vues a été utilisée, également, dans l’ingénierie des besoins [Nusebeih 94] comme un moyen pour la séparation de perception. La Contextualisation : les différences d’opinions ou de perception entre ceux qui rassemblent et fournissent les informations posent des problèmes pour la modélisation. La contextualisation peut être illustrée comme étant un mécanisme d’abstraction qui assure la séparation entre les informations collectées [Mylopoulous 98]. Aussi, un contexte est toute information qui peut caractériser les entités utilisées (qu’elles soient une personne, un objet, etc.) selon une perception particulière [Dey 00].


Les ontologies comme spécification de conceptualisation : une ontologie est une spécification explicite de conceptualisation. Il s’agit d’une description au même titre qu’une spécification de programme, des concepts et relations existants pour un agent ou une communauté d’agents [Gruber 93]. En résumé, une ontologie fournit un vocabulaire partagé pour une compréhension commune des domaines traités. Elle ressemble à un modèle conceptuel utilisé dans la création des systèmes d’information [Colomb 93]. En réalité, les deux sont très proches mais ils ne sont pas tout à fait identiques. En effet, un modèle conceptuel est relié à un système particulier, par contre, une ontologie peut être commune entre plusieurs systèmes. Néanmoins, nous pouvons exprimer une ontologie à l’aide des techniques utilisées par les modèles conceptuels comme UML [Cranefield 99], ER, etc. Les ontologies offrent, de plus, la possibilité d’inférence et de vérification de la consistance. Ces techniques, provenant essentiellement de l’IA, rendent les ontologies utilisables durant l’exécution des systèmes (Run-time) [Guarino 98].

II.2 Problématique du partage sémantique

En raison de leur complexité croissante, les systèmes sont amenés à coopérer pour répondre à certains besoins. Il faut alors pouvoir exprimer la sémantique des concepts utilisés par ces systèmes de façon explicite. Plusieurs facteurs s’accumulent pour générer la problématique du partage sémantique, la Figure.1 représente une illustration de ces facteurs. Le problème de l’hétérogénéité sémantique est largement décrit dans [Kashysap 97] comme ardu, et la recherche dans ce domaine est toujours d’actualité. De plus, des technologies comme CORBA ou encore Java et XML viennent à l’appui de cette problématique. La sémantique représente l’interprétation selon un point de vue ou un contexte particulier [Rifaieh-a 05]. Le partage sémantique est alors guidé par le contexte et le but à atteindre. Cette compréhension commune se définit au travers d’un consensus appelé ontologie. Le problème est que la mise en œuvre d’un tel consensus est coûteuse et complexe. Nous pensons que le partage doit s’établir sur la base :

• D’une représentation des systèmes à l’aide d’ontologies. • D’un ensemble de mappings reliant ces ontologies fonction d’un contexte

Figure.1 l’icône du problème


• L’utilisation d’algorithmes semi-automatique pour la détection de simiarités pour mettre en œuvre l’ensemble de mappings.

II.3 Perspectives des SIE

La tendance actuelle de l’évolution des SIE dépend des deux facteurs push et pull et peut se résumer aux objectifs suivants :

• Créer rapidement des SIE personnalisés sans grand effort ou des changements fondamentaux et avec un investissement minimum ;

• Créer des entreprises virtuelles qui vont être définies au-dessus des celles existantes ; • Démocratiser l’utilisation des SIE pour devenir omniprésent et autonome pour réagir à des

situations diverses (P2P, Pervasif, etc.). Nous étudions ce problème sous l’angle de l’interopérabilité et la réutilisabilité. Interopérabilité : L’interopérabilité se traduit par la possibilité pour les systèmes d’information de participer à une activité dont les données et l’exécution de tâches sont réparties entre ces systèmes. Ainsi, il est essentiel d’identifier quel service de quel système est en charge de réaliser l’opération demandée et après rendre les résultats aux demandeurs. Bien que les systèmes d’information soient construits pour des besoins spécifiques, ils devraient assurer un certain potentiel pour être interopérables dans l’architecture de l’entreprise. Pour cela, nous devrons définir la sémantique explicite pour les SIE et ajouter la dimension de contexte pour l’analyse, le développement et l’utilisation des objets sémantiquement enrichis. En particulier, nous devrons développer une plateforme des SIE pour permettre le partage de la sémantique et pour assurer réellement de l’interopérabilité entre ces systèmes. Réutilisation : La réutilisation, par définition, concerne l’adaptation d’une application ou bien un module pour être utilisé dans une autre situation. Cette réutilisation est essentiellement basée sur les modules et les patterns pour éviter au développeur de réinventer la roue. Des langages de programmation avec des frameworks adaptés ont vu le jour pour atteindre cet objectif. La sémantique de ces systèmes peut être exprimée avec des ontologies en tenant compte de la notion de contexte. Ainsi, nous nous intéressons particulièrement à la mise en œuvre de l’interopérabilité et la réutilisation au travers de l’utilisation des ontologies contextuelles.

III. Etude des plateformes d’intégration et d’échange de données (DW et EDI)

Parmi les plateformes d’intégration et d’échange de données qui contiennent un ensemble d’outils et de logiciels, nous nous intéressons à deux outils qui sont les systèmes d’entreposage de données (DW) et les systèmes d’échange de données informatisées (EDI). Les entrepôts de données sont des collections de données thématiques provenant de sources différentes et organisées pour la prise de décisions. Cette aide à la décision joue un rôle essentiel pour améliorer les performances de l’entreprise en surveillant les principaux indicateurs économiques. Dans


l’architecture d’un DW, un outil ETL (Extraction, Transformation, et Chargement) est utilisé pour assurer le processus d’entreposage de données (intégration des différents sources). Les systèmes EDI sont utilisés pour assurer l’envoi et la réception des messages entre SIE sans aucune intervention humaine. Un grand nombre de traduction de messages est demandé quotidiennement par ces systèmes pour communiquer correctement avec les clients et les fournisseurs. Des solutions logicielles appelées traducteur EDI assurent la fonctionnalité de translation entre les messages du format source vers le format cible. Nous étudions dans la suite ces deux systèmes et nous détaillons le problème des outils ETL ainsi que ceux des traducteurs EDI.

III.1 Etude des outils d’entreposage de données

La construction d’un système d’entrepôt de données passe par deux étapes essentielles. La première permet d’enrichir la base par des données provenant des différentes sources (intégration de données). La deuxième consiste à rendre les données accessibles aux utilisateurs. Ainsi l’architecture d’un entrepôt de données contient essentiellement les composants suivantes :

• Outils d’entreposage de données (ETL) : ces outils ont comme responsabilité de fournir à l’entrepôt la collection de données extraites des sources différents [Chaudhuri 97], de transformer ces données pour qu’elles deviennent conforme à une structure d’intégration et de les charger dans l’entrepôt.

• Outils de restitution de données : ces outils permettent aux analystes de prendre des décisions adéquates en se basant sur les données collectées à travers des modèles et outils adaptés comme des cubes de données, des logiciels d’analyse spécialisés, ou des techniques de fouille de données.

• Méta-données : incorpore les données sur les données existantes dans l’entrepôt. Il couvre les outils d’entreposage, outils de restitution, les schémas des données à intégrer, etc.

Les solutions existantes sur le marché des outils ETL ont les limites suivantes:

• L’utilisation d’un format spécifique par la plupart des outils pénalise la coopération entre eux pour améliorer la performance du processus d’entreposage.

• La maintenance et les mises à jour de ces systèmes sont complexes affectent aussi bien les méta-données que les outils séparément.

• L’utilisation réelle par ces outils des méta-données (associées aux modèles sources, modèle de l’entrepôt et les modèles de restitution) est presque inexistante.

Aussi, l’outil ETL, que nous proposons, a pour but d’assurer une bonne interaction avec les méta-données pour réaliser les différentes tâches du processus d’entreposage. Pour cela, nous étudions d’abord le problème de transformation entre les représentations. Ce problème sera résolu par la définition d’un modèle appelé Expression de Mappage (Mapping expression) et puis par la mise en place d’une architecture pour un outil ETL qui exploite les méta-données, définies à l’aide de ce modèle [Rifaieh-a 02].

III.2 Etude du problème de traduction des messages dans les systèmes EDI


L’échange de données informatisées (EDI) est une technologie mûre qui occupait la première place, depuis plus de deux décennies, pour promouvoir l’informatisation des échanges inter entreprise. L’arrivée de nouvelles technologies, comme XML, a rendu l’EDI plus facile à utiliser et plus compétitive [Kenney 02]. Ainsi, la coexistence de l’EDI et XML (EDI/XML et eb/XML) va prendre l’assaut du marché des plateformes d’échange de données de l’entreprise. Cependant, la traduction des messages entre les différentes normes de l’EDI est critique pour les entreprises qui communiquent avec plusieurs fournisseurs et clients [Omelyenko 02] [Rifaieh-a 03]. A titre d’exemple, les messages VCOM (virement commercial) d’EDIFACT1 (comme Payext, Paymul, Payord) et de SWIFT2 (comme MT103, MT102) contiennent des données similaires utilisées d’une façon complémentaire Figure.2. Nous constatons qu’une traduction entre ces messages est inévitable pour assurer la réussite de l’échange de données. En général, les messages EDI utilisent des diagrammes de branchement qui définissent les composants de messages, leurs imbrications et leurs significations. Alors, réaliser la traduction entre plusieurs messages EDI revient, en premier lieu, à identifier les similarités entre les éléments des messages en se basant sur ces diagrammes de branchement [Rifaieh-b 03]. Ensuite il s’agit d’exprimer l’ensemble des mappages qui peuvent exister entre les éléments similaires. Actuellement, ce travail est réalisé à travers des programmes spécifiques qui permettent à l’expert de définir les similarités entre les éléments des messages. Au jour d’aujourd’hui aucun modèle n’existe pour exprimer les expressions utilisées par le mappage entre les éléments similaires. Ainsi, ces expressions sont laissées à l’expérience et à l’expertise du développeur qui se charge de développer le programme de mappage. Nous constatons que ce problème de mappage ressemble de près au problème de mappage entre les schémas dans le cas des outils d’entreposage de données [Rifaieh-a 02] [Rifaieh-a 03]. Dans la suite, nous proposons un modèle de mappage (correspondance) appelé Mapping expression model, qui peut servir à définir les expressions de mappage entre les éléments des messages similaires. Nous étudions, également, le problème de découverte automatique des similarités entre les composants

1 http://www.unece.org/trade/untdid/ 2 http://www.swift.com/

Figure.2 Scénario d’utilisation de messages EDI : traduction entre EDIFACT & SWIFT


des messages EDI/XML. Nous nous intéressons, en particulier, à l’étude des messages EDI formatés en XML, dont les diagrammes de branchement sont exprimés en Schéma XML1.

III.3 Modèle d’expression des mappages

En examinant la problématique des outils d’entreposage de données, nous avons identifié le besoin de définir dans les méta-données les mappages entre les éléments de schéma source et le schéma cible. Nous avons constaté, également, en étudiant la problématique de traduction des messages EDI, le besoin d’exprimer d’une façon claire et concise les mappages entre les diagrammes de branchement des messages. Nous proposons dans ce paragraphe un modèle qui facilite la définition de mappage entre des éléments sources et cibles. Deux instances de ce modèle peuvent être utilisées pour les outils ETL et les traducteurs EDI. Nous appelons guide de mappage l’ensemble des expressions de mappage définies entre deux représentations (ex : schémas, messages, etc.). Nous utilisons les notations suivantes :

• S1, S2, … , Sn représentent n sources de données • 1

1Sa , 1

2Sa ,…, 1S

ma représentent m attributs/champs de la source S1

• T représente une cible "Target" qui peut être un autre schéma ou message. • A représente un attribut/champ de la cible T.

A chaque attribut/champ sont associés deux types de méta-données: la méta-donnée de l’identité de l’attribut/champ (nom, type, domaine, etc.) [Do 00], [Stöhr 99], [Rahm 01] et la méta-donnée de mappage [Rifaieh-a 02] qui définit le mappage de l’attribut/champ. En utilisant la représentation définie dans [Miller 00], µ (A) définit l’ensemble de méta-données associés à l’attribut/champ A. Alors µ(A) est un ensemble de valeurs {µ1(A), µ2(A), …, µz(A)}. Pour la lisibilité, on peut donner des noms à ces éléments (µ1, µ2 ,…, µz)comme type, Nom_Attribut, Nom_relation, etc. Nous supposons que µi (A) représente la meta-donnée de mappage de l’attribut/champ A. µi (A) définit comment on peut générer cet attribut/champ A à partir des représentations sources. Cet élément µi (A)

est composé d’un ensemble d’expressions de mappage (α1(A) ,α2(A) , …., αs(A) ). En effet, αi(A) est l’expression qui permet de générer une partie de l’attribut/champ cible. Par la suite, nous définissons le guideline de mappage MG comme étant l’ensemble de méta-données µi ( ) pour tous les attributs de

1 http://www.w3c.org/XML/Schema

Figure. 3 Modèle des expressions de mappage


la cible T à partir des sources S1,S2, …, Sn . Ainsi, MG (S1,S2, …, Sn, T) = { µi (A1 ) , µi (A2 ), … , µi (Aw )} où A1 , A2 , …, Aw sont les attributs/champs de la cible T. Nous étudions en détail, Figure.3, ces expressions αi de mappage. On définit αi comme un triplet, αi=< fi , li ,ci> où :

• fi est la fonction de mappage qui peut être arithmétique ou bien n’importe quelle fonction composée de manipulation de chaînes de caractère (ex : substring). Alors, fi peut être appliquée sur un ensemble d’attributs/champs qui appartiennent à une ou plusieurs sources. On peut définir Attribut ( fi ) ={ Sr

ra , Sppa ,…, eS

ea } où ar Є Sr, ar est un attribut/champ de Sr,

comme l’ensemble des attributs sources qui interviennent dans l’expression du mapping. Si S= S1 ∪ S2 ∪ … ∪ Sn

Alors fi : SxSx…xS → T ( ar ,ap,…,ae) a A (or a sub-attribut of A)

• li est un ensemble de filtres sur les données sources. On peut identifier un filtre pour chaque attribut/champ ar alors li={ li(ar)/ ar Є Sr}. Ce filtre peut tester la valeur de l’attribut/champ ar ou bien la valeur d’un ou plusieurs attributs/champs de la même source.

• ci est définit comme étant la condition à satisfaire avant d’effectuer le mappage. Elle nous

permet d’ignorer certaines données si elles ne satisfont pas la condition. Enfin, l’attribut/champ est défini comme une concaténation des valeurs trouvées par chaque expression. D’où,

…) ,]c ,l ,) a,… ,a,a ( [f ,(… Concat = A)A(

iiSee

Spp

Srri

i

44444 344444 21α

(Modèle d’expression de mappage)

Les utilisations de ce modèle d’expression de mappage s’inscrit dans les processus suivants :

• Le processus du mappage entre les schémas des bases de données : étudié dans [Madhavan 01], [Berlin 02]. Ce modèle peut définir les formules à appliquer entre les éléments identifiés par les algorithmes de correspondances.

• Le processus d’entreposage de données (ETL) : la phase de transformation de données peut se servir du modèle pour définir les règles et les formules à appliquer lors de l’intégration d’une source dans l’entrepôt. Ces expressions peuvent être stockées dans le catalogue de méta-données pour être utilisées ultérieurement [Rifaieh-a 02] [Rifaieh-b 02].

• Le processus de mappage entre les messages EDI : la translation entre les messages EDI peut être exprimée à l’aide du modèle d’expression de mappage pour définir d’une manière propre et concise les fonctions, conditions et filtres à utiliser [Rifaieh-b 03]. Ainsi, les traducteurs, basés sur ce modèle, peuvent permettre la définition graphique ces expressions comme c’était étudié dans [Grundy 01].

III.4 Utilisation avec les entrepôts de données (outil QELT)

Nous avons conçu l’outil QELT (Query-based ETL Tool) qui prend en charge le processus d’entreposage de données en appliquant le guide de mappage MG(S,T)entre S et T. Ainsi, cet outil utilise le modèle d’expression de mappage entre une source S et une cible T. QELT permet de gérer


d’une façon active les modifications dans le processus d’entreposage en se basant sur les méta-données de mappage exprimées à l’aide du modèle des expressions de mappage. Les méta-données peuvent être regroupées, dans cette architecture, comme suit :

• Les méta-donnés de mappage (MG) : ce modèle exprime le guide de mappage MG(S,T) et les expressions de mappage qui permettent de transformer les données sources en données conformes avec le cible.

• Modèle source (SM) : décrit le modèle de données sources, il couvre des modèles relationnel, orienté objet, ou bien semi structuré.

• Modèle cible (TM) : décrit le modèle de données cible, il définit la façon de stocker et exploiter les données collectées (ex : modèle multi- dimensionnel, OLAP)

L’architecture de QELT architecture (Figure.4) ressemble à l’architecture d’un entrepôt de données avec un ETL classique sauf que le processus d’entreposage est réalisé par le moteur de la base de données. En effet, QELT génère des requêtes SQL en se basant sur les expressions de mappage définies dans les méta-données. Ces requêtes implémentent les transformations de données demandées sur une base de donnée temporaire. L’ordre de réalisation du processus de l’entreposage est renversé vis-à-vis les ETL classiques et consiste à : Processus de l’extraction : Le rôle de ce processus consiste à collecter les données à partir des sources différentes. Ces sources peuvent être des bases de données traditionnelles, des documents produits par des systèmes interne de l’entreprise, ou bien des données recueillies des partenaires. Nous trouvons dans les méta-données SM, les paramètres nécessaire pour le processus de l’extraction. Alors, les programmes d’extraction ou même les requêtes d’extraction vont nous permettre de fournir à l’entrepôt les données ce dont il a besoin. Processus de chargement : La deuxième étape de ce processus consiste à charger les données dans une base de données temporaire. Cette base de données va être utilisée ultérieurement pour exécuter les requêtes de transformation de données, elle admet une structure similaire à la base de données source. Alors, dans le cas de l’intégration de plusieurs sources de données, il faut créer une base temporaire par source. Le modèle de données de cette base de donnée temporaire est exprimé dans le Modèle Source (MS). Ainsi, le processus de chargement permet de mettre les données extraites dans la base où les transformations seront effectuées.

Figure.4 Architecture de QETL


Processus de transformation : Le générateur des requêtes, SQL Generator (Figure.4), est le module responsable de lire les paramètres, règles, les expressions et le guideline de mappage pour générer des requêtes SQL. Concrètement, nous générons ces requêtes à partir des méta-données de mappage (MG). Ensuite, des procédures de transformations, contenant les requêtes SQL, seront générées dans la base de données temporaire. L’exécution de ces processus permet de filtrer, contrôler, nettoyer, etc. les données de la base de données temporaire. Le processus permet, également, de créer une nouvelle base de donnée cible pour contenir les données transformées. La structure de cette base est décrite dans le modèle cible des méta-données (TM). Une fois que la base de données cible est créée, le processus permet de supprimer la base de données temporaire. Nous utilisons les requêtes SQL puisqu’elles sont faciles à comprendre et efficaces dans la base de données. Par contre, la majorité des outils ETL existants exigent la création de programmes spécifiques pour effectuer les transformations avant de les charger dans la base cible. Le processus de transformation est relié directement aux méta-données sans aucune intervention. En effet, le développeur n’a pas besoin d’implémenter ce processus. Il est généré automatiquement à partir des expressions de mappage extraites du guide du mappage (MG). En résumé, QELT profite de la performance de l’exécution des requêtes sur une base de donnée temporaire pour créer la base de donnée cible. Il utilise d’une façon active les méta-données pour assurer la cohérence et la consistance des expressions de mappage utilisées.

III.5 Utilisation avec l’Echange des Données Informatisées

Il est important de faire la distinction entre le matching (découverte des correspondances) et la découverte d’expression de mappage. En effet, ce sont deux processus complémentaires [Rifaieh-b 03]. Le matching est une opération qui prend par exemple deux schémas de données en entrée et retourne à la fin les valeurs de similarité sémantique entre les éléments des schémas en entrée (ex : name firstName = 0.6 ; name lastName = 0.6) [Madhavan 01]. Quant à la découverte d’expression de mappage, c’est un processus après le matching consistant à trouver une expression (logique, mathématique, opération sur les chaînes de caractères, …) permettant d’associer un ensemble d’éléments du schéma source pour obtenir un élément du schéma cible en appliquant le modèle d’expressions de mappage (ex : name = firstName CONCAT lastName) [Rifaieh-a 03].

III.5.1 Traducteur EDI

Un traducteur EDI permet de faire communiquer le système interne de l’entreprise avec les systèmes externes en formatant les messages échangés en entrée et en sortie. Il assure le contrôle de validité des messages, la vérification des erreurs, validation des contraintes, etc. Les traductions s’effectuent entre des messages comparables, c’est-à-dire transmettant les mêmes informations, par exemple ORDERS d’EDIFACT et 850 Purchase Orders de X12; X12-820 avec PAYMUL d’EDIFACT et MT103 de SWIFT. Nous proposons un traducteur EDI semi-automatique et graphique utilisant le modèle d’expression de mappage. En effet, un outil graphique peut réduire la tâche de définitions des expressions de mappage en fournissant des listes des fonctions, des filtres, etc. Il permet, même pour ceux qui ne maîtrisent pas la technologie adoptée, la réalisation du processus de mappage. En utilisant le modèle des expressions de mappage, nous pouvons exprimer tous les types de formules nécessaires pour la traduction.


Pour n messages comparables, nous avons besoin de n2n − traductions à effectuer, ce qui est acceptable pour n ≤ 3. Par contre, si nous considérons l’utilisation d’un format pivot, nous pouvons réduire le nombre de traductions demandées pour 3>n . Si on considère un des messages utilisés comme format pivot, nous pouvons réduire le nombre de traductions jusqu’à n-1. Néanmoins, cette solution est difficile à mettre en œuvre puisqu’il faut que le message puisse contenir en totalité les données contenues dans les autres formats. Alors, nous optons pour la définition d’un format extensible pour le message pivot. Finalement, il reste à vérifier la validité des messages transmis, nous orientons notre intérêt pour une structure qui puisse être facilement validée. La discussion précédente nous amène à proposer le format XML comme format pivot de traduction des messages dans le Traducteur EDI proposé. En effet, le format XML est extensible pour engendrer les messages pris en charge actuellement et éventuellement les messages futurs. L’utilisation de XML nous donne accès à une palette d’outils existants pour valider les messages transmis. Il nous permet, également, d’utiliser un des langages associés à XML pour effectuer les traductions comme XSLT, XPath, XQuery, etc. De plus, ce format est un standard ouvert qui devient un avantage pour notre outil à la place de l’utilisation d’un format propriétaire. Ultérieurement, nous détaillons les tests que nous avons effectué pour valider l’utilisation de XML au sein du traducteur EDI proposé. Finalement, le modèle d’expressions de mappage nous permet d’exprimer sans difficulté les quatre catégories de traduction, entre les messages EDI, à gérer par le traducteur. Translation Simple (1:1): c’est processus de traduction simple permettant de traduire un message en entrée en un et un seul message en sortie. Translation avec découpage (1:n): la traduction consiste à couper le message initial en plusieurs messages cibles. Ces messages peuvent contenir des éléments en commun provenant du message initial. Translation avec groupement (n:1) : le processus doit prendre en considération le groupement des données provenant de plusieurs messages en entrée pour constituer un seul message en sortie. Translation avec découpage et groupement (n:m): ce scénario de traduction suppose d’avoir plusieurs translations de groupement (n:1) qui sont juxtaposées les unes à côté des autres. Comme chaque message source est indépendant des autres, nous pouvons constater qu’il donne naissance à plusieurs messages avec découpage (1:n).

Algorithme : Entrée : S1, S2 : deux schémas XML Sortie : Ensemble de triplets < E1i, E2j, Vsim>

Avec E1i : élément de schéma S1 E2j : élément de schéma S2

Vsim : la valeur de similarité entre E1i et E2j ∈ [0, 1] Matching (S1, S2) {

- Calculer la similarité de base entre E1i ∈ S1 et E2j ∈ S2 - Calculer la similarité structurelle entre E1i ∈ S1 et E2j ∈ S2 - Pour chaque paire <E1i, E2j> calculer Vsim en fonction de leurs

similarité de base et similarité structurelle trouvées - Sélectionner les paires de correspondance <E1i, E2j> qui sont les

plus plausibles }

Figure.5 Description brève de l’algorithme


III.5.2 Algorithme de matching entre les schémas EDI XML (EX-SMAL)

Le matching entre les messages commerciaux se fait actuellement à l’aide de différentes applications graphiques qui demandent non seulement un grand effort humain mais aussi une expertise sur les outils utilisés [Rifaieh-b 03] [Chukmol, Rifaieh 05]. C’est la raison principale pour laquelle notre algorithme EX-SMAL (EDI/XML semi-automatic Schema Matching ALgorithm) visant à automatiser ce processus de matching, a pour objectif principal de découvrir la correspondance sémantique entre les différents éléments de deux schémas à comparer. La compatibilité entre deux messages de deux standards différents repose sur la sémantique de ces deux messages. En effet, les différents standards partagent les mêmes principes tels que: chaque message est associé à un guide d’utilisation qui explique de façon textuelle le rôle des différents éléments composants du messages (ex : DTM, UNB, UNH, M23B, M71A, …). Ces derniers ne sont pas significatifs pour les lecteurs humains. Ce guide indique également leur type de données (chaîne de caractère, numérique, date formatée, …) ainsi que des éventuelles contraintes associées (ex : si UNB/@007 = 5 alors SIRET). Chaque message respecte une structuration qui est aussi spécifiée dans le guide. Cette structuration est faite en organisant les éléments de messages d’une certaine manière et en les séparant par les délimiteurs de champs (ex : +, :, ‘, ~, @, …). Nous proposons une solution qui se base sur la similarité de base entre les éléments individuels, en s’appuyant sur les descriptions textuelles et les types de données des éléments des guides d’utilisations, spécifié à l’aide des schémas XML, et la similarité de structure des éléments en comparant les voisinages structurels des éléments de messages en entrée. Les deux similarités sont utilisées pour calculer la similarité finale entre chaque paire d’éléments et cette dernière est enfin filtrée afin d’obtenir le résultat final du matching. Etant donné notre choix pour le schéma XML (sans élément partagé) comme structure de données interne, notre algorithme travaillera avec une structure arborescente. Cet algorithme se déroulera en trois étapes qui sont décrites dans Figure.5. Il consiste à : calculer la similarité entre tous les éléments de deux schémas, pour chaque paire d’éléments, on calculera d’abord la similarité de base, ensuite la similarité de structure entre eux et utiliser les valeurs de ces deux similarités pour calculer la valeur de similarité finale entre ces paires. Soient deux schéma S1 et S2, les entrées dans le processus de matching. Soient e1 ∈ S1et e2 ∈ S2, une paire d’éléments dont la valeur de similarité est à trouver. Soit sim_base(e1, e2), la similarité de base entre e1 et e2 ∈ [0, 1] Soit sim_struct(e1, e2), la similarité structurelle entre e1 et e2 ∈ [0, 1] La similarité entre e1 et e2 est une somme pondérée entre sim_base(e1, e2) et sim_struct(e1, e2) en respectant la formule suivante:

sim(e1, e2) = coeff_base*sim_base(e1, e2) + coeff_struct* sim_struct(e1, e2) ; Avec 0 ≤ coeff_base ≤ 1: le coefficient de similarité de base 0 ≤ coeff_struct ≤ 1: le coefficient de similarité de structure Et coeff_base + coeff_struct =1 La similarité de base : La similarité de base est la partie la plus importante dans notre algorithme. Elle tient compte des données qu’un élément de schéma possède individuellement, soit la description


textuelle et le type de donnée. Pour le calcul de similarité entre deux descriptions, on a opté pour la technique de recherche d’information pour résoudre ce problème. Ce choix est basé sur le fait que cette technique est bien adaptée dans le processus de classification des documents dans un corpus. En effet, les descriptions textuelles de chaque élément sont sous la forme d’un petit texte ; et notre objectif est de découvrir la similarité entre ces textes. La similarité des types de données intervient aussi dans le calcul de similarité de base dans le but de raffiner cette dernière. Soit sim_desc(e1, e2), la similarité de description textuelle entre e1 et e2 ∈ [0, 1] Soit sim_type(e1, e2), la similarité de type de données entre e1 et e2 ∈ [0, 1] Alors, la similarité de base entre e1 et e2 est une somme pondérée entre sim_desc(e1, e2) et sim_type(e1, e2) en respectant la formule suivante :

sim_base(e1, e2) = coeff_desc*sim_desc(e1, e2) + coeff_type* sim_type(e1, e2) ; Avec 0 ≤coeff_desc ≤ 1 : le coefficient de similarité de description textuelle 0 ≤coeff_type ≤ 1: le coefficient de similarité de type de données Et coeff_desc + coeff_type=1 La similarité structurelle : Le calcul de la similarité structurelle entre deux éléments se base sur le calcul de leurs voisinages et le calcul de similarité de base de chaque élément dans les deux schémas. Le calcul de voisinage d’un élément consiste à trouver les informations indispensables liées à cet élément qui permettent de déterminer sa position dans le message. Le voisinage d’un élément e est un quadruplet constitué de 4 items <anc(e), fr(e), fimm(e), feuille(e)>. Item 1 : anc(e) : ensemble de nœuds ancêtres depuis la racine jusqu’à élément e Item 2 : fr(e) : ensemble de nœuds frères qui partagent le même nœud parent que e Item 3 : fimm(e) : ensemble de nœuds fils immédiats qui sont les descendants directs de e Item 4 :feuille(e) : ensemble de nœuds feuilles qui sont les feuilles de sous arbre enraciné en e

Donc, la similarité structurelle entre deux éléments dépend de la similarité de leurs éléments du voisinage. La similarité structurelle entre deux éléments e1 ∈ S1 et e2 ∈ S2 (S1 et S2 les schémas à comparer) est la somme pondérée de 4 similarités : similarité des nœuds ancêtres, similarité des nœuds frères, similarité des nœuds fils immédiats et similarité des nœuds feuilles. Soient : <anc(e1), fr(e1), fimm(e1), feuille(e1)> : le voisinage de l’élément e1 représenté par C(e1)

<anc(e2),fr(e2), fimm(e2), feuille(e2)> : le voisinage de l’élément e2 représenté par C(e2)

C(e1).Item[1] = anc(e1) ; C(e1).Item[2] = fr(e1) ; C(e1).Item[3] = fimm(e1); C(e1).Item[4] = feuille(e1)

Soit M une matrice videSoit thr la valeur de seuil à appliquer dans la fonction agg

Pour ancE1i ∈ anc(e1) { Pour ancE2j ∈ anc(e2) { M[ancE1i][ancE2j] = sim_base(ancE1i, ancE2j); } } sim_anc(e1, e2) = agg(M, thr);

Figure.6 Méthode d’agrégation


C(e2).Item[1] = anc(e2) ; C(e2).Item[2] = fr(e2) ; C(e2).Item[3] = fimm(e2); C(e2).Item[4] = feuille(e2) Nous définissons ensuite une fonction d’agrégation qui prend en entrée deux valeurs, une matrice M qui contient les valeurs de la similarité de base trouvées entre les éléments de C(e1).Item[i] et ceux de C(e2).Item[i] (avec i ∈ [1, 4]) et une valeur de seuil thr entre 0 et 100. Cette fonction retourne une valeur qui est la valeur agrégée de celles qui sont dans la matrice en entrée. Cette fonction dépend du calcul de la moyenne arithmétique et de l’écart type (dans la probabilité descriptive) et peut être décrite brièvement dans la Figure.6.

Finalement, la similarité structurelle sim_struct entre deux éléments e1 et e2 est calculée en respectant la formule suivante :

sim_struct(e1, e2) = coeff_anc*sim_anc(e1, e2) + coeff_fr*sim_frere(e1, e2) + coeff_fimm*sim_fimm(e1, e2) + coeff_feuille*sim_feuille(e1, e2) ; Avec : coeff_anc + coeff_fr + coeff_fimm + coeff_feuille = 1

Tous les coefficients utilisés dans cet algorithme sont modifiables par l’utilisateur pour les optimiser dans un cas d’utilisation donné. Filtrage des paires d’éléments en correspondance : Cette phase est la dernière à effectuer dans notre processus de matching de deux schémas. Nous avons opté pour un filtrage simple en utilisant une valeur de seuil thraccept située entre 0 et 1 dans ce cas. Après le calcul de similarité entre chaque paire d’éléments provenant de deux schémas en entrée, on a une liste contenant ces valeurs de similarité. Le filtrage consiste à éliminer les paires d’éléments dont la valeur de similarité est inférieure à thraccept. Selon la valeur de thraccept, la cardinalité du matching peut varier de 1-1, 1-n à n-m. Notre algorithme fait partie de ceux qui travaillent avec les schémas de données et est classé parmi les approches hybrides puisqu’il combine la structure et la description. Notre solution porte les particularités suivantes : • Le traitement des descriptions textuelles des éléments : ce traitement rend notre algorithme

beaucoup plus riche par rapport aux autres algorithmes de matching travaillant avec les schémas de données. Cette particularité est effectivement liée à la nature des schémas de message EDI. Nous avons opté pour la technique de recherche d’information pour calculer la similarité entre deux descriptions textuelles.

• Le traitement de la structure est complet et tient en compte de tous les éléments nécessaires pour définir la position des éléments dans les schémas. En revanche, quelques cas particuliers sont à traiter pour que l’algorithme soit bien adaptable à la plupart des formes des schémas à comparer.

Malgré ces particularités, notre algorithme a également quelques limites telles que : • le fait d’avoir plusieurs valeurs de coefficients permettant de réaliser les différents types de

similarité peut poser quelques difficultés aux utilisateurs finaux. • Comme tous les autres algorithmes existants optant pour le filtrage à base d’une valeur de seuil, la

cardinalité du matching est variable par rapport à la valeur choisie. • Une extension traitant les autres contenus des guides d’utilisation des messages EDI comme : le

scontraintes, les statuts et les cardinalités associés aux éléments doit être effectuée. Pour conclure, le modèle des expressions de mappages est utile pour l’outil QELT et pour le Traducteur EDI. Par contre, nous trouvons une difficulté si nous souhaitons réutiliser des modules


développés pour le premier pour créer le deuxième. En effet, les représentations utilisées pour développer les outils (Annexe C-D) ne sont pas identiques. Nous étudions dans la suite l’apport des ontologies pour résoudre ce problème.

IV. Les ontologies dans les entreprises L’ontologie est un mot qui a fait couler beaucoup d’encre durant des siècles et qui était une énigme pour les philosophes. Durant la dernière décennie, cette idée a été importée dans le monde de l’informatique et en particulier le domaine : de l’IA pour la gestion des connaissances, des systèmes d’information pour la coopération entre les systèmes, etc.

IV.1 Ontologies définition et utilisations

Le terme d’ontologie en informatique est apparu au début des années 1990, grâce notamment au projet ARPA Knowledge Sharing Effort [Gruber 91]. Gruber dans [Gruber 93] a donné une première définition pour les ontologies, dans le domaine de l’intelligence artificielle (IA), comme "une ontologie est une spécification explicite d’une conceptualisation. Le terme est emprunté de la philosophie, où l’ontologie est un ensemble de choses existantes. Pour les systèmes d’IA, ce qui existe est ce qui peut être représenté". Compte tenu de cette définition, il n’est pas étonnant que le terme ontologie apparaisse non seulement dans le domaine de l’intelligence artificielle, amis aussi dans celui des systèmes d’information. Tout comme le monde peut être perçu à des niveaux d’abstraction variés, les ontologies ont, elles aussi, plusieurs niveaux d’abstraction. Dans [Guarino 98], Guarino distingue notamment entre :

• Les ontologies génériques, ou de haut niveau, contenant des conceptualisations valables dans différents domaines (ex : temps, lieu)

• Les ontologies de domaines qui s’attachent à décrire le vocabulaire inhérent à un domaine (ex : la médecine)

• Les ontologies de représentation, qui expriment les conceptualisations des langages de représentation des connaissances, mais ne tiennent pas compte des connaissances sur le monde.

• Les ontologies d’applications composées de concepts dérivés de tous les types d’ontologies évoqués précédemment. Ces ontologies émanent donc de groupes particuliers travaillant sur une application spécifique, et sont les seules utilisables par des concepteurs d’applications supportées par des bases de connaissances, ou des bases d’information.

Autrement dit, une ontologie définit le vocabulaire partagé pour aboutir à une compréhension commune d’un domaine donné. Elle contient les définitions des concepts dans le domaine et les relations entre eux. Elle peut contenir des axiomes et des règles qui nous permettent l’inférence des nouvelles connaissances. Ce type d’ontologies est considéré comme étant formel, d’autres ontologies informelles peuvent être aussi utilisées comme les taxonomies, les thésaurus, les modèles conceptuels, etc. Les ontologies formelles sont, en général, explicites et machine-processable, ce qui permet de gérer la sémantique et l’inférence à travers un moteur de raisonnement. Les ontologies formelles ont des applications diverses et qui couvrent plusieurs domaines et activités, nous allons étudier en particulier l’interopérabilité et la réutilisation.


IV.1.1 L’Interopérabilité

Les ontologies peuvent être utilisées pour la communication entre les personnes ou les machines. Dans le premier cas, les ontologies informelles peuvent servir pour préserver la cohérence et éliminer la confusion terminologique concernant les termes utilisés. En revanche, la communication entre les machines est exprimée en terme d’interopérabilité des systèmes. Les ontologies formelles contribuent essentiellement à résoudre le problème d’hétérogénéité sémantique entre les applications en mode d’exécution (Runtime). L’utilisation des ontologies permet d’exprimer explicitement la sémantique inhérente aux applications. Alors, les systèmes peuvent coopérer sans avoir aucune confusion concernant les objets, concepts, appels, méthodes, concepts, etc. provenant des systèmes différents [Rifaieh-a 04].

IV.1.2 La Réutilisation

Une deuxième catégorie d’applications des ontologies est orientée mode de développement (development time), en effet, cette catégorie d’applications tient compte des ontologies durant le développement des systèmes et non durant l’exécution. Dans le cadre du génie logiciel, les ontologies peuvent aider par exemple dans le processus de la réutilisation : Le manque de la réutilisation est reconnu comme étant un des faiblesses des techniques de développement des logiciels. La réutilisation peut réduire le temps et le coût de développement des systèmes [Gamma 94]. La gestion de la réutilisation dépend en grand partie du partage de la conceptualisation. Nous devons caractériser les concepts et les relations d’un domaine pour identifier leur scénario de réutilisation. En effet, une sémantique claire et concise est demandée pour identifier les concepts, les processus, les tâches, etc. à réutiliser pour développer un autre système. Les ontologies permettent de caractériser les classes, les tâches, les règles, les relations, etc. d’un domaine donné. Ce qui permet, en effet, d’avoir la possibilité de générer des frameworks pour déterminer quels sont éléments réutilisables de l’ontologie. Un travail complémentaire de réglage à partir du framework est demandé ultérieurement pour l’application finale. Ainsi, une tâche essentielle dans la réutilisation consiste à "contextualiser" les connaissances décrites dans le système [Mizoguchi 97].

IV.2 Les Ontologies formelles et les systèmes d’information des entreprises

Peu d’ontologies sont actuellement actives dans les architectures des entreprises à cause de la complexité des outils et des techniques qu’elles utilisent. Les autres facteurs de temps, de coût, etc. ont aussi contribué à une dépréciation des apports des ontologies dans les entreprises.

IV.2.1 Spécification des ontologies locales pour les SIE

Traditionnellement, pour chaque logiciel à construire, nous concevons une conceptualisation du domaine qui permet de définir les concepts et les relations à implémenter dans l’application. Les ontologies peuvent être utilisées pour faciliter la spécification et construire des applications réutilisables [Girardi 03]. Alors l’idée principale se résume par la mise en œuvre d’un vocabulaire pour spécifier les besoins pour une ou plusieurs applications [Jasper 99].


Malheureusement, les applications développées en utilisant les méthodes traditionnelles de spécification ne permettent pas d’avoir localement les ontologies explicites [Benaroch 02]. Alors, nous nous sommes intéressé à la définition des ontologies qui permettent de partager explicitement les connaissances du domaine (ontology-driven Information system). UML définit plusieurs types de diagrammes qui peuvent être utilisés pour représenter la modélisation statique et dynamique d’un système. Cette technique bien répandue dans l’industrie est facile à adopter par les développeurs et les concepteurs des SIE. Alors, comme les ontologies peuvent être exprimées à l’aide de UML à travers le diagramme des classes et le langage objet des contraintes OCL [Cranefield 99], nous pouvons utiliser cette technique pour exprimer les ontologies locales définies par les concepteurs pour chaque application.

IV.2.2 Les ontologies locales à des SIE et les ontologies globales de l’entreprise

Les ontologies locales sont définies, contrairement aux ontologies globales de l’entreprise, pour représenter les spécifications et la conceptualisation du domaine et des processus que le système doit implémenter. Néanmoins, les ontologies globales de l’entreprise sont difficiles à concevoir, puisqu’il est beaucoup plus difficile de modéliser le secteur des activités de l’entreprise que de modéliser une activité en particulier. Les ontologies globales de l’entreprise existantes sont génériques et difficiles à personnaliser. Elles sont conçues avec des approches descendantes (top-down) qui ne sont pas les plus réalistes pour identifier les spécifications d’un système donné. Nous nous intéressons dans notre travail aux ontologies locales pour chaque système des SIE, autrement dit "ontology-driven" SIE. Dans ce cadre, nous représentons les spécifications, la conceptualisation et le vocabulaire utilisé pour chaque système à l’aide d’une ontologie locale. Alors, cette ontologie représente le contexte de l’utilisation du système et le point de vue de ses utilisateurs. La dimension globale assurée grâce au couplage des ontologies et contextes est décrite dans les paragraphes suivants.

IV.3 Coupler les ontologies et les contextes

La notion de contexte a apparu pour partitionner les connaissances en des ensembles manipulables [Hendrix 79], ou bien comme des constructeurs qui facilitent le raisonnement [Guha 91]. Une première représentation formelle pour le contexte a été développée par McCarthy dans [McCarthy 87]. Les deux notions, ontologie et contexte, nous paraissent complémentaires, en effet, les ontologies représentent les connaissances partagées par plusieurs acteurs (applications, utilisateurs, etc.) et le contexte représente les informations locales concernant le point de vue d’un acteur d’une application. Par la suite, nous nous intéresserons au couplage de ces deux techniques dans le cadre des SIE pour résoudre le problème de partage sémantique.

IV.3.1 La notion de contexte dans les systèmes d’information

La notion de contexte a été introduite dans les systèmes d’information pour designer et mettre en place les vues, aspects et rôles. Le contexte permet de représenter les sémantiques reliées à la définition d’un objet (concept, composants, etc.) et ses relations avec d’autres objets. Par exemple, le contexte est


défini dans les bases de données à travers les vues pour assurer une personnalisation de la sémantique et la compréhension des données accessibles par une catégorie d’utilisateur. Dans le cadre de génie logiciel, la notion de contexte (ou bien point de vue) est utilisée pour identifier les objets définis localement avec une connaissance partielle et pour véhiculer la notion de profil d’utilisation [Nuseibeh 94] (“as a vehicle for separation of concern”). Les avantages de l’utilisation de cette notion dans plusieurs domaines ont été étudiés dans [Akman 96]. Nous pouvons les résumer par: économie de représentation, efficacité de raisonnement, permettre les inconsistances et les informations contradictoires et la résolution des ambiguïtés lexicales. Aujourd’hui, la notion de contexte est devenue intéressante pour représenter les connaissances locales [Gold 01] avec la démocratisation des applications de type P2P et loosely coupled. De plus, des approches formelles pour représenter le contexte ont été étudiées dans le cadre de la context-aware computing [Wang 04]. Enfin, la notion de contexte peut aider à résoudre le problème de la multi représentation des éléments et leurs utilisation. Ce besoin des applications où les concepts sont multi représentés a été bien identifié dans [Balley 04] et des applications où la sémantique des objets varient selon le contexte dans [Serafini 97]. Dans le cas de la multi représentation, le même concept peut être défini pour représenter deux objets qui ne sont pas identiques. Ces concepts peuvent avoir des attributs ou bien des relations qui sont, par contre, sémantiquement identiques. De plus, le partage sémantique en se basant sur le contexte demande une méthode de séparation, restructuration et d’organisation des informations selon leur contexte d’appartenance. Un exemple de la multi représentation est les modèles associés aux outils QELT et Traducteur EDI qui contiennent des concepts sémantiquement similaires mais qui diffèrent par leurs représentations.

IV.3.2 Comment coupler les ontologies locales et les contextes

Nous proposons un couplage entre les deux notions ontologies et contextes que nous appelons contextualisation des ontologies, pour faire face aux problèmes provenant du partage sémantique entre les systèmes d’information dans l’entreprise. Ce paradigme proposé a pour but d’illustrer comment on peut définir des spécifications de conceptualisation respectant plusieurs points de vues. Alors, la contextualisation des ontologies locales se résume en la possibilité de définir des concepts avec des représentations multiples. Les ontologies locales permettent la cohérence locale entre les concepts. La notion de contexte elle, va permettre une certaine incohérence due à la différence de perception. Ainsi, nous avons besoin de trois éléments essentiels pour mettre en œuvre l’utilisation des ontologies contextuelles :

• Définir localement les contextes d’utilisation en associant aux ontologies locales un mécanisme qui permet de distinguer leur appartenance,

• Définir entre les ontologies de contextes locales un mécanisme de correspondance sémantique qui reflète les similarités entre les concepts utilisés par des points de vues différents ou bien qui sont multi représentés dans leurs contextes locaux,

• En plus, définir globalement un mécanisme de raisonnement basé sur les ontologies locales pour permettre de gérer les connaissances globales extraites à partir des contextes locales.

Nous avons alors étudié la multi représentation des concepts ayant des définitions similaires mais qui ne sont pas tout à fait identiques par rapport à leurs attributs ou leurs relations avec d’autres concepts.


Par exemple, si on considère les deux systèmes de SIE qu’on a étudié DW et EDI, nous pourrons réutiliser quelques composants de l’un pour faciliter l’implémentation de l’autre. Il faut, ainsi, associer une conceptualisation qui respecte chaque contexte d’utilisation et puis définir les liens sémantiques entre les éléments de chaque contexte. Le travail sur la contextualisation des ontologies a été étudié conjointement avec [Arara-b 04] [Rifaieh- a 04], où nous avons étudié le choix d’un formalisme, qui assure la définition de la syntaxe et la sémantique des ontologies contextuelles. Ce formalisme sera détaillé dans le prochain paragraphe.

V. Le formalisme des ontologies contextuelles Le problème du partage sémantique dans les SIE se manifeste à cause de l’hétérogénéité sémantique entre ces systèmes. Après avoir étudié les ontologies et les contextes séparément, nous avons montré que leur couplage peut aider à résoudre le problème en local et en global. Nous avons, également, identifié trois points essentiels pour faire fonctionner les ontologies contextuelles : mécanisme d’appartenance, mécanisme de lien sémantique et mécanisme de raisonnement. Nous nous intéressons, maintenant, à trouver un langage ou bien un formalisme qui peut nous aider à exprimer les ontologies contextuelles en respectant ces trois mécanismes. De la même façon que les langages logiques ont été utilisés pour exprimer les ontologies (ex : Logiques de Description), nous avons orienté notre travail afin de trouver un langage logique ou bien de combiner plusieurs langages pour exprimer les ontologies contextuelles. Nous proposons d’adapter les logiques de description modale (Modal Description Logics par exemple : ALCNM ) pour répondre aux caractéristiques des ontologies contextuelles. Par ailleurs, nous allons utiliser un exemple pour monter comment résoudre le problème du partage sémantique en associant des ontologies contextuelles à ces SIE en utilisant ALCNM.

V.1 Exemple d’utilisation dans les SIE

Nous avons proposé, précédemment, deux systèmes QELT et EDI Translator, nous allons reprendre ces deux systèmes comme des exemples à étudier avec les ontologies contextuelles. Les modèles UML associés à ces deux systèmes sont définis dans l’Annexe C et l’Annexe D, nous extrayons deux parties de ces modèles dans Figure.7 et Figure.8. Ces modèles expriment la mono-représentation de chaque système sans tenir compte de la similarité sémantique entre les concepts qu’ils décrivent. En effet, cette similarité sémantique provient du fait que les deux systèmes implémentent, chacun à sa manière, le modèle d’expression de mappage déjà proposé dans cette thèse. Nous pouvons distinguer dans ces deux modèles plusieurs concepts qui sont multi-représentés. Par exemple, les concepts Mapping, Field, et Record dans EDI Translator aui ont leurs équivalents (Mapping, Attribute, et Entity) dans QELT. Nous supposons au début d’associer deux modèles d’ontologies locales pour ces deux systèmes. Ces deux représentations ontologiques ne nous permettent pas de gérer des connaissances en global. En effet, si nous sommes intéressés par la gestion de la réutilisation entre ces deux systèmes, nous avons besoin de définir l’appartenance des éléments à chaque ontologie (ex : concepts, rôles, individu), de définir les relations sémantiques qui existent entre ces éléments et puis d’exploiter ces relations pour


identifier les modules réutilisables. Nous étudions, dans la suite, une solution à travers les ontologies contextuelles pour gérer la réutilisation.

V.2 Mécanisme d’estampillage

Pour différencier les concepts des différentes ontologies et pour identifier leur appartenance, nous pouvons utiliser un mécanisme d’estampillage avec ces concepts. Alors, le fait d’estampiller un élément nous permet de distinguer ses rôles, ses instances, etc. dans son contexte et de gérer ainsi la caractérisation des représentations multiples. Nous considérons l’estampillage obligatoire de tous les éléments afin de garder la consistance du mécanisme. Ce mécanisme consiste à faire précéder chaque élément défini dans le formalisme des ontologies contextuelles par une étiquette qui indique le contexte local de l’ontologie. Par exemple, pour le concept Ci provenant du contexte Si nous pouvons définir Si:Ci. Puisque nous cherchons un formalisme logique pour les ontologies contextuelles, nous pouvons adopter le mécanisme d’estampillage défini dans [Benslimane 03] pour les logiques de description. Alors, si le formalisme pour définir les ontologies contextuelles est basé sur la logique de description, ce qui semble bien favorable, nous pouvons appliquer directement ce mécanisme pour définir des formules de type Ctx1:C1└┘Ctx2:C2, ┐ Ctx 1:C1, etc. où Ctx1 et Ctx2 représentent deux contextes différents.

V.3 Mécanisme de rapprochement sémantique

Ce deuxième mécanisme consiste à définir pour les ontologies contextuelles les liens sémantiques entre les concepts multi représentés. Ces liens directionnels sont appelés les règles de rapprochement sémantique entre les concepts des ontologies locales. En effet, les règles définissent comment un concept d’une ontologie est relié avec un autre concept d’une autre ontologie. Ce mécanisme de rapprochement sémantique est au cœur du contextulalisation puisqu’il permet de naviguer entre les contextes. Ces liens sont directionnels du fait qu’une règle sémantique n’est pas nécessairement symétrique. Plusieurs travaux ont été menés pour définir des règles sémantiques entre des représentations dans le travail de [Borgida 02], [Mitra 00], etc. Ainsi, si on choisit un langage logique pour représenter les ontologies contextuelles, nous pouvons utiliser les résultats de ces travaux.

Figure.7 Partie du modèle UML du Traducteur EDI

Figure.8 Partie du modèle UML du QELT


Les exemples suivants illustrent quelques règles proposées par ces travaux : • L’identité: un concept A de IS1 est identique à B dans IS2 si AB and BA ⊆⊆ • La subsumption: a concept A de IS1 subsume un concept B de IS2 si chaque instance

satisfaisant la description de B satisfait aussi la description de A, on la note avec AB ⊆ • L’inclusion: toutes les instance de B dans IS2 ont des instances correspondent de A dans IS1

Pour des petites ontologies, ces règles peuvent être définies manuellement, mais le travail deviendra de plus en plus compliqué en augmentant la taille de l’ontologie. Nous devrons considérer dans ce cas l’application des techniques de matching sémantique semi-automatique comme celle qu’on a développé entre les Schémas XML pour les messages EDI.

V.4 Mécanisme d’inférence global

De plus, nous avons besoin de définir un mécanisme d’inférence global entre les ontologies, ce mécanisme doit prendre en considération la cohérence locale pour chaque ontologie et la cohérence globale pour l’ensemble des ontologies contextuelles. Ainsi, quelques concepts peuvent être interprétés localement et d’autres doivent être interprétés globalement pour garantir la décidabilité. Actuellement, les interprétations des règles d’inférence avec les ontologies sont effectuées à travers le formalisme logique mis en place. La logique de description offre la possibilité de définir d’une façon bien concise les règles d'inférence, leurs algorithmes de décidabilité, etc. Par contre, il nous reste à trouver un moyen pour définir les règles d’inférence globale.

V.5 La logique de description et la logique modale (syntaxe et sémantique de ALCNM)

Nous orientons notre recherche vers les langages logiques pour exprimer les ontologies contextuelles. En effet, ces langages offrent un potentiel d’expressivité et ont été utilisés avec succès dans des domaines divers [Catarci 93] (bioinformatiques, e-commerce, environnement, intégration de données, etc.). Les logiques de description (basées sur KL-One), de par leur maturité sont utilisées pour exprimer explicitement les connaissances existantes dans les ontologies et assurent un mécanisme d’inférence adapté. Elles permettent une avancée sur les autres approches basées sur les graphes conceptuels (CG) [Sowa 00] ou frame logic [Farquhar 97]. En dépit de leurs expressivités les logiques de description ne permettent pas la prise en compte de la multi-représentation ni de l’expression des relations inter-ontologies. Pour cela, nous devons associer une solution complémentaire à ce problème. La recherche de cette solution nous a amené à identifier la logique modale (LM) [Lemmon 77], comme le formalisme qui pourrait former avec les LDs, le moyen adéquat pour exprimer les ontologies contextuelles. En effet, l’association des LDs avec les logiques modales a déjà permis de définir les logiques de description temporelles qui examinent la vérité dans son rapport au temps [Artale 01]. Nous résumons, dans la suite, les langages de la LD et de la LM. Puis nous essayons de vérifier l’adaptabilité de l’union de ce formalisme pour exprimer les trois mécanismes requis par les ontologies contextuelles. Dans ce qui suit, nous proposerons d’associer LDs et LM pour examiner la vérité par rapport aux contextes des ontologies locales.


Les logiques modales distinguent les modalités de vérité contrairement à la notion de vérité unique des logiques de description où la sémantique des expressions ou des formules est définie en terme de vérité dans un cas particulier appelé monde (world). Avec les logiques modales, on peut ainsi définir une expression qui est vraie dans un monde (world) et fausse dans un autre monde (world). Cette modalité de vérité est gérée à travers les deux opérateurs : nécessité (vrai partout) et possibilité (vrai dans un et pas dans un autre). Les mondes (worlds) sont reliés par des liens qui permettent de passer de l’un à l’autre, ce type de lien est appelé relation d’accessibilité (accessibility relationship). L’ensemble de mondes et des relations d’accessibilité définissent une structure appelée structure de Kripke (Kripke structure) [Wolter 98]. Le langage ALCNM est un langage qui combine logique modale et logique de description. La syntaxe contient les constructeurs classiques de ALCN [Baader 03] et les deux opérateurs de la logique modale: nécessité (noté □r) et possibilité (noté ◊r). Ainsi l’opérateur □r appliqué sur un concept C, signifie que le concept C est valide par nécessité dans tout monde accessible à travers la relation d’accessibilité r. Ainsi, les symboles primitifs du langage contiennent les concepts C0, C1, …, les relations R0, R1, …, et les objets a0, a1,…. A travers la dimension logique de description de ALCNM , il sera possible de définir des expressions complexe du type 10 CC ∧ , 1C¬ , 0C.R∃ . Ensuite, la dimension logique modale va permettre d’exprimer pour deux concepts C0, C1 les opérateurs r et ◊r , nécessité et possibilité, r C0, ◊r C1. La syntaxe et la sémantique des opérateurs de la logique modale sont définies comme suit :

Finalement, la sémantique de ALCNM se compose d’une combinaison entre deux sémantiques :

• La sémantique conventionnelle de la logique de description définie par le modèle de Tarski [Baader 03] qui associe une interprétation I = (∆I, .I) où ∆I est l’ensemble d’interprétation de domaine et .I est une fonction qui permet de mapper chaque concept à un sous ensemble d’interprétation de ∆I et pour chaque rôle un sous ensemble de II ∆×∆ .

• La sémantique donnée par la structure de Kripke qui nécessite des relations d’accessibilité entre les mondes à définir. Cette sémantique permet d’identifier le monde w Є W où un concept peut être interprété correctement. Si |W|= 1 (la cardinalité de monde=1) alors l’interprétation redeviendra uniquement l’interprétation classique de Tarski uniquement.

V.6 Représentation des ontologies contextuelles avec la logique modale de description (ALCNM)

Nous avons identifié, d’un part, les mécanismes requis pour les ontologies contextuelles et, d’autre part, les caractéristiques du langage de la logique modale de description ALCNM. Nous proposons dans ce paragraphe d’exprimer les ontologies contextuelles avec le formalisme de la logique modale de description (LMD). Ainsi, les arguments qui nous permettent de défendre cette proposition sont les suivants:

Syntaxe Semantique i C Necessity operator ( iC) I(w) ={x ∈ ( iC) I(w) iif ∀ v ∇i w, x ∈ C I(w)}

◊i C Possibility operator (◊i C) I(w) ={x ∈ (◊iC) I(w) iif ∃v ∇i w, x ∈ C I(w)}


• Le formalisme de la LMD offre la possibilité de définir des mondes de vérité où chacun admet une

interprétation locale de la logique de description classique (Tarski interpretation) [Wolter 98]. Ces interprétations sont cohérentes individuellement avec les définitions et les descriptions locales des concepts, rôles et des objets locaux. Ces interprétations locales sont identiques aux interprétations utilisées actuellement pour définir des ontologies de domaine. Alors, en supposant que chaque monde dans ce formalisme représente une ontologie locale, nous nous donnons le moyen de concevoir et gérer sans ambiguïté les connaissances locales. Ainsi, la gestion d’une ontologie locale (monde dans le formalisme) est exprimée avec le formalisme de la LD.

• L’utilisation d’un mécanisme d’estampillage est indispensable pour différencier les concepts de

chaque monde défini avec la LD classique. Nous utilisons l’extension proposée dans [Benslimane 03] pour estampiller la LD et donc marquer les mondes avec la LD sans ambiguïté selon leur appartenance.

• Il n’est pas suffisant de définir les mondes séparés les uns des autres sinon aucune possibilité de

définir d’inférence globale sur plusieurs mondes ne sera pas possible. Comme les ontologies contextuelles sont conçues pour gérer les connaissances aussi bien globalement que localement, nous sommes contraints de définir un moyen qui permet de répondre à ce besoin. En utilisant le formalisme de la LMD, nous avons vu que les mondes définis par le formalisme peuvent être reliés entre eux par des relations d’accessibilité. Ces relations d’accessibilité adhèrent à un mécanisme d’interprétation global appelé structure de Kripke (Kripke Structure). Alors, pour gérer des connaissances globales nous avons besoin d’un ensemble de relations entre les ontologies locales. En tenant compte du mécanisme de lien (rapprochement) sémantique requis pour gérer les ontologies contextuelles, nous découvrons que ces liens sémantiques peuvent former un ensemble de relations d’accessibilité entre les deux mondes où ∇i={ri1, ri2, …, rin}. Autrement dit, les relations d’accessibilités qui peuvent exister entre les mondes (dans le formalisme de LMD) reflètent le mécanisme de rapprochement sémantique requis pour les ontologies contextuelles Figure.9. L’ensemble de mondes existants avec leurs relations d’accessibilité définissent une structure représentant un graphe orienté étiqueté qui représente la structure de Kripke Figure.10.

• Le langage ALCNM offre l’utilisation des opérateurs de nécessité et de possibilité pour exprimer

des connaissances globales entre plusieurs mondes, □i et ◊i où i est la relation d’accessibilité utilisée entre les mondes concernés. Ainsi, nous avons autant d’opérateurs de possibilité (idem de nécessité) que de relations d’accessibilité [Wolter 98]. Ces opérateurs peuvent être appliqués devant les expressions définies par le langage ALCN classique pour exprimer l’expression globale.

Figure.9 Relation d’accessibilité entre deux mondes (ontologies contextuelles)

Figure.10 Graphe Orienté Etiqueté représentant la structure de Kripke


• Cette approche que nous proposons pour représenter les ontologies contextuelles avec LMD

considère également les deux paramètres de la personnalisation de la logique modale: aspect fini du domaine et désignateur fixe [Wolter 98]. Le premier impose que l’ensemble de monde doit être fini, ce qui est le cas avec les objets dans les SIE. Le deuxième impose que les objets soient uniques, en effet, lorsqu’on parle dans un SIE d’un objet du concept employé ayant le désignateur John Smith, nous considérons que ce désignateur représente le même objet dans n’importe quel monde.

Nous concluons, ainsi, que l’utilisation du formalisme de LMD est bien adéquate pour représenter les ontologies contextuelles en respectant les paramètres précédents.

V.7 Résolution d’exemple :

Si on considère l’exemple déjà proposé concernant le Traducteur EDI et l’outil QELT, nous avons supposé qu’une représentation ontologique a été développée pour chaque système individuellement. Les deux représentations contiennent des concepts multi représentés comme Mapping, Field, et Record dans le premier correspondant respectivement à Mapping, Attribute, et Entity dans le deuxième. En utilisant le formalisme des ontologies contextuelles, nous pouvons définir chaque ontologie en utilisant le mécanisme d’estampillage avec la logique de description. Si Oh représente l’ontologie du premier système (Traducteur EDI), alors nous pouvons définir:

...)Field.hasield 1( g)name.Strin ( Record:h

)Condition.onhasConditi 1( )Selection.electionShas 1( )Field.dhasMapFiel 1( .String)ID_Mapping ( Mapping:h

)cordRe.hasField 1(

)ditionMappingCon.hasField 1( )Selection.hasField 1(

)Manager.dhasMapFiel 1( ring)IDField.St ( g)name.Strin ( Field :h

hhh

hhhh

hhh

1hh

hh1

hh

1hhhhh

≥∩∀⊆≥∩≥∩

≥∩∀⊆≥∩

≥∩≥∩

≥∩∀∩∀⊆

−

−

−

Si Op représente l’ontologie du deuxième système (QELT), alors nous pouvons définir:

...

)Attribute.hasSources 1( )Attribute.hasSources 1(

)Attribute.hasTarget 1( )Attribute.hasTarget 1(

).MappingrmationuseTransfo 1( g)name.Strin ( tionTransforma:p

)tionTransforma.rmationuseTransfo 1(

)Entity.hasTarget 1( )Entity.hasTarget 1(

)Entity.hasSources 1( )Entity.hasSources 1( g)name.Strin ( Mapping:p

pppp

pppp

1-ppp

pp

pppp

ppppp

≤∩≥∩

≤∩≥∩

≥∩∀⊆

≥∩

≤∩≥∩

≤∩≥∩∀⊆

…

Nous avons besoin de définir les correspondances sémantiques qui représentent les relations d’accessibilité entre les concepts des deux ontologies. Ainsi, pour contextualiser ces ontologies nous définissons une relation d’accessibilité (∇i) avec ses règles de correspondances sémantiques rij, ∇i={ri1,ri2 ,… }. Ces règles sont envisagées entre les concepts multi représentés.


Alors, une règle rij de l’ensemble permet de définir la correspondance avec identité entre les deux concepts Mapping de QETL et Mapping de Traducteur EDI. Cette règle est illustrée comme suivant :

Mapping:p Mapping:h ≡ . Les autres règles que nous pouvons envisager sont étudiées individuellement en tenant compte de la similarité sémantique entre les concepts en question. Ces règles sont conformes avec le mécanisme de rapprochement sémantique des ontologies contextuelles.

VI. Utilisation des ontologies contextuelles dans les SIE

Après avoir étudié les ontologies contextuelles, nous abordons l’étude de faisabilité de cette technique dans les SIE. Ainsi, nous avons introduit le projet EISCO (Enterprise Information Systems Contextual Ontologies). EISCO promeut l’utilisation des ontologies contextuelles pour résoudre le problème du partage sémantique entre les SIE. Ce projet basé sur les ontologies contextuelles permet d’avoir une vue globale sur les SIE de l’entreprise. Nous résumons les buts de ce projet dans les trois points suivants : • Définir une architecture qui permet d’assurer le partage sémantique entre plusieurs applications de

l’entreprise. L’architecture utilise une base de connaissance globale qui permet de décrire les ontologies contextuelles. Cette architecture est un ensemble de fonctionnalité de type "plug and play" offerte aux systèmes qu’elle gère pour assurer les besoins opérationnels de coopération entre eux.

• Montrer trois scénarios d’utilisation qui couvrent l’interopérabilité, la réutilisation et le requêtage entre les SIE. Nous ne consacrons pas une étude complète concernant ces sujets avec les ontologies contextuelles par contre nous initions avec des exemples comment on peut les utiliser. Ces scénarios sont surtout des exemples d’utilisation des ontologies contextuelles pour montrer l’usage que l’on en fait dans des cas réels.

• Focaliser sur un scénario et donc introduire une étude de cas qui traite le sujet de réutilisation entre plusieurs SIE. Nous appliquons l’étude de cas sur les deux systèmes que nous avons déjà proposé dans cette thèse (Traducteur EDI et QELT). Ainsi, nous essayons de montrer le potentiel des ontologies contextuelles et de l’architecture proposée pour résoudre un cas réel de partage sémantique entre deux SIE.

VI.1 Architecture et composants dans le projet EISCO

L’architecture logique du projet EISCO se base sur une architecture client/serveur à 5 niveaux. Nous ajoutons à la couche de service la possibilité de définir la sémantique partagée et contextuelle. L’architecture logicielle du projet contient des composants qui assurent : l’accès aux ontologies et au serveur de la base de connaissance, la gestion des ontologies et des contextes associés, la résolution d’ambiguïté (moteur d’inférence), la gestion de la réutilisation et de l’interopérabilité entre les systèmes. Nous regroupons les composants de l’architecture de EISCO dans 3 catégories : EISCO KB Server, EISCO Core Services et EISCO Accessibility Server qui sont illustrés dans Figure 11.


VI.1.1 EISCO KB Server

La base de connaissance de EISCO centralisée, contient et gère les ontologies contextuelles et complétée par un moteur d’inférence. Cette couche contient les composants suivantes : KB Server Interface: a pour rôle d’assurer la communication entre la base de connaissance et les autres composants dans l’architecture. Ainsi, cette interface implémente toutes les méthodes de gestion de la base de connaissance. Reasoning System (moteur d’inférence): a pour rôle de raisonner sur les connaissances pour en extraire des nouvelles. Ce moteur doit être capable de gérer les inférences globales et locales avec le mécanisme d’inférence des ontologies contextuelles formalisées avec la LMD. Alors, EISCO KB Server fournit non seulement l’accès à la base des connaissances mais aussi la possibilité d’inférence sur ces connaissances dont les résultats dépendent de la sémantique explicite définie dans les ontologies contextuelles.

VI.1.2 EISCO Core Service

EISCO Core Server contient la couche de service dans l’architecture qui permet de gérer, contrôler et partager les connaissances à travers les systèmes. Il offre les fonctionnalités essentielles demandées pour assurer le partage sémantique entre les applications (service d’interopérabilité entre les systèmes, service des objets réutilisable, etc.). Il inclut les composants de service suivants : Ontology Manager, Semantic Mapper, Context Manager, Reusability Manager, Models Importer, etc. L’implémentation de cette couche de l’architecture doit suivre la logique d’un serveur d’application. En effet, cette couche connecte, d’un côté, l’interface de la base de connaissance avec leurs ontologies

Figure.11 Architecture du Serveur EISCO


contextuelles. De l’autre côté, elle est reliée au serveur d’accessibilité qui est connecté à son tour avec les systèmes de l’entreprise. Nous pouvons répertorier trois types de services offerts par cette couche : Applications Resources Provider: contient un ensemble de composants en charge d’offrir aux systèmes connectés les services demandés.

• Query Manager: gère le cycle de vie d’une requête globale opérant sur plusieurs bases de données dans les différents systèmes. Il assure la composition de la requête en plusieurs sous requêtes locales après avoir résolu leurs significations correspondantes à travers les ontologies contextuelles. Il assure ensuite la collecte, le croisement et le renvoi des données trouvées à l’utilisateur concerné.

• Reusability Manager: offre aux développeurs un service indispensable pour faciliter la réutilisation des composants développés. Le développeur explore les modules disponibles par rapports aux liens sémantiques définis entre les concepts existants et le concept traité par le développeur en se basant sur les ontologies contextuelles. Le développeur complète après les composants trouvés et les adapte aux besoins de la nouvelle application.

• Interoperability Manager: offre des interfaces pour permettre aux applications connectées de coopérer. Alors, les systèmes peuvent travailler ensemble à travers ces interfaces et se basent sur les ontologies contextuelles pour résoudre l’hétérogénéité sémantique.

Applications Importer: permet de faciliter l’intégration des modèles pour faire partie des ontologies contextuelles dans la base de connaissances.

• Models Importer: accessible à travers l’administrateur du serveur EISCO, il permet d’importer des modèles (UML, etc.) représentants les nouveaux systèmes à brancher avec l’architecture. Il permet de convertir le modèle pour devenir conforme avec le format utilisé EISCO KB Server.

• Ontologies Importer: accessible à travers l’administrateur du serveur EISCO, il permet d’importer des ontologies (OWL, DAML/OIL, etc.) relatives à des nouveaux systèmes. Il aide aussi à configurer l’ontologie importée pour devenir conforme avec le format utilisé.

Knowledge Manager: est responsable de gérer la base de connaissance et son utilisation. Il est utilisé par les autres composants de EISCO Core Services pour accéder à la base de connaissances. Il est accessible à travers l’administrateur du Serveur EISCO. Il contient les composants suivants :

• Ontology Manager: est utilisé pour gérer les ontologies, enregistrer les traces d’un changement dans le modèle, gérer les versions pour l’évolution des ontologies et résoudre les conflits possible.

• Semantic Mapper: permet de définir les correspondances sémantiques à travers les ontologies existantes. Il fait appel à ces correspondances à partir des ontologies contextuelles à la demande. Le processus de correspondance sémantique est laissé à l’expertise de l’administrateur. Par contre, on peut imaginer un module de match semi-automatique pour découvrir les correspondances sémantiques entre les concepts des ontologies en question. Ce travail revient à définir un algorithme similaire à EX-SMAL mais pour les ontologies.

• Context Manager: gère le mécanisme d’estampillage des ontologies existantes. Il permet, alors, d’associer une étiquette qui permette de retrouver les concepts d’un contexte avant d’être inséré dans la base de connaissance.


VI.1.3 EISCO Accessibility Server

La dernière couche dans le serveur EISCO permet de connecter les systèmes de l’entreprise à l’architecture. Elle offre un ensemble de services middelware, outils, frameworks qui facilite le branchement d’un système. Plus spécifique, elle assure la connectivité avec les bases de données des systèmes, la mise en place des conteneurs des objets réutilisable, la gestion de sécurité, la gestion de concurrence, etc. Cette couche gère également la couche physique d’accessibilité avec des technologies TCP/IP, CORBA, RMI, JNI, etc. et des connecteurs spécifiques.

VI.2 Scénario d’utilisation et Etude de cas

Nous consacrons la partie de scénario d’utilisation et de l’étude de cas pour montrer l’utilité des ontologies contextuelles dans le cadre d’un cas pratique. Alors, nous nous orientons vers les deux systèmes déjà proposés et conçus dans la thèse comme exemples de vérification. Nous avons constaté que les deux applications implémentent le modèle d’expression de mappage de deux façons différentes. Nous avons étudié, également, que la question de la réutilisation repose en partie sur le partage sémantique entre les systèmes. Ainsi, nous illustrons la réutilisation en nous appuyant sur les représentations ontologiques présentées précédemment et le mécanisme d’estampillage et de rapprochement sémantique entre les concepts. Au début, nous supposons qu’un des deux systèmes (QELT) a été implémenté et qu’il est accessible à travers le serveur EISCO. Autrement dit, il existe dans la base de connaissance un modèle (une ontologie) qui correspond à la conceptualisation utilisé du système QELT. Les modules implémentés pour ce système sont aussi accessibles par l’architecture à travers un conteneur des composants réutilisables. La succession des rôles entre l’administrateur et le développeur (client) se décrit comme suivant : • L’administrateur du serveur EISCO a pour responsabilité de créer et maintenir les ontologies dans

la base de connaissance. • L’administrateur définit les liens sémantiques (mécanisme de rapprochement sémantique). • Le client (le développeur de l’application Traducteur EDI) va demander au système comment

procéder à la réutilisation. Nous pouvons illustré les événements par un diagramme de séquence Figure.12, que nous pouvons le décrire comme suivant :

1. La création du nouveau système commence par la modélisation du système, un diagramme

UML (diagramme de classes) va être associé au système. L’administrateur procède après à l’importation du modèle dans la base de connaissance EISCO KB. Une ontologie contextuelle va être créée pour représenter ce nouveau système (Traducteur EDI).


2. L’administrateur va ensuite identifier les liens sémantiques qui existent entre les concepts de la nouvelle ontologie et les concepts des ontologies existantes dans la base de connaissance (y compris celle de QELT). Ces liens sémantiques (mécanisme de rapprochement sémantique) sont exprimés à travers un ensemble de relation d’accessibilité ∇i et des règles (Ri1,…, Rin) entre les concepts. Par exemple, l’administrateur va définir une règle d’identité ente les concepts O1:Mapping et O2:Mapping où O1 représente l’ontologie de QELT et O2 représente l’ontologie du Traducteur EDI.

3. Le client se connecte à EISCO Core Services en précisant son contexte de travail (Traducteur

EDI). EISCO CS va demander à son tour de chercher à travers EISCO KB les liens sémantiques, définies par l’administrateur. EISCO KB va retourner un ensemble de contextes auxquels le contexte initial est relié avec leurs concepts et les règles sémantiques définis. Ainsi, la relation d’identité, entre O1:Mapping et O2:Mapping, déjà définie sera renvoyée au client. Les résultats sont ou bien des liens directs ou bien des liens découverts par le moteur d’inférence.

4. Le client choisit le contexte le plus proche parmi la liste retournée ; qui sera dans ce cas là le

QELT. Il sélectionne parmi les concepts associés le concept (O1:Mapping) ou les concepts qui sont en relation avec le nouveau concept à implémenter (O2:Mapping). Il sélectionne ensuite parmi les composants réutilisables de l’application ceux qui peuvent l’aider à achever l’implémentation du système. Dans notre exemple, nous supposons qu’un composant réutilisable appelé DefineMappingExpression va être utilisé. Il va reprendre le source de ce composant pour le modifier ou l’ajuster pour des nouveaux besoins. En regardant de plus près le code source de ce composant (à gauche dans le Tableau.1) avec les représentations définies dans Figure.7 et Figure.8, nous pouvons l’ajuster pour définir le code du nouveau composant (à droite dans le Tableau.1).

Figure.12 Diagramme de séquence pour l’étude de cas


5. Finalement, le nouveau composant sera affecté au système (Traducteur EDI) et en même temps stocké parmi les composants réutilisables pour une prochaine réutilisation.

Nous constatons dans l’implémentation du projet EISCO que le temps requis pour développer le nouveau composant est suffisamment réduit par rapport à un développement ne tenant pas compte de réutilisation. Ainsi, nous considérerons dans la suite une illustration de cette étude de cas avec une implémentation partielle des fonctionnalités de l’architecture du projet EISCO. Nous utiliserons la plateforme J2EE pour gérer la réutilisation à travers des objets réutilisables appelés Enterprise Java Beans (EJB).

VII. Implémentation Nous avons développé plusieurs prototypes et outils pour mettre en pratique les idées et les formalismes étudiés dans cette thèse. Ces prototypes diffèrent par leurs niveaux d’implémentation et le niveau de fonctionnalité qu’ils assurent (prototype vertical ou bien horizontal).

VII.1 Implémentation de QELT

Nous avons proposé un outil QELT qui implémente le modèle d’expression de mappage pour améliorer le processus d’entreposage de données. Ainsi, nous avons développé un prototype (en Visual Basic) qui respecte la logique de cet outil utilisant des requêtes SQL générées à partir des méta- données Figure.13. Nous l’avons utilisé avec une configuration d’un scénario réel représentant l’intégration dans l’entrepôt des données d’un Service Après Vente pour une chaîne de supermarché1.

VII.1.1 Scénario d’utilisation

Les données sont initialement stockées sur un moteur fonctionnel (MainFrame) sous forme des fichiers VSAM. Le scénario consiste à utiliser QELT pour générer un entrepôt de données puis des statistiques et des rapports. Le coût du développent de chaque statistique est de l’ordre de 5 jour/homme. En effet, il faut écrire des programmes d’extraction COBOL puis écrire des programmes de transformations spécifiques à chaque statistique en prenant en considération les métas données concernant le mappage entre la structure stockée et les données à extraire. En tenant compte qu’un

1 Etude de cas réalisée à Tessi Informatique et Conseil.

Void DefineMappingExpression(Field A) Begin Record R=A.hasfield(); Mapping M=A.hasMappField(); Selection S=A.mySelection; MappCondition Mc=A.myMappCond; Vector Vs=S.hasField(); Vector Vc=Mc.hasField(); End

Void DefineMappingExpression(Attribute A) Begin Entity E = A.hasattribute(); Transformation T = A.myTransformation; Function F = A.myFunction; Vector Vt = T.hasSources(); Vector Vf = F.hasSources(); End

Tableau.1 exemple du pseudo code des composants réutilisables


grand nombre de statistiques est demandé par le client, nous avons opté pour une solution orientée entrepôt de données. Notre outil QELT a été utilisé avec le Système de Gestion de Base de Donnée MS-SQL Server1. Ce scénario nous a permis de vérifier l’efficacité de l’outil en terme de génération des requêtes SQL à partir du guideline de mappage et les expressions définies pour transformer les données source en données cible. Ainsi, nous avons procédé en trois étapes : Etape 1 : Processus d’extraction Comme le source n’est pas compatible SQL, nous étions obligés d’écrire des programmes COBOL d’extraction de données à partir des fichiers VSAM et de générer des fichiers ASCII. Notre jeu d’essai a couvert 273,000 lignes de données stockées dans ces fichiers. Etape 2 : Processus de chargement Nous avons chargé les fichiers contenant les données extraites en utilisant une base de donnée temporaire ayant un schéma respectant la structure des données définies dans les méta-données du source (SM) en utilisant des commandes CREATE TABLE et en se servant de l’outil DTS (Data Transformation Server) intégré à MS-SQL SERVER. Etape 3 : Processus de transformation En utilisant l’outil QELT, nous avons réalisé la création automatique des transformations de données à partir des méta-données décrivant les schémas sources et cibles et les expressions de mappage. Les requêtes SQL ont été enregistrées dans le SGBD et exécutées sur la base de donnée temporaire pour générer la base cible. Enfin, nous avons éliminé la base de donnée temporaire. Les requêtes utilisent des fonctions propriétaire de SQL Server comme CONVER, DATETIME et SUBSTRING, etc.

1 Microsoft SQL SERVER 2000

Figure.13 Vues des interfaces de QELT Interfaces


Etape 4 : Processus de restitution Nous avons utilisé Business Object1 (BO) pour restituer les données à partir de l’entrepôt qu’on vient de créer. Cet outil nous permet de restituer plus facilement les statistiques et les rapports à partir d’un entrepôt de donnée.

VII.1.2 Résultats:

Nous avons effectué notre test sur un PC Pentium III Processeur 667 MHz et 128 MB RAM avec Windows NT Server 4.0. Les résultats suivants ont été notés :

• Le processus d’extraction a généré trois fichiers contenant respectivement les données extraites de l’application sur le mainframe. Le jeu d’essai a couvert 6% des données existantes pour le client dans le cadre des données à restituer dans l’entrepôt.

• Le processus de chargement était très performant, pour un total de 273,000 ligne de données, il était de l’ordre de 25 secondes.

• Le processus de transformation était automatisé à travers la création des procédures dans le SGBD reconstituant les expressions de mappage entre la source et la cible de données. L’exécution des requêtes SQL sur la base de données temporaire contenant les données chargées était de l’ordre 2 Minutes et 15 Secondes pour la totalité du processus.

L’utilisation de l’outil QELT nous a permis effectivement de diviser par deux le temps de développement des statistiques demandées par le client. En effet, le travail concernant l’analyse des correspondances entre la source et la cible et l’implémentation de ces correspondances avec des programmes a été éliminé par la création automatisée des requêtes à partir des méta-données de mappage déjà existantes.

VII.2 Implémentation du Traducteur EDI

En étudiant la problématique de traduction des messages EDI définis dans plusieurs standards, nous avons proposé de développer un traducteur utilisant le format XML. Nous avons identifié la capacité du modèle d’expression de mappage pour définir les règles à appliquer pour traduire un message d’un format à un autre. Nous avons proposé un outil graphique Semi-automatique utilisant le modèle d’expression de mappage et un algorithme de calcul des correspondances (matching) entre les structures de message. Le prototype actuel décrit dans Figure.14 s’intéresse essentiellement à l’algorithme EX-SMAL (EDI/XML semi-automatic Schema Matching Algorithm). Les résultats de correspondances obtenues peuvent être après complétés par les expressions de mappage. Dans un premier temps, nous avons testé l’utilisation du modèle d’expression de mappage et de l’algorithme EX-SMAL chacun à part. Une extension future pour mettre les deux composants dans le même prototype est prévue.

VII.2.1 Utilisation du modèle des expressions de mappage

1 http://www.france.businessobjects.com/


Cette partie verticale du prototype (i.e. l’expression de mappage) a été réalisée à l’aide du langage XQuery. En effet, XQuery est un langage de haut niveau pour traiter des documents XML. Il a été proposé par le groupe XML Query group au sein du W3C. XQuery est conçu comme un langage de requête sur les documents XML. Il permet, en même temps, de fusionner, requêter, joindre et générer des documents. Le but est d'avoir un langage similaire à "SQL" mais pour les bases de données XML.

VII.2.1.1 Scénario d’utilisation Le scénario d’utilisation est défini dans la Figure 2 et illustre un client qui utilise la technologie EDI pour payer ses fournisseurs. Le processus de traduction est établi dans le scénario dans chaque établissement où le message source reçu est différent du message qu’il faut générer. Nous prenons comme cas d’étude la conversion EDIFACT (avec le message PAYMUL) vers SWIFT (avec le message MT103). Nous avons défini une requête XQuery (Annexe-E) qui respecte le modèle d’expression de mappage entre les éléments des messages PAYMUL pour générer des messages MT103. Nous avons défini 15 fichiers de taille croissante, comme jeu d’essai, et contenant un nombre croissant de message PAYMUL. Ce jeu d’essai a été fabriqué essentiellement à partir d’un seul message PAYMUL avec 49860 duplicata pour le fichier le plus grand.

VII.2.1.2 Résultats En utilisant l’implémentation GNU Kawa de XQuery appelé aussi QeXO1, nous avons effectué les tests sur un PC PIII, 1000 MHz, sous Windows 2000 Server, 256 RAM de mémoire, JDK 1.4 et Eclipse2 IDE. Pour faciliter la lisibilité des requêtes XQuery utilisé, nous avons crée un nouveau plugin de formatage de XQuery sur Eclipse. Ces tests appliqués sur l’ensemble de fichiers contenant les messages EDI, nous a permis de retirer les conclusions suivantes :

1 http://www.gnu.org/software/qexo/ 2 http://www.eclipse.org

Figure.14 Vu Général de Traducteur EDI implémentant EX-SMAL


• Nous ne pouvons pas manipuler les fichiers XML en bloc, alors, une division en plusieurs

messages, pour être traités individuellement, est indispensable. • L’utilisation de XSLT est beaucoup moins efficace que l’utilisation de XQuery pour effectuer

les transformations. De plus, XSLT ne permet pas de générer d’une façon directe les scénarios de traduction (n :1) et (n :m) requis par notre traducteur.

• Au niveau des performances, l’implémentation QeXO nous permet de réaliser un nombre illimité de messages découpés. Le temps de traitement pour le plus grand fichier (contenant 49860 messages PAYMUL) est de l’ordre de 1813.78 seconde. Ce qui est acceptable pour la volumétrie actuellement traitée dans le cadre de l’étude. Ce résultat est meilleur que le résultat obtenu avec XSLT (utilisant Napa).

VII.2.2 Implémentation de l’algorithme de matching (EX-SMAL)

Nous avons déjà présenté de manière théorique l’algorithme de matching EX-SMAL et son fonctionnement. Nous allons voir comment cet algorithme sera réalisé en implémentation et quels sont les résultats obtenus. Le prototype a été développé en Java pour des raisons de portabilité et de compatibilité avec plusieurs APIs libres qu’on a utilisé. Avant de commencer le travail d’implémentation, nous avons converti les guides d’utilisation en schéma XML où les descriptions textuelles sont représentées sous forme de <annotation>…</annotation>. L’exécution de l’algorithme se fait effectivement en 6 étapes.

VII.2.2.1 Etapes d’exécution : Etape 1- Construction d’arbre : cette étape consiste à prendre les deux schémas XML en entrée et les convertir en arbre. Chaque nœud de l’arbre est un objet contenant : le chemin depuis la racine (ex : /UNB, /UNB/UNH/DTM,…), le type de la donnée (initialisé par défaut avec la chaîne de caractère), la description textuelle et sa propre étiquette (UNB, UNH, UNZ, …). Etape 2- Calcul de la similarité de base : cette étape prend en entrée deux arbres dont les nœuds sont spécifiés précédemment et calcule la similarité entre les nœuds individuels de chaque arbre en se basant sur leurs descriptions textuelles et leurs types de données. La comparaison des descriptions se fait à l’aide de l’API Lucene1. Cette API (version 1.4.0) offre la possibilité d’extraire le vecteur de termes liés, de prendre en compte le nombre et la fréquence des termes que deux descriptions ont en commun, et le cosinus de l’angle de deux vecteurs correspondants. En combinant le résultat des requêtes appelées PhraseQuery de Lucene et du cosinus entre les vecteurs, nous obtenons la similarité de description. Quant au calcul de similarité de type de données, nous avons défini un tableau statique contenant les valeurs de similarité entre les types de schéma XML. Ainsi, les valeurs de similarité de base sont stockées pour être utilisées dans des étapes ultérieures. Etape 3- Calcul des vecteurs de voisinage de chaque nœud d’arbre : cette étape se déroule en parallèle avec l’étape 2. On parcourt les deux arbres et pour chaque nœud, extrait les valeurs de son voisinage, c-à-d, les 4 vecteurs contenant respectivement les nœuds ancêtres, les nœuds frères, les nœuds fils immédiats et les nœuds feuilles du sous arbre en élément en question.

1 http://jakarta.apache.org/lucene/docs/index.html


Etape 4- Calcul de la similarité structurelle : dans cette étape, le calcul de similarité structurelle entre deux nœuds se fait à l’aide des résultats de similarité de base calculés dans 3 et en prenant en compte des similarités de chaque élément de leurs voisinages (ancêtre, frère, fils immédiat et feuille). De même, les valeurs de similarité structurelle sont stockées pour être utilisées dans la prochaine étape. Etape 5- Calcul de la similarité finale des paires d’éléments des schémas entrés : les sorties des étapes 2 et de 4 font les entrées de cette étape 5. Pour chaque paire de nœuds, la valeur de la similarité finale est calculée en se servant de leurs valeurs de similarité de base avec celle de structure. Ces résultats finaux sont stockés pour être utilisé dans la dernière étape. Etape 6- Filtrage des paires d’éléments en correspondance : cette dernière étape consiste à choisir parmi les correspondances trouvées dans l’étape précédente celles qui sont très plausibles. Pour ce faire, à l’aide d’une valeur de seuil située entre 0 et 1, on élimine les paires d’éléments en correspondance dont la similarité est inférieure à ce seuil. L’ensemble des correspondances après le filtrage constitue le résultat final d’un processus du matching entre deux schémas, représentées sous forme graphique (voir Figure.14). Nous stockons les correspondances finales sous forme d’un triplet <source_element_path, target_element_path, value> avec source_element_path qui signifie le chemin depuis la racine d’un élément du schéma source (ex : /Interchange_EDI/UNB/UNH/ pour UNH), target_element_path qui signifie le chemin depuis la racine d’un élément du schéma cible (ex : /Message_swift/Sender pour Sender) et value qui signifie le degré de similarité finale entre les deux éléments en question. Cette dernière est une valeur numérique située entre [0, 1]. Ces triplets peuvent être stockés sous forme de Hashtable ou dans un fichier XML pour des usages futurs. Ainsi, les éléments en correspondance, dont la valeur est au dessus du seuil de filtrage, sont liés par des lignes dans l’interface graphique (voir Figure.14).

VII.2.2.2 Tests de performance: Notre prototype actuel propose deux fonctionnalités principales qui sont : le matching des schémas proprement dit et le batch de performance. Ces deux fonctions sont distinctes. Pourtant, on peut utiliser le lancement de batch pour aider à améliorer le processus du matching. Dans le matching des schémas, l’utilisateur choisit deux schémas à comparer et un ensemble de valeurs nécessaires qui sont : coeff_desc pour le calcul de la similarité de base, coeff_anc, coeff_fr, coeff_fimm et thr pour le calcul de la similarité structurelle et coeff_base pour calculer la similarité finale entre les paires d’éléments des schémas. Dès que le matching est terminé, l’utilisateur peut filtrer en utilisant différentes valeurs de thraccept pour sélectionner uniquement les correspondances qui sont les plus plausibles et finalement sauvegarder le résultat final (ensemble des correspondances après le filtrage) dans un fichier XML.

L’objectif principal du batch est de permettre aux utilisateurs de déduire les meilleures valeurs de coefficients pour effectuer le matching entre deux schémas. Nous avons prévu 4 types de batch : un pour la similarité de base, 2 pour la similarité de structure et un autre pour la similarité finale. Si l’utilisateur veut avoir un meilleur ensemble de coefficients pour son matching des schémas S et T, il est recommandé de lancer deux campagnes de batch sur chacun des schémas en entrée. Un batch


donné effectue le matching entre un schéma donné avec lui-même en faisant varier différentes valeurs de coefficients pour obtenir le meilleur résultat. Le résultat de chaque batch est un ensemble de fichiers XML qui peut se présenter sous forme de graphe (développé en utilisant l’API JFreeChart1). D’une façon empirique, nous proposons une mesure pour déterminer un meilleur coefficient à utiliser dans le matching entre deux schémas S et T en se basant sur les résultats de batch entre S et S et celui de T et T. Nous avons effectué notre test en utilisant les schémas provenant du monde commercial réel. Nous avons pu constater que notre algorithme fonctionne bien et donne des résultats très motivants même dans des cas extrêmes (comparer PAYMUL avec MT103). Néanmoins, plusieurs points importants (les statuts, les cardinalités et les contraintes associés aux éléments de schémas à comparer) ne sont pas encore traités dans ce premier prototype. Notre test de performance2 a été réalisé en utilisant deux schémas réels dont l’un est celui du message PAYMUL de EDIFACT (243 éléments) et l’autre qui est celui du MT103 de SWIFT (50 éléments). Le temps pris pour faire correspondre un schéma de 243 éléments (PAYMUL) avec un autre de 50 éléments (MT103) était à peu près de 5 minutes. Nous avons procédé comme suit :

• comparer PAYMUL avec PAYMUL et MT103 avec MT103, lancement de batch pour déterminer les meilleures valeurs de différents coefficients

• comparer PAYMUL avec MT103, calcul de rappel et de précision en se basant sur les résultats faits par un expert humain et ceux qui sont donnés par notre prototype.

Les comparaisons de PAYMUL avec PAYMUL et MT103 avec MT103 donnent des résultats parfaitement fiables, ce qui assure que notre algorithme fonctionne bien. Nous avons utilisé les mesures de qualité du matching en nous basant sur le calcul de précision (nombre de vraies correspondances trouvées automatiquement / nombre total de correspondances trouvées automatiquement) et de rappel (nombre de vraies correspondances trouvées automatiquement / nombre total de correspondances trouvées manuellement). La comparaison de PAYMUL et MT103, qui est un exemple volontairement atypique car les deux schémas sont très différents au niveau structurel (PAYMUL qui est profond et MT103 qui est très large), donne un résultat intéressant et on arrive à obtenir 25% de précision en utilisant les valeurs de coefficients par défaut. Autrement dit, un quart de vraies correspondances a été trouvées automatiquement et par rapport au nombre total de correspondances. L’effort humain pour éliminer les mauvaises correspondances est considéré encore comme acceptable.

Le test effectué et la mesure de qualité mentionnée ci-dessus peuvent nous donner une idée de base sur la performance de notre algorithme. En revanche, pour extraire tous les détails de performance de cet algorithme, un test à grande échelle devra se faire en se servant d’un grand nombre de schémas de messages provenant du monde commercial réel à comparer. Nous pourrions améliorer notre prototype en prenant en considération des modifications faites par l’utilisateur (acceptation, élimination ou rajout des correspondances) et la réutilisation des résultats qui sont précédemment trouvés et stockés.

1 http://www.jfree.org/jfreechart/ 2 Notre prototype a été exécuté sur PC Pentiumn IV de vitesse 2.8Ghz ayant 448Mo de RAM.


VII.3 Implémentation & étude de faisabilité du projet EISCO

L’implémentation du prototype du serveur EISCO s’est déroulée en plusieurs étapes : • Etude de faisabilité concernant les composants de l’architecture, • Choix et installation du serveur de Bases de Connaissances et du serveur d’applications, • Etude de faisabilité sur l’importation de schémas UML dans le serveur de Bases de

Connaissances et sur l’utilisation des EJB dans le Core Services, • Développement de l’interface permettant la communication avec le serveur de bases de

connaissances, • Développement de l’interface graphique de l’administrateur et du client, • Développement de quelques composants du Core Services pour réaliser l’étude de cas de la

réutilisation.

VII.3.1 Etude de Faisabilité

L’étude de faisabilité a pour but de vérifier l’adéquation des techniques choisies avec les besoins du projet EISCO. Ainsi, elle porte sur les serveurs à utiliser (serveur d’application et de bases de connaissances), sur les différents services offerts par ces serveurs et leurs composants. Elle propose éventuellement des solutions de rechange en cas d’impossibilité technique. EISCO KB Server Nous proposons l’utilisation des serveurs de base de connaissance existants comme Racer [Haarslev 01] ou bien FACT [Bechhofer 00]. En effet, ces deux serveurs issus de projets de recherches et qui sont en Open Source, gratuits et ont des fonctionnalités similaires. Fact s’appuie sur une architecture Corba, tandis que Racer est basé sur une architecture RMI. Les deux serveurs nécessitent peu de ressources : ils peuvent être installés sur un PC de bureau. Corba étant un peu plus lourd à mettre en œuvre, et racer disposant d’une interface écrite en Java (JRacer) permettant une utilisation très simple dans le cadre de notre projet, le choix s’est porté sur Racer. En effet, Racer peut charger un modèle UML formaté en XMI pour générer une ontologie de domaine associé, il vérifie la cohérence du

Figure.15 Diagramme de déploiement

Figure1.6 Interface Administrateur EISCO


modèle et la satisfiabilité avec la logique modale (modal logics). La gestion de cette dernière n’est pas suffisamment claire et concise. Alors, nous avons opté à une solution d’échange pour sauvegarder et gérer les liens sémantiques. L’utilisation du serveur de la base de connaissance est simplifiée à travers JRacer (API Java pour Racer). Par contre, un module d’interfaçage indispensable reste à développer pour gérer la communication avec les autres modules. EISCO Accessibility Server Nous proposons l’utilisation des serveurs d’applications existants pour la plateforme J2EE comme JBOSS, Jonas, TomCat, etc. En effet J2EE (Java 2 Enterprise Edition) est une plateforme dont l’utilisation a prouvé l’efficacité pour faciliter les développements des applications. Ainsi, le serveur d’applications devra offrir la possibilité à un client (une application) de se connecter au serveur, l’accès aux objets réutilisables créer dans EISCO Core Services, la publication des objets réutilisables, accès aux bases de données, gestion de sécurité, etc. Notre choix s’est porté sur JOnAS pour être utilisé dans notre application. Les EJB (Enterprise Java Beans) utilisés dans JOnAS sont des composants serveur distribués écrits en Java. Ils permettent d’améliorer l’encapsulation des fonctions, grâce à l’utilisation d’interfaces qu’ils exposent et qui peuvent être utilisées par les clients. Un EJB est déployé dans un conteneur (JBoss, JOnAS, etc.), et c’est à ce celui-ci qu’accède le client. Un EJB permet d’implémenter de la logique métier comme l’accès à un SGBD, à un autre système d’information, de développer des applications Web, de développer des Web Services, etc. EISCO Core Services Les composants de Core Services doivent être implémentées en totalité. Elles assurent les fonctionnalités demandées par les utilisateurs et leurs applications. Nous avons choisi pour le développement le langage JAVA, pour deux raisons essentielles : la question de portabilité des modules implémentés et la question d’uniformité avec le langage utilisé par les autres couches de l’architecture.

VII.3.2 Implémentation du prototype

La première phase de cette implémentation a consisté à installer et configurer les serveurs Racer et Jonas. Ensuite il a fallu développer les différents programmes de l’application : la partie

Figure 1.7 Interface Client EISCO


administration du serveur et la partie client, ainsi que différents composants de Core Services. L’implémentation s’est faite à l’aide de l’environnement de développement intégré d’Eclipse pour Java. Le diagramme de déploiement du prototype sur les différents serveurs est illustré dans Figure.15. Les interfaces graphiques développées pour le projet EISCO sont : Administrateur : consiste en une interface administrateur qui permet de gérer le serveur EISCO. Elle assure le démarrage des serveurs d’applications et de Bases de Connaissances, le chargement des ontologies et des modèles dans la base de connaissances, la gestion des liens sémantiques entre les ontologies contextuelles, etc. Cette application graphique intègre JRacer, de façon à pouvoir utiliser les services offerts de Racer Figure.16. Client : Il s’agit d’une application graphique qui permet à un utilisateur de se connecter au serveur et d’y effectuer les opérations de requêtage, de développement, etc. L’application cliente doit pouvoir utiliser les services offerts par EISCO Core Services. Dans ce but, elle doit pouvoir se connecter au serveur, consulter les ontologies ouvertes ainsi que les EJB existants dans le conteneur de JoNAS Figure.17.

VII.3.3 Scénario de test

La séquence des événements réalisés dans ce scénario est répartie entre le rôle de l’administrateur et du client du serveur EISCO est comme suit :

VII.3.3.1 Rôle de l’Administrateur : L’administrateur est responsable essentiellement de gérer la consistance et la mise à jour des modèles des systèmes attachés à l’architecture. • Tout commence par la création du diagramme de classes UML pour la nouvelle application

(Traducteur EDI), à l’aide d’un logiciel de modélisation comme ArgoUML1, Poséidon2, Rational Rose3, etc. Puis il faut exporter ce schéma au format XMI. Il faut ensuite effectuer la transformation du fichier XMI (Version 1.2) en un fichier utilisable par Racer. La version actuelle de Racer (version 1.7.19, utilisé dans notre architecture, ne permet pas d’assurer l’intégration directe du fichier XMI. Nous avons utilisé une solution d’échange qui consiste à appliquer une feuille de style XSLT avec MS-XSL4 (développé dans le cadre du projet exff5) pour transformer le modèle en un fichier OWL alors acceptable par Racer (UML XMI OWL). Nous supposons qu’une ontologie représentant QELT a été déjà intégré, en suivant la même logique, par l’administrateur dans EISCO KB.

• En étudiant, les modèles QELT et Traducteur EDI, nous avons constaté des concepts multi représentés comme le concept Mapping dans le modèle du Traducteur EDI et le concept Mapping dans le modèle de QELT. Ces deux concepts, dont la finalité est très proche, sont modélisés différemment. Selon le formalisme des ontologies contextuelles, l’administrateur va définir un lien

1 http://argouml.tigris.org/ 2 http://www.gentleware.com/ 3 http://www-306.ibm.com/software/rational/ 4 http://www.microsoft.com/downloads 5 http://homepages.nildram.co.uk/~esukpc20/exff2004_11/exffindex.html


sémantique (d’identité) entre ces deux concepts multi-représentés. Une application graphique a été développée pour permettre à l’administrateur de saisir d’une façon simple et rapide les liens existants entre deux ontologies en utilisant l’arbre JTree pour l’affichage de contenu des ontologies et de sauvegarder ces liens dans un fichier au format XML (voir Annexe-J). En effet, Racer a annoncé dans [Haarslev 01] de gérer la logique modale (modal logics), par contre cette tâche n’est pas suffisamment claire et concise dans le guide d’utilisation. Alors, nous avons opté à une solution d’échange en sauvegardant les liens sémantiques dans un fichier au format XML. Ce fichier pourra être appelé par l’application cliente pour retourner les liens existants entre deux concepts. De plus, à chaque concept d’une ontologie est associé une liste d’EJB où le concept est utilisé. Dans notre cas, à l’EJB MappingExpressionEJB de QELT est associé aux concepts Entity, Mapping, Attribute, Transformation, and Function (voir Tableau.1).

VII.3.3.2 Rôle du Client : Lorsque le modèle est importé dans l’EISCO KB, le développeur peut alors utiliser le serveur EISCO pour compléter le développement de l’application Traducteur EDI. Pour cela, il va procéder ainsi : • Il démarre l’application cliente, celle-ci utilise un EJB qui permet, de façon automatique, la

connexion au serveur EISCO et retourne la liste des ontologies actives sur le serveur. • Le développeur choisit dans cette liste le contexte de son travail qui est dans ce scénario

Traducteur EDI. L’application appelle alors un nouvel EJB, qui va chercher les concepts définies dans l’ontologie de ce système et les afficher.

• Le développeur choisit alors un concept qu’il souhaite manipuler, dans ce cas c’est le concept Mapping de Traducteur EDI. Un EJB est alors utilisé pour chercher la liste des concepts ayant des liens sémantiques (dans le formalisme des ontologies contextuelles) avec le concept sélectionné et retourne cette liste au client. Cette liste est alors passée en paramètres à un nouvel EJB qui lui, va chercher la liste des EJB utilisant ce concept. Cette recherche retourne l’EJB MappingExpressionEJB du QELT implémentant le concept Mapping du QELT.

• Le développeur va à ce moment pouvoir sélectionner l’EJB et obtiendra la liste des méthodes contenues dans cet EJB. Il trouve ainsi la méthode MappingExpression.

• Le développeur peut enfin sélectionner cette méthode qui l’intéresse et récupére le code de celle-ci pour la personnaliser à l’application (Traducteur EDI). Cette étape de personnalisation va être simple puisque le contenu de la méthode MappingExpression de QELT est très similaire à celle du MappingExpression de Traducteur EDI (voir Tableau.1).

• Finalement, le développeur va enregistrer le nouveau EJB MappingExpressionEJB pour Traducteur EDI qui a été crée à partir de la personnalisation du code du MappingExpression de QELT. Le nouveau EJB est alors enregistré dans le container et attend l’administrateur du serveur EISCO pour le déployer.

VIII. Conclusion Le partage sémantique est essentiel entre les systèmes d’informations dans les entreprises. En effet, il constitue l’élément clé pour permettre la coopération entre ces systèmes afin de réaliser les tâches quotidiennes. Dans cette thèse, nous commençons, avec une méthodologie buttom-up, par étudier les plateformes d’intégration et d’échange de données pour mieux se familiariser avec ces systèmes. Nous identifions à la suite un modèle d’expression de mappage pour répondre aux besoins de ces systèmes. Nous constatons que ces deux systèmes implémentant différemment le même modèle, contiennent des éléments multi représentés. Cette incompatibilité sémantique empêche la réutilisation entre ces deux


systèmes. Nous étudions alors les ontologies comme solution formelle pour le partage sémantique entre les systèmes d’information dans l’entreprise. Mais nous affrontons également au problème de la multi représentation. Nous proposons ainsi de coupler avec les ontologies la notion des contextes pour surmonter cette difficulté. Nous utilisons le langage ALCNM pour représenter formellement cette combinaison. Nous construisons une architecture et quelques scénario pour l’utilisation les ontologies contextuelles dans l’entreprise. Une série de prototypes qui correspond à chaque partie de notre recherche a été implémenté et qui inclut : QELT, Traducteur EDI et EISCO. Les perspectives à ces travaux incluent : une extension de l’implémentation du projet EISCO pour prendre en considération les opérateurs de la logique modale servant à la définition des connaissances globales ou semi-globales ; une extension de l’algorithme EX-SMAL pour prendre en charge les autres éléments des guides d’utilisation dans les messages EDI comme les contraintes, les statuts, etc. ; étudier l’applicabilité de l’algorithme EX-SMAL pour le matching entre les ontologies contextuelles ; étudier la complexité des algorithmes de satisfiabilité pour le formalisme adopté aux ontologies contextuelles. Nous pouvons aussi prévoir une ouverture des ontologies contextuelles vers le domaine des systèmes pervasifs et autonomes.

1. CHAPTER 1

INTRODUCTION

"The Science of today is the Technology of tomorow " Edward Teller

The introduction of computers into business in the 1950’s has had a strong impact on corporations. In today’s conquest of market places, enterprises call up their systems to meet with the strategic requirements and competitive challenges. These systems become responsible for supplying employees with needed tools to communicate, to cooperate, to answer their queries, and to support their decisions. In this perspective, Enterprise Information Systems (EIS) cover a set of information systems (e.g. DW, EDI, etc.) used by organizations in order to fulfill their operational needs and to support their strategic goals. This includes the integration of systems within the organization and the linking of the internal processes electronically to those of other organizations. In essence, this raised area of research is concerned with the study of enterprise information systems as tools supporting the business processes and management of organizations.

Chap 1- Introduction

44

1.1 Research Context and Problem Identification For these EIS the emphasis is put on the properties of openness, scalability and autonomy. Architecture of distributed components and their shared understanding are necessary to permit the reuse of these components in different contexts, and the interoperability between heterogeneous systems developed by different organizations. In fact, the lack of reusability for software components is a universal problem in computer science. Furthermore, according to EIS challenges, interoperability is additionally a key issue requiring significant endeavor. Lack of interoperability, caused by heterogeneity, compromises enterprises collaboration and deprives organizations of an efficient knowledge sharing within and between them. In addition, providing a common understanding within an enterprise becomes more essential to cope with the complex separate systems’ semantics. Regardless of these facts, many EIS are developed independently, based on specification with an intention of mono-representation of the domain of interest. Therefore, they offer separately a low potential for providing global querying or any data/knowledge sharing ability. By analyzing the preceding issues, we can distinguish that the problem originates in semantics heterogeneity which degrades interoperability and reusability for EIS. For instance, semantic interoperability is fundamentally driven by the purpose of communication [Obrst 03]. To make systems interoperate, semantics sharing is essential for the process and should be guided by the interpretation within a particular context and from a particular point of view.

Therefore, the way to address the problem of semantics heterogeneity is to reduce or eliminate conceptual and terminological mismatches. A shared understanding can function as a unifying framework for the different viewpoints, and serve as the basis of communication between people, interoperability between systems, and other system engineering benefits such as reusability, reliability, and specification. In the last decade, ontologies and context played separately a major role in many AI applications, integration for heterogeneous and distributed systems, and system engineering. On one hand, local information systems ontologies are foreseen to play a key role in partially resolving the semantic conflicts and differences that exist among systems. On the other hand, the notion of context carried through views, aspects, and roles can support development process of complex systems. It is defined as a locally managed object, which encapsulates partial knowledge about a particular system. In résumé, this subject examines contemporary themes in enterprise information systems. It investigates research and practice in ontologies and context paradigms for improving semantics sharing among these systems.

1.2 Motivations In today’s challenging economy, companies are growing horizontally, seeking fusion and solid partnership with other enterprises. In fact, conglomeration has become widely exercised by many enterprises. Because of strong market competitions that have arisen to earn the niche market, a crucial demand for integrated and cooperative information systems is becoming widely recognized by IT (Information Technology) decision makers as well as by the academia. Therefore, EIS should offer the possibility to be adaptable for enterprise environment changes and predictable evolution. Organizations are endowed with more and more complex structures allowing a high degree of autonomy for their members. This autonomy can be considered as result of deploying an effective


45

information systems structure, which enables system autonomy, and at the same time, coherence within the global architecture. Meanwhile, improving EIS is relative to two factors: the push from technology assisted by academic research, and the pull from industry. In the effective use of EIS to support business goals, a number of key challenges face all large enterprises, whether they are governmental, in the private sector or in universities. Firstly, enterprise information systems are growing due to high demand for efficiency and data quality. They should evolve according to enterprise business evolution, which considers consequently information and knowledge sharing with partners. Secondly, the binding of academic and industrial research is important to overcome EIS and semantics sharing problems. Indeed, the current research results, in these topics, are far from being used as commercial products nor have they proven their implementation adequacy for the ROI (Return On investment).

1.3 Research Methodology and Goals Firstly, the work aims at gaining an appropriate and solid theoretical foundation for research into enterprise information systems and, secondly, at providing research output and relevant results to EIS. In order to resolve the main problem of semantic sharing among EISs, we intend to pair-up the two notions of Context and Ontologies. Typically, contextualization can be seen at the ontology level in order to enable the multiple views and multi-representation requirements. Furthermore, designing complete distributed system architecture that resolves the semantics sharing and takes into account such important attributes as interoperability, reusability, and common understanding, is also the core of our work.

Figure 1.1: Research Methodology

In order to trouble-shoot the semantics sharing and to reach the EIS goals, we progressed as follows in our research methodology (Figure 1.1):

• Selecting relevant EIS systems to the study: These systems should reflect the context of research and the interest of research project members. We chose the data warehousing system because it is essential for enterprise business intelligence. In addition, we studied the EDI system, which is a mature wide used technique enabling enterprise communication through messages exchange.


46

• Resolving the theoretical and practical problems involved in these systems: It consists of

exploring the difficulties and reaching a local solution for these problems. This step is increasingly important because it helps us to identify some of the EIS problems and to contribute in resolving them.

• Using the outcome to show the semantic sharing problem: after locally solving identified

problems, we distinguished that the mapping expression can be used for both of the systems. The lack of a common understanding makes it hard to share the model, or even more to reuse the implemented components. Therefore, sharing the understanding of the model by the developers of the two systems can help to improve reusability. In this step, we explored the techniques that exist for semantic sharing such as Unified Modeling Language, ontologies, etc., and we identified that problems reside in multiple views and multiple perspectives that can be given to a concept in each of these systems.

• Associating theoretical and logical techniques to pair up ontology and context: We found out

that another notion named Context could be helpful for resolving the left aside issue in semantic sharing. Therefore, we tried to pair up the two notions ontologies and context for reaching a dual expressivity. We associated Description Logics and Modal Logics formalisms for expressing this union.

• Suggesting practical applications such as architecture that enable us to reach the goals: This

step is very useful from a practical point of view, because reaching a theoretical result is not the goal per se, but facilitating EIS users’ tasks is the foreseen goal. We illustrated many scenarios as well for using the suggested architecture, and implemented a prototype with a case study to show the feasibility of these suggestions.

1.4 Originality of this work and Results The main idea we are arguing, consists in encouraging plans of evolving EIS, and considering theoretical research such as ontologies and contexts. Therefore, the discussions involved in this thesis combine together academic research and enterprise practices. Hence, one of the credits of this work is to create a space of expertise and ideas exchange between these two worlds, with respect to the push and pull factors. In terms of the theoretical research, we introduced mainly two ideas:

• Mapping expression model: This model clarifies the expressions used to map from one representation to another. There are two instances of this model permitting: the description of meta-data in the Data Warehousing systems; and the mapping guideline of EDI message translation systems.

• Contextual Ontologies formalism: Contextual ontology co-studied in [Arara 04] [Rifaieh-a

04] offers a strong machine processable formalism that meets with EIS semantics sharing needs. It brings an adequate ontology based solution to resolve the real state of multiple views and multi-representation phenomena. In contrast to simple ontology, contextual ontology can be used between inter-communities with minimal ontological commitments, and without


47

establishing a unified consensus, often disregarded for the complexity or the unknown cost. The contextual ontology is formalized using the Modal Description Logics (i.e. ALCNM language).

In terms of the enterprise application, we introduced the following ideas:

• Developing a query-based Extraction Load and Transformation tool (QELT) used for creating enterprise data warehousing systems;

• Studying enterprises’ communication with XML/EDI (XML based Electronic Data

Interchange) systems;

• Designing algorithms that determine the semantic similarity between XML/EDI schema and employing their output in the mapping process for EDI translators;

• Conceiving an architecture and developing a prototype promoting the use of contextual

ontologies with the EISCO (Enterprise Information System Contextual Ontologies) project. This project includes also setting up scenarios of use and a case study which implements preceding findings including Mapping Expression Model and tools (QELT & EDI Translator).

1.5 CIFRE Grant The work presented in this thesis, was coordinated between the Lyon Research Center for Images and Intelligent Information Systems (LIRIS) and R&D-team at TESSI Informatics. This research project was financed under the aegis of the National Association of Technological Research (ANRT) through an industrial convention of technology transfer (CIFRE N°2000661- from May 2001 to April 2004).

The research results, presented in this thesis, are well suited with the business needs of TESSI Informatics. In fact, the developed prototypes and tools (i.e. QELT, EDIMapper, XBalo, etc.) have a direct economical impact by improving TESSI Informatics offers including: EDI banking system, TESSI PRO-ERP system, and CE+ Client’s information supervision. Other studied topics helped to introduce new ideas and technologies, for instance, EISCO project. The research outcomes contributed in overcoming the main challenges and succeeded though effective transfer of knowledge. The cost of this conducted research project is effectively 780 Man-days, arranged with 330 days for the enterprise’s practice and 450 days for the Laboratory research.

1.6 Outlines of the thesis The remaining part of this thesis is organized as follows: Chapter 2 will study the enterprise information systems. It will survey EIS requirements specifications methods and point out the problem of semantics sharing for interoperability and reusability. It includes, as well, a round-trip of information modeling techniques in computer science within AI (Knowledge Representation), data modeling (databases), and requirements analysis (software engineering and information systems). One of the main underpinnings will show that


48

multiple-views and multi-representation are being used for complex systems, and that they are essential for future EIS. Chapter 3 will consist in studying in detail two EISs covering data integration and data exchange platforms. It will suggest a mapping expression model and show its practical use with Data Warehouse and Electronic Data Interchange systems. It will reveal, through these systems, how semantics sharing is essential for reusability issues. Chapter 4 will focus on unveiling how ontology and context can help for resolving semantic sharing problems. It will study, separately, these notions and investigate their usefulness with EIS. It will discuss the notion of adaptability of enterprise ontologies and show the context paradigm. It will conclude how it is possible to assist IT decision makers in overcoming their fears of using ontologies. Chapter 5 will seek formalism for defining the paradigm contextual ontology. This formalism will allow each ontological concept to be identified by a different manner and to have a different role depending of the context of use. It will suggest the use of Modal Description Logics as the underlying formalism for contextual ontologies and show simple examples of the use within EIS. Substantial parts of this chapter were made in collaboration with A.Arara. Chapter 6 will intend to define a common framework for the use of contextual ontologies in EIS. It will show where and when a contextual ontology can help in overcoming challenging scenarios with interoperability, reusability, and query answering. It will present the architecture of EISCO Project (Enterprise Information System Contextual Ontology). It will analyze, in detail, a case study using preceding findings in Chapter 2. Chapter 7 will describe the different implementations of suggested models and formalisms. This will include QETL, EDI-Matcher, and EISCO Project applying the reusability scenario. Chapter 8 will conclude this work and show, as well, perspectives and future issues.

2. CHAPTER 2

UNDERSTANDING ENTERPRISE INFORMATION

SYSTEMS

"The desire to understand the world and the desire to reform it are the two engines of progress "

Bertrand Russell In Manners and Morals, 1929

Nowadays, information technology is used day-to-day within business applications to help employees to succeed doing their jobs better and faster. In the information economy, an organization’s competitiveness is largely determined by the amount of information it owns, or has access to. Putting together the infrastructure that is required to collect, disseminate, and archive this information, and to handle intelligently the use of it, is an increasingly complex task. In order to achieve this goal, we have to recast the role of information systems in various areas of use. This includes the integration of systems within the organization, and the linking of the internal processes electronically to those of another organization. For instance, the growth and emergence of the internet bring up new organizational models, new business processes, and new ways of using knowledge.

Chap 2- Understanding Enterprise Information Systems- Problems and Perspectives 50

In today’s global environment, information systems and other global technologies (e.g. internet) are creating new opportunities for organizational coordination and innovation. In essence, this originated area of research is concerned with the study of information systems as tools supporting the business processes and management of organizations. The work aims, firstly, at gaining an appropriate and solid theoretical foundation for research into enterprise information systems and, secondly, at providing research output with relevant results to IS. Therefore, we are arguing for designing a complete distributed systems architecture that takes into account such important attributes as reliability, scalability, interoperability, reusability, and performance. The purpose of this chapter is to bring to the surface the common problems of existing enterprise information systems. Particularly, it focuses on the problem of semantics sharing, and its amplitude in terms of slowing down the effectiveness of cooperative applications. It pinpoints how some research advances (e.g. ontologies and context) can be used for resolving these problems. At first, we expose the position of enterprise information systems in terms of improving enterprise’s business activities. Next, we discuss the spectrum of enterprise information systems and their applications. Because we are concerned with how we may improve conceiving these systems, we glance over some engineering issues. Afterwards, we concentrate on identifying the problem position referring to semantics sharing and study the issues of interoperability, reusability, and query answering within information systems.

2.1 What is an Enterprise Information System about? An Enterprise Information System, hereafter in this thesis EIS, provides the information infrastructure for an enterprise by offering a set of essential services for users. As any information system (IS), an EIS consists of components of three different types: application programs, information resources like databases and/or knowledge bases, and user interfaces. The services, offered by EIS, are the backbone of the business task supported by the enterprise and exposed as local and/or remote interfaces. An EIS information resource may include relational database systems, object database systems, mainframe transaction processing systems, etc. An application program provides EIS-specific functionality, such as an implemented view in a relational database or a transaction program in legacy information systems or mainframe systems. Finally, user interfaces concern the interaction, in terms of conviviality, easiness to use, etc., between the users and the applications. This area is also known as man-machine interfaces (MMI).

2.1.1 Position of Enterprise Information Systems

Enterprise Information Systems are indispensable instruments for the management of every enterprise. They represent an implicit component of enterprise structures, into which the companies continuously invest significant financial sums. Any organization, small and large, business, research, academic and others, can use information systems to conduct more of their functions electronically making them more efficient and competitive. Hence, EISs exist as a part of an organization and adhere to its strategies and goals. To understand enterprise information systems, we therefore need to understand organizations in their domain


principles, components, structure and behavior. In [Falkenberg 98], the authors specify several disciplines that help in enterprise systems: organizational science, computer science, system science, cognitive science, semiotics and certain aspects of philosophy. Therefore, studying EIS includes technologies and business models being developed in a fully integrated multidisciplinary environment. In particular, we restrict our interest to the following:

• explaining how information systems can transform organizations; • analyzing the role played by major types of information systems in organizations; • appraising system-building alternatives; • selecting appropriate strategies in order to design and implement advanced information

systems.

As a matter of fact, EIS did not originate only in academic research, but they have also been stimulated and adopted by the industry. Hence, we can constantly find differing, or even controversial opinions, on each of the questions involving EIS.

2.1.2 Spectrum of Enterprise Information Systems

Enterprise Information Systems cover collective activities given to users, which are involved in enterprise-level processes. The level of processing information varies with the responsibility and the enterprise hierarchy. Therefore, we can distinguish four levels of information processing that correspond to different categories of users. The Operational-level systems monitor elementary activities, answer routine questions, and keep track of transactions (receipts, cash deposits, flow of materials). The Knowledge-level systems support knowledge and data workers. The Management-level systems support monitoring, controlling, decision-making, and administrative activities. The Strategic-level systems support long-range planning activities, such as future employment levels, industry cost trends, etc. We can stratify EIS, based on their function, with respect to the users that they intend to serve. In extent to the work of Jack Rockart [Rockart 96], we divide enterprise information systems into three categories: horizontal systems, vertical systems, and external systems. This classification is depicted in Figure 2.1 and can be understood as follows:

2.1.2.1 Horizontal systems: Horizontal systems provide the day-to-day services for each different category of users in the enterprise. These daily processes are increasingly being broken down into multiple systems, which are often run by one service users. Many systems constitute this category such as: Management Information Systems (MIS): supports managers, at different levels of an organization, to take decisions and provides information in form of reports and responses to their queries. Human Resource Information Systems (HRIS): supports the human resource functions which include personnel records, payroll, employee records performance, personnel training, skills inventory, etc.


Figure 2.1: Classification of EIS

Manufacturing Information Systems (MFIS): provides services to support the common manufacturing functions such as, production scheduling, shipping, receiving, materials requirements planning, resource planning, quality control systems, etc. Accounting Information Systems (AIS): covers commonly used functions such as fixed asset accounting, budgeting, tax accounting, etc. Marketing Information Systems (MKIS): assists in the marketing functions such as sales analysis, sales forecasting, promotion, pricing, marketing research, direct mail advertising, etc. Financial information systems (FIS): provides necessary support for the financial functions such as cash management, accounts payable, capital expenditure analysis, financial forecasting, credit analysis, etc.

2.1.2.2 Vertical systems Vertical systems are used essentially for decision support and benefit of coordinating multiple searches from multiple heterogeneous information sources (e.g. horizontal systems). Data Warehousing (DW): integrates historical or current data from many sources for a particular use. These integrated data are intended to serve in Business Intelligence, OLAP (On Line Analytical Processing), and Data Mining systems. Business Intelligence (BI): permits organizations to use the information coming from diverse systems in order to improve decision making with respect to increasing client services, sales, and profit. Customer Relationship Management (CRM): records details of customers and their transactions with a company. It can provide personalized online ordering analysis preferred channels of communications for marketing purposes. Enterprise Applications Integration (EAI): provides data integration between multiple systems (i.e. back-end integration). Enterprise Resource planning (ERP): supports several areas of a business by combining a number of applications.


2.1.2.3 External systems Some of the preceding systems are completely independent but most are interconnected through electronic networks to exchange information. For example electronic commerce (i.e. e-commerce) is adapted to achieve market based coordination mechanisms. It allows businesses to transact with each other B2B (Business-to-Business), brings customers closer to businesses B2C (Business-to-Customer), and makes better interaction between government and business enterprises G2B (Government-to-Business). Another typical application is an online catalogue. This category includes: Electronic Data Interchange System (EDI): provides the possibility of sending/treating messages between information systems without any human intervention. ebXML Registry: ebXML enables companies to meet and conduct business through the exchange of XML-based messages. An ebXML registry is an information system that securely manages any content type and the standardized metadata that describes it. Supply Chain Management Systems (SCM): supports reverse cycle for returned items from the buyer back to the seller. Therefore, it integrates linkage and coordination of activities, supplier, manufacturer, and distributor.

2.1.3 Enterprise Information System Development

EIS development is not a plug-and-play task, because it requires a considerable endeavor to analyze complex enterprises. Typically, information systems development starts from the problem space as described in [Castro 01]. In order to ensure cost-effectiveness and quality, developers employ some kind of system development process model (e.g. waterfall, 2TUP, Spiral, etc.). Typical activities performed include system conceptualization, systems requirements analysis, system design, specification of software requirements, detail design, software integration, testing, etc. For instance, software engineering identifies three broad methods of requirements specification [Conger 94]. These include: Process-oriented methods: concentrate on describing the function of a system in terms of the steps taken to perform some procedure [Benaroch 02]. One of the well known process oriented methods is REA (Resource Event Agent) Enterprise Modeling [McCarthy 83]. Data-oriented methods: focus on specifying the data requirement of an application, on the premise that data are less changing than processes. Current information systems have widely adopted methods and technologies such as the Entity-Relationship (ER) meta-model of [Chen 76] and the Relational Database (RDB) model of [Codd 70]. Object-oriented methods: inspired by object oriented programming, they think in terms of objects that encapsulate both data and the processes acting on that data. For instance, the Unified Modeling Language (UML) has been commonly used by the software engineering community because its scope is broadening to include more diverse modeling tasks. Not surprisingly, traditional techniques for building information systems are no longer adequate. Information system engineering needs to be extended to support new, flexible, and more powerful information modeling. Indeed, information modeling plays a central role during the information system development. In order to use information, one needs to represent it, capturing its meaning and inherent structure [Mylopoulous 98]. Such representations are important for communicating


information between people, but also for building information systems, which manage and exploit this information. In addition, new trends of IS modeling are coming up to take their place. Theses trends tackle multiple view requirements, multi-representations requirements, context aware engineering, ontology driven conceptualization, etc. Subsequently, it concerns projects involving heavy use of multiple descriptions for one and the same entity, such as multiple versions of the same design or multiple perspectives on the same data. In this case, the use of contextualization is recommended to organize and rationalize these perspectives of the same reality. Moreover, in the mid-’90s, researchers started looking at conceptual models, which began to gather the ontology and treat taxonomies to support various forms of parameterization and contextualization. We will discuss in the following, briefly, some of these trends. Multi-representations requirements: The multi-representation problem is commonly known in the discipline of information modeling. For databases, a large amount of work addresses specific facets of multi-representation. The first technique that one could think of is the well-known view mechanism. Database views have been traditionally used to present partial, but consistent, viewpoints of the contents of a database to different user groups.Views allow one to derive a new (virtual) representation from the representations already defined in the database. In spatial database, the work in [Balley 04] has investigated the problem and proposed an extension of an existing ER-based model called MADS. A stamping mechanism of data elements (concepts, attributes, instances) and relationships is suggested to enable manipulations of data elements from several representations. Multiple views requirements specifications: the notion of viewpoints has been used to support the actual development process of complex systems. In this respect, a viewpoint is defined as a locally managed object, which encapsulates partial knowledge about the system and its domain. An approach was proposed by [Finkelstein 94] in which a framework was devised to describe complex systems based on viewpoints. The framework is independent of any specific development methodology and actively encourages multiple representations in software development. Views are also used in requirements engineering [Nusebeih 94] as “a vehicle for separation of concern“. In this case, participants are allowed to address only those concerns or criteria that are of interest, and ignore others that are unrelated. Using viewpoints to relativize descriptions in an information base is serving as a mechanism for dealing with inconsistency in requirements specifications [Robinson 94]. Contextualization: A problem inherent in any modeling task is that there are often differences of opinion or perception among those gathering, or providing information. Contextualization can be seen as an abstraction mechanism which allows partitioning and packaging of the descriptions being added to an information base [Mylopoulous 98]. According to [Dey 00], context is any information that can be used to characterize the situation of entities (i.e. whether a person, place or object) that are considered relevant to the interaction between a user and an application, including the user and the application themselves. The notion of context has been studied since the early day of AI. Hence, contexts have found uses in problem solving, as means for representing intermediate states during a search by a problem solver in its quest for a solution [Hewitt 71]. In knowledge representation, context was used as representational devices for partitioning a knowledge base (e.g., [Hendrix 79]). Recent work in [Nixon 02] suggests


integrating context services to tiered architecture. It studies the impact of the context on architecture. This impact is resumed by the front tiers (presentation and session) where the application can change its presentation, based on a user’s context. Ontology as specification for conceptualization: An ontology, defined as specification of conceptualization, looks very much like a conceptual model, as it has been used for many years in designing information systems [Colomb 03]. In fact, the two are closely related, but not identical. A conceptual model relates to a particular information system, whereas an ontology is a representation of a world external to any particular system, usually shared by a community of interoperating information systems. Therefore, an ontology is an object very much like a conceptual model, and can be developed using adaptations of conceptual modeling tools like Enhanced Entity-Relationship-Attribute or UML Class diagrams. Using ontology as specification is discussed in [Jasper 99]. Other work in [Cranefield 99] considers UML as an ontology modeling language. However, there are some differences when comparing ontologies to conceptual modeling (e.g. UML class/objects models). Indeed, conceptual schemas lack a model checking for inconsistency, and also, no inference is possible. Conceptual schemas are often not accessible at run time [Guarino 98] and usually have no formal semantics. Whereas with formal semantics, ontologies are accessible at the run time. Finally, an ontology provides a domain theory, not the data structure, and a language for describing ontology is semantically richer than those designed for databases. The preceding round-trip of information modeling techniques in computer science included techniques developed in AI (Knowledge Representation), data modeling (databases), and requirements analysis (software engineering and information systems). One of the main underpinnings is that an ontology forms a specialization hierarchy of domain concepts and their definition. The lower level concepts have close links with data-level terms in ER models or UML models and implementation specific database schemas. Contextualization can be seen at the ontology level in order to enable typically the multiple views and multi-representation requirements. The issues concerning context and ontologies for EIS are going to be studied throughout this thesis.

2.1.4 Architecture of Enterprise Information Systems:

From system engineering point of view, EIS models and architectures have evolved to allow, on one hand, unified modeling using object oriented paradigm such as Unified Modeling Language UML. On the other hand, a suitable framework such as Model Driven Architecture (MDA) permits natural and economical modeling of design and analysis domains and the relationships between them [Skene 03]. This advent conforms to the change of EIS architectural needs including horizontal, vertical and external systems. In order to better understand how EIS is organized and conduct their role, we study in this section the architectural aspect of EIS. Information systems architecture provides a unifying framework into which various people with different perspectives can organize and view the fundamental building blocks of information systems. We consider that there exist 4 levels of EIS architecture: business architecture (i.e. modeling the business to be supported by the system); functional architecture (i.e. modeling virtual components, that fulfill together the functions of the system); software architecture (i.e. modeling the components that


realize the system); and technical architecture (i.e. modeling the hardware infrastructure and their operating systems) [C4ISR 97]. We restrict our interest to software architecture. The most common definition of software architecture is given in [Bass 98], “The software architecture of a program or computing system is the structure or structures of the systems, which comprise software components, the externally visible properties of those components and the relationships among them”. Therefore, software architecture is conventionally concerned with the structures at a high abstraction level describing the main constituents of a software system. Despite this, there seems to be no common agreement on what these structures exactly are [Smolander 02]. This is admitted in the “IEEE Recommended Practice for Architectural Description of Software-Intensive Systems” [IEEE 00]. However, an architecture considers a set of models, each describing an architectural view of the system. The models are related and must give a complete and consistent definition of the system. The consistency ensures that all models are errorless and they do not imply contradictions if put together. The completeness ensures that all relevant aspects of the system are defined, i.e. sufficient information for constructing the system. By architectural viewpoint model, we mean a framework that defines the relevant aspects or concerns that must be taken care of when designing software architecture. Hence, the viewpoints cannot be standardized, but they must be selected or defined according to the environment and the situation.

Figure 2.2: Linkages among the Architecture Views ([C4ISR 97]) For instance, the C4ISR (US Department of Defense Architecture Framework) [C4ISR 97] provides guidance on describing architectures. In this standard-like viewpoint model, there are included three major views that logically combine to describe the architecture. These three architecture views are: The operational architecture view is a description of the tasks and activities, operational elements, and information flows required to accomplish or support an operation.


The systems architecture view is a description, including graphics, of systems and interconnections providing for, or supporting, needed functions. The technical architecture view is the minimal set of rules governing the arrangement, interaction, and interdependence of system parts or elements, whose purpose is to ensure that a conformant system satisfies a specified set of requirements. The Figure 2.2 illustrates some of the linkages that serve to describe the interrelationships among these three architecture views. Nonetheless, we found other frameworks in the research literature concerning architectural views. The 4+1 view [Kruchten 95] is perhaps the best-known model defining architectural viewpoints. It specifies four viewpoints (logical, process, development, and physical viewpoints), accompanied with unifying scenarios. In Sowa & Zachman [Sowa 92], a framework tries to describe all the relevant aspects of information systems from an enterprise perspective. In [Smolander 02], authors seek ways to clarify the situation and to define the nature of software architecture and software architecture design. The OMG-Standard MDA1 defines three viewpoints from which some systems can be seen. These models, which are representations of a given system, are Computation Independent Model (CIM), Platform Independent Model (PIM) and Platform Specific Model (PSM). Some recent research in MDA [Duric 04] involves ontologies with MDA to ensure using the well known UML notation in ontological engineering. Others such as [Ekstedt 04], propose a utility-cost based approach, founded on a consistent Enterprise Software System Architecture meta-model. This architecture is organized to reflect the technical aspects of the systems, as well as the business context that they are intended to support. To resume, the notion of viewpoint is architecturally coherent for reflecting an architecture view on a system (technical, physical, etc.). It differs from requirements viewpoint, which involves the level of detail, user interest, location identity, and state of people and groups.

2.1.5 Perspectives of Enterprise Information Systems:

In the Age of Information Revolution, information is becoming ubiquitous, abundant and precious resource for enterprise. Just as the futurists have been predicting for many years, information technology has penetrated every aspect of the enterprise, and its exploitation has affected the structure of many industries. In essence, the foreseen evolution of EIS is constrained by two factors: the push from technology as well as the pull from the business needs. Therefore, input should be drawn from all the stakeholders concerned in IT to improve EIS. In particular, the amount of information processed by human is going to be tremendous. The information will be no longer limited by human knowledge and will outdo his capacity. Many notions of AI will take a larger role in the enterprise and will settle down side to side with enterprise employees. Indeed, as Mark S.Fox [Liu 99] predicted, each member will have his own personal agent that will extract, explore, infer, and discover information for him. The effect of AI trends in enterprise will not eliminate people from the enterprise, but instead allow them to focus their energies on the truly creative tasks that remain.

1 http://www.omg.org/mda/


However, many other foreseen evolutions can be predicted for the future EIS, [Liu 99], such as:

• Creating rapidly customizable enterprise information systems without fundamental changes and reducing investments of time, money, and expertise.

• Centralizing organizational coordination and decision making. Virtual enterprises will be defined on top of existing ones, with confusing borders among cross distributed organizational collaboration.

• The evolution of enterprise information systems will facilitate autonomous systems to react for a specific situation and event. A more pervasive capability, P2P-like, and loosely coupled systems will grow across communities and enterprises.

2.2 State of the art of EIS problems Because of strong market competitions that have arisen to earn the niche market, a crucial demand for integrated and cooperative information systems becomes widely recognized by IT decision makers. Therefore, EIS should offer the possibility to be adaptable for enterprise environment changes and predictable evolution. Organizations are endowed with more and more complex structures allowing a high degree of autonomy for their members. This autonomy can be considered as result of deploying an effective information systems structure which enable the system’s autonomy, and at the same time, coherence with the global architecture. Offering autonomy to users as well as to their systems encourages having loosely coupled systems, which are inter-related with articulating mechanism. Many issues arise from this new challenge; indeed, the semantic sharing between these systems becomes the daily bread for reaching goals [Rifaieh-a 05].

2.2.1 Semantic sharing

Many factors come in to play in the problematic of current EIS. In the Figure 2.3, we represent a simplified icon gathering these factors. We identify that with the increasing complexity of systems and IT needs, the systems are consequently increasingly working together. Meanwhile, we must reach toward maximizing the richness of semantics, and making it increasingly explicit. In addition, disparate modeling methods, paradigms, languages and software tools severely limit interoperability. The potential for re-use should reduce much wasted effort of re-inventing the wheel. Likewise, the lack of shared understanding, leads to poor communication among people and their organizations, generating difficulties in identifying requirements and systems’ specification.

2.2.1.1 Semantic heterogeneity Semantic heterogeneity has been identified as one of the most important and toughest problems to tackle when cooperating among multiple systems [Kashyap 97]. For databases, the problem of semantic heterogeneity is defined in [Sheth 93] as the identification of semantically related objects in different databases and the resolution of schematic differences among them. Hence, semantic heterogeneity occurs due to different conceptualizations for the same real world entities or phenomena. Naming heterogeneities (i.e. synonyms, homonyms, etc.) are the simplest case of semantic heterogeneities. Compared to other types of heterogeneity, such as syntactic and schematic, semantic heterogeneity is the most difficult to tackle.


Figure 2.3: The Problem Icon

The problems of integrating and developing information systems and databases in a heterogeneous, distributed environment have been translated in the technical perspectives as system’s interoperability. In this respect, several technical solutions have been proposed to cope with the problem of heterogeneity, whether in the system (physical or network) level, syntactical, or semantic level. Technologies such as CORBA, RMI, Java, XML, etc., have all successfully resolved a great part of the physical and syntactical interoperation of systems, but the semantic issue is still, and will continue to be, an open issue for further research and investigation. Furthermore, each EIS needs to have its own local explicit semantics in order to reach a full cooperation with other systems. These explicit semantics should be machine process-able and suitable for reasoning support. Thus, the clear semantics is essential for the evolution from tightly coupled systems toward loosely coupled systems [Obrst 03]. They encompass more global interaction and integration of multiple systems in enterprises and communities.

2.2.1.2 Understanding the problem Throughout these factors, we identify that semantics sharing is the bottleneck of current EIS. Indeed, the semantics are fundamentally the interpretation within a particular context and from a particular point of view. The semantic sharing is driven by the context and the purpose. A common semantic (i.e. called ontology consensus) can be identified for a community of users. This unified consensus is often unfavorable, due to the complexity of the process and the cost of maintaining such a consensus. Therefore, loosely coupled systems should be based on explicit semantic sharing with minimum consensus. We think that in order to enable the semantic sharing, we should consider:

• Establishing the semantic representation via some conceptualization technique (e.g. ontologies, context)

• Defining semantic mappings among representations • Designing algorithms that determine the semantic similarity and employing their output in a

semantic mapping process.


We can recognize for instance, three applications that take advantage of semantic sharing which are, respectively, interoperability, reusability and common enterprise understanding. We will try to study briefly each one of these issues in the following sections.

2.2.2 Interoperable and reusable Enterprise Information Systems

2.2.2.1 Interoperability: Currently, more than ever before, the challenge of conducting joint operations is increasingly summed up by interoperability. In effect, to interoperate is to participate in a common purpose; the purpose is the intention; the goal to which activity is directed [Obrsst 03]. Determining how various systems are pulled together to accomplish a joint mission is one of the major challenges facing information systems architecture developers. The approach of interoperability to information management and the set of information systems are based largely on identifying which service is in charge of the operation and then making it accessible for reaching cooperating purpose. Even though information systems are built to meet specific requirements, they must still provide an appropriate level of service interoperability to meet joint requirements. Regrettably, many EIS are developed independently and based on specification with intention of mono-representation. They offer separately a low potential for providing interoperability, global querying, and knowledge sharing ability. Different contexts and view points enforce the heterogeneity of semantics throughout the interoperation process. Therefore, we should start to change our habits and our pattern behavior, taking into consideration the issue of interoperability. As such, understanding the specific nature and degree of interoperability required, should become a key consideration that must be accounted when designing, constructing, and deploying any information technology architecture. Today, interoperability is achieved implicitly; semantic considerations are only in the heads of designers and implementers. We should focus on making semantics an explicit part of the computational process and adding the dimensions of operational context and purpose to the analysis, design and implementation of semantically enriched objects. In particular, we need to develop a rich framework that copes with tight and loose coupling between systems. The configuration of semantic sharing will be negotiated between the service requester and the potential service provider through this architecture.

2.2.2.2 Dimension of interoperability There are differing opinions among users about what is meant by interoperability. For instance, some consider the ability to translate data into text files and exchange them using simple e-mail as “achieving” interoperability. This is one way for two systems to work together, but this restricted view leaves out many other capabilities that are needed to satisfy an operational need. We consider the definition of interoperability beyond the ability to move data from one system to another; it handles the ability to exchange and share services between systems. The Level of Information Systems Interoperability (LISI) Reference Model [C4ISR 97], identifies a partition with many levels of information systems interoperability. This partition focuses on increasing levels of sophistication for system-to-system interaction. The LISI Reference Model defines five levels Figure 2.4 that can be depicted as follow:


Figure 2.4: Levels and corresponding computing environments ([C4ISR 97])

Level 0-Isolated: systems have no physical connection. Data exchange between these systems typically occurs via an extractable common media format (e.g., diskette, CD, DVD, etc.). Level 1-Connected: systems are linked electronically. These systems conduct exchange of homogeneous data types, such as simple “text,” e-mail, or simply exchanged files (e.g.. FTP like). Level 2-Functional: systems are distributed, i.e., they reside on local networks that allow complex, heterogeneous data sets to be passed from system to system. Formal data models (logical and physical) are present; but generally, only the logical data model is agreed to go across programs and each program defines its own physical data model. Level 3-Domain: Information at this level is shared between independent applications. A domain-based data model is present (logical and physical) that is understood, accepted, and implemented, across a functional area or group of organizations that comprises a domain. Level 4-Global (sometimes known as Enterprise): systems are capable of operating using a distributed global information space across multiple domains. Multiple users can access and interact with complex data simultaneously. Data and applications are fully independent and can be distributed throughout this space to support information fusion. On one hand, to reach the goal of interoperability, we will borrow the LISI Reference Model and apply it over six elements. The elements, which resume the artifact for interoperability, are: data, object, components, application, system, and enterprise. Therefore, each level of interoperability should be conceived as enabling these elements. On the other hand, we consider four types or dimensions of interoperability summarized with syntactic, structural, semantic, and contextual. Syntactical – represents the syntactical representation of the element (e.g. XML), Structural – represents the structure or information of the element (e.g. data, meta-data, Schema), Semantic – represents the understanding or the interpretation of the element (e.g. local knowledge), Contextual – represents the extended semantics with many interpretations for the same entity (e.g. global knowledge).


Figure 2.5: Dimensions of interoperability (adapted from [Obrst 2003])

These dimensions do not influence the capabilities to the same degree; the Figure 2.5 shows the repartition of the influence of these dimensions over the elements of interoperability. We have to add to the original Figure 2.5 adapted from [Obrst 03], the contextual dimension, which we believe is present implicitly at every underlying interoperability process. Indeed, this contextual dimension (also known as pragmatic, view point, granularity, etc.) is present throughout the levels and with each element. Therefore, we can formalize the interoperability as a tuple I=<L, E, D> where: L: represents the levels of interoperability such as isolated, connected, etc; E: represents the elements defining what interoperability can be applied on, such as data, object, etc.; D: represents the dimension of interoperability and includes: syntactic, semantic, structural, and contextual. We have to exclude L0, which represents level 0, from the instances of this model, e.g. I=<L0 , Data , Contextual> , because interoperability cannot be established between isolated systems.

2.2.2.3 Reusability The object oriented approach of frameworks and design patterns consider reusability and capitalizes on the experience for future use. For example, Java programming language (e.g. Java Beans and Enterprise Java Beans) for implementation, allowed the use of some objects belonging to a particular activity in a different one. Nowadays, reusability is well attained at a software development level, with the reuse of objects and modules by the programmer for building new systems without re-inventing the wheel. Reusability, by definition, deals with the adaptation of applications or modules to different situations. It defines the degree of adaptability for a software module or any written code to be used in more than one computing program or software system [Russ 99]. In other words, it identifies a quality that measures the extent to which a program or process can be used in different contexts from which it was originally designed. Quantitatively, reusability can be evaluated by considering how many modules implementing each activity would be used in another activity.


Furthermore, reuse should be thought of in the early stages of software development. Therefore, well-planned and comprehensive technical architecture views facilitate integration and promote reusability across systems and compatibility among related architectures. As part of a disciplined process to build systems, technical architecture reduces information technology costs across an organization by highlighting risks, identifying technical or programmatic issues, and driving technology reuse. Additionally, adherence to a technical architecture streamlines and accelerates systems definition, approval, and implementation. The need for redundant, functionally equivalent applications is diminished, since applications can be shared as readily as data at this level. Decision-making takes place in the context of, and is facilitated by, enterprise-wide information found in this global information space.

2.2.2.4 Dimension of reusability Lack of reusability of software components is a universal problem in computer science. Nevertheless, we cannot avoid the ambitious goal of how to make knowledge and component modules reusable. Most projects do not deal with capitalization and reuse due to the delicate semantics sharing of knowledge related to conflict resolution [Couturier 03]. Existing works propose conceptual/software architectures dedicated to the cooperation of IS and provide prototypes allowing the validation of used concepts. They do not, however, deal with the process of knowledge reuse, nor the design of reusable elements that facilitate the development of future cooperative applications. Despite this, reusability and usability can be considered as incompatible by pulling up and down the specification and the level of abstraction. It is consequently difficult to simultaneously obtain high usability (i.e., adequacy for a specific use) and reusability (i.e., adequacy for several uses) [Russ 99]. Indeed, a design requires specificity, and specificity prohibits reusability. Or conversely, reusability requires generality, and generality hinders design; it is an all-or-nothing proposition. To be effective, a fair balance between reusability and usability should be considered by the system architect. We can stratify reusability into two sets including: knowledge reusability and elements reusability. On one hand, the reusability of knowledge is one of the most serious issues because un-reusable knowledge is worthless in a knowledge base. Therefore, no one tries to accumulate such knowledge. Thus, lack of reusability causes the following two drawbacks: building an EIS will require a tremendous amount of work because it is always built from scratch, and knowledge embedded in EIS will not accumulate well. The knowledge reusability can be realized through the definition of a library of patterns (e.g. design patterns) [Gamma 94] extracted from existing models and processes. On the other hand, reusability of elements should not be ignored because of their crucial help for speeding up the process of creating new systems. Elements of reusability can be seen as an effective implementation of frameworks based on patterns including many levels of software components such as object, package, processes, and applications programmer interface (API). We realize the need for declarative representations, which would have as many generalization concepts as possible in order to maximize the possibilities of reusability. At the same time, these representations must correspond as closely as possible to the things and processes they are supposed to represent.


2.2.2.5 Common enterprise understanding: Another concern of sharing semantics consists of reaching semantics at the enterprise level of EIS and enabling communication between people and organizations referring to many context dependent systems. Communication covers human and machine communication. Human oriented communication occurs between users and community at the enterprise level. Machine communication is classified as an interoperability issue. For query answering, one of the most important issues is to reveal what is the meaning of a term used in a query (i.e. understanding user queries). Since a query defines a user context, the used semantics may generate many queries and involve many systems and contexts. Furthermore, query answering involves studying how a global query would be propagated, how data answers are collected, and how data is intercrossed to generate relevant information. For example, ontologies contribute throughout the problem of query answering by understanding query meaning, identifying the meaning of documents that would satisfy the query, and obtaining only meaningful, relevant documents [Ceusters 03]. A formalization of query and user context is given in [Tzitzikas 04]. Specifically, it studies context based queries and flexibility of these queries in a taxonomy-based source. This formalization consists of C=<S,q,A>, where C denote user context, S is called the source view of C, q is called the query of C and A is called the answer of C. We denote S(q)=A, the query q on a source S in a given context C having the answer A.

2.3 Summary The problem of semantic sharing is becoming commonly known in EIS due to many factors such as multiple views, contextual requirements, semantic heterogeneity, the diversity of organizations, users, applications, abstraction mechanisms, etc. This chapter has studied EIS semantics sharing for improving interoperability and reusability. It has pinpointed new challenges for EIS common perspectives including organization objectives and long term goals (operational, tactical, and strategic levels), autonomy, communication, virtual organizations, distributed enterprise, etc. To conclude, we showed the need for a framework that enables EIS to overcome communication disability, cooperating deficiency, and unfulfilled requirements through explicit semantics. This framework should consider the multiple contexts, views specification, and user perspectives. The machine understandable semantics should cover rich modeling and context enabled facilities. In order to show thoroughly the EIS weaknesses, we consider analyzing some of them. In the next chapter, we will study actively two EISs, belonging to different categories of EIS classification; these being the data integration and the data exchange platforms (e.g. Data warehouse and EDI systems).

3. CHAPTER 3

IMPROVING ENTERPRISE DATA INTEGRATION AND

DATA INTERCHANGE PLATFORMS

(Description of Mapping Expression Model & Implemented Tools)

"If we knew what it was we were doing, it would not be called research, would it? "

Albert Einstein

Enterprise data integration and exchange platforms include a range of tools and software, which are responsible for building and managing integration and exchange functionality for a set of enterprise information systems. We concentrate in this chapter on two systems covering enterprise data integration and data exchange, which are, respectively data warehouse (DW) and Electronic Data Interchange (EDI) systems.

Chap 3- Improving Enterprise Data Integration and Data Exchange Platforms 66

On one hand, data warehousing is an essential element of decision support. It aims at enabling the knowledge user to make better and faster daily business decisions. In order to supply a decisional database, meta-data is needed to enable the communication between various function areas of the warehouse and an ETL tool (Extraction, Transformation, and Load) is needed to define the warehousing process. The meta-data contains data dictionary and repository and it describes warehousing process, data storage and information delivery. Current ETL does not take into consideration the metadata contents, which would help to reduce the update cost of this tool. On the other hand, EDI is characterized by the possibility of sending/treating messages between information systems without any human intervention. A large number of translations are needed in order to enable the communication between an enterprise and its suppliers and clients. Therefore, translator should be used to enable a large number of translations between source and target messages. Furthermore, we identify an increasing role for XML based techniques with these preceding platforms. Indeed, for data integration XML enables data to migrate from relational databases and other sources into future applications. It integrates structured and unstructured data to present new application and management opportunities. Furthermore, EDI is evolving toward XML based exchange with techniques such as eb/XML, EDI/XML, etc. We will study the usefulness of this wide spread standard with these platforms. The goal of this chapter is twofold; it presents at first enterprise data integration and data exchange platforms. It discusses the architecture of data warehouse systems and points out the need for communicating ETL tools with metadata. Indeed, the formalization of these metadata is still missing especially the part concerning the mapping from many sources to a target. The chapter investigates as well the issue of translation between different EDI messages. It clarifies the mode of use for an EDI translator as an enterprise data exchange platform and points out the need for a clear transformation model. We conclude by identifying the need of many applications (i.e. ETL, EDI translator) to establish a mapping meta-data or mapping guideline. Secondly, the chapter defines a model covering different types of mapping expressions that can be used for resolving the preceding platforms problems. For instance, this suggested model is used to create an active ETL tool, which incorporates queries to achieve the warehousing process, and to simplify the EDI message translation mechanism. Finally, we suggest the use of matching techniques to generate automatically the match between the message's components to create a new generation of EDI translators. This translator differs by being semi-automatic and easy to use for managers and people with limited technical skills. The chapter is organized as follows: in the next section, we will glance over DW components and its architecture. In the section 3, we will present EDI technology and its components and we will study the message translation with some real used examples. Section 4 will define the first contribution of this chapter including mapping guideline, mapping expression, and their models. Section 5 will describe the design of QELT (Query based Extraction, Load, and Transformation tool) as a practical use of the predefined model. In addition, a discussion for a semi-automatic translator tool for EDI messages is included.


Figure 3.1: Data warehouse System Architecture

3.1 Data warehousing systems: Data warehousing is an essential element of decision support, which has increasingly become a focus of the database industry. It provides a collection of data used for decision-making and is implied for improving performance and decision support in the enterprise. Therefore, data warehouse (DW) system helps to analyze and watch the economical indicators of an organization, to generate information from data, and to bring together information for watching evolution. The success of data warehouses implementation for business intelligence activities implies an increasing demand for new concepts and solutions [Quix 99]. It includes growth across platforms, tools, and applications.

3.1.1 Data warehouse Components:

The construction of a DW is divided into two stages known as back room and front room. The first ensures supplying the warehouse database; the second provides the restitution of data from data marts in order to fulfill analyst’s demands. According to standard data warehouse architecture (Figure 3.1), the data warehouses systems are composed of:

• ETL or warehousing tools which are a set of tools responsible for preparing the data constituting the warehouse database. It provides collecting and organizing data coming from different sources in a coherent environment in order to proceed into analyzing and building the indicators for running the enterprise.

• Restitution tools which are diverse tools, which give the analysts the possibility to make business decisions. They include systems such as data mining, executive information systems, or OLAP (On Line Analytic Processing) system, which operates with many representation and modeling techniques such as star schema, snowflake schema, data cube, pattern rules, etc.

• Meta-data which brings together the data about all the components inside the DW. In general, the meta-data should cover the acquisition, access, and distribution of warehouse data and should be the key to provide the business user with a complete map of the data warehouse.


3.1.1.1 ETL tools: For data warehouse system, ETL (Extraction, Transformation, and Load) represents the warehousing tools or population tools. These tools are globally responsible for the constitution process of the data warehouse [Chaudhuri 97]. They are used to supply DW with clean data coming from various operational systems. This process includes a mapping transformation for the data, which must take into account the sources and targets heterogeneity. An ETL proceeds by three steps in order to accomplish its job:

• Extract: it includes the extraction of the data by defining the selection parameters with respect to certain conditions.

• Transform: it includes the cleansing of the data, elimination of redundancy, construction of aggregations from original data, generation of substitution keys, etc.

• Load: it includes applying the target data model, verifying the integrity constraints, etc. Many commercial products and services are now available, and all of the principal vendors of database management system now have offerings in the ETL area. Thus, a diverse range of tools exists with different capabilities such as DTS, Data Stage, Sagent, Informatica, Data Junction, Oracle Data Warehouse Builder, DB2 Warehouse Manager, etc. These tools differ by their performance, data source integration, use complexity, platforms, etc. A subset of their capabilities is described in Table 3.1, after being tested with a comparison study1 [Rifaieh-a 02].

Product Characteristic

DTS ORACLE DW BUILDER

DATA STAGE SAGENT

Performance Average

High with SQL Sever

High

With Oracle

Depends on data

source

Depends on data

source

Maintenance Updating ActiveX

programs

Compiling modified

programs

Modifying used

objects

Modifying plan

content

Reusability Defined functions Defined functions Set of objects or plan Reusable plan

Access to meta-data None None Meta-data stage None

Plan for process extraction

Plan with timer Plan with triggers Event management Sagent automation

Data sources (Data integration)

Txt,

Via ODBC OLEDB

(MSAcces…), …

Flat files, Oracle, Sybase,

Via ODBC OLEDB

(MSAcces…), …

SQL Server, Oracle,

DB2, Txt, VSAM,

Informix, …

SQL Server, Oracle,

DB2, Txt, VSAM,

Informix, …

Table 3.1 ETL tools capabilities

Otherwise, the existing tools have common limits such as:

• The use of a specific format: most of the tools use an owner format for integrated data to establish the warehousing process. Thus, the communication between different tools becomes

1 This study has been achieved at Tessi Informatics.


hard. This communication is needed to merge different tools in order to improve the warehousing process.

• The maintenance: is complicated where a set of programs has to be updated. An update takes place in two steps; one time by updating the meta-data, and another time by modifying warehousing programs.

• A limited interaction with meta-data: meta-data is used very passively because it is query limited.

Moreover, warehousing tools have another challenge to provide availability, task management, and evolution support. Data integration and reuse possibilities are wide open but not yet very well realized. Although some tools provide reusable functions, these solutions are still limited. Indeed, existing functions do not allow users to utilize an existing transformation plan and specify it with parameters to create a new data warehouse. For instance, the maintenance process is very difficult; if a user wants to change a SKU (Stock Keeping Unit) number definition from five digits to seven, how many programs need to be changed to affect this enhancement? For most of the existing tools, in order to achieve this operation, a query has to be formulated into the data dictionary. Then, the user has to update all the concerning programs.

3.1.1.2 Restitution process: The restitution process constitutes the background that enables business intelligence. It lets organizations’ managers and decision makers, access and analyze information coming from diverse operational systems through data marts and data warehouse. The restitution process is mostly based on OLAP (On Line Analytical Processing). We can differentiate between OLTP (On Line Transaction Processing) and OLAP with many factors. The first treats dynamic data and is application oriented. It is used by diverse employees with predefined queries and real time data access. The second treats static data and is subject oriented. It is used by analysts with on-use queries and a huge volume of data, which are historically cumulated. A set of existent tools deploys data via a multidimensional model and data mart with snowflake or star models and multidimensional cube. These tools offer operations such as roll up & drill down, Slice & Dice. In this category, we can also identify Microstrategy1, Business Objects2, Sagent3, etc which are commercial implementations of these techniques. Other sophisticated tools based on data mining, provide users with capability for knowledge discovery of their data warehouse.

3.1.1.3 Meta-data Management: Traditionally, meta-data is data about the data. The meta-data allows the various function areas in the warehouse to communicate. It encompasses all corporate resources: database catalogs, data dictionary, and data models [Stöhr 99]. In other words, meta-data is the essential information that defines the what, where, how, and why about the used data. It can range from the conceptual overview of the real world to detailed physical specifications for the particular database management system [Sampaio 01]. Data dictionary includes things like the name, length, valid values, element descriptions, type 1 http://www.microstrategy.com 2 http://www.businessobjects.com 3 http://www.sagent.com


descriptions, attribute property descriptions, and process method descriptions. Data models contain the sources’ data models, targets model, and data marts models. Metadata repository is the vehicle of metadata. It contains tools to facilitate the manipulation and query of the metadata. As systems have increasingly become more diverse, distributed, and complex, the management of the meta-data has become increasingly difficult.

3.1.2 Issues for improving data warehouse systems:

In this section, we study the current issues of evolving and improving data warehouse systems. We will show the challenging issues along with the usefulness of XML based-techniques and their alter changes.

3.1.2.1 XML vehicle for improving Data warehouse The integration of data from heterogeneous sources is the essential stage of the warehousing process. These sources could arise from existing legacy systems, e-business, B2B activities, ERP, etc. XML can contribute in warehousing process: data integration, cleansing procedures, data storage, and front-end information delivery. We study in the following how XML and its standards support data warehousing process. Therefore, XML could be used in all the process, especially in legacy data extraction, input transaction capture, cleansing procedures, direct storage of XML, and front-end information delivery. Source Data Integration: The integration of data from different sources is the essential stage of the warehousing process. If the systems submit data in XML format, then partners sharing a common XML schema can transmit and receive data. Although older legacy applications may need to be retrofitted with XML writers, many modern systems are equipped to write data in XML format. Existing relational databases already support query output directly in XML form [Rifaieh-c 02]. The ability to read remote data via XML greatly simplifies data extraction, because custom parsers for the remote data source do not have to be written. This, in turn, promotes a more distributed and opportunistic approach to a spread-out "data web-house". Data web-house is a data warehouse where sources data come from XML web documents; it constitutes an essential search engine for a collection of web data [Xyleme 01]. Native XML Data Storage: Native XML databases [Bourret 04] [Chadudhri 03] can be used to store the core of XML-based data warehouse. They provide the possibility to integrate semi-structured data inside the data warehouse. This direct XML storage can fulfil the needs of storage support for data web-house. In this category, we can identify Tamino (http://www.softwareag.com/tamino/), Natix (http://www.dataexmachina.de/natix.html), and Lore (http://www-db.stanford.edu/lore/). Hybrid scenarios in which data in XML form could be joined inside the database directly to data in conventional relational tables is also interesting. Indeed, market DBMSs offer the capability to bulk load data inside relational table from XML sources [Kappel 01]. This step is done with the help of XML schema. Front-end Information Delivery:


More and more sophisticated tools are used to deliver information outside the data warehouse. The widespread deployment of XML will be helpful in removing query and reporting tools from end users' terminals. An XML data transfer, with an associated XSLT formatting specification, is enough to produce any desired user interface presentation on a remote browser. Often the information in data warehouses is published to a company's intranet web site. HTML is actually used to build these sites. Thus, restitution tools should provide the ability to generate HTML pages from the warehouse database [Rifaieh-c 02].

3.1.2.2 Needed Data warehouse features: Efficiency: XML is a neutral format, which can be used to enable the communication between different warehousing tools. No owner format is needed to link and merge different warehousing tools. Indeed, it is useful sometimes to divide the process and use a tool for extraction another one for transformation etc. The efficiency of the global process will be higher, especially when tools have good performances within different steps. We can chain a high-level extraction tool with XML generation capability and a high level transformation tool with XML input, etc. Since XML is used in all the tools, no extra transformation between different owner tools formats is required [Rifaieh-c 03]. Maintenance: The business rules of an organization are never the same, due to changes in the real world. So far, how do we maintain consistency when business rules change as a result of corporate reorganizations, regulatory changes, or other changes in business practices? How many places are impacted for each of these potential changes? The user has to update all the concerning programs. To make the maintenance process work better, the warehousing tool has to be updated automatically. XML presents the ability to specify the transformations using XSLT. If the system is able to generate automatically a new transformation, no extra update will be needed to enable the evolution. Scalability: An important aspect in achieving the scalability is that changes in the implementation on either the application system side or the data side are not changing the transported data format between the two applications. This means that the client application or the server application does not need to be aware of that change. One way of achieving this goal, is to move the data in the standardized format between these components. Currently, the main format that is used in all the scenarios is XML. The idea is to take the data, in the data system or in the application system, move it into the standardized format of XML, and transport it between the different systems.

3.1.2.3 Evolution of Business rules: A data warehouse system is a complex system whose components evolve frequently independently of each other [Quix 99]. The enterprise model can evolve with the enterprise objectives and strategies. The technical environment changes with evolution of products and updates. The software components can be upgraded, completed, etc. Moreover, the warehousing tools can obtain new schedules or new algorithms. The existing DW tools do not deal with the evolution of meta-data. Hence, meta-data of business warehouse contains, for instance, the management rules. These rules are never the same, due to changes in the real world. They define which elements are supposed to be decisional and how they are calculated. So far, how do we maintain consistency when business rules change as a result of corporate reorganizations, regulatory changes, or other changes in business practices? How many places are


impacted for each of these potential changes? In order to answer this question, meta-data access should be improved. Moreover, the evolution has to propagate automatically to the warehousing process. Ideally, no program needs to be modified to enable new decision elements or new formula for restitution by an analyst worker [Rifaieh-b 02].

3.1.2.4 How Meta-data Improves Data Transformation: Meta-data are used to answer administrator queries. These queries offer information concerning the structure, models, and the warehousing process. Although, the set of meta-data is passive and query limited, it can be sometimes used by automated tools to improve the data interpretation /exploitation. A passive use of meta-data implies two sorts of updates: in the meta-data repository, and in the tools that handle data into warehouse. The maintenance of these tools to provide evolution is a hard task and time consuming. The idea of making meta-data active, can improve the warehouse systems and the warehousing process. The solution can be performed with a traditional transformation query with an automatic query generator. Therefore, we suggest generating automatically queries from meta-data to keep the warehousing tool updated [Rifaieh-b 02]. A QELT (Query based Extraction Load Transformation) tool is suggested later on in this chapter. To sum up, the studies of data integration and warehousing tools are widely open, but not yet very well realized. Currently, few of the tools communicate directly with meta-data repository. In order to improve existing ETL tools, we have to consider creating a model for representing the meta-data for data warehousing process and making these meta-data active to dynamically create the needed Extraction, Transformation, and load process [Rifaieh-a 02].

3.2 An Analysis of EDI Message Translation and Message Integration Problems

Electronic data interchange is considered as primary data exchange platform for organization. It has been supporting B2B e-commerce for about 25 years. Nowadays, it still offers significant benefits and opportunities to vendors and enterprises alike. Despite long-standing predictions, electronic data interchange is not dead. EDI continues to evolve to meet the change in enterprise requirements and deliver significant benefits to enterprises across a broad range of industries. Therefore, enterprise IT decision makers cannot ignore the enduring importance of EDI. Recent research report [Kenney 02] proved that pressure from new Internet-based entrants is making EDI more competitive through a combination of these two technologies. However, message translation is a critical component of any enterprise EDI system. A growing number of applications give enterprises new opportunities to implement cost-effective translation solutions [Kenney 02] [Omelyenko 02]. The providers of EDI translation tools, for example, are developing application integration strategies that lever up the powerful translation engines at the heart of comprehensive integration broker suites. This well-established technology can be a valuable element of an application integration strategy, if it is properly managed.


Figure 3.2: An EDI Translator's Architecture However, for Internet based approach, XML is far from being the silver bullet that we all thought would save us from the inadequacies of other forms of electronic data interchange [Segev 97]. Despite that the advent of XML-based standards, there are many variants of document and message formats that need to be translated. Message-oriented middleware (MoM) systems also typically require extensive message mapping to facilitate systems integration. Existing EDI, XML, MoM messaging, and translation tools provide few high-level supports for message translation, requiring developers to write complex programs or scripting code. Most EDI message translations are done by hand-coded applications. For XML, various XSLT-based translation tools exist, but these have limited expressive power and require considerable effort to be used. The MoM integration tools also provide limited translation capabilities and limited visual formalisms, requiring scripting and coding [Grundy 01],[Stonebraker 01], [Fensel 02], and [Yan 02].

3.2.1 EDI Technology:

EDI is characterized by the possibility of sending/treating messages between information systems without any human intervention. The emergence of the Electronic Data Interchange enables companies to communicate easily (e.g. Send Orders, Funds Transfer, Acknowledgement of receipt, etc.). Many companies treat different types of messages and standards. The use of a translator between these messages is critical for enabling different partners’ communication. A recent study about the position of EDI in the market [Kenney 02] proves an adequate profit for enterprises. For instance, the messages standard SWIFT1 carried over 1.5 billion messages in 2001. The average daily value of payment messages on SWIFT is estimated to be above USD 6 trillion. Furthermore, this standard is being improved to ensure security control with messages and fund transfer [Zhu 02]. The EDI technology uses tools enabling translation, communication, and administration to establish the communication between partners. First module contributes to extract the data from the enterprise database and create a message with respect to an existing standard. The translator constitutes the core of any efficient EDI tool (Figure 3.2). The communication ensures the exchange layer whereas an administration unit is useful to control exchange, daily report, etc.

1 http://www.swift.com


Figure 3.3: Scenario of translation between SWIFT MT 103 &MT202

3.2.1.1 Existing standards: In order to normalize used messages between different communities, some international organizations tried to define standards for the use of EDI messages. The ONU/CEFACT consecrated EDIFACT1 and ANSI created X122, etc. These different standards define a set of messages to be used between partners. Sometimes, these standards define the same type of messages but with different representations. Therefore, these messages use different structures and treat the information in a different manner. The selection of the standards depends of the domain of applications. In reality, influential enterprises force their medium suppliers to use their own EDI Standard. In contrast, a vital supplier imposes to his client to use his protocol. Therefore, many enterprises having different clients and suppliers need to communicate with different standards. Thus, the enterprise information systems should be able to send and receive information without limits.

3.2.1.2 Components of EDI message: Each standard defines a set of messages to be used in a domain of application. EDIFACT (EDI for Administration, Commerce, and Transport) is used across its based area. These messages are made up of hierarchical record structures (segments, records and fields), encoded into a serialized form for transmissions. Each message includes a set of segments structured with a specific order beginning with a header and closing with an ending. This structure of messages and their components are standardized and known by the branching diagram (e.g. ISO 9735 EDIFACT).

In order to facilitate the use of a message, the branching diagram is associated with a Message Implementation Guideline (MIG). The MIG is used between collaborators to identify the semantic of data during the exchange. For example, to exchange information not existing in the standard message,

1 http://www.unece.org/cefact/ 2 http://www.x12.org/


the collaborators can decide to use an unused field to contain this information. This specific use should be stated in the MIG.

3.2.2 Our EDI Translator:

A translator is needed to communicate the inner system of enterprise with outer systems by sending EDI messages. An EDI translator aims at enabling the extraction of data from a system file and the conversion to EDI message and vice versa. It provides syntax errors checking, validating rules and constraints, and parsing the message (Figure 3.2). An EDI translator is able to translate the information form a message A to a message B whether or not a mapping table or mapping rules exist between the messages. A translator can be mono-standard, meaning that a translation to one standard is expected. Other translators are multi-standard having many standards as input/output. Being an universal translator identifies a translator that is able to make translation between all standards. For a universal translator, other tasks could be imagined such as translation from fax or any other enterprise portal to any other format. Moreover, the translation process can be considered between messages, which belong to the same standard. The Figure 3.3 shows an intra-standard message translation between SWIFT messages MT 103 and MT 202. The inter-standard translation should be also considered. For example, if a client in France buys articles from a supplier located in the USA, the client sends a demand to his bank to pay his supplier (Figure 3.4). The client has already established an agreement with his bank for an electronic data interchange. They use EDIFACT messages (PAYMUL, PAYEXT, etc.). The bank has another agreement to make the international transfer commands with SWIFT Standard. An enormous effort is needed to convert existing information in the PAYEXT message to SWIFT's MT103 message. This work includes the identification of data concerning client, supplier, amount, invoice number, etc.

3.2.2.1 Existing translators: Existing solutions differ by being visual, script, or program based. Existing market solutions include Gentran (www.sterlingcommerce.com), Edifecs (www.edifecs.com), Mercator (www.mercator.com), GXS (www.geis.com), DataBroker (Medita) (www.telino.com), Symphonia (www.orion.co.nz), SAIOS (www.saios.com), etc. A visual translator tool to help the integrator is essential to reduce the time of implementing translation between messages. Other translators use a pivot format during conversion between different messages. Flat files and database tables are used to achieve this step. Using flat files needs an appropriate parser to yield and render a desirable message. Database tables complicate the process because of use with extended database constraints and queries. In [Grundy 01] and [Stonebraker 01], translators using a visual language are presented. The visual language is used to define the mapping between the source and the target message. In Table 3.2, we defined the essential features needed to evaluate any market translator. Therefore, studying preceding tools with the parameters of Table 3.2 helped us to identify the common issues of existing tools:

• Define manually the mapping between messages, • No hint is offered to discover the mapping between elements, • Limited visual mapping,


• Difficult to be used by users that are not IT (Information Technology) professionals, • Long process to define a large number of transformation or mapping, • Complicate developers tasks;

Very

Important

Fairly

Important

Somewhat

Important

Not very

Important

Error reporting and recovery X Debugging X Graphical monitoring X Openness X Ease of use X Breadth of sources and target supported X Visual mapping interface X Transformation language and script X Web services support X Performance and throughput X Parallel processing X Platform support X Accepted networks X Integration with existing information systems (API) X Protocols (FTP/SMTP) X Archiving files X Security X Documentation X

Table 3.2 Translators features

We suggest the creation of a new EDI translator which functionality is described in the following sections. Moreover, to eliminate the preceding constraints, we suggest a semi-automatic matching algorithm to our EDI Translator.

3.2.2.2 Translation unit: As previously asserted, a translator ensures a set of tasks such as constraints validation, errors check, etc. In this section, the core of the translator known by the translation unit interests us. We consider a translation between comparable messages. By comparable, we mean messages treating the same type of information, for example, ORDERS of EDIFACT and 850 Purchase Orders of X12; X12-820 corresponds to PAYMUL of EDIFACT and MT103 of SWIFT.

Let M1 and M2 be two comparable messages of two standards N1 and N2. RM (1, 2) is a set of rules used to translate between M1 and M2. Therefore, RM (1, 2) defines the mapping from the representation A1 of M1 to A2 of M2. RM (2, 1) is the needed rules between A2 and A1.


Whether RM (1, 2) is reversible between the elements of A1 and A2, RM (2, 1) = RM (1, 2) –1, it will be sufficient to define RM (1, 2) and use these rules by the opposite way to return to the original message. Thus, for n comparable messages, we need n2n − rules. Such result is fair (acceptable) if n ≤ 3. This implies the need of a pivot format to reduce the number of needed rules when 3>n . Apparently, we can use one of the messages (Mi) as a pivot message during translation. Hence, for n messages we need n-1 rules. Nevertheless, it is not easy to choose the message Mi. This solution can generate a loss of data if Mi is not chosen properly. Indeed, the structure of Mi is predefined and non-extensible to include new data existing in a new message Ms. This drives us to look for an extensible pivot format. Let P represent the pivot format chosen for the exchange of messages M1, M2, …, Mn. Thus, for n messages we need to define n rules. One issue still not resolved is how to validate the pivot format and the information and create associated tools to manipulate this format. This leads us to choose a special format bounded with validation and manipulation tools [Rifaieh-b 03]. We suggest using XML as a pivot format within the translation unit of our translator. Therefore, for n messages we need n transformation rules. The XML format is extensible, thus, we have the possibility to define the existent messages and to extend where it is needed. We have also the possibility to validate the messages with the help of DTD (Document Type Definition) and XML Schema. Using an extensible format such as XML makes the translation engine take advantage of transformation languages associated with XML, such as XSLT, XPath or XQuery. This pivot format is also open-standard instead of using an owner format. The integration of data content of the message with other systems might be easier from the XML files, than ASCII flat files of EDI messages.

3.2.2.3 Translation scenarios: We define in this section some scenarios of translation between EDI messages. We have identified four categories of translation scenarios. M1, M2, and M3 are the used messages, having respectively G1, G2, and G3 as structure, let G be the generic structure for the messages M1, M2, and M3, denoted G=<G1, G2, G3,…>. Simple Translation (1:1): Let M1 and M2 be two comparable messages belonging to two different standards. These messages use a generic structure G. The translator enables the conversion of the information existing in the source messages and formatting the target message. Thus, M2 = (Select M1 Condition C1 Fields X1… Xn) With Mapping G1 to G And Mapping G to G2. M1 = (Select M2 Condition C2 Fields Y1… Yn) With Mapping G2 to G And Mapping G to G1. Example: the translation between the message ORDERS of EDIFACT and Purchase Orders 850 of X12, is performed as following: first read the information in the message ORDERS and fill the pivot structure with the data. G is the pivot format having a generic structure of messages. Therefore, the mapping rules between G1 and G is known. In order to generate the target message, we can use a selection with conditions based on the values of many fields. This simple translation is the most used between messages.


Figure 3.4: Scenario of Use of EDI message translation between EDIFACT & SWIFT

Translation by slicing (1:n): Let M1 be an EDI source message having G1 as structure, M2 and M3 are target messages having respectively G2 and G3 as structures. The messages M1, M2, and M3 are not necessarily comparables. The translator enables the reading of the source message M1, storing the data in the pivot format having a generic structure G. A mapping between G1 and G is performed. The second stage consists of selecting the data from the pivot message with precise conditions and rendering the target messages M2 and M3 by using the mapping between G G2 and G G3. M2 = (Select M1 Condition C1 Fields X1…Xn) With Mapping G1 to G And Mapping G to G2 M3 = (Select M1 Condition C2 Fields Y1…Yn) With Mapping G1 to G And Mapping G to G3. Example: The EDIFACT’s message PAYMUL contains the data FII (Financial Institution Information) that defines the information about the establishment (e.g. bank, company, etc). The data MOA (Monetary Amount) contains the amount of the transfer. To enable better costumer service, the bank needs to generate two different messages from the initial message. The target messages are acknowledgment of receipt for the sender and an inter-bank transfer message with Swift Standard. The same information is used to generate these messages. MOA will be included in the Swift message with the FII. A message to confirm the client's account debit including the MOA and the FII is needed also. These messages use different structures than the original message with the same information. Translation by grouping (n:1) : In this scenario, the translator ensures the reading of the source message M1 and M2, stocking the data in a generic structure and formatting the target message M3 with respect to the structure G3. M3 = Merge [( Select M1 Condition C1 Fields X1, …,Xn) With Mapping G1 to G), ( Select M2 Condition C2 Fields Y1, …,Yn) With Mapping G2 to G)] with Mapping G to G3. Example: If we carry the last example, the bank centralizes received messages from many clients and tries to send to its head branch a Swift message with many requests for transfer. The translator should create a message and gather all the received data in a new message structure in order to send it.


Translation by Slicing and Grouping (n:m): This scenario assumes having many translations by grouping which are applied individually over n sources and 1 target. Since the target messages are not interacting between each other, we can thus considere this type of translation as m translations by grouping (n:1) placed side by side.

3.2.3 Challenging issues for our EDI Translator:

EDI challenges include supporting a huge number of messages and fulfilling companies’ needs to treat different types of messages and standards. The use of a translator between messages is critical for enabling different partners' communication. After studying the message translation and conversion process, we perceive the following issues: a large number of translations are needed to bind the enterprises with their clients and suppliers; existing standards define the same type of messages but with different representation; and the messages use different structure and treat the information in a different manner. Meanwhile, a mapping model, which is capable to define the transformation process between different messages, is missing. We suggest studying this issue from EDI message translation and the integration of data warehouse schema. Furthermore, we study using a semi-automatic translator, which can help developers to reduce the time of identifying the matching between the components of two messages [Chukmol, Rifaieh 05].

3.2.3.1 Uniform Message Representation: We defined in this section the usefulness of a uniform representation for the translation of many EDI messages. We proved that using an extensible format such as XML is helpful to unify the messages translation throughout a pivot format. In order to verify this issue, we studied the messages of fund transfer used between clients with their banks and banks between each other. The messages belong to the standards EDIFACT and SWIFT. We defined a uniform message representation for this domain of use (Appendix G) [Rifaieh-b 03]. We carried out the UMR with XML to encompass all data coming from the fund transfer messages. A mapping using mapping expression formulae is needed to map the original message to the pivot format; and another mapping expression helps to render the destination message. The use of UMR (Uniform Message Representation) can reduce consequently the number of rules and mapping needed to integrate a large number of comparable messages in the EDI system. We performed a mapping between EDIFACT PAYEXT message and SWIFT MT103 by using this pivot representation. The use of UMR in the translation unit may be seen as akin to the use of mediator schema (or global schema) for the integration of data from many sources [Pottinger 02]. The comparison of the approach used in the database, and the one briefly discussed in this section, is out of the scope of this thesis.

3.2.3.2 Schema matching applied to message translation: We showed that a part of the message translation problem is translating between different message schemas. Message schemas may use different names, somewhat different data types, and different ranges of allowable values. Fields are grouped into structures that also may differ between the two formats. For example, one may be a flat structure that simply lists fields, while another may group


related fields. Alternatively, both formats may use nested structures but may group fields in different combinations. Translating between different message schemas is, in part, a schema-matching problem [Rifaieh-a 03]. As defined in [Rahm 01] for database schema, a match operation would reduce the amount of manual work by generating a draft mapping between the two message schemas. The application designer would still need to validate and probably modify the result of the automated match. Nonetheless, the problematic of matching and mapping in relational database schema is simpler versus in branching diagram of EDI messages. Indeed, branching diagram is a complicated tree structure with n levels of depth whereas relational schema is 1 level of depth. Therefore, the description of mapping correspondence for relational database schema defined in [Miller 00] is less complicated than matching tree structure defined in [Kurgan 02]. Nevertheless, the authors in [Kurgan 02] treat the (1 to 1) mapping what it could not be applied to EDI message translation. However, we argue that using schema-matching techniques to identify the mapping between message schemas can reduce considerably the task of users. We are talking about 80% of the work needed to define a mapping between two messages. In essence, this technique is useful when no mapping guideline is defined between the source and the target messages. This is the actual situation of mapping between SWIFT and EDIFACT messages. For difficulty matching between elements, a heuristic match will be presented to be validated by the user [Madhavan 01]. An EDI messages translator applying such techniques will go beyond any existing tool. Moreover, if we add the machine learning technique (learning algorithms) to enable the translator to learn from existing mapping scenarios, as used in [Yan 02] the translator will be unique. Thus, the simple EDI translator will become an adequate translation engine for a large number of translations.

3.3 Mapping guideline & mapping expression model: In this section, we are interested in the representation of the mapping guideline. By mapping guideline, we mean the set of information defined by the developers in order to achieve the mapping between the attributes of two database schemas or the elements/fields of two EDI messages’ representations. We identify mapping expression of attribute/elements as the information needed to recognize how a target attribute/element is created from the sources attributes/elements.

3.3.1 Existing solutions:

Currently, different kinds of mapping guidelines are used for many applications. Traditionally, these guidelines are defined manually during the implementation of the system. In the best case, they are saved as paper documents. These guidelines are used as references each time we need to understand how an attribute/element of a target schema has been generated from the sources attributes/element. This solution is very weak for ensuring the maintenance and the evolution of the system. To keep updating these guidelines is a very hard task, especially with different versions of guidelines. To update the mapping of an attribute or element/field in the system, one should include an update for the paper document guideline as well. Thus, it is extremely difficult to maintain such tasks, especially with simultaneous updates by different users.


Therefore, we need to put the mapping guideline inside the system meta-data and use this meta-data to reduce the complexity of manual definition of mapping expression.

3.3.2 Mapping expression examples:

We can identify some examples of the mapping expression identified from different types of applications.

• Break-down/concatenation: in this example, the value of a field is established by breaking down the value of a source and by concatenating it with another value (Example.1). For instance, the MT103 SWIFT EDI message contains the field :23:B that includes the date, currency, and the amount (DD/MM/AA XYZ NNN) all concatenated. In order to find the date we have to break down the value of this field and take the 6 first digits.

• Arithmetic operation with multiple data: in this case, an arithmetic function is defined. This function calculates the result value of the target attribute. The example uses a reference table (rate) in order to calculate the precision (Example.2). Therefore, converting between messages and changing the currency should use a reference table.

• Conditional mapping: sometimes the value of a target attribute depends of the value of another attribute. In the example, if X=1 then Y=A else Y=B (Example.3).

• Mapping with breaking key: the representation of a row depends on the value of a field. In our example if X=1 this rows represents Enr1 and if X=2 this rows represents Enr2 (Example.4).

• Mapping using a correspondence table with multiple columns: the value of a target field is a function using many values in a conversion table (Example.5). For EDI instance, to translate between EDIFACT PAYORD and SWIFT MT100 a correspondence table is needed.

3.3.3 Mapping expression model:

Example.5 Correspondence tables with multi-cloumn


In this section, we define a formal model to represent mapping expression and the mapping guideline [Rifaieh-a 03] [Rifaieh-a 02]. First, we consider the definition of the mapping guideline as a set of mapping expressions between two different schema/message representations. The used notations are:

• Let S1, S2, … , Sn represent n source of data (schema/message) • Let 1

1Sa , 1

2Sa ,…, 1S

ma represent m attribute/field of the source S1

• Let T represent the Target data container (or target schema/message) • Let A represent an attribute/field of the target T.

Each attribute/field of the source has associated with a meta-data. These meta-data are two parts: the attribute/field identity meta-data and the mapping meta-data. The attribute/field identity meta-data exists within the description of the source schema/message. This part of meta-data covers the attribute/field name, the relation name, the schema/message name, the data base name, the owner of the data, the domain name, etc. The attribute/field identity meta-data includes information such as max-value, min-value, range, data-types, etc. A uniform model to represent the attribute/field identity meta-data and other type of meta-data is described in [Do 00], [Stöhr 99], [Rahm 01]. We admit the representation of attribute/field meta-data described in [Miller 00] for the databases schemas. Thus, with each attribute/field is associated its set of meta-data including all the different kind of meta-data. µ(A) denotes the meta-data associated with an attribute/field A. Formally, µ(A) is a set {µ1(A), µ2(A), …, µz(A)} of values. For convenience, the authors in [Miller 00] give these values µ1, µ2 ,…, µz names. For example, the attribute/field name is denoted attrname(A) and the relation name is denoted relname(A). Therefore, attrname(A)="Address", relname(A)="Orders", and attrdatatype(A)="String" represent a set of meta-data values of the attribute A. Based on this representation, we assume that µi (A) denotes the mapping meta-data of the attribute/field A. µi (A) describes how to generate the target attribute/field A from other sources. Unlike attribute/field identity meta-data, µi (A) is a tuple of mapping expression (α1(A) ,α2(A) , …., αs(A) ). Indeed, αi(A) is a mapping expression applied on multi-sources to generate a part of the target attribute/field A or to generate a sub-attribute/field of the attribute/field A.

Figure 3.5: Illustration of Mapping Expressions Model


Therefore, mapping guideline MG, (Figure 3.5), is the set of mapping meta-data µi ( ) for all the attributes/fields of the target T from different sources S1,S2, …, Sn . Thus, MG (S1,S2, …, Sn, T) = { µi (A1 ) , µi (A2 ), … , µi (Aw )} where A1 , A2 , …, Aw are the attributes/fields of the target T. In the Example.6, the attribute/field A of the target T has attrname(A)="Address", it is composed of Street from the source S1 and the ZIP Code of the source Ss. α1(A) and αs(A) are the mapping expressions for the attribute/field A. Below we will discus these mapping expressions. Let αi= < fi , li ,ci>, (Figure 3.5) Where fi is a mapping function, it could be an arithmetic function or any other string function such as substring, etc. Hence, fi can be applied on a set of attributes/fields, these attributes/fields could belong to one or different sources/message. Hereafter, we will denote attribute to represent a database schema attribute or a EDI message field. Attribute ( fi ) ={ Sr

ra , Sppa ,…, eS

ea } where ar Є Sr, ar is an attribute of Sr.

Let S = S1 ∪ S2 ∪ … ∪ Sn

Thus fi : SxSx…xS → T ( ar ,ap,…,ae) a A (or a sub-attribute of A) li is a set of filters for sources rows, we could find a filter li(ar) for each source attribute/field ar, thus li={ li(ar)/ ar Є Sr}. This filter can test the attribute/field value of ar itself or test the value of other fields on any other attribute value of the same row. In particular, the filters include a joining between attributes of the same source. The use of foreign-key is useful to materialize these filters. In addition, ci is a condition on the mapped value of the attribute/field A. This selection enables us to ignore mapped rows that do not satisfy the condition ci. In the Example.6, α1(A) and αs(A) are simple kinds of mapping expressions where fi is the identity function without any particular filter li or condition ci.

The Example.7 treats a more complicated situation. We have, attrname(A)="Value", α1(A)=<f1,l1,c1> and Attribute(f1)={ a1, a2 } where a1= substring(Key_Element, 1,2) of the source S1 and a2= substring(Conversion, 1,1) of the source S2 . The function f1 is the simple arithmetic multiplication, f1(a1,a2) = a1 * a2 .


We have to select the results of Element of S1 where substring(Key_Element, 4,2)="AB". This implies l1(a1): [ (substring(Key_Element, 4,2)="AB") == True]. We have to select the value of Table_con of S2 where Type="€". This implies l2(a2): [ (Type="€")==True]. At end, we have to select the results of the multiplication to be lower than 45. This implies the condition, c1: Value ≤ 45.

…) , ]c , l , ) a,…, a,a ( [f ,(…Concat =A )(

iiSee

Spp

Srri

i

4444 34444 21Aα

(Mapping expression model)

This model is global for all types of mapping expressions between an attribute/field A of a target schema/message T and its sources [Rifaieh-a 02] [Rifaieh-a 03]. Generally, in order to generate the complete value of an attribute/field A, we have to concatenate the result of different mapping expressions of sub-attributes.

3.3.4 Usefulness of mapping expression:

We can identify some of the applications where mapping expressions can be used : • Data warehousing tool (ETL): an ETL includes a transformation process where the

correspondence between the sources data and the target DW data is defined. These mapping expressions could be defined in the meta-data of the system [Rifaieh-a 02]. We have implemented the mapping guideline of type MG(S,T) (from one source S to a target T) and its set of mapping meta-data including the mapping expression formalism. We applied this formalism to create an automatic ETL tool using SQL queries. This tool is able to read the mapping guideline of the application to generate a complete set of SQL queries needed to extract, transform, and load data inside the DW database. This tool is going to be presented later on in this chapter. A case study of using this tool has been realized and will be shown in the implementation chapter of this thesis [Rifaieh-d 03].

• EDI message mapping: the need for a complex message translation is required for EDI, where data must be transformed from one EDI message format into another [Rifaieh-a 03]. This process includes a mapping between the content of the messages. We tried as well to implement an EDI Translator using the mapping expression model. A description of a case study for this suggested EDI Translator is going to be presented in the implementation chapter.

• EAI (Enterprise Application Integration): the integration of information systems and applications needs a middleware to manage this process [Stonebraker 01]. It includes management rules of an enterprise's applications, data spread rules for concerned applications, and data conversion rules. Indeed, data conversion rules define the mapping expression of integrated data. Studying the usefulness of mapping expression model with EAI is not in the scope of this thesis and can be considered as a future work.

3.3.5 Comparable approaches:

Related work falls into three main categories, which are high-level languages to express data transformations, mapping expression or value correspondences, and algorithms to support matching operation.


Several languages have been proposed to express data transformations, especially for performing higher-order operations on relational data [Glhardas 01] and [Raman 01]. In [Glhardas 01] a declarative data cleaning language, model, and algorithms are defined. The framework was implemented in AJAX System. Potter's Wheels [Raman 01] promote an interactive approach whereby users are able to apply a set of simple transformations to modify samples of data and see the results interactively. However, unlike the mapping expression model, these approaches aim at defining declarative transformation languages. Secondly, several frameworks have been proposed for mapping expression. Mapping expression has been studied as value correspondence in [Miller 00], or correspondence rules in [Abiteboul 97]. In [Madhavan 02], mapping is studied with query answerability, mapping inference and mapping composition. These studies focused on manipulating mapping mostly as a form of logical reasoning and using mapping between two schemas to define the data cleaning. Clio’s goal, in [Miller 01], is to detect the mapping between schemas and apply it with value correspondences. The idea of value correspondences is similar to mapping expression. Our model of mapping expression complements the model of value correspondence in order to accept conditional mapping, mapping with breaking key, and mapping with tables codes multi-colons. One way or another, our model could be considered as an extension of value correspondences model. In the third category, schema mapping studied the problem of mapping and matching discovery. Our work complements these studies. We studied the case of an existing reference for mapping. We assumed that this mapping exists in the meta-data of our system and we tried to apply our framework with a data warehousing system. Our system can be used after a matching or a mapping discovery algorithm. We believe that the mapping meta-data discovered by the matching algorithm are useful for later purposes. Modeling, conserving, saving, and using the mapping meta-data constitute the core of our approach. Our approach can be complementary to the work of schema matching and it is indispensable to map source and target schema. Indeed, without the mapping expression, we can just identify the correspondent elements, but we cannot apply the mapping to transfer the data.

3.4 Practical use: The purpose of this section is to show how we can supply an ETL tool by queries processing to perform the transformations and load process. For this, we allow DBMS to play an expanded role as a data transformation engine, as well as a data store. Therefore, SQL queries will be used in all of the construction process, especially in data extraction, cleansing procedures, direct storage, and front-end information delivery. As we also deal with the implementation of mapping meta-data, the automatic generation of queries is an appropriate solution to take advantage of this implementation. We present QELT, an SQL Queries based Extraction, Load and Transformation tool. The basic idea of this tool consists of using the SQL to create an ETL tool where DBMS is used as an engine to perform the warehousing process. Therefore, DBMS will be used for two intentions, as a data storage system and a transformation engine.

3.4.1 QELT:


Figure 3.6: QETL Architecture

Some commercial tools now support data extraction from different data sources and multiple DBMS to feed the warehouse. Nevertheless, the designer must define the logical mapping between the source and the target schemas and the warehouse schema. Furthermore, these tools do not interact with the meta-data repository to realize the warehousing process. For that, these tools are weak in meta-data evolution and reusability. QELT takes advantage of its ability to achieve the automatic generation of data transformation from mapping guideline (mapping meta-data). It enables the generation of a set of procedures to be stored in the DBMS and to be called to refresh DW data. For that, it provides extraction from multiple sources by using SQL queries on the data sources. It uses meta-data to optimize the flow of these data. It reduces the update of the transformations process when meta-data rules evolve by an automatic generation of SQL transformation procedures. Finally, it offers the possibility to load data in the target DW database.

3.4.1.1 QELT Architecture: QELT architecture (Figure 3.6) is very similar to the traditional warehousing tool. It integrates data from different and heterogeneous sources, applies transformation, and loads data into the DW database. Moreover, QELT is an active ETL; it interacts with meta-data to generate the transformations and to specify the loading with target schema. Hence, it optimizes the flow of data and reduces the update of the warehousing process by making automatic the creation of valid transformation. The order of the process is not conserved in QELT, for that it loads data within DBMS in a temporary database, which has the same schema as the source data. After execution of the transformations procedures and creation of DW database, the temporary database will be deleted. The different components of QELT tool are described below:

3.4.1.1.1 Meta-Data components: The Meta-data repository is the essential element in the system. It includes a set of information:

• The mapping meta-data (MG): this model is used to describe the mapping guideline and its mapping expressions. By mapping expression, we mean the needed information to identify how a target field could be mapped from a set of source fields.


• The source model (SM) contains the model of the source data. It covers relational, object oriented and semi-structured data modeling.

• The target model (TM) is similar to (SM); it describes the target data model. Moreover, it could cover the multidimensional model used by OLAP processing.

• Management rules (MR) is the set of rules defined by the administrator in order to fulfill business requirements.

3.4.1.1.2 Extraction Process The role of this process consists in collecting needed data from different sources. These sources could be traditional databases, inner digital documents that are produced by applications in an enterprise, legacy data, or even web documents published by other partners. The meta-data repository supplies the extraction process with the needed parameters. If the data source is a traditional database, the extraction process will consist of queries stored in the meta-data repository. Other data sources need extra programs to extract data. Therefore, the main role of the extraction process consists of supplying the system of the needed data.

3.4.1.1.3 Loading process: The second step consists of loading data to a temporary database. This database will be used later to execute transformation queries. The temporary database has the same structure of the source database. This implies the creation of a temporary database for each source. In the case of integration data flat files or XML documents, we create a temporary database having a simple structure; or we use a XML Schema description for XML documents. The loader reads the description of the data sources (physical model) from the meta-data repository and creates the temporary database. The second goal of the loader consists of filling the temporary database with the data. In order to not fill the temporary database with unused data, which could have a dramatic influence on the DBMS, the extraction process should optimize the flow of data and not supply the loader with unneeded data.

3.4.1.2 SQL Generator and transformation process: The SQL Generator is a module, which can read useful parameters, rules, and a mapping guideline from the meta-data repository to create an SQL transformation. Concretely, a query is performed on the mapping model (MG). The result file is used by the generator to produce the transformation queries. Then, a procedure is created in the DBMS containing these queries transformation. Indeed, the selection and filters get rid of superfluous data. During this process, all controls and check are applied to data. Thus, these queries are more than a cosmetic change for the data; it tackles the content, structure and valid values. We use SQL queries because they are easy to understand and are very efficient inside the DBMS. Existing tools should create a set of programs to achieve the transformation process; the optimization of these programs will not be easy. On the other hand, the SQL Generator could use the query optimization to get high performance. The transformation process consists of the execution of these procedures including the full creation of the target database or the refresh of target data. The last statement of the transformation procedure includes the removal of the temporary database. We consider that the data transformation process is generated directly from meta-data. Thus, the designer will not be called upon to verify the consistency of the process, since the process uses


mapping guidelines extracted from the meta-data repository. At the same time, the traditional request of meta-data is accessible for system’s administrator. As we are using DBMS as an engine, we have tried to perform this functionality by generating the SQL procedure from meta-data.

3.4.2 Semi-Automatic EDI Translator

Firstly, we have to differentiate between matching and mapping. A matching between elements is needed to identify the correspondent’s elements. Afterward a mapping expression is required to express how the matched elements can be mapped. Two processes can reach the goal. The input of the first matching algorithm is the schema or the branching diagram of the messages. The output is a list of matched elements. Hence, the input of the mapping process is the list of matched elements and the output is the individual mapping expression of target elements [Rifaieh-b 03]. There is much literature on schema matching and learning algorithms [Bernstein 00], [Raman 01], [Berlin 02] and [Xu 03]. These studies differentiate between classical schema integration, schema matching and the transformation and cleaning process. In the schema matching, some prototypes have been developed such as Cupid [Madhavan 01] and Clio project [Miller 01]. Cupid is an algorithm of schema mapping between two different representations. It treats the problematic of mapping discovery, which returns mapping elements that identify related element of two schemas. Cupid does not investigate how to apply the mapping between these elements. An offshoot of Clio Project, code named Chocolate, is intended to enable mapping of XML documents to other XML documents. Chocolate in this instance would devise a common format for defining a customer. XML documents can be mapped from the way they are originally structured to the way they are to be presented to the user [Miller 01]. These works are not suitable to the matching of EDI messages. Indeed, EDI messages do not have significant field names, i.e., NAD represents the Address in EDIFACT and 32A represents the amount of the transfer with Swift. Though, an element of an EDI message is defined with: textual description (a short text describing the element’s role in the message), data type, constraints (condition depending on the instance value of the element and can influence the value restriction of another element in the message), status (an information indicating if the element’s existence in the message is mandatory, optional, …), cardinality (the possible occurrence number of an element within another element in a message). Another important fact concerns the meaning variation of an element due to its location in the message (structural influence). Therefore, we have to identify a new similarity algorithm, which takes into consideration the specific characteristics of EDI message's branching diagram expressed with XML Schema. Our choice for the XML schema is completely related to the fact that it can help us to define all the semantic of each message’s element and the structure of an element which is important to refine our matching result. This choice is also in accordance with or suggested EDI translator.

3.4.2.1 Similarity Algorithms: In this section we’re presenting EX-SMAL (EDI/XML semi-automatic Schema Matching ALgorithm) proposed as a solution for the EDI message’s schema matching [Chukmol, Rifaieh 05]. The criteria for matching will include data-type, structure, and elements descriptions. Other information related to an element (constraints, status, cardinality) will be taken into account for the future extension of this work.


Algorithm EX-SMAL:

Input: S, T: two XML Schemas

Output: set of triplets <Si, T

j, V

sim>

With Si: an element of S

Tj: an element of T

Vsim: the similarity value between S

i and T

j

Matching(S, T) {

Convert S and T to tree

For each pair of elements <Si, T

j>, compute {

Basic similarity value.

Structural similarity value.

Pair-wise element similarity value.

}

Filter: eliminate the element pairs having their Vsim below an acceptation threshold

value.

}

Figure 3.7: Short description of EX-SMAL The algorithm is briefly described in Figure 3.7 and can be understood as follows:

• The algorithm takes two XML schemas (S and T) as the input and returns a set of triplets <es, et, vsim> in which es is an element of S, et is an element of T and vsim is the pair wise element similarity value between them.

• To match the input schemas, the algorithm convert them into tree (each edge represents the containment relation and each node is an XML schema element, attribute, attributeGroup …). A tree node is an object characterized by its label, path from root, textual description, data type, constraints, status, and cardinality.

3.4.2.2 Basic similarity This similarity is computed as the weighted sum of the textual description and data type similarity. We calculate the basic similarity between a pair of elements, each of which comes from the input schemas. Because we are dealing with a subset of element criteria, an element has a strong basic similar value with another if their textual description and data type are strongly similar. We can compute the basic similarity of two elements s and t by using the following formula: basicSim(s, t) = descSim(s, t)*coeff_desc + coeff_type*dataTypeSim(s, t) where coeff_desc + coeff_type = 1 0≤coeff_desc≤1 and 0≤ coeff_type≤ 1. Textual description similarity Instead of working on the element name to get the basic similarity of the elements, we choose the textual description associated with each element. Indeed, the element name are not useful for comparing EDI message elements because they are not significant nor readable. The textual description similarity indicates how much two elements are similar according to their textual description. We use the information retrieval technique to solve this problem. From each description to compare, we extract a terms vector containing every term with their associated term frequency in the description. We, then, compute the cosine of the two terms vectors to evaluate a part of the pair wise description similarity. This option is not sufficient to determine the textual description similarity because it takes into account only the term frequency in both descriptions. Therefore, we add another computing to this description comparison by supposing that all the textual description


associated with every element of the target schema forms a corpus, which will be indexed. With every single description extracted from a source element, a query is formulated and we can query the above index to get a set of scores indicating how much this query is relevant to the descriptions in the corpus. The query type we handle takes into account the terms order in the description. The score and the description affinity resulted from the vectors cosine computing will be finally used to calculate the description affinity between two given elements. Data type similarity We used a static matrix defining the XML schema primitive data type affinity. The values given as the data type affinity between two elements is obtained from the empirical study on those data type format and value boundary. These similarity values help to obtain the basic affinity degree of two comparing elements’ types.

3.4.2.3 Structural similarity The structural similarity is computed by using two modules: the structural neighbors computing and the aggregation function agg (see Figure 3.8). This computing is based on the fact that two elements are structurally similar if theirs structural neighbors are similar. Let M be a Matrix

Input thr the threshold value defined by the user

For Item[x](E1i) ∈ Item[x](e1) { For Item[x](E2j) • Item[x](e2) {

M[Item[x](E1i)][ Item[x](E2j)] = sim_base(Item[x](E1i), Item[x](E2j));

}

}

sim_Item[x](e1, e2) = agg(M, thr);

Figure 3.8: Aggregation Function

Structural neighbors The structural neighbors of an element e is a quadruplet <ancestor(e), sibling(e), immediateChild(e), leaf(e)> in which:

• Item[1](e)=ancestor(e): the set of parent elements from the root until the direct parent of the element e

• Item[2](e)= sibling(e): the set of sibling elements (the elements sharing the same direct parent element) of e

• Item[3](e) = immediateChild(e): the set of direct descendants of the element e • Item[4](e)= leaf(e): the set of leaf elements of the sub-tree rooted at e.

We denote Item[x](e) to refer to one of this preceding Items.

The choice for these four items defining the structural neighbors of an element is related to many structural observations [Chukmol, Rifaieh 05], that we can summarize as follows:


• Ancestor elements influence their descendants meaning, however, they do not define the entire structural semantic of a given element.

• The sibling nodes are interesting to be considered in the structural neighbors. In fact, two elements can perfectly share the same ancestral structure but differ by the influence from their siblings.

• To reinforce the exact semantic of each element, we need to look more detail into the depth of an element. This is related to the fact that the detail of an element resides in its composing elements (immediate children and the last level descendant). We choose to ponder the immediate children because they define the basic structure of the parent element. The choice of the last level descendant will help us to go through the finest-grained content or intentional detail of an element.

Structural similarity value computing After calculating the neighbors of each tree node, we can proceed to the real structural matching step. Let s and t, two elements to match and C(s), C(t) the structural neighbors of s and t respectively.

• C(s) = <ancestor(s), sibling(s), immediateChild(s), leaf(s)> the structural neighbors of s • C(t) = <ancestor(t), sibling(t), immediateChild(t), leaf(t)> the structural neighbors of t

Let: • ancSim(s, t): ancestor item similarity (between ancestor(s) and ancestor(t)) • sibSim(s, t): sibling item similarity (between sibling(s) and sibling(t)) • immCSim(s,t): immediate child item similarity (between immediateChild(s) and

immediateChild(t)) • leafSim(s, t): leaf item similarity (between leaf(s) and leaf(t))

The structural similarity value of two elements s and t depends on the similarity value resulting from the comparison of each pair of structural neighbors items (ancSim(s, t), sibSim(s, t), immCSim(s, t) and leafSim(s, t)). Therefore, the structural similarity value is computed in function of the ancestor item, sibling item, immediate child item and leaf item’s similarity. The similarity value of each structural neighbors items’ pair is computed by using the function agg(M, thr) which take a matrix M and a threshold value thr ∈ [0, 100] as input. It returns the aggregated value of the input matrix M (see Figure 3.8). The function agg uses the arithmetic mean (avg) and the standard deviation (sd) measures of the descriptive probability to compute the variation coefficient (vc) of all the values in M. Thus, M forms a population that contains only the basic similarity values. We use the standard deviation of the arithmetic mean as dispersion measure because it is sharply more exact than others dispersion measures (inter-quartile range, variance, etc). We compute the arithmetic mean avg and standard deviation sd of M respectively with:

avg =[ ][ ]

| ( )| | ( )|

1 1

| ( ) | | ( ) |

i j

ancestor s ancestor t

i j

s t


M= =

×

⎛ ⎞⎜ ⎟⎝ ⎠

∑ ∑ and sd =

[ ][ ]( )( )2| ( )| | ( )|

1 1

| ( ) | | ( ) |

i j


i j

s t avgM


= =

−

×

∑ ∑

We compute the variation coefficient vc of M by: vc= 100sd

avg×


By comparing the calculated variation coefficient with the thr value given by a user, agg decides if the arithmetic mean of M will be the aggregated value of M or not. If the user gives thr≥ vc, then agg returns avg as the aggregated value of M. If the user gives thr<vc,

we will eliminate all the values from M below: 1100

thravg −⎛ ⎛ ⎞⎞

⎜ ⎜ ⎟⎟⎝ ⎝ ⎠⎠

interfering in the arithmetic mean

computing. We obtain a sub set of values in M and apply again the aggregation function. We apply this computing to all the structural neighbors’ items similarity (ancSim(s,t), sibSim(s, t), immCSim(s, t) and leafSim(s, t)). Finally, the structural similarity value between two elements s and t, structSim(s, t), is computed with the following formula: structSim(s, t) = ancSim(s, t)*coeff_anc+ sibSim(s, t)*coeff_sib+

immCSim(s, t)*coeff_immC+ leafSim(s, t)*coeff_leaf Where 0 ≤ coeff_anc ≤ 1, 0 ≤ coeff_sib ≤ 1, 0 ≤ coeff_immC ≤ 1 , 0 ≤ coeff_leaf ≤ 1, And coeff_anc+coeff_sib+coeff_immC+coeff_leaf=1

3.4.2.4 Pair-wise element similarity After computing the basic similarity value and the structural similarity value for each pair of elements, we can compute their pair wise element similarity value. This value is computed as the weighted sum of the basic similarity value and the structural similarity value. It’s proposed as the final similarity value for a pair of elements in our approach. Let s and t, two elements to match. The pair wise element similarity of s and t is computed by the following formula: similarity(s, t) = basicSim(s, t)*coeff_base+ structSim(s, t)*coeff_struct Where 0 ≤ coeff_base ≤ 1, 0 ≤ coeff_struct ≤ 1, And coeff_base + coeff_struct = 1

3.4.2.5 Filtering This is the last step in our algorithm consisting of eliminating all the pairs of elements with the pair wise element similarity value below the value thraccept given by the user (0≤thraccept≤1).

This algorithm can be classified among the hybrid schema based approaches. In fact, it combines between the structural similarity and the textual description similarity. It can differentiate from other approaches with the following particularities: • It treats the textual description of the elements, which is richer than other approaches treating the

elements names such as Cupid [Madhavan 01], or Clio [Miller 00]. Effectively, this choice was directed by the particularity of EDI branching diagram. We used some known techniques in Information Search techniques to find the similarity of two elements’ descriptions.

• It fully treats the structure of an element by covering the structural neighbors’ items: ancestors, siblings, immediate Childs, and leafs.


However, some limits can be identified for our algorithm such as: • Using many coefficients make it hard to be initialized by a non-advanced user. • The current algorithm does not take into consideration other important elements of branching

diagram such as constraint, status, cardinality, etc. A full solution for an EDI matching algorithm should consider all these elements.

We implemented the EX-SMAL algorithm and applied it with real world EDI message (PAYMUL, MT103). The detail of implementation issues and results are described in the Chapter 7.

3.5 Summary Enterprise integration and data exchange platforms are essentially used to enable the cooperation between an organization and the external world. The first ensures the integration of data coming from different databases within a data warehouse system. The second facilitates the electronic data exchange with the partners, clients, and suppliers by sending and treating messages. Meta-data plays an essential role in the data warehouses, and is used as a mapping guideline for translating EDI messages. It is affected by evolution of management rules, which are essential to develop tools and utilities with skills to pass the evolution directly. We investigated the importance of the mapping guideline for these applications and concluded that there is the lack of a model for expressing the mapping. In essence, we specified a model to represent a mapping guideline and mapping expressions and illustrated them with some examples. The last goal of this chapter consists in applying the mapping expression model with practical use in order to create an ETL tool and an EDI translator. On one hand, the suggested tool (QELT)-Query based ETL is original in its capability to read the mapping guideline defined in the meta-data repository to create the transformation process. A case study of using this tool as an effectiveness proof is going to be developed later on in this thesis. On the other hand, we suggest the creation of a semi-automatic EDI transformation tool. This EDI translator differs by being easy to use for managers and people with limited technical skills. Additionally, a case study of automatic matching for EDI message will be presented in the implementation chapter. However, studying these systems leads us to recognize that any EIS deployed in an enterprise architecture can be related to other systems through a common or similar process. The recurrent problem, which manifests in this case, is the reusability. As we showed, in the previous chapter, reusability is affected by semantic sharing between information systems. Thus, establishing a semantic sharing mechanism between the studied applications (i.e. EDI Translator and DW), is essential to overcome the reusability problem. Therefore, local ontologies (e.g. UML based) can be considered in this case, as a first step toward reaching shared semantics. Some researches, such as [Van Zyl 99] and [Kappel 01], have studied the definition of an ontology at the meta-data level. They aim at creating a more suitable and efficient model for describing the meta-data of a data warehouse. Moreover, Open EDI ontology (Appendix-B), which is based on the REA model, tries as well, to define an ontology for the EDI exchange. We will concentrate, in the next chapter, on studying the idea of putting ontologies to work in enterprises. We will analyze the preceding issue of semantic sharing through local ontologies, along with showing how context can assist in reaching this goal.

4. CHAPTER 4

ONTOLOGIES IN THE SERVICE OF ENTERPRISE

INFORMATION SYSTEMS

"The philosophers have only interpreted the world; the thing, however, is to change it"

Karl Marx

Ontology is a word that has been the subject for many studies and a puzzle for many philosophers and scientists as well. In the last decade, this debated subject has contaminated the computer science community. Indeed, the research about ontological issues has been widely active in various areas such as knowledge engineering, intelligent information integration, knowledge management, cooperative information systems, etc. To put it in a nutshell, the computer’s research in ontology has been carried out in the AI domain [Borgida 89] and with the project ARPA Knowledge Sharing Effort [Gruber 91]. Since then, many

Chap 4- Ontology in the Service of EIS and Semantic Sharing 96

other computer communities have begun using ontologies to establish a joint terminology between human and machines as members of interest [Chandrasekaran 99]. In general terms, an ontology is an Explicit Specification of Conceptualization. Thus, an ontology is a description, like a formal specification of a program, of the concepts and the relationships that can exist for an agent or a community of agents [Gruber 93]. A suitable formalization of conceptualization and ontology was introduced by Guarino in [Guarino 95]. In other words, ontology provides a shared vocabulary for common understanding of a domain. It includes computer-usable definitions of basic concepts in the domain and the relationships among them. Ontology can be used for human and machine communication, interoperability between systems, software engineering, etc. On one side, ontologies can be used in a very shallow or general sense; namely as taxonomical structure. On the other side, ontologies can be very formal with a clear semantics and machine-processable format along with an efficient reasoning support. Ontologies contribute to break down the problem of heterogeneous systems and fulfill the cooperative applications need in term of shared interpretation. Indeed, many applications need to communicate with a s shareable semantics. The use of a shared explicit ontology for a class of systems can unify the interpretation of used concepts. Therefore, at the semantic level confusion will be reduced for people or for machine interpreting the information coming from different applications systems. In the context of an evolving economy, one of the major problems for enterprises is to ensure coherent and fast partnership and to share a common understanding of a given application domain. Local enterprise ontologies are considered, versus enterprise ontologies, to specify systems and to support semantic sharing for distributed and inter-organizational applications. As a matter of fact, a few of effective enterprise ontology exists today in a productive setting, due to novelty and complexity of the tools, methods and techniques (i.e. Artificial Intelligence and Knowledge management). The development of enterprise ontology is a challenging work confronted by domain, time, and cost constraints. All these added factors have contributed to the depreciating of enterprise to ontologies technology contributions. Additionally, contextual ontology is defined as ontology in which its concepts can be seen from different points of view. This suggested paradigm provides a wider support to access heterogeneous information sources by various applications in a domain. It essentially treats the problem of multi-representation of concepts and multi-views requirements in the various contexts of use in order to ensure a clear and clean semantic sharing between many systems. In this chapter, we begin by presenting the origin of the ontology research and show different definitions for this topic in computer science field. We show after an overview of ontology research applications, languages and formalism. We concentrate next on unveiling the relation between formal ontologies and information systems. Afterwards, we differentiate between enterprise ontology and local information systems ontologies. We discuss, later on, how ontologies can take an effective role in modern enterprise information systems. We suggest, finally, to pair-up the notion of context and ontologies in order to overcome the semantic sharing difficulties when transferring from a context to another.


4.1 What is an ontology about? Ontology is the term used to refer to the shared understanding of some domain of interest which may be used as a unifying framework to solve the problem of semantic integration and semantic sharing. An ontology necessarily entails or embodies some sort of world view with respect to a given domain. A domain is a specific subject area or areas of knowledge, like medicine, tool manufacturing, real estate, automobile repair, financial management, etc. The world view is often conceived as a set of concepts (e.g. entities, attributes, processes), their definitions and their interrelationships; this is referred to as a conceptualization. Such a conceptualization may be implicit, e.g. existing only in someone's head, or embodied in a piece of software. The word 'ontology' is sometimes used to refer to this implicit conceptualization. However, the more standard usage and that which we will adopt, is that the ontology is an explicit account or representation of a conceptualization. An example of EDI oriented ontology is described in Appendix B, which can show what ontology looks like for a specific domain.

4.1.1 Origins

The root of the word ontology comes from Greek Ontos = being and logos = science. In philosophy, ontology is a branch of metaphysics concerned with the nature and relations of being. The dictionary definition in [M-Webster’s 04] gives the following:

1. A science or study of being: specifically, a branch of metaphysics relating to the nature and relations of being; a particular system according to which problems of the nature of being are investigated; first philosophy.

2. A theory concerning the kinds of entities and specifically the kinds of abstract entities that are to be admitted to a language system.

In other words, ontology is the theory or study of being as such [Britanica 04]; i.e., of the basic characteristics of all reality. Though the term was first coined in the 17th century, ontology is synonymous with metaphysics or “first philosophy” as defined by Aristotle in the 4th century BC. In the 18th century, Christian Wolff contrasted ontology, or general metaphysics, with special metaphysical theories of souls, bodies, or God, claiming that ontology could be a deductive discipline revealing the essences of things [Corazzon 04]. This view was later strongly criticized by David Hume and Emmanuel Kant. Ontology was revived in the early 20th century by Edmund Husserl and Martin Heidegger. The interest in ontology was renewed in the mid-20th century by W.V.O. Quine to become a central discipline of analytic philosophy. A set of definitions by some leading philosophers from Christian Wolff to nowadays, are discussed in [Corazzon 04], and more discussion about history of ontology can be found in [Buffalo 04].

4.1.2 Contemporary definition


There are many definitions that represent what is meant by ontology in computer science. We show in the following a set of normative definitions for ontology. Ontology is explicit, which means that it can not be implicitly assumed and should be processable by machine. Therefore Welty [Welty 03] defines ontology as: "Ontology is a discipline of philosophy whose name dates back to 1613 and whose practice dates back to Aristotle. It is the science of what is, the kinds and structures of objects, properties, events, processes, and relations in every area of reality. ... What the field of ontology research attempts to capture is a notion that is common to a number of disciplines: software engineering, databases, and AI to name but a few. In each of these areas, developers are faced with the problem of building an artifact that represents some portion of the world in a fashion that can be processed by a machine”.

The most referred definition for ontology is given by Gruber in [Gruber 93] baptizing ontology as specification of conceptualization. “An ontology is a specification of a conceptualization. …In the context of knowledge sharing, I use the term ontology to mean a specification of a conceptualization. That is, an ontology is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents. This definition is consistent with the usage of ontology as set-of-concept-definitions, but more general. And it is certainly a different sense of the word than its use in philosophy. … In that context, an ontology is a specification used for making ontological commitments. The formal definition of ontological commitment is given below. For pragmatic reasons, we choose to write an ontology as a set of definitions of formal vocabulary. … Practically, an ontological commitment is an agreement to use a vocabulary (i.e., ask queries and make assertions) in a way that is consistent (but not complete) with respect to the theory specified by an ontology.”

This definition has been, however, criticized by Nicola Guarino that, after examining many possible interpretations of ontology, [Guarino 95] wrote: "A starting point in this clarification effort will be the careful analysis of the interpretation adopted by Gruber. The main problem with such an interpretation is that it is based on a notion of conceptualization (introduced in: Genesereth, Michael R. and Nilsson, L. "Logical Foundation of Artificial Intelligence" Morgan Kaufmann, Los Altos, California, 1987) which doesn't fit our intuitions, (...): according to Genesereth and Nilsson, a conceptualization is a set of extensional relations describing a particular state of affairs, while the notion we have in mind is an intensional one, namely something like a conceptual grid which we superimpose to various possible state of affairs. We propose in this paper a revised definition of a conceptualization which captures this intensional aspect, while allowing us to give a satisfactory interpretation to Gruber's definition." Guarino gives this definition:


"… Aristotle’s ontology is always the same, independently of the language used to describe it. On the other hand, in its most prevalent use in AI, an ontology refers to an engineering artifact, constituted by a specific vocabulary used to describe a certain reality, plus a set of explicit assumptions regarding the intended meaning of the vocabulary words. This set of assumptions has usually the form of a first-order logical theory, where vocabulary words appear as unary or binary predicate names, respectively called concepts and relations. In the simplest case, an ontology describes a hierarchy of concepts related by subsumption relationships; in more sophisticated cases, suitable axioms are added in order to express other relationships between concepts and to constrain their intended interpretation” To resume Sowa [Sowa 00] gives the following definition for ontology: "The subject of ontology is the study of the categories of things that exist or may exist in some domain. The product of such a study, called an ontology, is a catalog of the types of things that are assumed to exist in a domain of interest D from the perspective of a person who uses a language L for the purpose of talking about D. … "Ontological analysis clarifies the structure of knowledge. Given a domain, its ontology forms the heart of any system of knowledge representation for that domain. Without ontologies, or the conceptualizations that underlie knowledge, there cannot be a vocabulary for representing knowledge....Second, ontologies enable knowledge sharing." We adopt Gruber’s definition referring to ontology as an explicit (not implicit) specification of conceptualization, with the Guarino’s interpretation of conceptualization.

4.1.3 Formal ontology and information systems

An explicit ontology may take a variety of forms, but imperatively it will include a vocabulary of terms and specification of their meaning (i.e. definitions). Ontologies are distinguished along a spectrum of formality and referring to a different degree of formality by which a vocabulary is created. In the literature, ontologies fall in many types based on different spectrum [Guarino 98], [Sowa 00], and [Obrst 03]. Therefore, the same term ontology can be used to describe models with different degrees of structure. The Figure 4.1 shows a snapshot of this spectrum and can be understood as follows:

Figure 4.1: Ontology spectrum (adapted from [Obrst 03])


Highly informal: expressed loosely in natural language. An informal ontology contains a list of types that are either undefined or defined only by statements in a natural language. Semi-informal: expressed in a restricted and structured form of natural language. In this zone, we identify weak structure such as taxonomies, Yahoo hierarchy, biological taxonomy, etc. Semi-formal: expressed in an artificial formally defined language. This zone includes elements with more structure: in the bottom database schemas and metadata schemes (e.g. ICML, ebXML, WSDL), on the top conceptual models (OO models, UML, XML topic maps, etc.) and in-between thesaurus (e.g. WordNet). Rigorously formal: usually expressed in a logic-based language (e.g first order logics) with theorems and proofs mechanism. Fundamentals of formal ontology include axiomatic theories organized into partial order (lattice). Axiomatic theories or simply axioms, use first order logic for a rich expression of the constraints between the entity and relation types. We distinguish between theorems and theories. Theorems are licensed by a valid proof using inference rules. Theories are possible other theorems, as yet unproven. Using axiomatic-deductive method provides deduction and derivation (non-monotonic reasoning) which are based on completeness and decidability. According to the level of granularity in [Guarino 98] and [Gomez Perez 99], ontologies levels of abstraction can be classified into six types:

• Top-level ontologies describe very general concepts such as space, time, matter, object, event, action, etc., which are independent of a particular problem or domain. It seems therefore reasonable, at least in theory, to have unified top-level ontologies for large communities of users.

• General ontologies define a large number of concepts relating to fundamental human knowledge.

• Domain ontologies, which describe the vocabulary related to a specific domain (like medicine, or automobiles)

• Task ontologies, which define concepts, related to the execution of a particular task or activity, such as diagnosing or selling.

• Application ontologies define concepts essential for planning a particular application. They describe concepts depending both on a particular domain and task, which are often specializations of both the related ontologies.

• Meta-ontologies, generic, or core ontology define concepts which are common across various domain; these concepts can be further specialized to domain as specific concepts.

4.1.4 What ontologies can do for Enterprise Information Systems

Formal ontologies can be used at development time or at runtime, however, we should distinguish between ontology-aware IS and ontology-driven IS. Every information system (IS) can have its own ontology, since it ascribes meaning to the symbols used according to a particular view of the world [Guarino 98]. Creating Ontology-driven Information Systems resumes by favoring the role that ontology plays in all aspects and all components of the IS.


Furthermore, ontologies started to be more aware that having a single representation is sometimes inadequate, matter of levels of granularity and context dependent characteristics. Therefore, they should also provide the ability for integrating different user perspectives and representations. We identify three main categories of uses for ontologies. Within each, other distinctions may be important, such as the nature of the software, who the intended users are, etc.

4.1.4.1 Communication In particular, ontology based human communication aims at reducing and eliminating terminological and conceptual confusion by defining a shared understanding, i.e., a unifying framework enabling communication and cooperation amongst people for reaching better enterprise organization. Ontologies can offer the possibility to define normative models to be used within any large scale community. Presently, one of the most important roles an ontology plays in communication is that it provides unambiguous definitions for terms used in a software system. More examples of case uses have been developed in many area such as:

• Disease Ontology project intends to create a comprehensive hierarchical and controlled vocabulary for human disease representation (http://diseaseontology.sourceforge.net);

• FAO (Food and Agriculture Organization of the United Nations), is committed to help information dissemination by providing consistent access to information for the community of people and organizations through the Agricultural Ontology Service (AOS) project (http://www.fao.org/agris/aos/);

• Open EDI ontology defines an ontology for data management and interchange between enterprises known as ISO/IEC JTC 1/SC 32, (http://www.jtc1sc32.org/). A snapshot of this model is described in Appendix B.

4.1.4.2 Interoperability Interoperability is considered the ability of many systems to cooperate for a common purpose. It includes the opportunity to exchange data between systems and services call across many platforms and systems. The main problem to achieve interoperability is the heterogeneity which includes structural heterogeneity (schema) and semantics heterogeneity [Kashyap 96]. Ontologies can effectively help for resolving the problem of semantic heterogeneity and reaching a well established interoperability process. They can act as a conceptual model representing enterprise consensus semantics through an integrating environment (e.g. global architecture). This environment, covering enterprise-wide systems, enables different software tools and information systems to collaborate.

4.1.4.3 System Engineering The applications of ontologies that we have considered to this point have focused on the role that ontologies play in the operation of software systems ( i.e. at the runtime). In this section we consider applications of ontologies that support the design and development of the software systems themselves (i.e. at the development time). Specification: Ontology covers, by definition, the specification and the conceptualization. The basic idea is to use ontology to model the application domain and to provide a vocabulary for specifying requirements. In other words, a shared understanding of the problem and the task at hand can assist in the specification


of software systems. This ontology will help considerably during the development of the software application. An approach consisting of specifying local ontologies in support of semantic interoperability was suggested in [Benaroach 02]. This approach has been applied for distributed inter-organizational applications. This use of ontology as assistance for identifying specification is also considered in many research works such as [Cranefield 99], [Osterwalder 02], etc. In [Jasper 99], a set of scenarios for the use of ontologies are presented, along with a classification for ontologies applications. One of these scenarios consists of using ontology as specification by building a conceptual model of the domain in UML, which will then comprise the explicit ontology for the application. Either way, the ontology’s role in specification varies with the degree of formality within the system design methodology:

• Informal ontology facilitates the process of identifying the requirements, and understanding the organization and deployments of the system. This informal ontology is helpful particularly for systems involving a distributed team of designers working in a different domain.

• Formal ontology provides a declarative specification of the software systems. It allows the designers to reason about what the system is designed for, rather than showing how functionality is supported in the system.

Reliability: reliability defines the ability of a system or component to perform its required functions under stated conditions. We can differentiate between the roles that an ontology might play for reliability of software systems. Indeed, informal ontologies can improve the reliability by serving as a basis for manual checking of the design against the specification. Formal ontologies enable the use of semi-automated consistency checking of the software system with respect to the declarative specification. Reusability: The accomplishment of reusability largely depends on the sharing of a similar conceptualization. By characterizing classes of domains and tasks within these domains, ontologies provide a framework for determining which aspects of a system are reusable between different domains and tasks. Indeed, a clear semantic is needed of the concepts related components being reused. Thus, ontology-driven IS can import and export modules and components between their systems. One of the key issues for reusability is "de-contextualization" of the knowledge or ontology [Mizoguchi 97]. Every piece of knowledge/ontology is tuned to a context in which it is expected to be applied. The first thing we have to do is to formalize the context and then, establish the terminological correspondences. These two are not sufficient, but necessary, for making knowledge and components reusable. The resulting problem is that when software tools are applied to new areas/contexts, they may not perform as expected, since they relied on assumptions that were satisfied in the original applications/contexts, but not in the new ones. By characterizing classes of domains, tasks within these domains, and their coordination rules, ontologies will provide the possibility of generating the frameworks for determining which parts of an ontology are reusable between different domains and contexts. A further process of completing and adjusting from framework may be needed before


running the new application. Throughout, identifying which objects, methods, and components, will be a subject of reuse between two contexts. For instance, if we consider ontologies with the tools that we have suggested (QELT and EDI Translator) based on the common Mapping Expressions Model, we could reuse implemented components from one to speed up the development of the second.

Figure 4.2: Ontology architectures

4.1.4.4 How can we use ontologies for semantics sharing Ontologies can also be used for the identification and association of semantically corresponding information concepts. In order to establish this correspondence and sharing semantics, there are many ways that ontologies may be employed. In general, three different directions can be identified: single ontology approaches, multiple ontologies approaches, and hybrid approaches. The Figure 4.2 gives an overview of the three main architectures as described in [Wache 01]. We discuss in the following these three architectures. Single Ontology approach: this approach consists of using a global ontology for providing a shared vocabulary and specification of the semantics. Thus, all information systems are mandatory related to this global ontology. Sometimes, it is necessary to combine (import) new ontology in order to take into consideration a new system plugged to the global architecture. Nevertheless, this process is possible when the imported ontology does not have a different view on the domain (e.g. different granularity, perspectives). In this approach, maintaining the global ontology and the minimal ontological commitments is a hard task. Multiple local Ontologies approach: this approach consists of defining a local ontology for each information system. These ontologies do not necessarily share the same semantics and no common minimal ontological commitment is needed. The local ontologies can be developed independently and can be simply added or removed from the architecture without affecting any change. On the other hand, the lack of a common vocabulary makes it difficult to make the systems work together. To overcome this problem, an additional representation formalism defining the inter-ontology mapping is needed. The inter-ontology mapping identifies semantically corresponding terms of different source ontologies, e.g. which terms are semantically equal or similar, etc.


Hybrid Approach: it brings together the single and the multiple approaches. Therefore, the semantics of each system is defined locally as in the multiple approach. But in order to make the local ontologies comparable to each other, they are built from a global shared vocabulary [Goh, 97]. The shared vocabulary contains basic terms (the primitives) of a domain which are combined in the local ontologies in order to describe more complex semantics. Sometimes the shared vocabulary is also an ontology as described in [Stuckenschmidt 00]. The advantage of a hybrid approach is that new sources can easily be added without the need for modification. It also supports the acquisition and evolution of ontologies. The use of a shared vocabulary makes the source ontologies comparable and avoids the disadvantages of the multiple ontology approaches. To conclude, we referred in this section the word Mapping to signify the relation or connection between ontologies or parts of an ontology to another ontology. This notion has been considered differently in many studies such as alignment between ontology [Sowa 00], defined mappings [Peerce 99], Lexical relations [Mena 96], and Top-level grounding [Calvanese 01].

4.1.5 Overview of languages, implementation tools, and applications

During the last decade, an uncountable number of studies have been lead regarding ontologies research. After importing the word from philosophy and identifying its meaning for the computer science community; the researchers studied representation languages and implementation tools along with ontology engineering methods. The second interest of the community was to identify a set of examples of developed ontology, many projects such as CYC upper ontology1, LinkBase (1 Million medical concepts, 420 link types)2 or Towntology project [Keita 04] have goals to provide useful and ready-to-use ontologies. On the other hand, a set of libraries becomes available to access, to reference, or to import during the development cycle. We enumerate DAML ontology library3, Ontolingua4, OWL ontology library5, and Kactus library6.

4.1.5.1 Ontology representation languages Ontology languages development, or expressing ontologies in the implementation stage, has been motivated by the wide use of ontologies in diversified domains that may require different levels of expressiveness and inference mechanisms. We can stratify the ontology representation languages as follows: Description Logic-based: there is an overwhelming dominance of systems using some variants of description logics as an ontology representation language. We find in this category Loom [MacGregor 91], CLASSIC [Borgida 89], CARIN [Goasdoué 99], and OIL [Fensel 01], etc. Last but not least, description logics were chosen as underlying formalism for OWL7 (Ontology Web 1 http://www.opencyc.org/ 2 http://www.landcglobal.com/pages/linkbase.php 3 http://www.daml.org/ontologies/ 4 http://www.ksl.stanford.edu/software/ontolingua/ 5 http://protege.stanford.edu/plugins/owl/owl-library/index.html 6 http://web.swi.psy.uva.nl/projects/NewKACTUS/library/library.html 7 http://www.w3.org/TR/owl-features/


Language) suggested by W3C for creating the web semantics paradigm. However, these languages vary by their constructors (e.g. logical operators, slot constraints, axioms, etc.), the efficiency of their reasoning support, and the soundness and completeness of their algorithms. A comparison between the features and the expressiveness of a set of these languages can be found in [Wache 01]. Frame-based: it covers a set of languages such as Frame-Logic (Flogic) [Kifer 95], KIF (Knowledge Interchange Format)-based Ontolingua [Farquhar 97], and OKBC (Open Knowledge Based Connectivity) [Chaudhri 98]. They offer, on one hand, common elements for the definition of concepts and relations. On the other hand, they are based on First Order Logic axioms and differ in expressiveness and computational properties. Only OKBC does not provide an axiom language that is sufficient for the description of terminological axioms [Wache 01]. Yet, we should refer to [Corcho 00] as providing an evaluation of the ontology languages mentioned above. Other approaches: despite of the two main based languages, we can identify several less used approaches. Formal concept analysis is based on well-founded mathematical models such as lattice of theory. Object languages the object oriented paradigm offers a set of techniques and methods that can be used for modeling ontologies. Although, [Cranefield 99] suggested the use of UML and OCL as ontology language, this technique lacks the support of reasoning services. For indication, the Table 4.1 shows a comparison between the expressivity of ontology languages using XML syntax.

Table 4.1 Comparison between XML-based ontology languages ([Rifaieh 01])

4.1.5.2 Ontology development techniques, implementation tools, and methodologies

Ontologies are developed by a team including domain experts who have the domain knowledge and ontologists who can formally model knowledge. Ontologies are usually developed using special tools that can model rich semantics and to assist team’s members in building ontologies (e.g. Protégé1 ).

1 http://protege.stanford.edu/

Data Types Types of properties Property Element Classes

Primitive data-type

Numeric min, max

Transitive Inverse Import element

Individual element

Negation/ Disjoint classes

Inheritance

XOL Yes Yes No Yes No Yes No No

OIL No No Yes Yes No Yes Yes

RDFS No No No No No Yes No Yes

DAML+OIL

Yes Yes Yes Yes No Yes Yes Yes

OWL Yes Yes Yes Yes Yes Yes Yes Yes


Currently, there are no available tools, which support the complete development cycle, but there are many existing tools for tasks such as ontology capturing, representation, visualization, and editing [Arara 02]. The authors of [Duineveld 00] gave a comparative study of existing ontological engineering tools. Gruber set up in [Gruber 93] ontology engineering foundations. He proposed a set of design criteria to guide the development of ontologies. He discussed the basic design principles required to construct and design formal ontology and suggested a compromise among the conflicting design features namely: clarity, coherence, extendibility, minimal encoding bias, and minimal ontological commitment. Afterwards, an extensive research has been carried out in methodologies (e.g. Methontology [Fernández-López 99]), creation (e.g. Open Ontology Forge [Kawazoe 03]) and versioning (e.g. OntoView [Klein 02]). As for ontology development, many approaches have been suggested which are sometimes based on domain vocabulary or on a domain explicit model. Regardless, we can bring together development methods in three main categories: top down, bottom up, and middle out. A brief summary of these approaches is presented below: Top down: recommended by Sowa in [Sowa 95], starting from the most generic concept and building a structure by specialization; the ontology is built by determining first the top concepts and by specializing them. Bottom up: proposed in [Van Der Vet 98]: starting from the most specific concepts and building the conceptual hierarchy by generalization; the ontology is built by determining first the most specific concepts and by generalizing them. Middle out: suggested in [Uschold 96], identifying core concepts in each domain identified and then generalized and specialized them to complete the ontology.

4.1.5.3 Ontology applications It is almost impossible to enumerate the applications where ontologies have been exploited, but we may be able to classify them in some manner according to some key aspects of technological and technical trends as follows: Interoperable ontology-based information systems: Many research works have been focusing on ontologies and semantic interoperability applications. The paper [Fonseca 99] introduces, for instance, a geographic information system architecture based on ontologies and studies interoperability for a proposed system. Ontobroker project [Decker 98] studies the interoperability for semi-structured data in the context of the web. Information searching and semantic web: Ontologies allow a content-based search instead of key word-based search. Many search techniques in the Web (e.g. Yahoo, Google, etc.) are ontology-based searches. The use of ontologies for reaching the semantic web is currently a very active research area (e.g. OWL) [Berners-Lee 04]. Intelligent metadata and specification of information technology (IT) systems: An important role played by ontology is the explicit descriptions of specification for information systems in [Cranefield 99] and applied in the domain of management software tools in [Osterwalder 02]. Ontology can


easily be used to describe metadata for some applications systems. In [Van Zyl 99], an ontology based meta-data representation is presented for Data warehouse systems. Query and reasoning support services: Ontology-based queries allow for rich semantics that will lead to precise query results. Formal ontologies based on description logics have support for reasoning services provision and provides users with inferred results for their queries. For instance, in observer project [Mena 96] queries processing in global information systems is studied, where separated ontologies are used for describing information sources. Some scenarios of using ontologies for query answering are presented in [Ceusters 03]. The above categories show the widely diverse spectrum of possible applications of ontologies. In addition, a framework for understanding and classifying ontology application was presented by [Jasper 99]. A survey of existing approaches and applications for integration of information using ontologies is also presented in [Wache 01]. It should also be noted that applications do have similarities and common factors. For instance, they all need to share the meaning of terms in a given domain, and they all strive for working and communicating in cooperative environments. By use of ontology, barriers among applications in a specific domain of interest can be drastically reduced and semantics can be more easily shared.

4.2 Difference between Enterprise Ontology and Local Information Systems Ontology

As previously presented, ontologies are a suitable technology for semantic sharing. Thus, it is logical to be used for improving enterprise‘s domain of activities. Indeed, getting coherence and fast partnership is one of the major problems that enterprises need to ensure. This consists of sharing a common understanding of a given application domain and exchanging information between these systems. In this respect, we differentiate between two uses of ontologies in the context of enterprises. Firstly, ontologies can be used for modeling enterprise activities and its business organization. Secondly, local enterprise ontologies can be used for modeling enterprise information systems and capturing specification (ontology driven IS). The aim of the latter is to support semantic sharing for distributed and inter-organizational applications. Some will argue against this stratification because modeling an enterprise information system goes back to modeling a business activity. In fact, each one applies a different strategy for building specific enterprise ontology (bottom-up or top-down) and differs in their implementation goals. The first, assumes defining very general concepts, whereas the second, defines specific system concepts (local ontology). Let us begin by studying each of these approaches, then we will return to compare between them.

4.2.1 Enterprise ontologies

The objective of an enterprise ontology is the conceptualization of the common economic phenomena of a business enterprise unaffected by application-specific demands. We find in the research literature


many projects that aim at modeling enterprise activities. We can enumerate Core Enterprise Ontology [Bertolazzi 03], Edinburgh Enterprise Ontology [Uschold 98], and TOVE ontology1. Moreover, some methodologies and languages were suggested for capturing enterprise ontologies such as IDEF52, PIF3, and BEM (Business Engineering Model). IDEF5 is presented as a methodology for capturing an ontology. PIF is a frame language based on KIF and presented as a language for describing processes. BEM is part of the Open Information Model (OIM), which concepts are described using UML class diagram. Other suggestions can not be considered as enterprise ontology, such as MIT Process Handbook4, which represents about four thousand classified processes. This handbook can be considered very useful for identifying enterprise activities and business processes. Otherwise, REA ontology [Geerts 00] focuses on enterprise internal business process in detail, but not for inter-enterprise collaboration. We sketch in the following some enterprise ontologies and show the difficulties in creating such ontologies.

4.2.1.1 Edinburgh Enterprise Ontology (EEO) The Edinburgh Enterprise Ontology [Uschold 98] is a collection of terms and definitions relevant to business enterprises. It proposes a set of carefully defined concepts that are widely used for describing enterprises in general, and that can serve as a stable basis for specifying software requirements. Conceptually, the EEO is divided into a number of main sections describing enterprise activities such as activities and processes, organisation, strategy, marketing, etc. EEO is represented with both an informal (text version) and formal way with Ontolingua. This enterprise ontology contributes to the reuse of business models, nevertheless, it covers only such general and abstract concepts, that it is very hard to construct concrete business models with operability.

4.2.1.2 Core Enterprise Ontology: CEO Core Enterprise Ontology [Bertolazzi 03] comprises a categorization of the enterprise concepts and a first proposal of Upper ontology. Specific enterprise ontology can be built as a top-down approach starting from CEO, through refinement and decomposition hierarchies. CEO gathers the most general business concepts, common to the majority of enterprises, independently of the specific activity field. It includes also a concept definition language, which consists of refining the concepts already existing in CEO in order to build a specific enterprise ontology [Bertolazzi 01]. CEO advises to model a specific industry sector or even a specific enterprise, and to start with a few, well-established, general concepts that will guide business express in defining their enterprise ontology.

1 http://www.eil.utoronto.ca/enterprise-modelling/tove/index.html 2 http://www.idef.com/ 3 http://ccs.mit.edu/pif/ 4 http://ccs.mit.edu/ph/


4.2.1.3 Toronto Virtual Ontology TOVE ontology The TOVE project has formally defined a set of concepts that are general enough to allow their use in different applications, and, for each concept a set of Prolog axioms allow the semantics of the concepts to be defined. The definition of each concept is given in first order logic. The concepts, similarly to Enterprise Ontology (Edinburgh Enterprise Ontology), are grouped into thematic sections. The ontology contains generic concepts like time, causality, activity, and constraint. For each concept, properties and relations are also defined. The concepts are structured into taxonomies and are represented by constants and variables; the attributes and relations are represented with predicate.

4.2.2 Specifying local Enterprise Information System Ontology

In the traditional software engineering domain, for each new application to be built, a new conceptualization is developed. According to the contribution of ontologies in this domain, they can be used as specification for high-level reusable software, like domain models and frameworks [Girardi 03]. The main idea resumes by providing a vocabulary for specifying requirements for one, or more, target applications [Jasper 99]. Unfortunately, applications developed using traditional IS specification methods do not make their local ontology explicit because their underlying semantic information is usually implicit in the software [Benaroch 02]. Therefore, if we do not start with modeling and using ontology, it will be hard to extract the semantics from the implemented systems. In this case, we can proceed according to three ways to find these semantics: reading source code; analyzing run time; and interviewing users. An ideal alternative would be to create applications based on an IS specification method that make their local ontologies explicit in the first place (ontology-driven IS). For instance, UML defines several types of diagrams that can be used to model the static and dynamic behavior of a system. It is widely adopted in industry and has a very large and rapidly expanding user community. Designers of enterprise information systems will be more likely to be familiar with this notation than KIF or DL, which are not widely known outside the AI research community [Cranefield 99]. In UML, the ontology information is usually modeled in class diagrams to depict the concepts in the domain: an object diagram to show particular named instances of those classes, and Object Constraint Language (OCL) to offer powerful expressiveness for constraints.

4.2.3 Local enterprise information system ontology versus global enterprise ontology

In general, a system implements a certain number of business processes, but with specific use and intention. Albeit, enterprise ontologies can be very useful when it comes to IS requirements engineering, but they are not easy to understand and to be manipulated. We resume in the following some drawbacks of enterprise ontology approaches:

• It is difficult to design a precise and comprehensive enterprise ontology. Modeling a specific industry sector or even a specific enterprise is more difficult challenge than modeling a set of systems.


• Some enterprise ontologies are very generic and difficult to be personalized for a specific activity.

• If we assume that these ontologies exist and they are efficient, no guideline is provided to use the predefined concepts for reaching the specification of the enterprise information systems.

• Systems designers prefer to work on their systems independently, with a basic vocabulary, rather than a very tight enterprise ontology.

• They respect, in majority, top-down approach, where bottom-up seems to be a more realistic method for reaching system’s requirements.

• These ontologies adhere to a single ontology approach for sharing semantics, whereas multiple or hybrid approaches are more promising.

In order to avoid the gap between top enterprise design and specific system design, we believe that using ontologies in the enterprise should consider the specification of the system and a common vocabulary (which can be an ontology), in other word, the hybrid approach of sharing semantics. Therefore, in order to reach ontology-driven EIS, the elicitation and modeling of software requirements can be accomplished by binding local ontology and global ontology. Thereby, local EIS system ontology can be identified based on top-level enterprise ontology and the particularities of specific requirements of the implemented application. Hereafter in this thesis, we are interested on specifying local ontologies for each information system. In other words, local ontology-driven information systems, without considering the existence of a global enterprise ontology. Whereas, these local ontologies represent the system context and their user’s point of views. A more complete solution can be reached by extending our work and combining the local ontologies and specific enterprise ontology.

4.3 Our Vision for Promoting Ontologies within the enterprise:

Although, we showed previously the adequacy of ontologies for enterprise evolution and set of essential productivity goals, a small number of enterprises have yet confronted this adventure. Studies about ontology use in practical enterprise domains show that more attention should be given to this promising technique. In April of this year, the market research firm Gartner1identified taxonomies/ontologies as one of the leading IT technologies, ranking it third in its list of the top 10 technologies forecast for 2005 [Denny 04]. However, the market interest is not up to these predictions, through 2006, more than 70 percent of firms that invest in unstructured information-management initiatives will not achieve their targeted return on investment, due to underinvestment in taxonomy/ontologies building [Knox 03]. Therefore, when putting ontology to work in an enterprise environment, we should take into account to change the way that we are thinking and acting. This mind shift seems not yet ready for IT decision makers. In the next section, we discuss how to defeat this fear of adopting ontologies techniques in the enterprises, and allowing them to take place in the production chain.

1 http://www4.gartner.com/Init


4.3.1 Is it ontology-phobia or ignorance?

Based on our three years of practical experience in TESSI Informatics & Consulting and based on our discussions with IT executives and others consultants, we perceived a persistent ignorance about ontologies applications and their advantages. Currently, few IT decision makers even know that ontologies exist. And as for those that do know, they have developed a fear of ontologies as a promising technology. This exaggerated and inexplicable fear is translated to an ontology-phobia, excluding ontologies chances to take their place in the enterprise architecture. Let us put in a nutshell the elements that contribute to this situation:

• Enterprises are goal-oriented with high aims of profit. They consider generally the ROI (Return On Investment) as the balance of every activity. During an uncertain economical environment, they tend to have short or medium term investments. Surprisingly, sometimes it is preferred to keep repetitively fixing existing systems, rather than re-conceiving the whole system applications.

• Ontologies development is a complicated process where ontologists and computer scientists have to work together. They require mutual work to classify the concepts and conceive their structure and knowledge base rules. All these elements have cooperated to make ontology development a very expensive investment, and risky. Therefore, ontologies developments are still limited. In opposite recall for adapting ontologies, from existing ones, seems to be a more attractive approach.

• Enterprises that will use ontologies should adopt new architectures and conceive their systems in a different way than how they used to do. Sometimes, they may have to build their own solutions using tools from several vendors. At the largest scale, when the amount of information requires a huge amount of concepts and categories that change frequently, this exercise will be a laborious trial and often an erroneous process.

• Enterprises doubt the maturity of ontological technologies and their applications arguing that announced goals are not yet reachable and have low cost-effectiveness.

However, in the long term for an enterprise’s evolution, there will be an urgent need for conceptualization. Thus, a set of enterprise domain ontology is needed to tackle this evolution. The definition of ontologies is being investigated for organizing meta-data for data warehouse systems and incorporating ontologies in enterprise mediator systems. For instance, some studies consider improving exchanging electronic data, classifying enterprise catalogues, improving client relationship management, optimizing knowledge management.

4.3.2 Defeating ontology-phobia:

To defeat ontology-phobia, we argue according to two elements. Firstly, we insure the interested that the maturity of the ontological approach is already attained, due to the existing components of ontology techniques. These researches and applications were developed in order to answer the basic requests. Other arguments can be cited such as:

• Well-organized ontologies and knowledge bases can save an enterprise money by decreasing the amount of employee time spent trying to find, integrate, share, and reuse information in the myriad of EIS.


• Many satisfying results have been attained in the last decade including formal ontologies, languages, engineering methodology, tools, etc. In particular, we have almost all the required elements to design, develop, and construct ontologies. Other research areas such as versioning, evolution and maintenance of ontologies are considered as current and interesting issues for investigation.

• Efficient tools are available for the development of enterprise ontologies. A recent census of ontologies development tools, for many area of applications, has counted up to 105 tools [Denny 04].

Secondly, we should take the responsibility to make ontologies more attractive; therefore, a certain number of works should be realized:

• Academic research should promote the originality of these techniques and prove their efficiency. Recently many sessions and conferences consider enterprise application of ontologies becoming more and more in vogue.

• Developing ontologies and making them widely accessible, indeed, enterprises are more interested to adapt existing domain ontologies for their business with a "minor" process of adaptation rather than constructing ontology from scratch.

• Defining methodology for adapting ontologies and showing success stories in spread domains are not negligible arguing elements.

• Presenting enterprise architecture and case studies where ontologies techniques play the main role. This thesis adheres to this goal, it takes the responsibility to argue the usefulness of using ontology driven IS and show an architecture and case study (in Chapter 6) as elements of arguments.

4.4 Context and ontology Likewise ontology, the notion of context has emerged in many disciplines, such as philosophy, linguistics, cognitive psychology, before arriving to computer science. Focusing on knowledge representation in AI, contexts appear as a mean of partitioning knowledge into manageable sets [Hendrix 79], or as logical constructs that facilitate reasoning activities [Guh 91]. McCathy has introduced, in [McCarthy 93], contexts as abstract mathematical entities with properties, which constituted one of the first examples of formalization of context.

4.4.1 The notion of context in information systems

In information systems, the notion of context was carried through views, aspects and roles, and workspaces. In particular, context is defined in multiple databases as a key component that captures the semantics related to an object’s definition and its relationships to other objects. In software engineering, this notion of viewpoints (i.e. context) has been used to support the actual development process of complex systems. In this respect, a viewpoint is defined as a locally managed object or an agent, which encapsulates partial knowledge about the system and its domain. Views are also used in requirements engineering [Nuseibeh 94] “as a vehicle for separation of concern”. The advantages of using the notion of context in several disciplines have been studied in [Akman 96]. They can be summed up in economy of representation; efficiency of reasoning; allowing inconsistencies and contradicting information; resolving lexical ambiguity; and flexible entailment. In


particular, the notion of context is becoming more interesting with the proliferation of loosely coupled systems and Peer-to-Peer (P2P) technology where context is used to represent the peer knowledge [Gold 01]. Recently, in context-aware computing, formal approaches are taken to model context based logical reasoning mechanism. Context reasoning is used to achieve the consistency of context and the deduction of a high-level, implicit context from the low-level explicit ones [Wang 04]. Furthermore, the notion of context can help to solve the problem of multi-represented elements and their manipulation. There is currently a crucial need for an abstraction mechanism that deals with an explicit description of multi-represented concepts [Balley 04] and semantics of context dependent objects [Serafini 97]. For multi-represented concepts, the same data element might be used by different entities of different applications to mean different things. Different data element names could also be used to represent the same things, potentially creating hundreds of instances of the same data all inconsistently named. In addition, context semantic sharing considers the notion of context as a way of restructuring and organizing information elements according to the context in which they occur.

4.4.2 Contextualizing local ontologies (pairing-up ontology and context)

In ontologies multiple and hybrid approaches (ontologies architecture) presented previously, the interesting point is how the local ontologies are described. The local description of information is called context as used in COIN [Goh 97] as an attribute value vector. Hence, the role of the ontology is to describe the terms and the structure of the concepts. Context is used to label the belonging of information elements when mapping is performed between the two representations. The paradigm illustrating the combination of ontology and context can achieve an explicit specification of conceptualization from different perspectives or contexts. Contextualizing local ontologies, as an abstraction mechanism, can be resumed by contextualizing ontological concepts and attributes. In such a way, that ontological information is provided according to specific interest, purpose, and level of details. Thus, the notion of context is used as a form of views to deal with concepts from different perspectives. Context is also used to handle inconsistent and contradictory concepts in the same ontology base, as long as they are treated in different contexts. We consider that contextual ontology is defined as ontology in which its concepts can be seen from different points of view (multi-represented concepts). This idea respects the notion defined in [Arara 04]: “Context in ontology representation can play a major role in structuring and packaging concepts and their relationships according to user needs and interests. Hence, contexts will be used in our approach as an abstraction mechanism that enable partitioning of ontological concepts and their links in order to resolve the conflicts and differences resulting from the multiple representations problems. It should be restated that we hypothesize in this thesis that a real world phenomenon can be multi-represented, and it is a fact of life that several representations (each representation is associated with a context) of the same, sharable object are possible”. Therefore, this suggestion to pair-up the two essential elements (context and ontology) is mainly proposed to deal with the problems in information modeling and semantic heterogeneity, in particular, the multiple views and multi-representation problems. In other words, this approach is different from


those that use context modeling in the sense that ontology with multi-represented concepts and context with multiple views are our prime concern. The term Contextual Ontologies will be used to indicate that the ontology we are dealing with is context based referring to this illustrated paradigm.

4.4.3 Requirements for Contextual ontologies in EIS

From the previous discussion, the use of ontologies as well as contextual ontologies can: reduce the loss of semantics in information exchange among heterogeneous applications; improve reusability (e.g. objects, components, etc.); and provide a global view over enterprise architectures, infrastructures, and applications. Therefore, contextual ontologies are highly placed to help resolving semantic sharing among EIS. We argue that enabling the semantics sharing can be done through Contextual Ontologies paradigm considering:

• Establishing the context semantic representation via local ontologies and expressing their belonging using a special mechanism.

• Defining semantic mappings mechanism (i.e. bridges rules or coordination rules) among Contextual Ontologies’ concepts, which are multi-represented, and treating these mappings with logical theories.

• Defining a global inference mechanism, this is based on local ontologies knowledge and the semantic mappings.

4.4.4 What problems do contextual ontologies help solving in EIS?

Like ontologies, contextual ontologies aim at first to help with heterogeneous systems problems (e.g. heterogeneous databases). Whereby, different organizational units, service providers have a radical difference between their systems including different syntactical (i.e. the format), structural (i.e. schema), and semantic (i.e. the mean or the interpretation). They all speak different languages for access, description, schemas, and meaning. Based on the previous discussion, we strongly argue that using the contextual ontology as a common description for EIS can give systems the ability to be reusable and interoperable, along with a global query answering service. This contribution broadly appears in the enterprise wide system interoperability, currently system-of-systems, and vertical stovepipes, where contextual ontologies act as conceptual model representing enterprise shared semantics. In addition, contextual ontology can be used to improve systems reusability by identifying semantically similar elements and revealing their reusable bit code. Finally, it can provide a global understanding to be used for query answering to obtain only meaningful relevant information.

4.5 Summary: After studying ontologies per se as a formal solution for EIS communication, interoperability, and reusability, we showed the position of these ontologies in enterprise architecture. Next, we argued the relevance of defining ontologies for improving EIS. A major theme for the use of ontologies in this


domain includes the creation of a shared modeling and a common environment for different information systems and software tools. Any information technology application should use these integrated enterprise models, spanning activities, resources, and services. We identified, as well, the limits of simple ontology to answer EIS needs, especially facing the multiple views and multi-representation problems. Therefore, we suggested to pair-up the two notions of ontologies and context to overcome these problems. Indeed, context is closely attached to semantics for an unambiguous set of information. In other words, information can only be understood in its own context. Thus, context is exploited to allow for contextual information capturing in order to enable the implementation of concepts with multiple perspectives. Hence, the formal representation of contextual ontologies should preserve adequate reasoning mechanisms namely: concept satisfiability; concept subsumption; concept consistency; and instance checking. A machine understandable semantics and interpretation should be given for information in a context, according to a specific point of view. In the next chapter, we will seek formalism for contextual ontology. This formalism should cover the mechanisms required by the Contextual Ontologies. It should also allow explicit semantics, machine understandable, and logic inference. We will summarize also the related works to contextual ontologies and their formalisms. The work of formalizing Contextual Ontologies, considering modal description logics, was held in collaboration with A.Arara [Arara 04] and attained a common framework illustrated in [Rifaieh-a 04], [Rifaieh-b 04].

5. CHAPTER 5

CONTEXTUAL ONTOLOGIES FORMALISM

"He who loves practice without theory is like the sailor who boards ship without a rudder and compass and

never knows where he may cast " Leonardo Da Vinci

In essence, semantics sharing between users of large communities with diversified perspectives is a challenging direction of research that requires more attention. At present, concepts are expressed formally as a single representation, in the sense that the representation language is characterized by defining a unique concept and its properties as a fixed data set. In contrast to this assumption, a real world entity is unique but it can have several representations [Benslimane 03] due to various interests, purposes, or perspectives. In fact, the multi-representation phenomenon becomes normative rather than exceptional, if interoperations among systems are sought. The notion of context is used as a form of views to deal with concepts from different perspectives [Arara-a 04]. The contextual ontologies are defined at the abstraction level to take into account the diverse points of view and multi-representation.

Chap 5- Contextual Ontologies Formalism 118

From the previous chapter’s discussions, contextual ontologies can be considered essential for the growth and competition among enterprises. The use of contextual ontologies requires providing a formalism for expressing them and a common framework for the use of these ontologies. In this respect, a formal and machine processable representation becomes very essential to satisfy the needs of EIS. In this chapter, we introduce the multi-representation through examples from Enterprise Information Systems. We study, after that, the three mechanisms required for contextual ontologies (stamping mechanism, semi-global interpretation mechanism, and semantic similarity mechanism). We suggest, then, using the modal description logics formalism for coping with contextual ontologies requirements. Similarly to the use of the description logics for expressing ontologies, we show the use of modal description logics for expressing contextual ontologies. More specifically, the language based on the ALCN augmented by modal operators is going to be used for this purpose. Finally, we give an overview of the relevant related works to contextual ontologies and multi-representation issues.

5.1 Examples of Multi-Representation Being inspired by e-commerce, a product can have different definitions, scales, and prices (e.g. with respect to currency). Furthermore, this same product needs to be marketed by many enterprises where each has its own representation, which varies in properties and possibly structures from other representations. In brief, the multiple representations are rooted to the abstraction mechanism where several conceptualizations are associated with the same object due to several other factors such as viewpoints, contexts, special interest, etc. Contextual ontologies permit us to overcome this problem by allowing the multiplicity of a concept representation. A set of context-dependent ontologies can be defined and put together without being integrated for this purpose. Hence, contextual ontology is an ontology that is kept locally, but inter-related with other ontologies through a mapping relationship between similar concepts. We argue that using the contextual ontologies as a common description for EIS conceptualization (based on multiple view of system specification) can give systems the ability to share easily more semantics.

Figure 5.1: A side of UML model for HRIS

Figure 5.2: A side of UML model for PMIS


5.1.1 Example PMIS &HRIS (Example.1):

Let us consider two information systems used in an enterprise: PMIS (Project Management Information System) and HRIS (Human Resource Information System). The UML models, defined in Figure 5.1 and Figure 5.2, represent the mono-representation of each system. These systems contain concepts using the same identifier and having the same meaning, such as Manager in PMIS and Manager in HRIS, or concepts that are different but having the same components, structure and semantically similar, such as Engineer in PMIS and Developer in HRIS. In this case, we can identify that the concepts Manager, Engineer are multi-represented in these systems.

5.1.2 Example EDI & DW (Example.2):

In respect to our findings in chapter 2, we have developed two different systems using the mapping expression model. The first, implements the QELT meta-data used to generate the transformation queries. The second, implements the mapping guideline of EDI-Translator, which is used to identify the mapping between EDI messages. The Figure 5.3 and Figure 5.4 show a snapshot of the UML class model of the implemented systems. The full models are described in the Appendix C and Appendix D. We can recognize that these two systems have many multi-represented concepts. For instance, the concepts Mapping, Field, and Record of the EDI Translator are respectively multi-represented, versus Mapping, Attribute, and Entity of QELT.

5.2 Studying Contextual Ontologies Mechanisms The approach of contextual ontologies aims at supporting several applications that are associated with several representations of the same real world entities. Our basic idea is to treat such systems as local views, where they preserve their local semantics, but we allow coordination (bridge rules) for expressing the communication among locally independent systems. Globally, the approach respects the representation for many points of view for the same concept. It provides, as well, the possibility to define global knowledge over local ontology.

Figure 5.3: A snapshot of UML model for EDI Translator

Figure 5.4: A snapshot of UML model for DW, QETL


In order to reach the advantages of contextual ontologies, we should consider three mechanisms allowing the previous features: Stamping mechanism, semi-global interpretation mechanism, and semantic similarity mechanism. We study briefly in the follow these mechanisms.

5.2.1 Stamping Mechanism:

First of all, we need to differentiate between concepts that belong to different contexts. For this reason, we recall the stamping or trade marking technique, which can be used to distinguish one representation of the same element from other representations. Thus, the multi-representations stamps or simply stamps are used to characterize several representations of a real-world phenomenon for the multi-representation paradigm. Hence, a concept can be used in one or more representation, but each representation of a concept is stamped or labeled differently. A stamping mechanism to characterize database elements in GIS applications was proposed in MADS system [Balley et al. 04] to support several representations of data. To guarantee stamping consistency, stamps are obligatorily assigned to every piece of information in order to customize them. Similarly, the stamping mechanism is used in our contextual ontology to stamp the components of each ontology and their operators and constructors. The usefulness of this technique consists of resolving the ambiguity for identifying a concept by its context. Hence, this labeling technique permits each concept Ci to be known (identified) by the context Ctxi that it belongs to, e.g., Ctx1:C1, Ctx1:C1└┘ Ctx2:C2, etc. The stamped primitive concept is used to denote primitive concepts that are specifically available for some given contexts. If we consider the Example.1 as C1, we can define the C1:Engineer, C1:Manager, C1:Staff, etc. A Description Logics stamping technique has been studied in [Benslimane-b 03], respecting the preceding characteristics, for stamping ontologies coded in this description logics formalism.

5.2.2 Semantic Similarity Mechanism:

The second issue concerning the contextual ontologies is the potential to define the semantic relationships between multi-represented concepts. We need to be able to state that two concepts of two ontologies, though being contextually different, are related, because they both refer to the same object in the world. Therefore, a directional bridge rule asserting the type of semantic relationship is going to be expressed for these concepts [Rifaieh-a 04]. The directionality of the bridge rules is important information that helps to understand the meaning of the rule. We can identify many types of bridge rules such as those defined in [Borgida 02], [Huhn02]. and [Mitra 00] :

• The identity: a concept A of IS1 is identical to B in IS2 if ABandBA ⊆⊆ • The subsumption: a concept A from IS1 subsumes the concept B from IS2 if every instance

satisfying the description of B satisfies also the description of A, we note that AB ⊆ • The inclusion: all the instances of B in IS2 have corresponding instances of A in IS1 ( BA ⊇ ). • Etc.


For a small ontology, these rules can be identified manually, but it gets more complicated with real scale ontology. A semantic matching algorithm can be useful to reduce the complexity of this task, we can inspire from the EX-SMAL algorithm to reach this goal In example.1, we can define a bridge rule asserting that the concept Employee of PMIS subsume the concept Manager of HRIS, C1:Employee ⊇ C2 Manager. In example.2, we can define a bridge rule asserting an identical relationship between the concepts Mapping of QELT and Mapping of EDI Translator, C1:Mapping ≡ C2 Mapping.

5.2.3 Semi-Global Interpretation Mechanism

The last issue concerns the definition of rules or axioms involving one or many context. These rules are very useful to build a semi-global knowledge base for contextual ontology. We talk about semi-global interpretation mechanism because these rules do not necessarily involve all the existing contexts. Therefore, the semi-global interpretation considers the coherence in a subset of the available contexts. Whereas, the local interpretation considers using concepts belonging to one context, semi-global interpretation define rules with concepts belonging to many contexts [Rifaieh-b 04]. In Example.1: using the existing multi-represented concepts can help us to create a new global concept. For instance the concept Mangement_committee_member can be defined with respect to a rule asserting that: the members of the Management committee are the instances of Manager from HRIS and the instances of Manager from PMIS. Let us refer I1= “John Smith” as an instance of Manager in the system HRIS and J1= “Thomas Green” as an instance of Manager in the system PMIS. The semi-global interpretation mechanism should offer the possibility to interpret that I1 and J1 are instances of Mangement_committee_member.

5.3 Description Logics & Modal Logics In reality to use contextual ontologies, we encounter the following challenges: (i) finding a suitable representation model for these contextual ontologies that compromises expressivity and tractability, (ii) dealing with the required mechanisms studied before. In this section, we seek a formalism suitable to express these mechanisms. We have oriented our research toward the logics based approaches because these approaches have been successfully used in specifying explicitly the concepts and their relationships for a domain of interest (e.g. the use of Description Logics for expressing ontologies). In the last thirty years, terminological knowledge representation systems (TKRS) have been commonly used to represent facts about an application domain [Catarci 93]. Ontologies were expressed using formalism such as CG (Conceptual Graph) [Sowa 00], frame logic [Farquhar 97], or KL-one based languages (DL family), etc. However, maturity of mathematical logics in general and in description logics (DLs) in particular, has encouraged their use for expressing ontologies with applications in bioinformatics, e-commerce, environment, urban applications, etc. Although DLs, are very expressive and machine processable, there are still several research problems encountering the use of DLs. For instance, the multiplicity of representations is not treated in DLs. Therefore, we should seek for an additional solution for expressing the multi-representation issue. We


suggest associating the DLs with the Modal Logics (MLs) [Lemmon 77] for expressing our contextual ontologies. In modal logics, the semantics of expressions or formula is defined in terms of modalities as the truth of things in different worlds or contexts. Thus, Modal Logics distinguish modes of truth, which contrast with classical Description Logics, where things are just true or false, and not true in one situation (context) and false in another. For the generalized modal logic, modes of truth are explained by referring to (abstract) worlds namely: truth in all (accessible) worlds, and truth in some (accessible) world. The Modal Logics can express also the temporal progression modality, belief modality, etc. In fact, the combination of DLs and MLs are already used with applications such as temporal databases, software specifications, etc. In the following, we will briefly review the basic issues concerning DLs, Knowledge base systems, and MLs. We will study, after, the suitableness of a combination of DLs and MLs for expressing contextual ontologies and the required mechanisms.

5.3.1 Description Logics:

Description logics (DLs) are a family of object-centered knowledge representation formalism that has been used for many Information Systems (IS) applications (DB integration) and Artificial Intelligence (AI) applications (intelligent agents). Recently, they get more attention since it has been chosen as the basic formalism for semantic web ontology and the language OWL (Ontology Web Language). The basic elements of a concept language are concepts, roles and objects. The elementary descriptions are atomic concepts and atomic roles. A concept is a class that is used to describe a set of objects with common properties. Roles are used to describe binary relationships between objects. Indeed, Description Logics view the world as being populated by individuals that can be grouped into classes, and that can be related to each other by binary relationships. A specific DL language (such as ALC, ALCN, ALCNH, etc.) provides a set of constructors and operators for building more complex concepts and roles. Complex description can be built from them inductively with concept constructors. Concept descriptions in ALCN are formed according to the syntax rules defined in the left side of the Table 5.1. A DL language distinguishes between two types of collection: The T-Box (Terminological Box): Terminological axioms make statements about how concepts or roles are related to each other. Then we single out definition for specific axioms and identify terminologies as sets of definition, by which we can introduce atomic concepts as abbreviation or name for complex concepts. Thus, it is a collection of subsumption assertions specifying the terminology used to describe some application domain, it is somehow similar to an IS schema [Baader 03]. The A-Box (Assertional Box): This second component of a knowledge base describes a specific state of affairs of an application domain in terms of concepts and roles. Some of the concept and role atoms in the ABox may be defined names of the TBox. In the ABox, one introduces individuals, by giving them names, and one assets properties of these individuals. Thus, it is a collection of assertions about individuals describing some state of worlds; it is somehow similar to a database of facts in an IS.


Syntax Semantic

n)restrictionumber least (At n)restrictionumber most (At

tion)Quantifica al(Existenti

tion)Quantifica (Universal

negation)(Concept

on)(Disjuncti

on)(Conjuncti

concept) (Atomic

Bottom) Concept, (Universal ,

nRnR

RC

RC

C

DC

DC

A

TDC

≥

≤

∃

∀

¬

∩

∪

⊥→

{ }{ }

{ }{ }{ }{ }n R)d,d(d# d)nR(

n R)d,d(d# d)nR(

CdR)d,d(,d d)C.R(

CdR)d,d(,d d)C.R(

C/)C(

DC)DC(

DC)DC(

T

I212

I1

I

I212

I1

I

I2

I212

I1

I

I2

I212

I1

I

III

III

III

I

II

≥∈∈=≥

≤∈∈=≤

∈∧∈∃∈=∃

∈⇒∈∀∈=∀

=¬

∪=∪

∩=∩

=⊥

=

∆

∆

∆

∆

∆

Φ

∆

Table 5.1: Syntax and Semantic of ALCN

A DL knowledge base K is a pair <T,A> where T is a terminology (T-Box) and A is an A-Box. Our research will focus on the DL language ALCN. The semantic is given by the interpretation ),( III ⋅∆= , which consists of an interpretation domain I∆ , and an interpretation function I⋅ . The interpretation function I⋅ assigns to every atomic concept A a set IIA ∆⊆ and to every atomic role R a binary relation IIIR ∆×∆⊆ . The interpretation function is extended to a concept description by the inductive definition in the right side of the Table 5.1. For instances, we say that two concepts C, D are equivalent, and write DC ≡ , if II DC = for all interpretation I. The number restriction semantics are interpreted using “#{.}” which denotes cardinality of the set.

5.3.2 DL-based Knowledge Representation System:

A Knowledge Representation (KR) system based on Description Logics provides facilities to set up knowledge bases, to reason about their content, and to manipulate them. Figure 5.5 sketches the architecture of such a system. A KR system not only stores terminologies and assertions, but also offers services that reason about them. Typical reasoning tasks for a terminology are to determine whether a description is satisfiable (i.e. non-contradictory), or whether one description is more general than another one, i.e. the first subsumes the second. Algorithms that check subsumption are also employed to organize the concepts of a TBox in a taxonomy according to their generality [Baader 03]. Checking satisfiability of concepts is a key inference. A number of other important inferences for concepts can be reduced to (un) satisfiability. For instance, in order to check whether a domain model is correct, or to optimize queries that are formulated as concepts. Further, interesting relationships between concepts are equivalence and disjointness. Theses properties are formally defined as follows:

• Satisfiability: a concept C is satisfiable with respect to the TBox T if there exists a model I of T such that CI is nonempty. In this case, we say also that I is a model of C.

• Subsumption: a concept C is subsumed by a concept D with respect to T if II DC ⊆ for every model I of T. in this case we write DC T⊆ or T╞ DC ⊆ .

• Equivalence: two concepts C and D are equivalent with respect to T if II DC = for every model I of T. In this case, we write DC T≡ or T╞ DC ≡ .


• Disjointness: two concepts C and D are disjoint with respect to T, if Φ=∩ II DC for every model I of T.

TBoxTerminological knowledge

ABoxKnowledge about individuals

Knowledge base

Formulated in

Description logics

RulesApplicationprograms

Reasoning

DL Reasoner

KB server

TBoxTerminological knowledge

ABoxKnowledge about individuals

Knowledge base

Formulated in

Description logics

RulesApplicationprograms

Reasoning

DL Reasoner

KB server

Figure 5.5: DL-based Knowledge Representation System The basic reasoning mechanism provided by DL systems checked the subsumption of concepts. Some successful KRS such as Racer [Haarslev 01] or FACT [Bechhofer 00] consider in addition to TBox reasoning and the ABox consistency check, altogether with optimization techniques.

5.3.3 Modal Description Logics:

Modal logic [Lemmon 77] is a form of logic which deals with expressions that are qualified by modalities such as possibly, necessarily, contingently, and others. Whereas traditional forms of order logic state only about assertions which are true or false, modal logic deals with logical relationships to express an assertion is true in a situation and false in other situation. There exists a large variety of modal logics for a variety of applications. In the following, we will sketch two of them: Propositional Dynamic Logics (PDL) are designed for reasoning about the behavior of programs. PDL was introduced by [Fischer 77] in order to simplify the task of defining the behavior in large software engineering projects. Temporal logics are designed for reasoning about time dependent information. They have application in databases, automated verification of programs, hardware, distributed systems, natural language processing, planning, etc. [Baader 03]. They are used to “temporalize” Description Logics and differ in whether the basic temporal entities are time points or time intervals (e.g. ALCQIreg is used for temporal description logics) [Artale 01]. In general, Modal Logics is used for talking about modalities of truth. It defines a set of situations (i.e. called worlds) and the relations existing between these worlds. It allows the definition of interpretation for a statement in these worlds. If a statement is true in all possible worlds, then it is a necessary truth. A


statement that is true in some possible world (not necessarily our own) is called a possible truth. The possibility and necessity operators are represented respectively with C (is interpreted as the set of all the possible worlds where C holds), ◊ C (is interpreted as the set of all the possible worlds where C holds).

In the work of Kripke [Kripke 59], these worlds can be related with so called accessibility relationships. The accessibility relations are set of binary relations involving two worlds. The set of accessibility relations define the Kripke’s Structure of worlds. However, we can apply some restriction of these binary relations. For example, we can obtain the modal logic S4 by restricting the Kripke structures to those where the accessibility relation is reflexive and transitive [Baader 03]. Other modal logics restrict the accessibility relation to be symmetric, equivalence relation, etc. Moreover, the number of accessibility relations may be different from one. Then we are talking about multi-modal logics, where each accessibility relation ∇r can be thought to correspond to one agent, and is quantified using the modal operators ◊r and r.. We are interested in multi-modal logics language because it permits to define many accessibility relationships between the identified worlds. In this respect, Table 5.2 shows the syntax and semantics of modal operator in a multi-modal language.

5.3.4 Syntax and semantics of ALCNM:

The combination between DLs and MLs can create Modal Description Logics languages MDLs such as ALCNM. The syntax of MDL ALCNM consists of the classical ALCN constructs and the modal operators’ □ and ◊. If we consider the multi-modal logics, then our modal operators become i and ◊i. One must be careful in such a combination not to ruin the balance between expressiveness and effectiveness [Wolter 98]. The syntax permits to define expressions such as: ◊C, □C, DC ∧ , C¬ , C.R∃ , etc. with the modal description language ALCNM. The intended semantic of ALCM is a natural combination of standard Tarski-type semantic for the description part of ALC and the Kripke-type semantics for the modal part. In other words, the formal semantics of ALCNM language are interpreted by:

• The conventional ALC interpretation (description logics) with Tarski’s model where an interpretation is a pair I = (∆I, .I), such that the set ∆I is the interpretation domain and .I is the interpretation function that maps every concept to a subset of ∆I and every role to a subset of

II ∆×∆ . • Kripke’s structure stating what are the necessary relations between worlds and what are the

formulas necessarily holding in some worlds. It defines where concepts are interpreted with respect to a set of worlds or contexts denoted by W. If |W|= 1 then our interpretation becomes the

Syntax Semantic

iC Necessity operator ( iC) I(w) ={x ∈ ( iC) I(w) iif ∀ v ∇i w, x ∈ C I(w)}

◊iC Possibility operator (◊i C) I(w) ={x ∈ (◊iC) I(w) iif ∃ v ∇i w, x ∈ C I(w)}

Table 5.2: Syntax and Semantic of modal operators


classical interpretation of Tarski-Model described above. The syntax and semantics of modal operators are given in Table 5.4.

Having a set of contexts or worlds W={w1, …, wn}, the model of interpretation of concepts will require a Kripke’s structure < W, ∇r, I(w)>, where ∇r denotes a set of accessibility binary relations between contexts and I(w) is the interpretation over W. A model of ALCNM based on a frame F=<W, ∇0, ∇1 , ….>, is a pair M=<F,I> in which I is a function association with each w Є W a structure I(w) = < ∆ I(w), R0

I(w), …, C0 I(w),…,a0

I(w),…> where ∆ I(w) an interpretation domain in context w ∈ W , it’s a non empty set, the domain of M. Ri

I(w) set of roles that are interpreted in context w, in other word they are binary relation on ∆ I(w). Ci

I(w) set of concepts that are interpreted in context w (i.e. Ci I(w) ⊆ ∆ I(w) ).

ai I(w) set of objects that are interpreted in context w.

The issue of satisfiability and algorithm’s complexity in description logics with modal operators has been studied in [Wolter 98]. The authors of [Donini 96] proved that the reasoning tasks are reducible to the satisfaction problem for formulas, and they showed that satisfaction problem is decidable in the class of all ALCM models.

5.4 Expressing Contextual Ontologies with Modal Description Logics (ALCNM)

We have studied separately the mechanisms of contextual ontologies and the modal description logics language ALCNM. Likewise, DLs languages have been used with success for expressing ontologies, we suggest in this section the use of MDLs to express our contextual ontologies [Rifaieh-a 04] and [Rifaieh-b 04]. Therefore, we present in this section the arguments that defend this idea.

• Firstly, the formalism of MDLs offers the possibility to define a set of worlds where each world is an

interpretation of classical description logics (Tarski interpretation) [Wolter 98]. These worlds have coherent interpretations for their concepts, roles, and objects, which are identical to the interpretation of ontologies expressed with any DL language. Therefore, we can manage a local coherence for our contextual ontologies by assuming that each local ontology represents a world in the MDL formalism.

• Since each local ontology is represented with the simple DL formalism, we can use the stamping

mechanism suggested in [Benslimane 03] to differentiate between concepts belonging to different world (or local ontology). This extension of the DL helps us to put together without ambiguity the multi-represented elements of contextual ontologies.

• The mechanism of semantic similarity can be expressed using the accessibility relationships between

worlds. In other words, the MDL formalism permits us to express the relationships between the local ontologies since each ontology is considered as world in this formalism. The accessibility relationship that exists between two worlds (i.e local ontologies) is defined as a set of semantic bridge rules between the concepts of these ontologies Figure 5.7. Thus, if ∇i={ri1, ri2, …, rin}is the accessibility


relationship between two ontologies, rij represents a bridge rule between two concepts of these ontologies.

• As we studied before, we need, in our contextual ontologies, a semi-global interpretation mechanism.

This mechanism should permit to express some rules or knowledge that are global (i.e. involving all the contexts) or semi-global (i.e. involving a subset of the contexts). Since Kripke’s structure provides an interpretation for a set of worlds with respect to their accessibility relationships and the modal operators. The semi-global interpretation mechanism can be considered as Kripke’s interpretation in MDL for these worlds with respect to the accessibility relationships defined previously between local ontologies (see Figure 5.6).

• In addition, we should respect, in the used MDL formalism, some parameters of design such as : rigid

designators and finite aspect [Wolter 98]. The first imposes referring to the same object in any world using the same designator. For instance, if we use John Smith as an object of the concept Employee, we consider that this designator represents the same object whatever the world we are in. This parameter is valid because our worlds represent the conceptualizations of a domain where the designators of object are unique. The second parameter imposes a finite number of worlds. This parameter is also valid in our case because the number of contextual ontologies representing the systems within the enterprise is not infinite.

• Finally, the language ALCNM offers the possibility of multi-modal logics for using the necessity and

possibility operators: □i and ◊i where i refers to the ∇i accessibility relationship between two worlds. Therefore, we should understand that we have as much operators as we have accessibility relationships.

All these elements permit us to conclude that the MDLs languages (e.g. ALCNM) are adequate for expressing our contextual ontologies. Indeed, using MDLs languages helped us to express all the mechanisms required by the contextual ontologies.

Figure 5.6: Labeled oriented graph representing Kripke’s structure

Figure 5.7: Accessibility relation between worlds or contextual ontologies.


5.5 Revisited examples: We consider in this section the reviewing of the preceding presented examples. In fact, we aim at showing the use of the MDL formalism with these examples.

5.5.1 Example 1:

Let us consider the Example PMIS & HRIS, we assume that an ontological representation has been associated with each system and we need to define a new concept Management-committee-member. This concept is defined by using the new operators and with respect to the used mechanisms. Suppose that a Management-committee-member is every manager in HRIS and every engineer in HRIS having a management responsibility according to PMIS. According to multi-representation ontology paradigm, we can define for each contextual ontology the relevant concepts used by the system. Let us label the concepts of HRIS with h and concepts of PMIS with p to avoid any ambiguity. We will consider the ontology of HRIS Oh with the definition of the following concepts:

)Manager.of_Member 1 ( ing)voting.Str ( Staff:h

.String)occupation ( ourceHuman_Ress:h Manager:hng)field.Stri ( ourceHuman_Ress :h Engineer :h

1hhh

hh

hh

−≥∩∀⊆

∀∩⊆∀∩⊆

The ontology of PMIS Op includes the concepts:

)ssourcesRe.s_toContribute 1( )Manager.eResponsibl 1( teger)Task_ID.In ( Task:p

.String)occupation ( Human:p Manager:p

ng)field.Stri ( Human:p Developper:pRessources:p Human :p

1pp

1ppp

pp

pp

−− ≥∩≥∩∀⊆

∀∩⊆

∀∩⊆⊆

These ontologies have been contextualized by defining an accessibility relation (∇i ), including a set of bridges between their concepts. The relation ∇i = {ri1 , ri2 ,… }, contains the bridge rule rij attesting that

Manager:p Employee :h ⊇ . The definition of a new global concept Management-committee-member, can be considered using the accessibility relation between the context ontologies Op and Oh as following:

∩∪= Engineer:h(Manager:hmember_committe_Manag i p:Manager) The operator i applied to the concept p:Manager relies the objects of p:Manager in Op, by necessity through the accessibility relation ∇i for being a subset ( ⊇ ) of the concept h:Employee for Oh . In other words, the formula means: the members of Management Committee are the instances of h:Manager with the intersection between the instances of h:Engineer and p:Manager. These instances of p:Manager are by necessity obtained through the rule Manager:p Employee :h ⊇ .

5.5.2 Example 2:


Considering the example of EDI Translator and QELT, we can assume that two ontologies representation have been associated to these systems. The multi-represented concepts Mapping, Field, and Record of EDI Translator correspond to the concept Mapping, Attribute, and Entity of QELT. If the ontology of EDI Translator is represents with Oh, using the stamping mechanism of contextual ontologies, we can express these concepts as follows:

...)Field.hasield 1( g)name.Strin ( Record:h

)Condition.onhasConditi 1( )Selection.electionShas 1( )Field.dhasMapFiel 1( .String)ID_Mapping ( Mapping:h

)cordRe.hasField 1(

)ditionMappingCon.hasField 1( )Selection.hasField 1(

)Manager.dhasMapFiel 1( ring)IDField.St ( g)name.Strin ( Field :h

hhh

hhhh

hhh

1hh

hh1

hh

1hhhhh

≥∩∀⊆≥∩≥∩

≥∩∀⊆≥∩

≥∩≥∩

≥∩∀∩∀⊆

−

−

−

…

Likewise, if the ontology of QELT is represents with Op, we can express the following concepts:

...

)Attribute.hasSources 1( )Attribute.hasSources 1(

)Attribute.hasTarget 1( )Attribute.hasTarget 1(

).MappingrmationuseTransfo 1( g)name.Strin ( tionTransforma:p

)tionTransforma.rmationuseTransfo 1(

)Entity.hasTarget 1( )Entity.hasTarget 1(

)Entity.hasSources 1( )Entity.hasSources 1( g)name.Strin ( Mapping:p

pppp

pppp

1-ppp

pp

pppp

ppppp

≤∩≥∩

≤∩≥∩

≥∩∀⊆

≥∩

≤∩≥∩

≤∩≥∩∀⊆

…

These ontologies have been contextualized by defining an accessibility relation (∇i ), including a set of bridges between their concepts. The relation ∇i = {ri1 , ri2 ,… }, contains the bridge rule rij attesting that

Mapping:p Mapping:h ≡ . The definition of the bridge rules conforming to the mechanism of semantic similarity help us to use this information for crossing the inference between the two ontologies. For instance, if I1 is an instance of the concept h:Mapping of QELT, we can deduce that I1 can be also considered as an instance of p:Mapping with respect to the bridge rule rij of the accessibility relation ∇i.

5.6 Related work After studying the formalism tailored for contextual ontologies, we consider in this section to compare our approach with other approaches having similar goals. We distinguish between the alternative techniques and the alternative formalisms.

5.6.1 Comparison with Other Techniques


Table 5.3: Comparative analysis

We consider a comparison with other ontology-based techniques that are relevant to information integration and semantic interoperability such as integrated ontology, Contextual Ontology, and P2P ontology. These techniques are resumed as follows: Integrated ontology: is a representation of global semantics. It defines a global understanding based on an established consensus [Staab 04]. The consensus is needed for the common integrated ontology, which needs to be renewed each time an update occurs. It suffers from the loss of original understanding or loss of information in the profit of the unified representation. Regrettably, to reach the consensus, a considerable effort, cost, methodology, and update time is needed. Therefore, this solution does not conform to the dynamic aspect of EIS, which need to make accessible as fast as possible (even without any consensus) the business system with new functionalities. Contextual Ontology: provides local and global semantics. It allows a global view without losing original representation. Indeed, it adapts the models to co-exist through the contexts relationships. The contexts are related to an interpretation with a pre-defined structure. Thus, contextual ontology provides a dynamic consensus, rather than a static consensus, which is offered by integrated ontology [Rifaieh-c 04]. P2P ontology: considers that nodes (ontologies) are equipotential in terms of functionalities and capabilities. Each peer has different amounts of knowledge that depend on the interactions it has performed in the network of available ontologies. Each peer can acquire new knowledge and/or extend its knowledge only by querying peers, which have this information. [Castano 03]. Therefore, P2P ontology offers autonomy and low cost updating, but no global view can be reached with this technique. Locally independent ontologies: offers local semantic and interpretation where each system is viewed as being separated from others. Note that no global view can be expected from independent ontologies. Table 5.3 shows a comparison between the preceding techniques. We identified a set of criteria to compare these approaches illustrated as follows:


• Global view interests whether the approach provides a global understanding or global representation;

• Local view identifies whether the approach provides a local understanding or local representation; • Consensus identifies whether a commitment should be established between community members; • Updating complexity defines the computational complexity for updating the used elements of the

approach, • Autonomy defines whether a system depends on external information to answer local needs; • Cost of adding new ontology defines the complexity of incorporating a new ontology to the

approach; • Query answering defines the ability to answer global or local queries; • Reusability defines the ability to reuse existing components for creating a new system; • Interoperability defines the ability to cooperate for a common purpose.

The values of the comparison are deducted by analysis of the state of the art for each approach. However, to reinforce the table, we need to check these results using empirical and experimental tests.

5.6.2 Comparison with Other Formalism

In reviewing literature, we found some works that deal with context, ontologies, and logics. In fact, a survey of context modeling is given in [Strang 04] that stratifies context approaches in: key-value models, markup scheme models, graphical models, object oriented models, logic based models, and ontology-based models. We are interested in showing logic-based models and ontology based models.

5.6.2.1 Logic based models: A logic defines the conditions on which a concluding expression or fact may be derived with the inference process. In a logic based context model, the context is consequently defined as facts, expressions and rules. In this respect, formalisms have been suggested in the logic to cope with the notion of context. One of the first logic based context modeling approaches has been researched and published as Formalizing Context in early 1993 by McCarthy [McCarth 93]. He introduced contexts as abstract mathematical entities with properties useful in artificial intelligence. He introduced, as well, the basic relation ist(c,p) and value(c,p): ist(c,p) is predicate which asserts that the proposition p is true in context c, and value(c,e) is a function that gives the value of the term e in the context c. In [Guha 91], the author elaborates more on these basic relations by presenting the lifting rules. By lifting rules, we can relate propositions and terms in sub-contexts to possibly more general propositions and terms in the outer context. We can identify also the work in [Giunchiglia 93] [Ghidini 01], referred as Multi-Context Systems (MCS), which is less on context modeling than on context reasoning. It treats a context as a specific subset of the complete state of an individual entity that is used for reasoning about a given goal. An extension of this work is currently revived with MCS/LMS (MCS/Local Model Semantics) in [Roelofsen 04], which consider propositional modal logics for tackling the context reasoning and seeking satisfiability algorithms. In [Borgida 02], formal semantics of distributed description logics and distributed first order logics [Ghidini 98] are presented to provide coordination among a set of distributed information systems. In this approach, the authors argue that there will be no single global view, but correspondence between objects


in the local domains should be furnished through directed mapping using bridge rules. They also show that it is possible to translate distributed description logics (DDLs) reasoning to description logics (DLs) reasoning. They claim, as well, that DDLs extends the reasoning available on ordinary schemas to the cases of multiple schemas connected by arbitrary binary correspondence between individuals (objects). The accessibility relations define a set of bridges rules that enables to express the semantic similarity between world’s components. In [Goh 97], the context is treated as a formal object expressed in first order logic (FOL) to serve as a frame reference for sentences that are relative to some context. The COIN (COntext Interchange) team in MIT has investigated the notion of context for representing context knowledge and a context mediation engine in [Firat 03] and [Goh 97]. COIN is constructed on a deductive and object-oriented data models, and it combines the syntax and semantics of F-logics and predicate calculus. To conclude, logic based context models may be composed distributed, but partial validation is difficult to maintain. Their level of formality is extremely high, but without partial validation, the specification of contextual knowledge within a logic based context model is very error-prone. Incompleteness and ambiguity seem to be not addressed neither. Applicability to these approaches seems to be not carried out to very high implementation results.

5.6.2.2 Ontology based models One of the first approaches of modeling the context with ontologies has been proposed in [Ötztürk 97]. Recently, the subject of combining ontologies and context has gotten more attention, for different reasons and applications. Thus, some formalisms have been suggested to cope the issue of defining a framework for this combination. In [Bouquet 03], a framework and a concrete language C-OWL (Contextual OWL) for contextualized ontology was proposed. The definition of contextualized ontology is “to keep its contents local but they put in relation with contents of other ontologies via context mappings”. The new C-OWL language is augmented with relating (syntactically and semantically) concepts, roles, and individuals of different ontologies. The global interpretation is divided to the local interpretation and to the interpretation of the mapping rules. The local interpretation has two parts: Tarski’s well-known interpretation and a hole interpretation of axiom that can be defined thought other context and which can be unsatisfiable in the first. Another approach, the Aspect-Scale-Context information (ASC) model, has proposed ranges for the contextual information called scales [Strang 03]. The ASC model using ontologies provides an uniform way for specifying the model’s core concepts as well as an arbitrary amount of sub-concepts and facts, altogether enabling contextual knowledge evaluated by ontology reasoner. This work build up the core of a non-monolithic Context Ontology Language (CoOL), which is supplemented by integration elements such as scheme extensions for web Services. All ontology based context models inherit the strengths in the field of normalization and formality from ontologies. The CONON context modeling approach in [Wang 04] is based on the same idea of ACS/CoOL approach namely to develop a context model based on ontologies because of its knowledge sharing, logic inferencing and knowledge reuse capabilities. In (CONON), an upper ontology captures general features of basic contextual entities and a collection of domain specific ontologies and their features in each sub-


domain. The first order logic and description logics are used as reasoning mechanism to implement the CONON model, allowing consistency checking, and contextual reasoning. However, the ASC model and the CONON model inherently support quality meta-information and ambiguity [Strang 03]. Applicability to different existing models is limited to the type of formalism that adapts.

5.6.2.3 Evaluation The definition of contextual ontologies in C-OWL [Bouquet 03] is identical to our perception of contextual ontologies, but the formalism suggested in this chapter differs with the one suggested for C-OWL. Our approach, however, is different from those that use context modeling in the sense that ontology with mulit-represented concepts is our prime concern. Context, however, is exploited to allow for contextual information capturing in order to enable the implementation of concepts with multiple perspectives [Arara-b 04]. Hence, our formal representation of contextual ontologies should also preserve adequate reasoning mechanisms namely: concept satisfiability, concept subsumption, concept consistency, and instance checking. We compare our formalism for contextual ontology versus the other formalisms surveyed in this section. The Table 5.4 examines the properties based on the following criteria:

• Expressiveness: defines the power of the language to describe a phenomena or situation, example: ALC has limited expressiveness to describe “an elephant has precisely four legs”

• Extensibility: behavior to add operators and constructors, a language can be extended to represent needed phenomena, e.g. extension of ALC for number restriction ALCN.

• Tractability: is about reasoning with rules, it evaluates the difficulty to reason correctly with a representational language. Typically, we say that a problem is tractable if (we know) there exists an algorithm, solving the problem, whose run-time is (at worst) polynomial. Otherwise, we call the problem intractable.

• Computational complexity: is related to language expressiveness and leads tractability. Algorithms computational complexity increases with increasing expressivity. Therefore, a tradeoff between expressivity and computational complexity is important.

• Purpose: it shows the foreseen applications for the formalism.


Expressiveness

Syntax

Semantic

Extensibility

Tractability of reasoning

Computational

complexity Purpose

DDL DL syntax

Bridge rules

Tarskian

interpretation

Limited to DLs

family tractable

Same as DLs

(simulation to DL

proved)

Distributed

information

systems

COIN F-Logic

(extended

DataLog)

Horn-logics

Extension based

on built-in

predicate and

function

symbols

ALP (Abductive

Logic

Programming)

?

Loosely

information

integration in

databases

MCS/LMS

Propositional

logics

Deductive facts

reasoning ?

CSAT: Massacci’s

tableaux based

procedure for PLC

and equivalence

results with modal

logics

Polynomial (CSAT

procedure)

Information

Integration

COOL (ASC)

DAML +OIL

OWL

Horn-logic

Tarski

interpretation

Limited by DL

and Flogic

Tractability of

Ontobroker with

FLogic

?

Contextual

interoperability

and web

services

CONON DL +

First order

logics

Traskian

interpretation

Limited by RDF

and OWL ? ?

Pervasive

computing

C-OWL DL syntax

(OWL)+ bridge

rules

Local Tarskian

interpretation

with Hole

Limited with

OWL (SHOIQ)

Tractable but with

reservation of local

holes

inconsistency

Not studied yet Semantic web

Contextual Ontology

(our approach)

Modal DL

Local Tarskian

+ Kripke’s

structure

Extensible with

accessibility

relation

ALCN (tractable)

with respect to

tractability

operator

Not studied yet

Specific algorithm

is needed

Semantics

sharing

Table 5.4: Comparison between related works formalisms

5.7 Summary In the scope of EIS ontology, several conceptualization and categorizations of concepts are most likely to occur, which have lead to the multiple views and multi-representations problems. In order to tackle the multi-representations problem we proposed an approach of contextual ontology, where contexts are used as a partitioning mechanism to the domain ontology. The contextual ontology approach is aiming at


coordinating a set of context-dependent ontologies without constructing a global ontology. Thus, the problem of updating, cost, and the loss of information are avoided when bypassing the global ontology. The approach is based on the well-founded theory of modal description logics to build a semantically rich conceptual model that supports scalability and extensionality. Moreover, the proposed formalism of contextual ontology supports information accessibility, filtering, and reasoning services based on their contextual information. In the domain of EIS, the approach can be exploited to contribute to the existing problems of semantic interoperability, and re-usability related to multi-representations. The approach will be applied in the next chapter to a research project named EISCO. The EISCO project will provide global understanding over EIS to support the semantic sharing within these systems. Secondly, EISCO aims at achieving information systems interoperability (intra-inter), communication between systems, and reusability. We intend to present a case study along with scenarios of use to validate the proposed architecture.

6. CHAPTER 6

ENTERPRISE INFORMATION SYSTEMS CONTEXTUAL ONTOLOGIES (EISCO)

PROJECT

"We think in generalities, but we live in details" Alfred North Whitehead

Enterprise Information Systems (EIS) represent a set of systems used for managing an enterprise and establishing its business. Our perspective of improving EIS includes providing a global enterprise view over EIS with data integration initiative, combined with interoperability and reusability services. Ontologies are foreseen to play a key role in partially resolving the semantic conflicts and differences that exist among systems. Local ontologies are constructed by capturing a set of concepts and their links according to various criteria, such as the abstraction paradigm, the granularity scale, interest of user communities, and the perception of the ontology developer. Thus, different applications of the same

Chap 6- Enterprise Information system Contextual Ontology (EISCO) Project

138

domain end up having several representations of the same real-world phenomenon. Contextual ontology is an ontology (or ontologies) that characterize ontological concepts by a variable set of properties with respect to the context. In this chapter we introduce the EISCO (Enterprise Information Systems Contextual Ontologies) project. This project aims at validating the usefulness of contextual ontology in EIS, providing architecture, and showing scenarios of use. Furthermore, EISCO goals include discussing a case study with reusability used in EIS and a contextual ontology. The project is not currently aiming at proving the advantage of contextual ontologies for interoperability, query answering, etc. These issues can be considered as future works where different studies for each topic can be developed separately. The following section will show the project identity and discusses, as well, the software architecture. Next, we study the scenarios of using the EISCO project with interoperability, reusability, and global queries answering. In order to put into evidence EISCO’s advantages, we finish this chapter with a case study of reusability between the previously studied data exchange and data integration platforms.

6.1 EISCO project EISCO (Enterprise Information System Contextual Ontology) is a software engineering project, which exploit the model representing the contextual ontologies. Hence, these ontologies will be commonly used by several systems and applications within the enterprise. It allows sharing the same concept among several applications and representing each concept with a multiplicity of representation, such as, different roles, attributes, and instances. The notion of context is used in order to allow systems to preserve the semantics locally, whereas inter-relation between information sources is performed using coordination or bridge rules. Consequently, the EISCO project defines a high level of conceptualization for EIS by using context-dependent ontologies. It intends to be a sort of a full description of shared models involved in EIS.

6.1.1 Project identity

The EISCO project encompasses an architecture enabling automated collaborative process in a cohesive framework for systems cooperation. It is a superset of "plug and play" capabilities, from which some system can be added on the enterprise architecture, and coupled with other systems for reaching reusability, interoperability, and query answering process. The EISCO project is carried out with the following goals in mind: • Exploiting and showing the usefulness of contextual ontologies to deal with the semantic sharing

problem among EIS. Therefore, some concepts are going to be used by different applications where the definition and the description of these concepts depend on the application itself.

• Providing an architecture serving as a shared enterprise platform for reaching operational goals of cooperation between EIS.

• Showing three scenarios of use with information systems interoperability, reusability, and global query answering.

• Discussing a case study which focuses on the reusability scenario to validate the proposed architecture and formalism. This case study explores the models of two previous studies (QETL, and EDI Translator) and creates a contextual ontology for them. Throughout this case study, we show the


139

contribution of the EISCO project in resolving real world systems and its impact on managing reusability.

Currently, the EISCO project focuses only on a special interest, the reusability for EIS within a contextual ontology framework. It is not about studying the interoperability, query answering, scalability, etc. Later on, different studies for each topic may be developed separately.

6.1.2 EISCO logical architecture

In order to reach the desired characteristics, the EISCO project needs to adhere to a clear architecture model. Since EISCO’s main purpose includes providing services over EIS, we turned toward a client-server or application server architecture. The n-layer or n-tiered (e.g. 3-layer, 4-layers, 5-layers, etc.) architecture defines the principal layers for structuring distributed computing and increasing testability and maintainability. The term tier can imply a physical separation on separate processors, so many prefer the term layer [Jackson 03], implying a separate software component independent of location. The basic architecture is often referred to as a three-layer design, including the separation of the user interface or client layer, the problem of domain classes, and the data access classes and other technical services. Nevertheless, many layers can be gathered in one layer, or a layer can be broken down into many layers, creating architecture that is more detailed. The Figure 6.1 describes 5-layers architecture, showing separation between layers: client, application, enterprise, mapping, and physical. The client layer is the part of the client-server configuration that contains the user interface and other objects to access the application. The application layer includes objects specific to one application. The business logic layer, or enterprise layer, includes the part of a client-server configuration that contains the objects that implement a transversal functionality common between many applications. The mapping layer includes adapters needed for accessing physical resources.

Figure 6.1: Multi layer Architecture Figure 6.2: EISCO based architecture


140

The data layer, or physical layer, includes the underlying physical resources databases, CICS transactions, etc. We suggest adapting the 5-layer architecture to consider contextual ontologies. Therefore, 5-layer ontology based architecture will serve as a logical model for developing the EISCO project. It conserves the five identified layers and adds a new layer for semantic representation & contextual requirements (Ontology & Semantic Layer, Service Layer). Unlike the traditional architecture, the new Service Layer consists of providing contextual reasoning, and coordination. The Ontology & Semantic Layer serves as ontology, taxonomies, and common models, etc. In contrast, we can merge the mapping layer with the service layer, and the physical resource layer and the KB/ontology layer to meet with the original architecture. Indeed, the additional layers can be considered as a specific part of the mapping and physical layer, because they ensure similar functionalities (i.e. resources/ KB and mapping/access to resources). Either way, we think that a 5-layer architecture is the best model to regard for EISCO project.

6.1.3 EISCO software architecture

The software architecture is needed to describe the components, which satisfy the purpose of the logical architecture. If pertinent characteristics are omitted, the logical architecture may not be useful. If unnecessary characteristics are included, the architecture effort may prove infeasible, given the technologies available, or the architecture may be confusing and/or cluttered with details that are superfluous to the issues at hand. As far as the impact of the EISCO server architecture will be at the enterprise level of information systems, each running EIS should be accessible through a connector to the EISCO server in the global architecture, regardless of their horizontal, vertical or external purpose. Thus, the EISCO server encompasses and insures a high level of services useful for the cooperation of distributed enterprise information systems. The EISCO server ensures functionalities such as: (i) access to ontologies, the KB server and the inference engine; (ii) controls of the EISCO contextual ontologies and coordination rules; (iii) provides services such as resolving ambiguities (e.g. resolves ambiguity about concept, attributes, methods, etc.); (iv) provides an underlying architecture for services reusability (objects, components, patterns, etc.), interoperability, and query answering (treats and resolves global queries); (v) contains set of objects that can be remotely called by different applications; (vi) provides accessibility connectors to running applications. The Figure 6.3 depicts the EISCO Server architecture. It includes three main components: the EISCO KB server, the EISCO Core Services, and the EISCO Accessibility server. Each one of these components is going to be studied separately in the next sections.

6.1.3.1 EISCO KB Server The EISCO knowledge base is a centralized repository for organizing system models representing ontologies (i.e. contextual ontologies) with machine-processable semantics for the dissemination of knowledge used to optimize information retrieval, interoperation and reuse. In general, a knowledge base is not a static collection of information, but a dynamic resource that may itself have the capacity to improve as a part of the inference engine results.


141

Figure 6.3: EISCO Server architecture

KB Server Interface: the role of the KB Server Interface is to ensure the communication between the EISCO KB server and other components of the architecture. This interface will implement all the methods for managing and maintaining the knowledge base. Reasoning System (inference engine): a main advantage of using and underlying KB in our architecture is its reasoning capabilities. The reasoning engine performs assertion to discover new knowledge by inference from existing ones. Therefore, the EISCO knowledge base server provides not only the access to the KB, but also the reasoning capacity over a knowledge base where results depend only on explicit semantics. Furthermore, this engine should be suitable with the logic paradigm of contextual ontologies. Some successful implementations of KBS, such as Racer [Haarslev 01] or FACT [Bechhofer 00], can upload an UML model formatted in XMI and generate the ontology. Racer can also be used to check the coherence of an UML model; it is considered a prover for modal logic Km as well. FaCt system is also a DL reasoner, with a generic API defined using IDL and communicates via ORB. Therefore, the implementation of the EISCO KB server can use one of these two available tools.

6.1.3.2 EISCO Core Service The EISCO core server contains common services in the architecture platform for managing, controlling, and disseminating knowledge flow throughout the system, and for sharing services among applications (e.g., reusable objects, information, data, etc.). It includes many other functionality oriented components such as Ontology Manager, Semantic Mapper, Context Manager, Reusability Manager, Models Importer, etc [Benharkat, Rifaieh 04]. The implementation of this part of the architecture should follow the logic of an application server. On one side, an interface to the knowledge base and underlying ontologies should also be considered as a part of the Core Service implementation to insure the communication with EISCO KB. On the other side, EISCO CS is connected to the Accessibility Server, which serves as a bridge for the systems to reach the


142

core services. In essence, we can differentiate between three different categories of services, which are provided by EISCO CS.

6.1.3.2.1 Applications Resources Provider: This set of components is responsible in offering connected application (i.e. systems), the basic outcome of the EISCO Server Architecture. This includes: Query Manager: manages the life cycle of global queries and assigns to user its correspondent context query. It performs the KB Server request in order to find the similarities between the contextual concepts. According to the results, the system generates many distinct queries performed on the local systems. Finally, it ensures crossing the results to answer the global query. Reusability Manager: provides to the developer a set of services to create a new system. The model designer attaches a set of objects and concepts to be reused afterward. The developer can express a need to a reusable object related to a concept defined in an ontology. The reusability manager then uses the set of existing bridge rules, as well as inferred links from the knowledge base, to identify similar concepts and implemented objects. Some of these objects can be reused directly, and some may be reusable with certain adaptations. Interoperability Manager: provides interfaces for connected applications to cooperate and to share process and information. Therefore, making systems work together is concluded by interfacing through the Interoperability Manager, which takes into consideration how to resolve ambiguity with the help of EISCO KB.

6.1.3.2.2 Applications Importer: It ensures the input to the EISCO system. This category permits applications to be inserted in the EISCO architecture when their models or local ontologies are imported. Models Importer: is accessible through the administrator interface to permit the import of an application or a system in the EISCO architecture. This component provides the possibility to convert the imported system model (UML, etc.) to an ontological model, accepted by the EISCO KB Server. Ontologies Importer: consists of helping the administrator to import an ontology and to configure it to be accessible through the architecture. It helps to convert imported ontology to conform to an accepted ontology language such as OWL, DAML/OIL, etc.

6.1.3.2.3 Knowledge Manager: It is responsible for ensuring the management of the KB, and its use, by the other core services components. All of these services are accessible to the architecture administrator, and are tightly related with other components of the EISCO core services. Ontology Manager: is used for managing ontology, helping to resolve conflicts, and keeping track of changes from previous manipulations. It manages, as well, the versioning for ontologies evolution. Semantic Mapper: permits manipulating the semantic mapping and saving the mapping through a conventional form of ontology mapping rules. It offers the recall of these mappings on the need of EISCO


143

CS components. It can provide, as well, a manual matching and/or semi-automatic matching to create the alignment between used ontologies. Context Manager: considers managing the contextual information and manipulating the stamping techniques that are going to be used by the contextual ontology. It also helps the administrator of the EISCO server to assign the stamping mechanism between used ontologies.

6.1.3.3 EISCO Accessibility Server EISCO Accessibility Server contains the infrastructure services providing a low-level, but robust suite of middleware services, tools, and frameworks, which simplify the development of EISCO project connections to existing EIS. More specifically, challenging software development practices such as threading, concurrency, database connectivity, object pooling, and load-balancing implementations are off-loaded from the development of EISCO, and integrated into the accessibility server. It manages the distributed computing, data flow throughout the system, data exchange, and physical accessibility (e.g., TCP/IP, CORBA, JNI). Therefore, the Accessibility Server provides, as well, connectors used to get systems plugged to the EISCO server. Each EIS should be available through a specific connector bridging it to the accessibility server.

6.2 Scenarios of use: Scenarios of Use are 'stories' that represent typical or possible activities. It is an illustration, a thumbnail, of the offer to meet with the user’s requirements. Scenarios provide the possibility for users to actively participate throughout the software development process. Three main applications are presented in this section in a fairly specific and concrete form. Firstly, the semantic interoperability scenario is treated with EISCO. Secondly, a query answering scenario with EISCO is developed to show the possibility of ensuring global querying over EIS. Lastly, a scenario about reusability with EISCO is treated to enable architect and developers to increase their efficiency and reduce the development process by using reusable objects, design patterns, framework, XML Schema, etc. In particular, the reusability of objects is going to be studied in the scenario through a case study.

6.2.1 Used Example:

Figure 6.4: A side of UML model for HRIS

Figure 6.5: A side of UML model for PMIS


144

For these scenarios, let us consider two information systems used in an enterprise: PMIS (Project Management Information System) and HRIS (Human Resource Information System). The UML models, defined in Figure 6.4 and Figure 6.5, represent the mono-representation of each system. These systems contain concepts having the same identifier and the same meaning, such as Manager in PMIS and Manager in HRIS; or the concepts are different but having the same component, structure and semantic similarities, such as Engineer in PMIS and Developer in HRIS. In this case, we can identify that the concept Manager, Engineer, etc. are multi-represented in these systems. Each system using one of these models offers separately, a low potential for providing global querying and knowledge sharing ability. For example, it will not be possible to answer queries with global visibility such as; what are all the roles that a manager plays in a company? We do not consider the definition of a unified representation for the multi-represented concepts. Although one may argue that integrated representation can answer this query, we should not forget that reaching a common representation is on one side not easy to achieve, added to the loss of the original representation. On the other side, sometimes reaching a common representation is undesirable, or not feasible, because low commitment exists between partners and of these systems.

6.2.2 Interoperability

The interoperability, by definition, is the ability for many systems to participate together for a common goal. Given two systems, interoperability is illustrated here through the use of some methods existing in the first, for computing a value needed for the second system.

Figure 6.6: Collaboration diagram of Interoperability Scenario

Let us consider the two systems: HRIS (Human Resources Information System) and PMIS (Project Management Information System), hereafter C1 and C2 respectively. We are trying to calculate in HRIS the “manager_travel_bonus”, which occurs if a Manager had traveled for a number of missions. The information concerning the current missions of the Manager is part of PMIS, and can be computed with the “number_extern_mission” method. We have for certain to make the two systems collaborate to reach


145

this goal. Therefore, the term “travel_assignment” is going to be translated into the term “extern_mission” via the contextual ontologies. which include r. The bridge rules are defined as part of the accessibility relation in the contextual ontologies: ∇i = {ri1 , ri2 ,… }. One of these rules rij represents the multi-representation aspect of the concept “manager” defined through an identity bridge rule between C1:Manager ≡ C2:Manage. The sequence of the events is represented in Figure 6.6 as a collaboration diagram and can be interpreted as follows:

1. The system C1 asks through the interface (i.e. communication system in our architecture its EISCO accessibility server and systems connectors): give me “travel_assignment”

2. The EISCO CS_I1 reformulates the question and asks the EISCO KB, what is it “travel_assignment” based on the context C1, C1: travel_assignment=??

3. The underlying ontology answers after looking at the local C1 ontology that C1:travel_assignment =C1:manager_mission

4. The EISCO CS_I1 communicates with the EISCO CS_I2 of the second system asking: give me C1:manager_mission

5. The EISCO CS_I2 goes back to the EISCO KB asking, what are the concepts in the context C2 that corresponds to the C1:manager_mission, C2 :?? =(C1:manager_mission)

6. The ontology answers, C2:extern_mission ≡ (C1:manager_mission) 7. The EISCO CS_I2 asks the system C2, give me “extern_mission” 8. The system computes “number_extern_mission” and returns here is the “extern_mission” 9. The EISCO CS_I2 answers the EISCO CS_I1 saying, here is the C1:manager_mission 10. The EISCO CS_I1 answers the system C1, here is “travel_assignment”

After all, the system C1 is able to compute “manager_travel_bonus”, using the result information of “travel_assignment”.

6.2.3 Query answering

For query answering, one of the most important issues is to reveal what is the significance of a term used in a query. A query defines the user context, which needs to be brought as close as possible to one or many systems contexts. The underlying contextual ontologies help the EISCO Server to understand what is meant by the used term, and thus match the query context and existing contexts. Afterwards, the system should identify the relevant information, documents, or data from different systems and cross their results. Using the same example of Figure 6.4 and Figure 6.5, let us consider a query requesting the roles that a manager «John Smith” plays in the company. Firstly, we can recognize that this query should use HRIS and PMIS simultaneously, to extract the useful information. EISCO Server’s architecture offers the possibility to access the set of information systems in an enterprise through a common portal. Thus, it will be possible to find the complete answer for this query. Suppose that, a context C represents the query context and C1, C2 represents respectively, the context of HRIS and PMIS. Let us assume also that ‘John’ is an engineer, one attribute ‘title’ of the concept ‘Engineer’ can describe the current mission of an engineer with multiple values such as developer, project manager, etc.


146

Figure 6.7: Collaboration Diagram of Query Answering Scenario

One axiom (A) in our model says that: if an engineer has a title as project-manager, it has the same responsibilities (roles) as a manager. The query over EISCO Server invokes the Query Interpreter, which asks EISCO KB to resolve the query terms (attributes, classes, objects) and collate the answer. Let us consider that the bridge rules are defined as part of the accessibility relation in the contextual ontologies as ∇i = {ri1 , ri2 ,… }. One of these rules rij

represents the multi-representation aspect of the concept “manager” defined through an identity bridge rule between C1:Manager ≡ C2:Manage. The sequence of events is represented by a collaboration diagram in Figure 6.7 and depicted as follows:

1. First of all, the Query Manager creates the context C of the query : C:Manager a concept, C:”John Smith” instance of the concept C:Manager, C:Roles set of relationship of the concept C:Manager

2. The EISCO CS asks the ontology (i.e. EISCO KB) to resolve the terms used in the context of the query and identify one or more contexts representing the concept in question. Many techniques can be used, at first comparing the context C to all the existing contexts or comparing it with one contextual ontology and exploiting the bridges between multi-represented concepts.

3. EISCO KB answers that: C:Manager corresponds to C1:Manager and C2:Manager C1:”John Smith” instance of Engineer and with the axiom (A), C1 :”John Smith” instance of C1:Manager, C1:Manager Responsible_of Task, C2:Manager is_member_of_management_committee

4. The EISCO CS creates two separate queries for the correspondent systems, in order to collect the data based on the answer of EISCO KB.

5. Each system will collect needed information and send it back to EISCO CS 6. EISCO CS collates the results and sends it back to the Query Manager.

At end, the Query Manager concludes from the preceding information that: C:”John Smith” is_member_of_management_committee, and C:”John Smith” Responsible_of_ Task, with values coming from the two systems.


147

It is obvious that HRIS has offered a part of the result, and PMIS has offered the second part, whereas the EISCO sever, has crossed the results to show the roles that a manager carries throughout all the EIS. Thus, EISCO provides support to query answering by identifying relevant information sources, translating the user query into collections of sub-queries, and returning the answers.

6.2.4 Reusability

The exercise of reusability of patterns, objects, and software components, is a well-known and widely practiced technique within object-oriented programming. Studying reusability with the EISCO Server includes finding the relation between the implemented components defined in different contexts. The reusability can be considered with the use of design patterns throughout the process of system modeling. This issue can be studied separately from the reuse of components, which we are presenting here. Let us assume that the implementation of the PMIS includes the model layer of the system and the implementation layer containing objects, components, etc. Let us imagine the method Worked_hours, developed in the context of the system PMIS (C1). In order to help the human resource office, the enterprise decided to create a new HRIS. The development of any system starts by identifying specification and creating a conceptualization model, which should to be done with a modeling language such as UML. In order to integrate the new system in the global architecture of the enterprise, an EISCO contextual ontology should be created for the system HRIS with respect to the context (C2). The contextual ontology should also consider the definition of the semantic similarity of the multi-represented concepts between the context C1 and C2. Therefore an identity bridge rule rij is defined as part of the accessibility relation in the contextual ontologies: ∇i = {ri1 , ri2 ,… }, asserting that C1:Manager ≡ C2:Manage.

Figure 6.8: Collaboration Diagram of Reusability Scenario

After achieving this step, we should consider how to simplify the implementation of the new system by reusing some existing components. The sequence of events is represented with the collaboration diagram in Figure 6.8 and interpreted as follows:

1. Reusability Manager defines the current context (C1), the needed concept C1:Manager, and the target context C2


148

2. EISCO CS asks the EISCO KB to identify the concepts which are multi-represented between C1:Manager and C2:??

3. EISCO KB answers that C1:Manager and C2:Manager are multi-represented, and thus the implementation of the first can contribute to the implementation of the second.

4. EISCO Core Services make accessible to the developers the source code of the objects, methods, etc using this concept. For instance, C1:Worked_hours will be accessible to be adjusted for the new context C2.

5. To be more practical, EISCO Core Services includes a container where it will gather reusable objects.

6. The developer will proceed to create the object C2:Overtime_wage by adjusting the initial C1:Worked_hours.

7. EISCO CS attaches the new object to the HRIS in order to complete the implementation of this system

The difficult action in this process, is to identify the right reusable components after resolving the semantic similarity between concepts. The developer needs to understand the source context where the component comes from in order to reuse it without incoherence with the target context.

6.3 Case study for reusability Case studies, in general, examine the interplay of all variables in order to provide as complete as possible an understanding of an event or situation. This illustrative case study is primarily interested in showing a descriptive analysis of the preceding reusability scenario. In particular, it aims at combining the results of the two earlier research works (described in Chapter 3), namely DW and EDI, to show the usefulness of contextual ontology and EISCO architecture. QETL and EDI Translator have basically a common core model of mapping expression Model (Chapter 3), which reflects a set of meta-data used by these tools to integrate and/or translate between database schemas and/or EDI messages. The first deals with data warehouse and meta-data management and the second with the EDI Translator and mapping techniques. The full UML Class Models of these systems are included in the Appendix C & D. As a matter of clarity, we are using for the case study sides of these class diagrams illustrated in Figure 6.11 and Figure 6.12. In résumé, extending the previous reusability scenario to allow implemented objects in one system to be reused into the other, constitute the goals of this case study.

Figure 6.9: Use Case Diagram of Reusability Case Study


149

Figure 6.10: Sequence Diagram of Reusability Case Study

At first, we assume that one of these systems (e.g. DW-QETL) has been implemented and hocked to the EISCO architecture. In other words, an ontology existing in the EISCO KB represents this system (i.e. O1) and the implemented objects of this system are accessible to EISCO CS. The system Administrator is responsible for creating and maintaining the ontologies in the EISCO Architecture and has preceded the import of the new system’s model. The Client (i.e. developer) is in charge of implementing the new system (e.g. EDI-Translator). An important concern of this work is to know the reusable components that can help in this implementation process. The Figure 6.9 illustrates the use of the case diagram representation this case study. The sequence of the events with the two actors, Administrator and Client, is illustrated in Figure 6.10 and depicted as follows:

1. The creation of a second system starts by associating an UML class diagram to this system. The

Administrator starts by importing the UML class model that represents the target system. An ontology representation (i.e. O2) is generated into the KB conforming to the UML class model. There are a number of good reasons why UML is used in this case study, such as: (i) UML is a graphical notation appreciated and used by a variety of companies in a wide spectrum of industries and domains. (ii) UML is supported by widely-adopted CASE tools. These UML CASE tools are more accessible to software developers than the current ontology tools from the research community, such as Ontolingua and Protege, which require expertise in knowledge representation.

2. The Administrator creates the semantic mapping between the introduced ontology (O2) and

existing ones (e.g. O1). Using contextual ontologies formalism, these semantic rules are expressed with the accessibility relationships ∇i = {ri1 , ri2 ,… }, between context and the bridge rules


150

(ri1,…,rin) between concepts. Therefore, the administrator will identify an identity bridge rule rij between the concept O1:Mapping and O2:Mapping (O1: Mapping ≡ O2: Mapping).

3. The Client gets connected to the EISCO CS referring to a context dependant system (e.g. EDI

Translator). The EISCO CS asks EISCO KB to find the contexts accessible from the client’s context. The EISCO CS will then make accessible information concerning the existing systems plugged to the Architecture and show the list of contexts related through accessibility relationships with the client’s context.

4. The Client can choose the concept and the semantic rule to get the reusable implemented object.

The development of the new system will count on the semantic rules defined in the KB and/or discovered by the KB Engine. In our case, the identity relationship between the concepts O1:Mapping and O2:Mapping is going to be explored by the client. The EISCO CS identifies the set of components implemented for the DW system that can be useful. Indeed, it seeks the reusable components for those that implement the Mapping object and makes them available to the Client.

Figure 6.11: Pseudo-code of DefineMappingExpression Method

5. The Client manages reusability through the reusable components, creates, modifies, or extends them. Thus, the Client will be able to access the source code of the reusable component named DefineMappingExpression in order to create a new component of the new system. A close look to the code of each method can help us to fabricate the one from the other. The contents of these

Void DefineMappingExpression(Field A) Begin Record R=A.hasfield(); Mapping M=A.hasMappField(); Selection S=A.mySelection; MappCondition Mc=A.myMappCond; Vector Vs=S.hasField(); Vector Vc=Mc.hasField(); End

Void DefineMappingExpression(Attribute A) Begin Entity E = A.hasattribute(); Transformation T = A.myTransformation; Function F = A.myFunction; Vector Vt = T.hasSources(); Vector Vf = F.hasSources(); End


151

methods are illustrated in the preceding pseudo-code (Figure 6.11). Finally, the component is going to be affected to the new system.

Hence, it is obviously appreciated to reduce the time of writing the code from scratch, and just adjust the initial code. This case study has been fully implemented with the architectural components of EISCO Project, using the reusable object of J2EE platform (Know as Enterprise Java Beans EJB). The issues concerning the implementation of the case study will be discussed in detail in the next chapter.

6.4 Summary In this chapter, the contextual ontologies approach has been applied to a research project named EISCO. A logical and physical architecture was suggested to deal with the three scenarios of use. Furthermore, a case study is considered to show how the implementation can be reduced by assisting developers with contextual ontologies semantic sharing and exiting components. This case study aims at developing the reusability scenario for two studied EIS, (i.e. EDI, DW), and demonstrating how the EISCO Project can meet with semantics sharing requirements. The work presented in this chapter drives us to study in depth the feasibility and implementation analysis for the EISCO project. This work of feasibility study is going to be considered from a technical (e.g. existing technologies such as J2EE platform and EJB, existing KB server such as RACER and FACT, etc.) and economical (e.g. resources, time, return on investment, etc.) point of view. In terms of implementation, the architecture combines Knowledge Bases driven by ontological conceptualization and the J2EE platform as an implementation framework for reaching reusability and interoperability. A survey for return on investment, system complexity, and feasibility study will be considered as well throughout the life cycle of the project.

7. CHAPTER 7

IMPLEMENTATION AND FEASIBILITY STUDY

"I didn’t think; I experimented" Wilhelm Roentgen

After studying the theoretical and architectural issues, we devote this chapter to show the manifestation of these studies with prototypes. In essence, prototyping is the key node to evaluate the usability of the whole system. It allows identifying the problems, analyzing their reasons, and finding solutions for full implemented systems. The prototypes differ according to their level of realism [Nielsen 92]. We can distinguish two levels of prototyping according to the level of interaction provided by the prototype Figure 7.1. A horizontal prototype only presents the visible part of the software. The horizontal prototype is actually only the man-machine interface (MMI). The main functionalities of the application are developed in a vertical prototype, which allows running user tests. It implements a consistent set of functionalities in order to allow the user to achieve typical scenarios of use during which the critical points are evaluated.

Chap 7- Implementation and Feasibility Analysis

154

Figure 7.1: Levels of prototyping In this chapter, we try to show the practicability of our proposals in yielding the implementation of prototypes. This includes prototyping the QELT tool (vertical and horizontal), the XQuery based EDI-Translator (vertical), the Semi-automatic EDI-Matcher (vertical and horizontal), and the EISCO Project (vertical and horizontal). We will present separately these tools, their feasibility studies, and their attached scenario of use.

7.1 Prototyping QELT In this section, we focus on implementing the architecture of DW and the meta-data to show where and how our approach can reduce the complexity of the ETL process. In other words, we try to show how it could be easier to acquire and transform data by using SQL queries. We also discuss the automatic generation of these queries from mapping meta-data. A prototype (horizontal and vertical) of QELT has been developed with Visual Basic (see Figure 7.2). This prototype uses the implementation of the model of the mapping guideline and the mapping expression to create automatically the needed transformation.

7.1.1 Scenario of using QELT:

In order to verify the effectiveness and efficiency of QELT, we built a scenario of test with samples coming from a legacy system. This source was not designed to be SQL compatible, since it includes VSAM files. The scenario aims at applying QELT in order to generate a DW and using this warehouse to generate statistics and reports. In the current use, the realization of each statistic costs up to 5 Man-days of development. The use of QELT can reduce this time to less than 2 days. Without QELT, complete COBOL programs under the mainframe computer have to be developed. Due to the continuous demands of different types of statistics, the client's needs are never the same, pushes us to create the DW on a different server by using QELT tool.


155

Figure 7.2: Snapshots for QELT Interfaces The data describes After-Sales Service for a group of hypermarket stores. The target model is defined in Figure 7.3. We have used MS-SQL Server as a target DBMS. Meta-data is created on the same server and it is used to help the warehousing process. It contains the source data description such as files structures, records, and field’s descriptions. Meta-data also describes the mapping guideline between the source and the target. The used meta-data model is defined in Figure 7.4. We proceed with three steps to guide our test. These steps describe the ETL (Extraction, Transformation, and Load) process used to create data warehouses, but with a different order. Step1-The Extraction Process: Since our data source is not SQL compatible, we used a COBOL extraction program that generates ASCII files from VSAM mainframe files. These programs are generic for many applications. This means that the extraction process uses a limited set of parameters received from meta-data. The resulting ASCII files are transmitted by ftp to the DBMS server. We proceeded to limit the extracted files size to include 273,000 rows of data. Step 2-The Loading process: In order to load data to be accessible by the DBMS, we created a temporary database with a simple representation. The structure of this table is identified by the source meta-data. A simple Create Table query was generated to achieve this process and the load to the database with the data existing in the ASCII files. We used DTS (Data Transformation Server) which is a module of MS-SQL Server to load the data. It enables the integration of data coming from sources such as ASCII files inside the DBMS. Step3-The transformation process: By using the QELT interfaces, we requested the creation of transformations. This request has been translated to the creation of a procedure inside the DBMS


156

including SQL statements. These statements perform data transformation and the production of a new database (target DW). These statements have been generated from the representation of the mapping expression described in the mapping meta-data and mapping guideline.

ID_HASCID_customerDate_HASC

Header_After_sale_customer

ID_customerNameAddressphone

Customer

ID_LASCID_HASCTypeDescriptionLine_statutecharacteristicSending_dateExpected_delivery_dateWeightQuantityAmountOwnerCategoryLocation

Line_After_sale_customer

ID_LASSID_HASSID_LASCSubnission_dateReparation_amount

Line_After_sale_supplier

ID_supplierNameAddressphone

Supplier

ID_HASSID_supplierDate_HASS

Header_After_sale_supplier

Figure 7.3: Target model for After-sales- Service

ID_fieldsID_RecordOrderTitleTypeLengthDecimalKeyName_cobol

Field

ID_fileName_file

VSAM_file

ID_RecordID_fileName_RecordMaster_pgLengthKey_lengthCode_record

Record

ID_parameterID_pgTypeLengthDecimalID_field

Parameters

ID_pgCreationModificationType

Progam

ID_mappID_field_app

Mapping ID_conditionID_SelectionOrderOperatorStart_positionEnd_positionTested_valueID_field_app

Condition

ID_field_appID_entityData_typeCode_fieldTitleLengthDecimalPrimary_keyForeign_keyJoin_type

Field_application

ID_customerName_customerStock_directoryCode_society

Customer

ID_mapp_fieldID_mappID_fieldsStart_positionEnd_positionOrderSuffixPrefix

Mapp_field

ID_applicationTitleID_customer

Application

ID_entityID_applicationTitleTable_name

Entity

ID_SelectionID_filed_app

Selection

ID_conditionID_mappOrderOperatorID_fieldsStart_positionEnd_positionTested_value

Mapping_Condition

Code_statTitleID_customer

Statistique

ID_parameterCode_statOrder_numberTitleTypeLengthDecimalPack_zone

Stat_parameters

ID_field_statCode_statID_field_app

Stat_fields

ID_SelectionCode_statOrderID_parameterOperatorID_field_stat

Selection_stat

Figure 7.4: Case study meta-data model including Mapping meta-data

Moreover, these statements use a set of functions for the SQL Server such as the CONVERT function used to convert characters to integer, float, or date, DATETIME function, and SUBSTRING function, etc. The last statement of the procedure consists of removing (DROP) the temporary table and database. Step4-The restitution process: We used Business Object (BO) to restitute data outside the target DW. This tool enabled us to generate easily statistics and needed reports.

7.1.2 Results:

We performed the test on a PC Pentium III Processor 667 MHz and 128 MB RAM with Windows NT Server 4.0. The case study gave the following results:

• At the extraction level: the COBOL extraction programs (on the mainframe machine) generated three files. The first and the second contain successively the customers and the suppliers. The third file contains all the rows needed to fill other tables. This scenario of test covered 6% of the client’s existing data. This step is common whatever ETL tool we use.

• At the loading level: this process executed the creation of a temporary database query. We used DTS to upload the data into the database tables. This process was very fast and took 25 seconds for the files having total size of 106 MB.


157

• The generation of SQL transformation from meta-data was not a complicated process. These transformations have been stored as procedures in the DBMS. The execution of these transformations took 2 minutes, 15 seconds, including the creation of the target DW, execution of the mapping expression between source and target attributes, and the drop of the temporary database. The performance of QELT is good even without optimization of the queries.

To conclude, we should keep in mind the trumps of the tool QELT:

• Efficiency: QELT has proved good efficiency for the transformation process. Indeed, these transformations executed by the DBMS have a good performance. Furthermore, the used procedure was not optimized with any query optimization technique. We believe that using an optimization technique for these queries will make QETL overtake other ETL tools.

• Maintenance: The active aspect of QELT architecture supports maintenance. Indeed, the system is able to generate automatically new transformations; no extra update will be needed to enable an evolution. Thus, the administrator will not be called to create manually the transformation according to the mapping guideline neither to verify the consistency of the transformations and the target DW.

• Improving Data Transformation: QELT meta-data describes the mapping between source and target data. During the warehousing process, these mapping specifications are expressed by the mapping expression which will be transformed to SQL statements. This ensures a full interaction between the meta-data and transformation process.

7.2 Prototyping EDI translator: The goal of this implementation is twofold: at first checking the feasibility of the mapping expression model for EDI/XML message translation; secondly, verifying the proposal of using schema matching for EDI messages. In fact, the suggested mapping expression model can be used for expressing the mapping for EDI messages translation and transformation. Therefore, we developed separately, two prototypes with their scenario of test. The integration of these prototypes in one efficient graphical tool is also a foreseen goal.

7.2.1 Using Mapping Expression Model with XQuery-based EDI Translator:

Since, we are interest particularly in XML based EDI messages, we considered the techniques for XML documents transformation. Although, there exist in the literature many techniques for restructuring an XML documents, two of these techniques are standardized by W3C (i.e. XSLT and XQuery). XQuery is a high level language for treating XML documents. It is suggested by XML Query Group at W3C and it is still a working draft up to now. One of the goals of this group is to define a language that is similar to SQL for XML data. A query is a set of expressions. The query is defined by a general FLWR (For Let Where Return) expression. These expressions can be used for defining a variable using Let. Other expressions enable iterative loops or conditions. In XQuery, we can use the predefined functions offered by XPath language such as substring, concat, etc. Since we are interested in applying mapping expression model for transforming XML-based EDI messages, we can choose between using XSLT or XQuery as transformation language for expressing these mappings. Both of the languages use XPath offered functions such as substring, concat, etc. Thus,


158

the goal of this vertical prototype consists on testing the potential of these languages to be used in the heart of the suggested EDI Translator. Hence, questions such as size limits, computational time, and effectiveness with real world messages' flow, should be answered. However, there exist, in the literature, some works studying the performance tests and potentials of these languages. Some benchmarks have been realized with implementations of these languages. We enumerate in this category XPath, [Gottlob 02] [Gottlob 03] and XSLT which proof that the computational time is exponential versus the size of the queries, and XSL [Moerkotte 02], which evaluates the transformation of XML data coming from a relational database source. However, up to the time that we started this study, no performance evaluation benchmark has been realized for XQuery. The prototype developed for this implementation included as well the development of a highlighting plug-in (for the IDE Eclipse1) to facilitate the writing of XQuery. This plug-in is needed because there exist no user-friendly editor tool for writing the queries.

7.2.1.1 Scenario of test The scenario of test is defined in Figure 7.5; it shows a client using EDI messages to pay a supplier through bank interchange. We focus in this test on XML-based EDI messages; the message PAYMUL of EDIFACT, and MT103 of SWIFT, translated to XML format are used in this scenario. Based on the message description guideline of EDIFACT2 offered by CFNOB3 and SWIFT4 Standard General Information, we defined a set of rules to translate between these two messages. These rules reflect the mapping expression model and are translated to an XQuery query which is given in Appendix-E. Therefore, we defined a set of XML files PAYMUL_etendu_1b.xml up to PAYMUL_etendu_15b.xml (15 files) containing increasing number of messages Table 7.1. These files are made up essentially from duplication of two PAYMUL messages including, each two sequences. The structure of these messages is described in Appendix F.

Figure 7.5: Scenario of use

1 http://www.eclipse.org/ 2 http://www.unece.org/cefact/ 3 http://www.cfonb.org/ 4 http://www.swift.com


159

1b including ( 3 Mess ) 8 KB 2b including ( 6 Mess) 16 KB 3b including ( 12 Mess) 32 KB 4b including ( 24 Mess) 64 KB 5b including ( 48 Mess) 128 KB 6b including ( 96 Mess) 256 KB 7b including ( 194 Mess) 512 KB 8b including ( 390 Mess) 1024 KB 9b including ( 778 Mess) 2048 KB 10b including ( 1558 Mess) 4096 KB 11b including ( 3116 Mess) 8192 KB 12b including ( 6232 Mess) 16384 KB 13b including (12466 Mess) 32768 KB 14b including (24932 Mess) 65536 KB 15b including (49860 Mess) 131072 KB

Table 7.1: Description of XML based EDI messages used in the test

7.2.1.2 Results For applying the scenario of test, we used the GNU Kawa implementation of XQuery (written in Java) named QeXO1. We developed a message slicing program (in C++) using Xerces XML parser2. The tests were running on a PC PIII, 1000 MHz, Windows 2000 Server, 256 Memory RAM, with JDK 1.4, using Eclipse 2.0 IDE. We applied each test 5 times and calculated the average of the results. We chose QeXo after we had tried some examples with other implementations of XQuery such as:

• Oracle Prototype for XQuery, does not allow the generation of output files, which is essential for our tests.

• Galax, is an open source implantation of XQuery, but it is written with O'Camel, which we are not familiar with.

• Microsoft Implementation for XQuery, is accessible through a demo server and not available for download or to be tested.

We used Napa3 implementation for XSLT, which is an open source SXLT implementation written in C++. We tried to apply the needed transformation with XSLT and XQuery respectively two times. At first, we didn’t allow fractioning of the treated files; secondly, we preceded the transformation over slices of the treated files, then we cumulated the results.

1 http://www.gnu.org/software/qexo/ 2 http://xml.apache.org/xerces-c/ 3 http://homepage.ntlworld.com/kjjones/


160

We learn from this benchmark (Table 7.2) that: Best time

for 1b (8 KB)

Best time for 10b

(4096 KB)

Best time for 11b

(8192 KB)

Best time for 15b

(131072KB)

Maximum size

XSLT (Napa ) 0.131 s 14341.75 s 55513.7s Out of memory

<= 8192 KB

Parser + XSLT (Napa+ Xerces )

0.64 s 262.69 s 556.7 s 8701.8 s Unlimited

Xquery (Qexo) 1.43 s 43.42 s Out of memory

Out of memory

<= 4096 KB

Parser + XQuery (Qexo+Xerces)

1.49 s 43.58 s 110.38 s 1813.78 s Unlimited

Table 7.2: Results values

• No one of the solutions (XSLT, XQuery) can treat the file named 15b (~132 MB). The maximum size is 8192 KB and 4096 KB respectively for XSLT (Napa) and XQuery (QeXO).

• Therefore, there is a need for slicing the file to many parts. This solution has no constraint, indeed, such a file contains many payment messages, we have in this case to extract each payment and create separately a correspondent file. Using this technique makes no size limit for the treated files with both of the techniques.

• Using XSLT with Napa is more time consuming than using XQuery with QeXo.

From a practical point of view, the use of EDI files with similar size is realistic because banks exchange a huge number of messages daily. A simple example is a branch which sends daily, a journal of needed international transactions to be performed by the general branch. That is, some banks send between 50,000 to 90,000 messages daily, and expect that the flow may reach 1,000,000 up to 5,000,000 in the next five years, and the volume of treated information may grow from 132 MB/daily up to 2384 MB/daily. An interesting subject of further study in this benchmark is to use compressed XML EDI messages and perform a XQuery over them. This technique is recently getting more attention for improving the performance of XML-based applications [Ferragina 04].

7.2.2 Semi-automatic matching discovery (EX-SMAL Algorithm):

After presenting theoretically the EX-SMAL Algorithm, we implemented a vertical and horizontal prototype for testing its efficiency with real world examples. The prototype was implemented using Java programming language and using multiple open-sources API which we will present in this section. First of all, to enable the matching for EDI branching diagram written in XML Schema, we saved the textual information concerning the elements with a correspondent annotation field. According to the algorithm sequence, the six following steps were executed respectively:


161

Step 1- Tree building: this step consists of converting the algorithm’s input, i.e. XML Schemas, into useful tree structures. Each node of these tree is an object containing: the path from the root (ex: /UNB, /UNB/UNH/DTM,…), the data type, the textual description, and the name of the node (ex: UNB, UNH, UNZ, …). Step 2- Computing basic similarity: this step uses the previously generated data structure and computes the basic similarity using their textual descriptions and their data types. The textual similarity was delegated to the API Lucene1. This API (version 1.4.0) offers the possibility to generate a term related vector and to calculate the term frequency. Furthermore, we combined the result with the results of PhraseQuery with the cosine of correspondent vectors to find the description similarity. The data type similarities are extracted from a static table showing the affinity degree between different types of XML Schema. Step 3- Finding of the vectors corresponding to elements’ neighbors: this step consists on creating vectors which correspond to the neighbors of each element in the source and target schema. Therefore, 4 vectors are defined respectively for each node in the generated trees covering: ancestor vector, sibling vector, immediate Child vector, and leaf vector.). Step4- Computing structural similarity: in this step, the algorithm calculates the structural similarity between each two nodes of the generated trees. It uses essentially the basic similarity between their neighbors’ vectors. Step-5- Computing final similarity: the results found in Step-2 (basic similarity) and Step-4 (structural similarity) help to calculate the final similarity between each pair of elements in the entries schemas. Step- 6- Filtering: this last step consists of choosing between the final similarities those being most likely useful. Therefore, every final similarity having a value less than a threshold chosen by the user is going to be eliminated. The set of the remaining similarity can be represented with a line between the used schemas as shown in Figure 7.6.

Figure 7.6: General view of EDI Translator implementing EX-SMAL algorithm

1 http://jakarta.apache.org/lucene/docs/index.html


162

The results our similarity matching algorithm can be saved (as XML representation or any other data structure) for a possible future use with the mapping expression.

7.2.2.1 Scenario of test We tried to apply the EX-SMAL algorithm with some real world examples coming from the EDI’s well known standards EDIFACT and SWIFT. The description of the scenario is defined in Figure 7.5 and shows a needed EDI schema matching for facilitating the process of message translation. We used the schema of the message MT103 (50 elements) from SWIFT and PAYMUL (243 elements) from UN/EDIFACT. We started the test by matching MT103 with itself and PAYMULwith itself also. As we used two schemas (PAYMUL and MT103) with the great structural difference (see Appendix F-G), we recognized that the structural similarity value doesn’t help much to refine the pair wise element similarity values. However, To make our structural processing flexible and complete, we observe the structural neighbors of each pair of elements before deciding which value to use (e.g. matching two neighbors without sibling elements will have to make the coeff_sib value available for other coefficients values). Thus, we equally dispatch the value of coeff_sib over other three coefficients. Finally the value of the remained three coefficients will be the sum of its initial value with a part of value from the coeff_sib. Moreover, we implemented the possibility for the user to run a performance batch which can help to determine the good set of coefficients to use in the matching process. Empirically, if a user wishes to compute the matching between two schemata S and T, depending on the best set of coefficients obtained from the matching between S and S and the one obtained from the matching of T and T, the user can have an idea to reach the good set of coefficients to use in the matching between S and T (see Appendix-I) The results of a batch are sets of XML files that can be graphically illustrated using JFreeChart1 API.

Figure 7.7: Extraction of the matching result between PAYMUL, MT103 and their self

7.2.2.2 Results In order to evaluate the performance of our algorithm2, we started the test by matching MT103 with itself and PAYMUL with itself (auto-matching). The curve in Figure 7.7 shows us the similarity value between

1 http://www.jfree.org/jfreechart/ 2 The prototype was running under a PC Pentiumn IV with 2.8 GHz of speed and 448Mo of RAM.


163

the identical elements in the schema of the MT103 message. The results of these identical schema matching are perfectly reliable for showing that our algorithm works well. After that, we started a test campaign, using the auto-matching with varying coefficient values, to find the ones which seem to be the best set to use in the future matching process (see Appendix-I). We found a set of values that we fixed as default coefficient values for later matching, coeff_desc = 0.7, coeff_type = 0.3, coeff_anc = coeff_sib = coeff_immC = coeff_leaf = 0.25, thr = 20, coeff_base = 0.6 and coeff_struct = 0.4. Afterwards, we started the matching between MT103 with PAYMUL using these coefficient values. The choice of matching MT103 with PAYMUL was deliberately made as an extreme example because the two schemata are so much structurally different one from another. The matching of MT103 (50 elements) and PAYMUL (243 elements) schema gave us 25% of precision for around 5 minutes of running time. This precision is calculated with the formula: number of true automatic matching / number of total automatic matching. The accuracy is calculated with the formula: number of true automatic matching/ number of true manual matching. However, in order to have a complete performance test of our algorithm, we should envisage enlarging our performance test with a larger number of real-world EDI message schemas. Furthermore, we should consider improving our prototype to allow user’s intervention after the matching process in order to define the mapping expression between the matched elements.

7.3 Prototyping EISCO Project The prototype of the EISCO project conducts us to realize many studies:

• Studying the feasibility of the architecture in terms of existing techniques. • Choosing and installing the needed KB server and accessibility server. • Studying the feasibility for importing UML diagram into KB server and the use of reusable

objects in the EISCO Core Services. • Development of the interface between the KB Server and the Core Services. • Development of horizontal prototype including Client Interface and Administrator Interface. • Prototyping some components of EISCO Core Services, in order to validate the case study of

reusability described in the preceding chapter.

7.3.1 Feasibility study & used technologies:

The feasibility study permits the selection of technologies to use in order to implement the EISCO Project. It aims at:

• Defining not only how to implement the architectural components, but also their operative modes; • Finding solutions for technical problems and proposing spare solutions in the case of an

insurmountable problem. In the following, we resume the results of the feasibility study for the components of the EISCO Project:

7.3.1.1 EISCO KB Server After searching the literature, we found that there exist many implementation of DL-based Knowledge Base servers such as RACER [Haarslev 01] and FACT [Bechhofer 00]. FaCt system is a DL reasoner with a generic API defined using IDL and communicates via ORB. The Renamed A-Box and Concept Expression Reasoner (RACER) is a Description Logic reasoning system with support for TBoxes


164

with generalized concept inclusions and ABoxes. Furthermore, RACER is a prover for modal logic Km with graded modalities and axioms. It provides a proprietary Java-server interface that implements most of the knowledge representation retrieval and query methods (JRACER). It accepts different formats of ontologies such as DAML, OWL, and RDFS. In addition, it enables the integration of UML models encoded in XMI format. We chose to include RACER in the implementation of our architecture. Indeed, the use of RACER is very simplified through the Java available API (JRACER). However, an additional interface module needed to be written in order to bind the KB server with the other components of the EISCO architecture.

Figure 7.8: Component Diagram of EISCO Prototype

7.3.1.2 EISCO Core services EISCO Core Service includes many components and provides the main functionalities of our architecture. We suggest implementing these functionalities as an applications server. The logic of this applications server stands on using reusable objects. The implementation of each component enables new functionality or service. The main components that we need to implement first of all are the ontology manager and context manager. Indeed, they are responsible for the communication of the Knowledge base server with the core services, and so with the user’s applications. Next, we can implement one after another, of the Core Services’ components. In order to keep coherence in the implementation languages, we chose Java language to implement this applications server.

7.3.1.3 EISCO Accessibility Server For implementing the accessibility server, we chose the use of the J2EE based services. Indeed, the Java 2 Enterprise Edition (i.e. J2EE) is one of several industry-adopted, open standard frameworks for Enterprise Computing. It can serve as a suite of middleware services, and a compatibility test suite which facilitates the building of enterprise-class applications in Java. The wide acceptance of the J2EE platform in the software industry encourages us to include this platform for enabling the accessibility in our architecture.


165

Furthermore, J2EE off-loads from the developer the implementation of threading, management of reusable object, concurrency, database connectivity, object pooling, etc. For instance, the reusable objects in J2EE platform is known as EJB (Enterprise Java beans) and they can be manipulated either by a container (called CMP- Container Management Persistence) or by the bean itself (called BMP-Bean Management Persistence). Thus an EJB container provides the use and the control of consistency for a set of EJBs. There exist three different types of EJB: Entities EJB: represents an image of consistent data and can be common for many applications and clients. In other words, they provide an object view of transactional data in an underlying data store, allow shared access from multiple users, including session objects and remote clients, and directly represent data in the data store. Session EJB: represents the business logic for a specific client’s application. It can be stateless or statefull. They execute on behalf of a single client and might be transaction aware. Message EJB: used to manage messages between applications. Many implementations of EJB containers are currently available or under development such as JBOSS1 and JoNAS2.We chose to use Jonas as the EJB (Enterprise Java Beans) container. It provides deployment and runtime support for application components. It provides, as well, a federated view of the underlying application server and available services.

Figure 7.9: Deployment Diagram of EISCO Prototype

7.3.2 Prototype implementation

1 Developed by Jboss Inc. ( Matrix Partners, Accel Partners, Intel) 2 Developed by WebObject (Bull, France Telecom, INRIA, Red Hat, and currently Apache)


166

Figure 7.10: Screenshot of EISCO’s Administrator Graphical Interface

The first step in this implementation consisted of installing and configuring the servers Racer and JoNAS. After that, we developed some of the different needed components of the Core Services. The description of implemented components for the EISCO Project and their inter-relationship is defined in the component diagram Figure 7.8. This diagram shows the aspects of implementation including source code structure, dependencies among software components, binary code components, and executable components. The configuration of run-time processing elements of the software components, processes, and objects that live on them, are shown in the deployment diagram1 see Figure 7.9. In addition, a horizontal prototype including the development of Administrator Graphical Interface and Client Graphical Interface were implemented as well (see Figure 7.10 and Figure 7.11). The implementation of the core services used also Java Programming Language (under Eclipse IDE) matter of using one programming language and for portability advantages of Java code. More specifically, the roles of the developed Graphical Users Interfaces of the EISCO Project are: Administrator: this interface enables the EISCO server administrator to start and control the EISCO server. It enables the management of the essential functionalities such as: starting KB server, loading new ontology, creating semantic rules between contextual ontologies, etc. This application implements JRacer in order to access the KB server (Racer version 1.7.19). In order to simplify the role of the administrator, lunching EISCO Server includes starting automatically the KB Server (Racer) and the Accessibility Server (JoNAS) (see Figure 7.10). Client: this interface enables users to connect to the EISCO Server. It provides utilities such as querying, development of new systems, etc. using EISCO Core Services. Thus, it is possible to connect to the server, have access to KB and ontologies, querying KB, recall EJB existing in JoNAS, etc. (see Figure 7.11).

1 http://umlcenter.visual-paradigm.com/umlresources/nota_11.pdf


167

Figure 7.11: Screenshot of Client EISCO Interface

7.3.3 Scenario of test

We have implemented the elements in the architecture for validating the scenario of reusability studied within Chapter 6. According to this scenario, we will try to reuse some developed components of QELT system to speed up the development of EDI Translator. Thus, the scenario tests the ability of EISCO project and its implementation to perform the reusability. More specifically, we assume that a reusable object MappingExpressionEJB has been developed for QELT. We will try to reuse this object to develop an object for EDI Translator called also MappingExpressionEJB. Since we use the J2EE reusable object in our architecture, we assume that an EJB called MappingExpressionEJB attached to QELT exists in the EJB containers (JoNAS).We will count on the EISCO Server to help us to find the similarity between the used concepts, to provide us with the code of the QELT’s reusable objects (e.g. MappingExpressionEJB), and to attach the new object to the new developed system (EDI Translator). The sequence of events performed in this scenario is divided between the role of the administrator and the client in the EISCO architecture:

7.3.3.1 Role of the Administrator: In the EISCO’s architecture, the administrator essential responsibilities come down to managing the consistency of used ontologies, updating and connecting new systems to the architecture, deploying reusable object, etc.

• Every things start with the creation of UML class diagram for the new application (EDI Translator), using a CAISE tool such as ArgoUML1, Poséidon2, Rational Rose3, etc. The next step consists on exporting this diagram to XMI (XML Metadata Interchange V.1.2) format. The XMI file should be translated to a format acceptable by the current version of Racer (Version 1.7.19). Therefore, we use an exchange solution which consists in applying an XSLT

1 http://argouml.tigris.org/ 2 http://www.gentleware.com/ 3 http://www-306.ibm.com/software/rational/


168

transformation (developed by exsff1 project) with MS-XSL2 to render the XMI file into OWL format. Thus, the sequence is as follow from UML to XMI and from XMI to OWL. We assume that an ontology representing QELT systems has been already integrated into the EISCO KB, using same procedure, by the administrator.

• After studying the QELT & EDI Translator models, we noticed that some concepts are multi-

represented in the two systems. For instance the concept Mapping of QELT and the concept Mapping of EDI Translator. These two concepts which refer to the same notion but they are represented differently. According to the formalism of Contextual Ontologies, the administrator will define a semantic rule (identity) between these two concepts. A graphical interface was developed in order to simplify the task of relaying concepts with semantic rules. This interface uses JTree for displaying the ontologies concepts and permits users to draw the links (see Appendix-J). The rules are going to be saved after in an XML file. Indeed, Racer declared in [Haarslev 01] to handle Modal logics; this functionality lacks of clarity in the user guide. Therefore, we used an exchange solution consisting of saving the bridge rules in XML formatted files. In addition, for each concept in the ontology is associated a set of EJB which implements this concept. In our case, an EJB MappingExpressionEJB for QELT is associated with the concepts Entity, Mapping, Attribute, Transformation, and Function (see Case Study Chapter 6).

Figure 7.12: The list of methods using the selected concept

7.3.3.2 Role of the Client: Once the models are imported in the EISCO KB, the user will utilize the EISCO Server to work on the development of the new system (EDI Translator). Therefore, the user will proceed as follow:

1 http://homepages.nildram.co.uk/~esukpc20/exff2004_11/exffindex.html 2 http://www.microsoft.com/downloads


169

Figure 7.13: The Source-code of selected method

• The Client Application starts with invoking an EJB which permits to connect to the EISCO Server and returns the list of active ontologies in the Server.

• The user chooses in the list the context which corresponds to the interest of EDI Translator. The application will call a new EJB, which search and show the defined concepts in the ontology of this system.

• The user chooses the concept that is going to be manipulated, in this case the concept Mapping of EDI Translator. An EJB is used to search the list of concepts having semantic rules (in the formalism of contextual ontologies) with the selected concept and returns this list to the Client. This former is going to be used as parameters for a new EJB which is going to search the EJB using this concept. This step returns the EJB MappingExpressionEJB of QELT as using the concept Mapping of QELT.

• The user will select the EJB and request the list of methods implemented in this EJB. Thus, the method MappingExpression is retuned with the list (see Figure 7.12).

• The user can select the MappingExpression method and get back the source code of the method in order to use it for the new EDI Translator application (see Figure 7.13). This step of personalization is simplified because the content of the MappingExpression method of QELT is very similar to MappingExpression method of EDI Translator (see Case Study Chapter 6).

• Finally, the user will save the new EJB: MappingExpressionEJB of EDI Translator which has been personalized from the method MappingExpression of QELT. This new EJB is going to be added to the EJB Container (JoNAS) in order to be deployed later on by the EISCO Administrator.

7.4 Summary In this chapter, we glanced over many prototypes realized throughout our research projects. Hence, we discussed the use of mapping expression formalism with QELT and EDI-Translator. We showed, as well, the realization of semi-automatic schema matching tool used for EDI/XML (EX-SMAL). Finally, we sketched the feasibility study of the EISCO Project and the implemented prototype. This implementation aimed at proofing the case study illustrated in the previous chapter.


170

However, being involved in a technology transfer contract, we designed, implemented, and supervised the development of other tools. These tools essentially promoted the use of XML technologies in the enterprise applications, such as:

• BaloXML: is a tool that feeds a Mainframe decisional system, watching enterprises situation, with parameters extracted from BALO announcement (Bulletin d’Annonce Legal et Obligatoire). These announcements are daily available through a supplier site under HTML format. The tool implements XQuery for manipulating XML files generated form the former HTML files, in order to extract the needed parameters. A full implementation of this tool is under the copyrights of Tessi Informatique & Conseil, and is currently under use.

• Tessi-Tool Plugin: is an application of using XQuery to generate Java code. Indeed, the tool

facilitates writing java program to access physical layer (relational databases) in a n-tier architecture. The structure of a database is captured as XML files using JDBC and Torque (open source of Apache). The tool generates the redundant pieces of code used for Hibernate objects; Hibernate is currently the most popular object/relational mapping solution for Java. The use of this tool reduces considerably the time dedicated to write these pieces of code. The tool is integrated as a plug-in within Eclipse IDE. The implementation of this tool is also under the copyrights of Tessi Informatique & Conseil, and is currently used in many development projects.

8. CHAPTER 8

CONCLUSION AND PERSPECTIVES

"you never can tell what you have said or done till you have seen it reflected in other people’s minds "

Robert Frost

In the Age of Information Revolution, information is becoming ubiquitous, abundant and precious resource for enterprise. In this respect, the enterprise information systems occupy the cornerstone in the enterprise enabling daily efficient internal and external activities. Thus, EIS should continue to evolve with the new requirements and the needs of competitive market. Perspective of improving EIS includes providing a global enterprise view over EIS with data integration initiative combined with interoperability and reusability services. In particular, the semantic sharing problem appears in EIS, due to semantic heterogeneity, and forges a barrier to easily communicate and cooperate these systems. Indeed, many EIS are developed independently, based on specification with an intention of mono-representation of the domain of interest.

Chap 8- Conclusion and Perspectives

172

They offer separately a low potential for providing global querying or any data/knowledge sharing ability. Furthermore, these systems represent, sometimes, differently the same real world object which generates the multi-representation issue. This thesis considers studying ontological-based solutions in the domain of enterprise wide information systems. As a matter of fact, two platforms of data integration and data exchange are considered in this study. We argue that ontologies formalism and applications as a key technology for information sharing and exchange will certainly become great assets for enterprises in the near future. Firstly, ontologies are foreseen to play a key role in partially resolving the semantic sharing and removing the conflicts that exist among systems. Secondly, ontologies themselves are not sufficient and need to be coupled with context in order to resolve the problem of multi-representation. We argue that combining the two notions of context and ontology can support the aspect of semantic sharing in the use of EIS. We suggest a general framework that makes use of existing resources developed previously with the data integration and data interchange platforms. We summarize, in this chapter, the research effort developed in this thesis. In the next section, we present the findings of our work. We discuss after the future trend and some improvement issues for developed tools and methods. We wrap up this thesis with a brief personal feedback from the research experience.

8.1 Reached Goals In this thesis, we proceeded through bottom-up methodology studying first of all some examples of EIS, in order to identify their problems. Since most enterprises have fairly dynamic environment and consequently information entities and relationships, their systems are apt to evolve, to communicate, etc. We found out that semantics sharing is the essential element to make systems work together in convenient enterprise architecture. Establishing an adequate communication media among various systems and their users can always be possible through a shared common vocabulary. Thus, a formal ontology for a specific domain of interest can be used for improving semantic sharing between these systems. However, ontologies are still puzzles for some decision makers and not considered essential for the growth, competition, and challenging among enterprises. Therefore, we devoted to define a common framework of enterprise information systems, which pinpoints the advantages and shortcomings of unaware or unplanned procedure of ontoligizing enterprise’s information systems. Therefore, we conducted an adequate research work to address the issue of semantic sharing between EIS; the major contributions of our work can be summarized chronologically as follows:

• We studied a query-based Extraction Load and Transformation tool (QELT) used for creating enterprise data warehousing systems. This tool is characterized of being active with the meta-data of transformation. It provides automatic generation of the transformation process from the mapping guideline.

• We introduced the Mapping Expression Model that defines the needed expressions to map from

one representation to another. This model comes to answer the need for a clean description for data transformation in the case of the above studied systems. Thus, two applications of this model


173

permits: the description of meta-data in the Data Warehousing systems; and the mapping guideline of EDI message translation systems.

• We identified the enterprises communication with EDI Translator using XML/EDI (XML based

Electronic Data Interchange) systems. We reused the model of mapping expression for expressing the mapping guideline. We designed an algorithm that determine the semantic similarity between XML/EDI schema and employing their output in the mapping process for the EDI Translator.

• We investigated a formalism for Contextual Ontologies (co-studied in [Arara 04] & [Rifaieh-a

04]) that copes with the problem of semantic heterogeneity and multi-representation. The theoretical framework considers the following: (i) Different conceptualizations provide a set of local ontologies that can be autonomously represented and managed, (ii) Inter-relationships between contextual ontologies can be discovered and represented, and (iii) Global Knowledge interpretation over contextual ontologies can be performed with semantic-based services. We adopted the Modal Description Logics (e.g. ALCNM language) for representing our contextual ontologies.

• We conceived an architecture with a prototype promoting the use of contextual ontologies with

the EISCO (Enterprise Information System Contextual Ontologies) project. This project includes also setting up scenarios of use and a case study which implements preceding findings including Mapping Expression Model and tools (QELT & EDI Translator).

8.2 Perspectives & Future works Future work in this domain of research includes two categories, at first, the direct works related to the implemented systems and suggested models. In the second place other research tracks and domains foreseen to be interested in the contextual ontologies. On the one hand, we plan:

• Studying Inter-enterprise semantic sharing: our research focused, in this thesis, on the intra-enterprise semantic sharing. The inter-enterprise semantic sharing enables companies to maximize their IT investment through a whole-of-enterprise approach to knowledge management and service delivery while at the same time interacting with collaborative and competitive external organizations.

• Completing the EX-SMAL algorithm: we intend to enable EX-SMAL to take into

consideration the other characteristics of message guideline that includes: constraints, status, cardinality, etc. Another issue concerning the usefulness of the EX-SMAL algorithm for the matching between contextual ontologies is also attractive to study.

• Extending the implementation of EISCO project: As a future work, we intend to complete the

proposed prototype implementation based on EISCO project architecture. We should consider as well the modal operators to enable global and semi-global knowledge.


174

• Investigating the satisfiability algorithm: we aim also to seek the satisfiability algorithm for richer DL language such as SHOIQ with the formalism of contextual ontologies.

On the other hand, Contextual Ontologies can have audiences of both autonomous and loosely coupled systems. The work in [Chen 2003] includes a context modeling approach based on ontologies known by CoBrA system. This system provides a set of ontological concepts to characterize entities such as persons, places or several other kinds of objects within their contexts. However, it aims at using OWL in a pervasive computing broker-centric agent architecture. We can anticipate the application of the contextual ontologies in this domain as well. For an autonomous system, the major challenge is the ability to maintain an accurate internal representation of pertinent information about the environment in which it operates. A real world modeling in autonomous systems can serve of ontological modeling and other knowledge techniques. Furthermore, loosely coupled frameworks allow individual nodes in a distributed system to change without affecting or requiring change in any other part of the system. Explicitly represented, contextual ontologies and logical semantics can serve as a solid framework for reaching a common purpose of communication between loosely coupled systems.

8.3 Personal feedback Getting involved in a PhD research work drives one to face the burden of long and difficult tasks where it is sometimes hard to know what is expected or even more where to start. However, it is a passionate experience that leads to satisfaction when it is carried out properly. This research experience has thought me how to be a better team player within many groups, to develop my communications skills, and to learn investigating and discovering. I have found that, I have had to become more persistent in my ideas, argumentative, and also outgoing to get my points across. Working in two different groups, where goals are sometimes different, pushed me to develop two different work strategies depending on the context. Negotiations and compromises were important to succeed the academic expectations and enterprise’s Return On Investment rule. I have also had the opportunity to demonstrate my works through research papers and presentations as well as practical prototypes, tools, and industrial presentations. This has been enormously helpful in learning how to explain complex ideas in various situations with respect to the audiences. Overall, I have found it, an enriching and pleasant experience, but more than that, it gave me an overwhelming confidence.

REFERENCES

[Abiteboul 97] S.Abiteboul, S.Cluet, T.Millo. Correspondence and Translation for Heterogeneous Data, In: Foto N. Afrati, Phokion G. Kolaitis (Eds.), Proc. of the 6th International Conference on Database Theory (ICDT), January 1997, Delphi, Greece. Lecture Notes in Computer Science 1186 Springer 1997, pp 351-363, ISBN 3-540-62222-5.

[Akman 96] Varol Akman and Mehmet Surav, Steps Toward Formalizing Context, AI Magazine, Volume 17 number 3, pp.55-72, 1996, [online] <http://cogprints.ecs.soton.ac.uk/archive/00000464/> (visited 10/09/04)

[Alsène 99] Alsne, E., The Computer Integration of the Enterprise. In: Alsne, E., IEEE Data Engineering Management, February 1999. IEEE Transactions on Engineering Managment, February 1999, vol. 46, pp. 26-35.

[Arara 02] Ahmed Arara, Djamel Benslimane, Towards Ontologies Building: A Terminology based Approach, In: K.Yétongnon and M.Amin (Eds.), Proc. of the 2nd IEEE ISSPIT, Marrakesh, Maroco, December 2002, pp.11-16., IEEE Publisher, ISBN 0-9727186-0-5.

[Arara-a 04] Ahmed Arara, Djamal Benslimane, Multi-perspectives Description of Large Domain Ontologies, In Christiansen et al. (Eds.), Proc. of FQAS’2004, June 2004, Lyon, France. Lecture Notes in Artificial Intelligence 3055 Springer 2004, pp.150-160.

[Arara-b 04] Ahmed Arara, Contextual ontologies formalization, Modal Description Logics Approach, submitted PhD Thesis 2004, Supervised by R.Laurini and D.Benslimane, Claude Bernard University Lyon I.

[Artale 01] A.Artale and E.Franconi, Temporal Description Logics. In D.Gabbay, M.Fisher, and L.Vila (Eds.), Handbook of Time and Temporal Reasoning in Artificial Intelligence, The MIT press, 2001.

[Baader 03] F.Baader et al., The Description Logic Handbook, Theory, Implementation, and Applications, Cambridge University Press, Cambridge, UK, 2003, 555p. ISBN 0521781760.

[Balley 04] S. Balley, C.Parent, S.Spaccapietra, Modeling Geographic data with Multiple Representations, Int. Journal Geographical Information Science June 2004, Vol.18, No4, pp.327-352. ISSN 1365-8816.

[Bass 98] L.Bass, O.Clements, R.Kazman, Software Architecture in Practice, Addison-Wesley, 1998. 447p. ISBN 0-201-19930-0.

[Bechhofer 00] S. Bechhofer and I. Horrocks, Driving user interfaces from FaCT, In Proceedings of the 2000 Description Logic Workshop (DL 2000), Darmstadt, Germany, 13-18 August, 2000, pages 45-54, [online],

References

176

<http://www.cs.man.ac.uk/~horrocks/Publications/download/2000/DL00-interfaces.ps.gz>, (visited 14/06/2004).

[Benaroach 02] Benaroach Michel, Specifying Local Ontologies in Support of Semantic Interoperability of Distributed Inter-organizational Applications. In: Alon Y. Halevy, Avigdor Gal (Eds.), Proc. of 5th International Workshop Next Generation Information Technologies and Systems, NGITS 2002, Caesarea, Israel, June 2002, Lecture Notes in Computer Science 2382 Springer 2002, ISBN 3-540-43819-X, pp 90-106.

[Benharkat, Rifaieh 04] A.Benharkat, R.Rifaieh, Towards an Architecture of Enterprise Information Systems using Contextual Ontologie, accepted to appear in the International Conference on Information Integration and Web-based Applications and Services IIWAS 2004.

[Benslimane-a 03] Djamal Benslimane, Ahmed Arara, Towards a Contextual Content of Ontologies, in Foundations of Intelligent Systems, Lecture Notes in Computer Science, Springer-Verlag Heidelberg, Volume 2871 / 2003, pp. 339 – 343, ISBN: 3-540-20256-0.

[Benslimane-b 03] D.Benslimane, C.Vangenot, C.Roussey and A. Arara, The Multi-representation Ontologies: A Contextual Description Logics Approach, In the proceeding of The 15th Conference On Advanced Information Systems Engineering, Austria, 16 - 20 June, 2003, Lecture Notes in Computer Science, Volume 2798, Springer-Verlag, Heidelberg, 2003, pp. 4-15, ISBN: 3-540-20047-9.

[Berlin 02] J.Berlin, A.Motro, Database Schema Matching Using Machine Learning with Feature Selection, Proceedings of the 14th International Conference on Advanced Information Systems Engineering, Toronto, 2002, Lecture Notes In Computer Science, Springer-Verlag London, UK, 2002, pp. 452 - 466 , ISBN:3-540-43738-X.

[Berners-Lee 04] T.Berners-Lee, Keynote speech in 13th International World Wide Web Conference 2004, [online] <http://www.w3.org/2004/Talks/0519-tbl-keynote/> (visited 09.09.2004).

[Bernestein 00] P. A. Bernestein, A. Y. Halevy, and R. A. Pottinger: A Vision of Management of Complex Models, ACM SIGMOD Record, ACM Press New York, NY, USA, 2000, Volume 29, issue 4, pp.55-63, ISSN:0163-5808.

[Bertolazzi 01] P.Bertolazzi, C.Krusich, M.Missikoff, An Approach to the Definition of a Core Enterprise Ontology: CEO, in Managing globally with information technology, Idea Group Publishing, Hershey, PA, USA, 2003. pp.104 – 115, ISBN:1-931777-42-X.

[Borgida 02] A. Borgida and L. Serafini, Distributed Description Logics: Directed Domain Correspondences in Federated Information Sources. In R. Meersman and Z. Tari (Eds.), Proc. of On the Move to Meaningful Internet Systems, 2002 DOA/CoopIS/ODBASE 2002 Confederated International Conferences DOA, CoopIS and ODBASE 2002, LNCS Springer Verlag, London, UK 2002, Volume 2519, pp 36-53. ISBN:3-540-00106-9.

[Borgida 89] A. Borgida, R.J. Brachman, D.L. McGuinness, and L.A. Resnick; CLASSIC: A Structural Data Model for Objects, Proceedings of the 1989 ACM SIGMOD international conference on Management of data, Portland, Oregon, USA, 1989, ACM Press New York, NY, USA, pp58 - 67 . ISBN:0-89791-317-5

[Bouquet 03] P.Bouquet, Fausto Giunchiglia, Frank van Harmelen, et al., C-OWL: Contextualizing Ontologies, Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, October 2003, Sanibel Island, Florida, USA. ACM Press 2004, New York, NY, USA, pp.270-271, ISBN:1-58113-912-8.

[Bourret 04] R.Bourret, "XML and Databases", Last update July 2004, [online] <http://www.rpbourret.com/xml/XMLAndDatabases.htm > (visited 25 August 2004)

References

177

[Britanica 04] "Ontology" Encyclopædia Britannica. 2004. Encyclopædia Britannica Premium Service. [online] <http://www.britannica.com/eb/article?eu=58583>, (visited 13 July 2004).

[Buffalo 04] State University of New York at Buffalo, Department of Philosophy, Ontology Site. [online] <http://ontology.buffalo.edu/>, (visited 13 July 2004).

[C4ISR 97] Command, Control, Communications, Computers, Intelligence, Surveillance, and Reconnaissance (C4ISR) architecture Framework Version 2.0, US Department of Defense, 1997, [online], <http://www.afcea.org/education/courses/archfwk2.pdf> (visited 10 August 2004).

[Calvanese 01] D.Calvanese, G.De Giacomo, M.Lenzerini, Ontology of Integration and Integration of Ontologies, In Proceedings of the 2001 Description Logic Workshop (DL 2001), Stanford University, California, USA, August , 2001, CEUR Electronic Workshop Proceedings, [Online], <http://ceur-ws.org/Vol-49>, pp.10-19, (visited 16/09/04).

[Castano 03] S.Castano, A.Ferrara, and S.Montanelli, h-match: an Algorithm for Dynamically Matching Ontologies in Peer-based Systems, In Isabel F. Cruz, Vipul Kashyap, Stefan Decker, Rainer Eckstein (Eds.), Proceedings of SWDB'03, The first International Workshop on Semantic Web and Databases, Co-located with VLDB 2003, Humboldt-Universität, Berlin, Germany, September 7-8, 2003, pp.231-250.

[Castro 01] J.Castro, M.Kolp, J.Mylopoulos, Towards Requirements-Driven Information Systems Engineering: The Tropos Project, Proc. of the 13th international conference on advanced information systems engineering (CAiSE*01), Interlaken, Switzerland, June 2001, Information Systems Volume 27, Issue 6, Elsevier Science Ltd. Oxford, UK, September 2002, pp. 365-389, ISSN:0306-4379.

[Catarci 93] T.Catarci and M.Lenzerini, Representing and Using Inter-schema Knowledge in a Cooperative Information Systems, International Journal of Intelligent and Cooperative Information Systems, Vol. 2, No.4, IEEE Computer Society Press, Los Alamitos, CA, USA, 1993, pp.375-398.

[Ceusters 03] Werner Ceustersa, Barry Smithb, Maarten Van Mola, Using Ontology in Query Answering Systems: Scenarios, Requirements and Challenges, In the proceedings of the 2nd CoLogNET-ElsNET Symposium, Amsterdam, December, 2003, [online], <http://ontology.buffalo.edu/smith/articles/Q_A2003.pdf>, (visited 22/09/2004)

[Chandrasekaran 99] B.Chandrasekaran, J.R.Josephson, V.R.Benjamins:What are ontologies, and why do we need them?, IEEE Intelligent Systems, Volume 14, Issue 1 (January 1999), IEEE Educational Activities Department Piscataway, NJ, USA, 1999, pp.20-26, ISSN:1094-7167

[Chaudhri 03] A.Chaudhri, A. Rashid, R.Zicari, “XML Data Management”, Addison-Wesley Pearson Education, Boston, USA, 2003, p.641, ISBN 0-201-84452-4.

[Chaudhri 98] V.K.Chaudhri, A.Farquhar, R.Fikes, P.D.Karp, and J.P.Rice, Open Knowledge Base Connectivity, Technical Report, OKBC Working Group, Knowledge Systems Laboratory, Stanford University, April 1998, [online] <http://www.ai.sri.com/~okbc/spec.html>, (visited 10/10/2004).

[Chaudhuri 97] S.Chaudhuri, U.Dayal, "An Overview of Data Warehousing and OLAP Technology", In M.Franklin (Ed.), Research Surveys,ACM SIGMOD Record, volume 26, number 1, March 1997. pp. 65-74.

[Chen 2003] Harry Chen, Tim Finin, and Anupam Joshi., eBiquity: Publication: Using OWL in a Pervasive Computing Broker, In Proceeding of Workshop on Ontologies in Agent Systems, AAMAS-2003, Melbourne, Australia, July 2003, [online],

References

178

<http://ebiquity.umbc.edu/v2.1/paper/html/id/79/>, (visited 27/09/2004). [Chen 76] P.Chen, The Entity-Relationship Model: Toward a Unified View of Data, Special issue:

papers from the international conference on very large data bases: September 22&ndash24, 1975, Framingham, MA, ACM Transactions on Database Systems (TODS), Volume 1, Issue 1, (March 1976), ACM Press, New York, NY, USA, 1976, pp. 9-36, ISSN:0362-5915.

[Chukmol, Rifaieh 05] U.Chukmol, R.Rifaieh, N.Benharkat, EX-SMAL: An EDI/XML Semi-automatic Schema Matching Algorithm, Submitted

[Codd 70] E.F.Codd, A relational model of data for large shared data banks, Communications of the ACM, Volume 13 , Issue 6 (June 1970), ACM Press, New York, NY, USA, 1970, pp. 377- 387, ISSN:0001-0782.

[Colomb 02] R.Colomb, Ontologies for Interoperation of Information Systems: A Tutorial, Technical Report 20/02 ISIB-CNR, Padova, Italy, November, 2002, [Online], <www.loa-cnr.it/Papers/ISIB-CNR-TR-20-02.pdf>, (visited 10/08/04)

[Conger 94] Conger, S. The New Software Engineering, Wadsworth Publishing Company, Belmont, California, 1994, 817 p. ISBN 0534171435.

[Corazzon 04] Raul Corazzon, Descriptive and Formal Ontology: A Resource Guide to Contemporary Research, [online], <http://www.formalontology.it/index.htm>, (visited 13 July 2004).

[Corcho 03] Oscar Corcho, Mariano Fernandez-Lopez, and Asuncion Goemez-perez, Methodologies, tools and languages for building ontologies: where is their meeting point?, IEEE Transactions on Data & Knowledge Engineering, Volume 46 , Issue 1, July 2003, IEEE Computer Society, Los Alamitos, CA, USA, 2003, pp.41-64, ISSN:0169-023X.

[Couturier 03] Vincent Couturier, Magali Seguran, Patterns And Components To Capitalize And Reuse A Cooperative Information System Architecture, Proceedings of the 5th International Conference on Enterprise Information Systems ICEIS 2003, Angers, France, April 2003, pp.225-231.

[Cranefield 99] Cranefield, S. and M. Purvis, UML as an Ontology Modelling Language, In T.Dean (ed), Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99), Stockholm, Sweden, 31 July - 6 August 1999, Morgan Kaufmann Publishers, Inc., San Francisco, CA, USA, ISBN: 1-55860-613-0,

[Decker 98] S. Decker, M. Erdmann, D. Fensel, and R. Studer, Ontobroker in a Nutshell, in Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries, Crete, Greece 1998, LNCS 1513, Springer-Verlag, London, UK, 1998, pp. 663- 664, ISBN:3-540-65101-2.

[Denny 04] M.Denny, Ontology Tools Survey-Revisited, XML.COM [Online] <http://www.xml.com/pub/a/2004/07/14/onto.html>, (visited 14/07/2004)

[Dey 00] A.K.Dey, Providing Architecural Support for Building Context-Aware Applications, PhD dissertation, Georgia Institute of Technology, November 2000, [Online] <http://www.cc.gatech.edu/fce/ctk/pubs/dey-thesis.pdf>, (visited 16/09/2004)

[Do 00] Do, H.H; Rahm, E., On Metadata Interoperability in Data Warehouses, Technical Report 01-2000. Dept. of Computer Science, Univ. of Leipzig, March 2000, [Online], <http://dol.uni-leipzig.de/pub/2000-13>, (visited 16/09/04).

[Doan 02] A. Doan, J. Madhavan, P. Domingos, and A. Halvey, Learning to map between ontologies on the semantic web. In Proceedings of the eleventh international conference on World Wide Web WWW'02, Honolulu, Hawaii, USA, May 2002, ACM Press New York, USA,

References

179

2002, pp. 662-673, ISBN:1-58113-449-5 [Donini 96] F.Donini, D.Nardi, and R.Rosati, Ground Nonmonotonic Modal Logics, Journal of Logic

and Computation, Vol. 7, No. 4, 1996, pp.523-548, [online], <ftp://ftp.dis.uniroma1.it/pub/ai/papers/DoNR96.ps.gz >, (visited 09/10/04).

[Duineveld 00] A. J. Duineveld, R. Stoter, M.R. Weiden, B. Kenepa, and V. R. Benjamins, Wondertools? A Comparative Study of Ontological Engineering Tools, International Journal of Human-Computer Studies, Volume 52, Issue 6 (June 2000), Academic Press, Inc., Duluth, MN, USA, 2000, pp.1111-1133, ISSN:1071-5819

[Duric 04] D.Duric, MDA-based Ontology Infrastructure, International Journal on Computer Science and Information Systems, vol.1, No.1, February 2004, ComSIS Consortium, Belgrade, Serbia and Montenegro, 2004, pp. 91-116, ISSN:1820-0214.

[Ekstedt 04] M.Ekstedt, P.Johnson, Å.Lindström, M.Gammelgård, E.Johansson, L. Plazaola, E.Silva, J.Liliesköld, Consistent Enterprise Software System Architecture for the CIO- A Utility Cost-based Approach, in the Proceedings of the 37th Hawaii International Conference on System Sciences (HICSS-37 2004), CD-ROM / Abstracts Proceedings, Big Island, HI, USA, January 2004, IEEE Computer Society, 2004, ISBN 0-7695-2056-1

[Falkenberg 98] E. Falkenberg, W. Hesse, P. Lindgreen, B.E. Nilsson, J.L.H. Oei, C. Rolland, R.K. Stamper, F.J.M. Van Assche, A.A. Verrijn-Stuart, K. Voss, FRISCO: A Framework of Information System Concepts - The FRISCO Report, IFIP WG 8.1 Task Group FRISCO. [online] <ftp://ftp.leidenuniv.nl/pub/rul/fri-full.zip> (visited 15/09/2004)

[Farquhar 97] A.Farquhar, R.Fikes, J.Rice, The Ontolingua Server: A Tool for Collaborative Ontology Construction, International Journal of Human-Computer Studies, Volume 46 , Issue 6 (June 1997), Special issue: innovative applications of the World Wide Web, Academic Press Inc. Duluth, MN, USA, 1997, pp.707- 727, ISSN: 1071-5819.

[Fensel 01] D.Fensel, F.van Harmelen, I.Horrocks, D.McGuinness, and P.F.Patel-Schneider, OIL: An ontology infrastructure for the semantic web, IEEE Intelligent Systems, Vol. 16, No. 2, March/April, 2001, IEEE Educational Activities Department Piscataway, NJ, USA, 2001, pp.38-45, ISSN:1094-7167.

[Fensel 02] Fensel, D., Omelayenko, B., Ying Ding, Klein, M., Flett, A., Schulten, E., Botquin, G., Brown, M., Dabiri, G, Intelligent Information Integration for B2B Electronic Commerce, The Kluwer International Series in Engineering and Computer Science, Vol. 710, 2003, p.160, ISBN:1-4020-7190-6

[Fernández-López 99] M.Fernández-López, A.Gómez-Pérez, A.Pazos-Sierra, and J.Pazos-Sierra, Building a Chemical Ontology Using Methontology and the Ontology Design Environment, IEEE Intelligent Systems & their Applications, Volume 14 , Issue 1 (January 1999), IEEE Educational Activities Department, Piscataway, NJ, USA, 1999, pp.37-46, 1999, ISSN:1094-7167.

[Ferragina 04] P.Ferragina et al., XML Compressed Document Engine (XCDE), Freeware [online] <http://roquefort.di.unipi.it/~ferrax/xcde/index.htm>, (visited 15/09/04)

[Finkelstein 94] A.Finkelstein, D.Gabbay et al., Inconsistency Handling in Multiperspective Specifications, IEEE Transactions on Software Engineering, Volume 20, Issue 8 (August 1994), IEEE Press, Piscataway, NJ, USA, 1994, pp.569-578, ISSN: 0098-5589.

[Firat 03] Aykut Firat, Information Integration Using Contextual Knowledge and Ontology Merging, Ph.D. Thesis, (supervised by S. Madnick, B. Grosof, M. Siegel), Massachusetts Institute of Technology, August 2003, [online], <http://ebusiness.mit.edu/bgrosof/paps/phd-thesis-

References

180

aykut-firat.pdf>, (visited 10/08/04). [Fischer 77] M.Fischer and R.Ladner, Propositional Modal Logic of Programs, In Proceedings of the

ninth annual ACM symposium on Theory of computing, Boulder, Colorado, USA, ACM Press, New York, NY, USA, 1977, pp.286-294.

[Fonseca 99] F.Fonseca and M.Egenhofer, Ontology-Driven Geographic Information Systems, Proceedings of the seventh ACM international symposium on Advances in geographic information systems, Kansas City, Missouri, United States, 1999, ACM Press New York, NY, USA, 1999, pp.14-19, ISBN:1-58113-235-2.

[Gamma 94] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides, Design Patterns Elements of Reusable Object-Oriented Software, Addison Wesley Professional Computing Series, MA, USA, 1994, p.395, ISBN 0-201-63361-2.

[Geerts 00] G.Geerts and W.E.McCarthy, The Ontological Foundation of REA Enterprise Information Systems, working paper, Michigan State University (August 2000), [Online], <http://www.msu.edu/user/mccarth4/Alabama.doc>, (visited 10/08/2004).

[Ghidini 01] C. Ghidini, F. Giunchiglia, Local models semantics, or contextual reasoning = locality + compatibility, Artificial Intelligence Journal, volume 127, Number 2, April 2001, Elsevier, 2001, p.221-259, ISSN: 0004-3702.

[Ghidini 98] C. Ghidini and L. Serani, Distributed First Order Logics, In F.Baader and K.U.Schulz (Eds), Studies in Logic and Computation, Frontiers of Combining Systems 2, Berlin, 1998, Research Studies Press, 1998, pp.121-140.

[Girardi 03] R.Girardi, C.G. de Faria, A Generic Ontology for the Specification of Domain Models, In S.Overhage, K.Turowski (Eds.), Proceedings of the 1st Int. Workshop Component Engineering Methodology, September 24, 2003, Erfurt, Germany, [online], pp. 41-51. <http://wi2.wiwi.uni-augsburg.de/downloads/gi-files/WCEM/WCEM-Tagungsband.pdf>, (visited 23/09/04)

[Giunchiglia 93] F. Giunchiglia, Contextual reasoning, Epistemologia Italian Journal for the Philosophy of Science, Special issue on I Linguagii e le Macchine 16 (1993), Tilgher, Genova, 1993, p.345-364. Also IRST-Technical Report 9211-20, IRST, Trento, Italy.

[Glhardas 01] H.Glhardas, D.Florescu et al, Declarative Data Cleaning: Language, Model, and Algorithms, In Proceedings of the 27th International Conference on Very Large Data Base, Roma, Italy, 2001, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2001, pp. 371-380, ISBN:1-55860-804-4.

[Goasdoue 99] F.Goasdoue and C.Reynaud, Modeling Information Sources for Information Integration, In D.Fensel and R.Studer (Eds.), Proceedings of 11th European Workshop, EKAW'99, Dagstuhl

Castle, Germany, May 26-29, 1999, Knowledge Acquisition, Modeling and Management, Lecture Notes in Artificial Intelligence Vol. 1621, Springer, Berlin, Germany, 1999, pp. 121-138, ISBN: 3-540-66044-5.

[Goh 97] Cheng Hian Goh, Representing and reasoning about semantic conflicts in heterogeneous Information Sources, Phd Thesis, MIT, 1997, [Online], <http://context2.mit.edu/coin/publications/goh-thesis/goh-thesis.pdf>, (visited 26/09/04).

[Gold 01] R.Gold and C.Mascolo, Use of Context-Awareness in Mobile Peer-to-Peer Networks, In Proceedings of the 8th IEEE Workshop on Future Trends of Distributed Computing Systems (FTDCS'2001), Bologna, Italy, October 2001, IEEE Computer Society, Washington, DC, USA, 2001, page: 142, ISSN:1071-0485.

[Gomez Perez 99] A. Gomez Perez and V. R. Benjamins, Applications of Ontologies and Problem-Solving

References

181

Methods, AI-Magazine, Vol. 20, No.1, AAAI Press, 1999, pp.119-122, ISSN 0738-4602. [Gottlob 02] G. Gottlob, C. Koch, and R. Pichler, Efficient Algorithms for Processing XPath Queries, In

Proc. of the 28th Int. Conf. on Very Large Data Bases (VLDB), Hon Kong, China, August, 2002, Morgan Kaufmann Publishers/Elsevier Science, St. Louis, USA, 2002, pages 95-106, ISBN 1-55860-869-9.

[Gottlob 03] G.Gottlob, C.Koch, and R.Pichler, XPath Query Evaluation: Improving Time and Space Efficiency, In Proceedings of the 19th IEEE International Conference on Data Engineering (ICDE'03), Bangalore, India, March, 2003, IEEE Computer Society, 2003, pp.379-390, ISBN 0-7803-7665-X.

[Gruber 91] T.R.Gruber, The Role of Common Ontology in Achieving Sharable, Reusable Knowledge Bases, In J.A.Allen, R.Fikes, and E.Sandewall (Eds), Principles of Knowledge Representation and Reasoning, Proceedings of the Second International Conference, Cambridge, MA, 1991, Morgan Kaufmann, 1991, pp. 601-602.

[Gruber 93] T.Gruber, Toward Principles for The Design of ontologies used for Knowledge Sharing, In N.Guarino (Ed.), International Workshop on Formal Ontology, Padova, Italy, 1993, Published in the International Journal of Human-Computer Studies, Special issue: the role of formal ontology in the information technology, Volume 43, Issue 5-6 Nov./Dec. 1995, Academic Press, Inc., Duluth, MN, USA, 1995, pp.907-928, ISSN: 1071-5819.

[Grundy 01] J.C.Grundy, W.B.Mugridge, J.G.Hosking, and P.Kendal, Generating EDI Message Translations from Visual Specifications, In Proceedings of the 16th International Conference on Automated Software Engineering, San Diego, California, USA, Nov 2001, IEEE CS Press, pp.35-42.

[Guarino 95] N Guarino. and P. Giaretta, Ontologies and Knowledge Bases: Towards a Terminological Clarification., In Towards Very Large Knowledge Bases: Knowledge Building and Knowledge Sharing, N. Mars (editor), ISO Press: Amsterdam, The Netherlands, 1995, pp. 25-32.

[Guarino 98] N.Guarino, Formal Ontology and Information Systems, In N. Guarino (Ed.), Proceedings of the 1st International Conference, Trento, Italy, June 1998, IOS Press Amsterdam, The Netherlands, 1998, p. 337, ISBN: 9051993994.

[Guha 91] R.V.Guha, Contexts: A Formalization and Some Applications, PhD thesis, Stanford University, 1991. Also published as technical report STAN-CS-91-1399-Thesis, and MCC Technical Report Number ACT-CYC-423-91.

[Haarslev 01] Volker Haarslev, Ralf Möller, Description of the RACER System and its Applications, In Proceedings International Workshop on Description Logics (DL-2001), Stanford, USA, August 2001, [Online], <http://www.sts.tu-harburg.de/~r.f.moeller/papers/DL-2001-Racer.pdf >, (visited 17/06/2004)

[Hendrix 79] G.Hendrix, Encoding Knowledge in partitioned Networks, In N.Findler (Ed.), Associative Networks, Academic Press, New York, USA, 1979, pp. 51-92.

[Hewitt 71] C.Hewitt, Procedural embedding of knowledge in PLANNER, In D.C.Cooper (Ed.), Proceedings of the 2nd International Joint Conference on Artificial Intelligence (IJCAI), September 5-8, 1971, London, England, William Kaufmann, 1971, pp.167-182, ISBN 0-934613-34-6

[Huhn02] M.N. Huhns, L.M.Stephens, Semantic Bridging of Independent Enterprise Ontologies, IFIP Conference Proceedings, Proceedings of the IFIP TC5/WG5.12 International Conference on Enterprise Integration and Modeling Technique: Enterprise Inter- and Intra-Organizational

References

182

Integration: Building International Consensus ICEIMT 2002, April, 2002, Valencia, Spain, Kluwer Deventer, The Netherlands, 2003, pp. 83-90, ISBN 1-4020-7277-5.

[IEEE 00] IEEE, IEEE Recommended Practice for Architectural Description of Software-Intensive Systems, IEEE, IEEE Std 1471-2000, 2000, Institute of Electrical and Electronics Engineers, 01-May-2000, pages 29 , ISBN: 0-7381-2518-0.

[Jackson 03] R.B.Jackson, J.W.Satzinger, Teaching the Complete Object-oriented Development Cycle, Including OOA and OOD, with UML and the UP, Information Systems Education Journal, 1 (28), ISSN: 1545-679X, [online] <http://isedj.org/1/28/>. (Also appears in The Proceedings of ISECON 2003: §2432. ISSN: 1542-7382), [on line] <http://isedj.org/isecon/2003/2432/index.html>, (visited 08/08/2004)

[Jasper 99] R.Jasper and M.Uschold, A Framework for Understanding and Classifying Ontology Applications, Twelfth Workshop on Knowledge Acquisition, Modeling and Management, Banff, Canada, Oct 99, [online], <http://sern.ucalgary.ca/KSI/KAW/KAW99/papers.html>, (visited 24/09/04)

[Kappel 01] G. Kappel, E.Kapsammer, W.Retschitzegger, XML and Relational Database Systems: A Comparison of Concepts, In P.Graham, M.Maheswaran, M.R.Eskicioglu (Eds), Proceedings of the 2001 International Conference on Internet Computing IC’2001, Las Vegas, USA, June 2001, CSREA Press, June 2001, pp. 199-205, ISBN:1892512823.

[Kashyap 97] V. Kashyap and A. Sheth, Semantic heterogeneity in global information systems: the role of metadata, context and ontologies. In M. P. Papazoglou and G. Schlageter (Eds), Cooperative Information Systems: Trends and Directions, Academic Press, London, United Kingdom, 1997, pp.139-178, ISBN:0125449100.

[Kawazoe 03] A.Kawazoe, T.Mullen, K.Takeuchi, T.Wattarujeekrit, N.Collier, Open Ontology Forge: A Tool for Ontology Creation and Text Annotation Applied to the Biomedical Domain, Journal of Genome Informatics Vol. 14, 2003, pp.677–678, [online], <http://hc.ims.u-tokyo.ac.jp/JSBi/journal/GIW03/GIW03P012.pdf >, (visited 09/10/04).

[Keita 04] A.Keita, R.Laurini, C.Roussey, M.Zimmerman, Towards an Ontology for Urban Planning: The Towntology Project, Accepted to appear in the 24th UDMS Symposium, Chioggia, Venice, Italy, October 2004.

[Kenney 02] F. Kenney, B. Lheureux, EDI Translators: New Offering Present New Opportunities, Gratner Report, 22 May 2002, Note Number: COM-15-9499.

[Kifer 95] Michael Kifer, Georg Lausen, and James Wu. Logical foundations of object oriented and frame-based languages. Journal of the ACM (JACM), Volume 42, Issue 4 (July 1995), ACM Press, New York, NY, USA, 1995, pp.741-843, ISSN:0004-5411.

[Klein 02] Klein, M., Fensel, D., Kiryakov, A., and Ognyanov, D. (2002c). Ontoview: Comparing and versioning ontologies, In Collected Posters ISWC 2002, Sardinia, Italy, [online], <http://www.cs.vu.nl/~mcaklein/papers/ISWC02-poster.pdf>, (visited 15 July 2004).

[Knox 03] R.E. Knox, D.Logan, What Taxonomies do for the Enterprise, Gartner Research Articles, AV-20-8780, 10 September 2003.

[Kripke 59] Saul Kripke, A Completeness Theorem in Modal Logic, The Journal of Symbolic Logic, Vol 24, 1959, pp. 1-14.

[Kruchten 95] P.B.Kruchten, The 4+1 View Model of Architecure, IEEE Software, Volume 12 , Issue 6 (November 1995), IEEE Computer Society Press, Los Alamitos, CA, USA, 1995, pp.42-50, ISSN:0740-7459.

[Kurgan 02] Kurgan, L., Swiercz, W., Cios, K.J., Semantic Mapping of XML Tags using Inductive

References

183

Machine Learning, In M.A.Wani, H.R.Arabnia, K.J.Cios, K.Hafeez, G.Kendall (Eds.), Proceedings of the 2002 International Conference on Machine Learning and Applications - ICMLA 2002, June 24-27, 2002, Las Vegas, Nevada, USA, CSREA Press, 2002, pp. 99-109, ISBN 1-892512-29-7.

[Lemmon 77] E.J. Lemmon (with Dana Scott), An Introduction to Modal Logic, Krister Segerberg (Ed.), American Philosophical Quarterly Monograph Series, no. 11, Basil Blackwell, Oxford, 1977.

[Liu 99] K.Liu et al., Enterprise Information Systems: Issues, Challenges and Viewpoints, In J.B.Filipe (Ed.), Enterprise Information Systems, Kluwer Academic Pubisher, 2000, pp. 1-13, ISBN: 0-7923-6239-X.

[MacGregor 91] R. MacGregor, Inside the LOOM Classifier, ACM SIGART Bulletin, Special issue on implemented knowledge representation and reasoning systems, Volume 2 , Issue 3 (June 1991), ACM Press, New York, NY, USA, 1991, pp.88-92, ISSN:0163-5719

[Madhavan 01] Madhavan, Jayant, Bernstein, Philip A., Rahm, Erhard. Generic Schema Matching with Cupid, Proc. 27th Int. Conf. on Very Large Data Bases (VLDB) 2001, pages 49-58, [online], <http://lips.informatik.uni-leipzig.de/pub/2001-28/en>, (visited 07/08/2004).

[Madhavan 02] Jayant Madhavan, Philip A. Bernstein, Pedro Domingos, Alon Y. Halevy., Representing and Reasoning about Mappings between Domain Models, Proceedings of the Eighteenth National Conference on Artificial Intelligence, Edmonton, Alberta, Canada, August 2002, American Association for Artificial Intelligence, Menlo Park, CA, USA, 2002, pp.80-86, ISBN:0-262-51129-0.

[McCarth 87] John McCarthy, Generality in Artificial Intelligence, Communication of the ACM, Volume 30, Issue 12 (December 1987), ACM Press, New York, NY, USA, 1987, pp. 1030-1035, ISSN: 0001-0782.

[McCarthy 83] W.McCarthy, The REA Accounting Model: A Generalized Framework for Accounting Systems in a Shared Data Environment, The Accounting Review (July 1982), pp. 554-78, [online] <http://www.msu.edu/user/mccarth4/paplist1.html>, (visited 9/10/04).

[McCarthy 93] John McCarthy, Notes on formalizing contexts, In Ruzena Bajcsy (Ed.), Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, San Mateo, California, 1993. Morgan Kaufmann Publishers, Inc., San Francisco, CA, USA, pages 555-560, ISBN: 1-55860-613-0.

[MELN 00] S. Melnik, S. Decker, A Layered Approach To Information Modelling and Interoperability On The Web, Proc. Of the workshop on the semantic Web at the 4 th European Conference on Research and Advanced Technology for Digital Libraries ECDL’00, September 2000, Lisbon Portugal, [online], <http://dbpubs.stanford.edu/pub/2000-30>, (visited 23/09/04).

[Mena 96] E.Mena, A.Illarramendi, V.Kashyap, and A.Sheth, Observer: An Approach for Query Processing in Global Information Systems based on Interoperability between pre-existing Ontologies, Distributed and Parallel Databases, Volume 8 , Issue 2 (April 2000), Kluwer Academic Publishers, Hingham, MA, USA, 2000, pp.223-271, ISSN:0926-8782.

[Miller 00] R.J.Miller, L.M.Haas, M.A.Hernandez, Schema Mapping as Query Discovery, In Amr El Abbadi, Michael L. Brodie, Sharma Chakravarthy, Umeshwar Dayal, Nabil Kamel, Gunter Schlageter, Kyu-Young Whang (Eds.), Proceedings of 26th International Conference on Very Large Data Bases VLDB 2000, September 10-14, 2000, Cairo, Egypt, Morgan Kaufmann Inc., San Francisco, CA, USA, 2000, pp.77 – 88, ISBN:1-55860-715-3.

[Miller 01] R.J.Miller, M.A.Hernandez, L.M.Haas, L.L.Yan, C.T.H.Ho, R.Fagin, and L.Popa, The Clio

References

184

Project: Managing Heterogeneity, ACM SIGMOD Record, Volume 30, Issue 1 (March 2001), ACM Press, New York, NY, USA 200, pp.78-83, ISSN: 0163-5808.

[Mitra 00] P. Mitra, G.Wiederhold, and M. L. Kersten, A Graph-Oriented Model for Articulation of Ontology Interdependencies, In Carlo Zaniolo, Peter C. Lockemann, Marc H. Scholl, Torsten Grust (Eds.): Advances in Database Technology EDBT 2000, 7th International Conference on Extending Database Technology, Konstanz, Germany, March 27-31, 2000, Proceedings. Lecture Notes in Computer Science 1777 Springer, Berlin, Germany, 2000, pp.86-100, ISBN 3-540-67227-3.

[Mizoguchi 97] Riichiro Mizoguchi, Mitsuru Ikeda, and Katherine Sinitsa, Roles of Shared Ontology in AI-ED Research: Intelligence, Conceptualization, Standardization, and Reusability, In Proceedings of the 8th World Conference On Artificial Intelligence In Education AIED-97, Kobe, Japan, August, 1997, pp.537-544, [Online], <http://www.ei.sanken.osaka-u.ac.jp/ieee/Them.paper.html>, (visited 27/09/2004).

[Moerkotte 02] G.Moerkotte, Incorporating XSL Processing Into Database Engines, In Proc. of the 28th Int. Conf. on Very Large Data Bases (VLDB), Hon Kong, China, August, 2002, Morgan Kaufmann Publishers/Elsevier Science, St. Louis, USA, 2002, pp. 107-118, ISBN 1-55860-869-9.

[M-Webster’s 04] Merriam webster’s online, [online], <http://www.m-w.com/>, (visited 25/07/04). [Mylopoulos 98] J. Mylopoulos, Information Modeling in the Time of Revolution, Information Systems,

Volume 23 , Issue 3-4 (May 1998), Special issue: selected papers from the 9th International Conference on advanced information systems engineering (CAISE '97), Elsevier Science Ltd., Oxford, UK, 1998, pp.127-155, ISSN:0306-4379.

[Nielsen 92] Jakob Nielsen, Usability Engineering Life Cycle, Computer, Volume 25 , Issue 3 (March 1992), IEEE Computer Society Press, Los Alamitos, CA, USA, 1992, pp.12 – 22, ISSN:0018-9162.

[Nixon 02] P.Nixon, F.Wang, S.Terzis, T.Walsh, and S.Dobson, Engineering Context Aware Enterprise Systems, in Workshop on Engineering Context-Aware Object-Oriented Systems and Environments (ECOOSE), in connection with OOPSLA 2002, November 2002, Vancouver, British Columbia, Canada, [online] <http://www.dsg.cs.tcd.ie/ecoose/oopsla2002/papers/08-nixon.pdf>, (visited 26/09/2004).

[Nuseibeh 94] B. Nuseibeh, J. Kramer, and F. Finkelstein, Expressing the Relationships between Multiple Views in Requirements Specification, IEEE Transactions on Software Engineering, Volume 20, Issue 10 (October 1994), In Proc. of the 15th ICSE, IEEE Press, Piscataway, NJ, USA, 1994, pp.760-773, ISSN:0098-5589.

[Obrst 03] Leo Obrst, Ontologies for Semantically Interoperable Systems, Conference on Information and Knowledge Management, Proceedings of the twelfth international conference on Information and knowledge management, New Orleans, LA, USA, pp. 366-369, ACM Press, New York, NY, USA, 2003, ISBN:1-58113-723-0

[Omelyenko 02] B. Omelayenko, D. Fensel, and C. Bussler : "Mapping Technology for Enterprise Integration", In Proceedings of the Fifteenth International Florida Artificial Intelligence Research Society Conference (FLAIRS-2002), Pensacola, FL, USA, May 14-16, 2002, AAAI Press, 2002, pp. 419-424 ISBN:1-57735-141-X.

[Osterwalder 02] Alexander Osterwalder , An e-Business Model Ontology for the Creation of New Management Software Tools and IS Requirement Engineering, CAiSE 2002 Doctoral Consortium, June 2002, Toronto, [online],

References

185

<http://www.mics.ch/getDoc.php?docid=226&docnum=1>, (visited 24/08/2004). [Ötztürk 97] P.Ötztürk, A.Amodt, Towards a Model of Context for Case-based Diagnostic Problem

Solving, Proceedings of the First International and Interdisciplinary Conference on Modeling and Using Context (Context-97), Rio de Janeiro, Brazil, February 1997, pp.198-208, [online], <http://www-poleia.lip6.fr/~brezil/Pages2/Publications/CONTEXT-97/18/paper.ps>, (visted 28/09/2004)

[Pottinger 02] R. A. Pottinger, P. A. Bernstein, Creating a Mediated Schema Based on Initial Correspondences, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, September 2002, Vol.25 No.3, pp. 26-31. [on line], <http://sites.computer.org/debull/ieee-list1.htm>, (visited 15 August 2004).

[Preece 99] A.Preece, K.Hui, W.Gray, P.Marti, T.Bench-Capon, D.Jones, and Z.Cui, The Kraft Architecture for Knowledge Fusion and Transformation, In Proceedings of the 19th SGES International Conference on Knowledge-Based Systems and Applied Artificial Intelligence (ES’99), Springer, 1999, [online], <http://www.csd.abdn.ac.uk/~apreece/research/download/es1999.pdf >, (Visited 10/10/2004)

[Quix 99] C.Quix, Repository Support for Data Warehouse Evolution, In S.Gatziu et al. (Eds.), Proceedings of the Intl. Workshop on Design and Management of Data Warehouses, DMDW'99, Heidelberg, Germany, June 1999. CEUR Workshop Proceedings 19 Technical University of Aachen (RWTH) 1999.

[Rahm 01] A. Rahm and P. A. Bernstein, A survey of approaches to automatic schema matching, The Very Large Databases Journal, Volume 10, Issue 4 (December 2001), Springer-Verlag, New York, Inc., Secaucus, NJ, USA, 2001, pp.334-350, ISSN: 1066-8888.

[Raman 01] V. Raman and J. M. Hellerstein, Potter's Wheel: An Interactive Data Cleaning System, In Proc. of the 27th VLDB Conference, Roma, Italy, September 2001, Morgan Kaufmann, 2001, pp.381-390.

[Rifaieh 01] R.Rifaieh, N.A.Benharkat, A Translation Procedure to Clarify the Relationship between Ontologies and XML Schema, In Peter Graham, Muthucumaru Maheswaran, M. Rasit Eskicioglu (Eds.), Proceedings of the International Conference on Internet Computing, IC'2001, Las Vegas, Nevada, USA, June 25-28, 2001. CSREA Press, 2001, ISBN 1-892512-8-X, Volume 1, pp.164-170.

[Rifaieh-a 02] R. Rifaieh, A. N. Benharkat, Query-based Data warehousing Tool, Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP, DOLAP 02, McLean, USA, November 2002, ACM Press New York, NY, USA, 2002, ISBN:1-58113-590-4, pp.35-42.

[Rifaieh-a 03] R.Rifaieh, N.Benharkat, A Mapping Framework for EDI Message Translation, In the proceedings of ACS/IEEE conference AICCSA’03, Tunisia, July 2003, IEEE Press, Piscataway, NJ, USA, 2003, ISBN:0-7803-7983-7, pp.87.

[Rifaieh-a 04] R.Rifaieh, A.Arara, A.N.Benharkat, Multi-representation Ontologies in the Context of Enterprise Information Systems, Americas Conference on Information Systems 2004, August 6- 8, New York City, NY, USA, [online], <http://aisel.isworld.org/article_by_author.asp?Author_ID=5561>, (visited 27/09/2004).

[Rifaieh-a 05] R.Rifaieh, N.Benharkat, Studying Semantic Sharing Problem within Enterprise Information Systems, accepted to appear in proceedings of the 3rd ACS/IEEE Conference ACCSA’05.

[Rifaieh-b 02] R. Rifaieh, A. N. Benharkat, A Mapping Expression Model used for a Meta-data Driven ETL Tool, In K.Yétongnon and M.Amin (Eds.), Proc. of the 2nd IEEE ISSPIT, Marrakech,

References

186

Morocco, December 2002, IEEE Publisher, ISBN 0-9727186-0-5, pp.288-293. [Rifaieh-b 03] R.Rifaieh, N.Benharkat, An Analysis of EDI’s Message Translation and Integration

Problems, In N.Debnath, G.Montejano, and D.Riesco, proceedings of the International Conference on Computer Science, Software Engineering, Information Technology, e-Business, and Applications (CSITeA’03), June 5-7, 2003, Rio de Janeiro, Brazil, ISBN:0-9742059-0-7, pp.254-260.

[Rifaieh-b 04] Representation Ontology as a foundation of Enterprise Information Systems”, accepted to the 7th International Conference on Information Technology CIT’04, to appear in LNCS-Springer.

[Rifaieh-c 02] R.Rifaieh, A.N.Benharkat, An XML-based Data Warehousing Tool using an Ontological Semantic Modeling, In R.Rimane (Ed.), Proceedings of the International Conference on Information Integration and Web-based Applications and Services IIWAS’2002, Bandung, Indonesia, September 2002, SCS-Europe BVBA, Ghent, Belgium, 2001, pp.62-73, ISBN:3-936150-18-4,

[Rifaieh-c 03] R.Rifaieh, N.Benharkat, A Data Warehouse Architecture Using An XML-Based ETL Tool, In M.Khosrow-Pour (Ed.), Proceedings on CD-ROM of the 2003 Information Resources Management Association International Conference IRMA 2003, Philadelphia, PA, USA, May 18-21, 2003, Idea Group Publishing, Hershey, PA, USA, 2003.

[Rifaieh-c 04] R.Rifaieh, A.Arara, A.N.Benharkat, “A View of Enterprise Information Systems Based on Contextual Ontologies”, accepted to appear in the proceedings of the IEEE International Conference on Computational Cybernetics ICCC 2004.

[Rifaieh-d 03] R.Rifaieh, N.Benharkat, A Case Study for A Query Based Warehousing Tool, Poster in ICEIS’03, France, 2003.

[Robinson 94] W.Robinson, and S.Fickas, Supporting Multi-Perspective Requirements Engineering, proceedings IEEE Conference on Requirements Engineering, Colorado Springs, USA, April 1994, pp. 206-215.

[Rockart 96] J. Rockart, M. Earl and J. Ross, Eight Imperatives for the New IT Organization, MIT Sloan Management Review, Vol. 38, No. 1, Fall 1996, Massachusetts Institute of Technology, 1996, pp.43-55.

[Roelofsen 04] Floris Roelofsen and Luciano Serafini, Complexity of Contextual Reasoning, In proceedings of AAAI-04 Conference, San Jose, California, USA, 25-29 July, 2004, pp.118-123, [online], <http://www.dit.unitn.it/~context/paper/aaai04_roelofsen_serafini.pdf>, (visited 26/09/2004).

[Russ 99] Thomas Russ, Andre Valente, Robert MacGregor and William Swartout, Practical Experiences in Trading Off Ontology Usability and Reusability, Twelfth Workshop on Knowledge Acquisition, Modeling and Management, Banff, Canada, Oct 99, [online], <http://sern.ucalgary.ca/KSI/KAW/KAW99/papers.html>, (visited 24/09/04).

[Sampaio 01] Marcus Costa Sampaio, Eduardo M. F. Jorge and Cláudio de Souza Baptista, Metadata for an Extensible Data Warehouse Server, In Proceedings of the First International Workshop on Information Integration on the Web (WIIW), Rio de Janeiro, Brazil, April 2001, pp. 133-140, [Online], <http://www.cos.ufrj.br/wiiw/papers/18-Marcus_Sampaio(07).pdf >, (visited 25/04/2004).

[Sampaio 01] xLuciano Serafini and Chiara Ghidini, Context Based Semantics for Information Integration, In S.Buva and L.Iwanska (Eds), Working Papers of the AAAI Fall Symposium

References

187

on Context in Knowledge Representation and Natural Language, American Association for Artificial Intelligence, Menlo Park, California, USA, 1997, pp.152-160.

[Segev 97] A. Segev, J. Porra, and M. Roldan, Internet-based EDI strategy, Decision Support Systems, Volume 21, Issue 3 (November 1997), Elsevier Science Publishers B. V., Amsterdam, The Netherlands, 1997, pp.157-70. ISSN: 0167-9236

[Sheth 93] Amit P. Sheth, Vipul Kashyap, So Far (Schematically) yet So Near (Semantically), In David K. Hsiao, Erich J. Neuhold, Ron Sacks-Davis (Eds.), Proceedings of the IFIP WG 2.6 Database Semantics Conference on Interoperable Database Systems (DS-5), Lorne, Victoria, Australia, 16-20 November 1992, IFIP Transactions A-25 North-Holland, 1993, ISBN 0-444-89879-4, pp.283-312.

[Skene 03] J. Skene, W.Emmerich, Model Driven Performance Analysis of Enterprise Information Systems, Electronic Notes in Theoretical Computer Science, Volume 82 Number 6 (2003), [online], <http://www.cs.ucl.ac.uk/staff/w.emmerich/publications/ETAPS03/mdp.pdf > (visited 7/08/2004).

[Smolander 02] K. Smolander, K. Hoikka, J.Isokallio, M.Kataikko, T.Mäkelä, What is Included in Software Architecture? A Case Study in Three Software Organizations, Proceeding of the Ninth Annual IEEE International Conference and Workshop on the Engineering of Computer-Based Systems (ECBS’02), Lund, SWEDEN, April 8-11 2002, IEEE Computer Society, Washington, DC, USA, 2002, pp. 131 - 138 ISBN:0-7695-1549-5.

[Sowa 00] John F. Sowa, Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks Cole Publishing Co., Pacific Grove, CA, USA, 2000, p.594, ISBN 0-534-94965-7.

[Sowa 92] J.F.Sowa and J.A.Zachman, Extending and Formalizing the Framework for Information Systems Architecture, IBM Systems Journal, Volume 31, Issue 3 (1992), IBM Corp., Riverton, NJ, USA , 1992, pp.590-616, ISSN:0018-8670.

[Sowa 95] J. Sowa, Peircean foundations for a theory of context, In D. Lukose et al., eds., Conceptual Structures: Fulfilling Peirce's Dream, Lecture Notes in AI 1257, Springer-Verlag, Berlin, 1997, pp.41-64.

[Staab 04] Staab, Steffen; Studer, Rudi (Eds.), Handbook on Ontologies, Series of International Handbooks on Information Systems, Springer, 2004, p.660 ISBN: 3-540-40834-7

[Stöhr 99] T.Stöhr, R.Müller, E.Rahm, An integrative and Uniform Model for Metadata Management in Data Warehousing Environments, In S.Gatziu, M.A.Jeusfeld, M.Staudt, Y.Vassiliou (Eds.), Proceedings of the Intl. Workshop on Design and Management of Data Warehouses, DMDW'99, Heidelberg, Germany, June 14-15, 1999. Technical University of Aachen (RWTH) 1999, [Online], <http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-19/paper12.pdf>, (visited 27/09/2004).

[Stonebraker 01] M. Stonebraker and J. Hellerstein, Content integration for e-business, International Conference on Management of Data, Proceedings of the 2001 ACM SIGMOD international conference on Management of data SIGMOD/PODS 2001, Santa Barbara, California, USA, May 21-24, 2001, ACM Press, New York, NY, USA, 2001, pp.552-560, ISBN:1-58113-332-4.

[Strang 03] Thomas Strang, Service Interoperability in Ubiquitous Computing Environments, PhD Thesis, Ludwig-Maximilians-University Munich, Oct 2003, Research Report, VDE-Verlag, Berlin, Germany, 2003, ISBN: 3-8007-2823-0.

[Strang 04] Thomas Strang, Claudia Linnhoff-Popien, A Context Modeling Survey, Accepted

References

188

for Workshop on Advanced Context Modelling, Reasoning and Management as part of The Sixth International Conference on Ubiquitous Computing, Nottingham/England, September 2004, [online], <http://pace.dstc.edu.au/cw2004/Paper15.pdf>, (visited 05/08/2004).

[Stuckenschmidt 00] H.Stuckenschmidt, H.Wache, T.Vgele, and U.Visser, Enabling Technologies for Interoperability, In U.Visser and H.Pundt (Eds.), Proceedings of Workshop on the 14th International Symposium of Computer Science for Environmental Protection, Bonn, Germany, 2000, pp.35-46, [online], <http://citeseer.ist.psu.edu/ >, (Visited 10/10/2004).

[Tzitzikas 04] Y. Tzitzikas, C.Meghini, N.Spyratos, A Unifying Framework for Flexible Information Access in Taxonomy-Based Sources, In Christiansen et al. (Eds.), Proc. of FQAS’2004, June 2004, Lyon, France, Lecture Notes in Artificial Intelligence 3055 Springer 2004, pp.161-174.

[Uschold 96] M.Uschold and M.Gruninger, Ontologies: Principles, Methods and Applications, In S.Parsons, A.E.Howe (Ed.), The Knowledge Engineering Review, volume 11, number 2 (June 1996), [online], <http://www.aiai.ed.ac.uk/project/pub/documents/1996/96-ker-intro-ontologies.ps>, (visited 27/09/2004).

[Uschold 98] Mike Uschold, Martin King, Stuart Moralee and Yannis Zorgios, The Enterprise Ontology, In M.Uschold and A.Tate (Eds.), The Knowledge Engineering Review, Special Issue on Putting Ontologies to Use Volume 13 (1998), [online], <http://www.aiai.ed.ac.uk/project/pub/documents/1998/98-ker-ent-ontology.ps>, (visited 27/09/2004)

[Van Der Vet 98] Vet, P.E. and N.J.I. Mars, Bottom-up Construction of Ontologies, IEEE Transaction on Knowledge and Data Engineering, Volume 10, Issue 4 (July 1998), IEEE Educational Activities Department, Piscataway, NJ, USA , 1998, pp.513-526, ISSN:1041-4347.

[Van Zyl 99] J. Van Zyl, D.Corbett, M.W.Vincent, An Ontology of Metadata for a Data warehouse Represented in Description Logics, In Proceedings of the International Symposium on Cooperative Database Systems for Advanced Applications CODAS'99, Wollongong, Australia, March 27-28, 1999, Springer, Singapore, 1999, pp. 27-38, ISBN 9814021644.

[Wache 01] H.Wache, T.Voegele, U.Visser, H.Stuckenschmidt, G.Schuster, H.Neumann, and S.Huebner, Ontology-Based Integration of Information - A Survey Of Existing Approaches, In Proceedings of the International Joint Conference on Artificial Intelligence IJCAI'01, pp.108-117, [online], <http://www.tzi.de/buster/IJCAIwp/Finals/wache.pdf>, (visited 15/07/2004).

[Wang 04] X.Wang, D.Zhang, et al., Ontology-Based Context Modeling and Reasoning using OWL, Proceedings of the Second IEEE International Conference on Pervasive Computing and Communications (PerCom 2004), 14-17 March 2004, Orlando, FL, USA, IEEE Computer Society, Washington, DC, USA, 2004, pp.18-22, ISBN 0-7695-2090-1.

[Welty 03] Christopher A. Welty, Ontology Research, Guest Editorial, AI Magazine, Volume 24, Number 3, Fall 2003, pp.11-12, American Association for Artificial Intelligence, [online], <http://www.aaai.org/Resources/Papers/AIMag24-03-002.pdf >, (visited 27/09/2004).

[Wolter 98] Description Logics with Modal Operators, In A.G.Cohn, L.Schubert, S.C.Shapiro (Eds), Proceedings of the 6th International Conference on Principles of Knowledge Representation and Reasoning (KR'98), Montreal, Canada, 1998. Morgan Kaufman, pp. 512-523.

[Xu 03] L. Xu and D. W. Embley, Discovering Direct and Indirect Matches for Schema Elements, Proceedings of the Eighth International Conference on Database Systems for Advanced

References

189

Applications DASFAA 2003, Kyoto, Japan, March 2003, IEEE Computer Society, Washington, DC, USA, 2003, P.39, ISBN:0-7695-1895.

[Xyleme 01] Web, In Alon Halevy, (Ed.) IEEE Data Engineering, Volume 24, Number 2 (June 2001), pp. 40-47, [online], <http://sites.computer.org/debull/A01JUN-CD.pdf >, (visited 25/08/2004).

[Yan 02] G.Yan, W.K.Ng, and E.P.Lim, Product Schema Integration for Electronic Commerce- A Synonym Comparison Approach, IEEE Transactions on Knowledge and Data Engineering, Volume 14, Issue 3 (May 2002), IEEE Educational Activities Department, Piscataway, NJ, USA, 2002, pp.583-598, ISSN:1041-4347.

[Zhu 02] D. Zhu, Security Control in Inter-Bank Fund Transfer, In M.S. Raisinghani and J.H.Nugent (Eds.), Journal of Electronic Commerce Research, Vol. 3, No. 1., Feb. 2002, ISSN: 15266133

Appendixes

191

APPENDIX A: ORGANIZATION OF ROLES BETWEEN THE PROJECT’S MEMBERS

Promoter (ANRT)

Industrial Team (R&D-Tessi Informatics)

Laboratory Team (LIRIS- INSA of Lyon)

Industrial Research Problematic

Formal ideas Research Interest

Proposal review andapproval

Literature reading and other findings

Analysis and problematic discovery

Focusing on DW and ETL Theoretical

model

Annual Report

Prototyping QETL

Alignment Local Research Topic

Framework to develop ContextualOntologies

Annual Report

Refocusing on EDI systems

Implementation research finding

Thesis Results

Prototyping EDI-Translator

Thesis Results

Thesis writing

Drawing EIS foreseen goals

Proposal

End of convention

Appendixes

192

APPENDIX B: A SNAPSHOT OF ISO 15944-4 (OPEN-EDI ONTOLOGY

COLLABORATION MODEL)

Bilateral Collaboration

governs

Economic Event

Economic Resource

Economic Agent

stockflow from

to

Economic Contract

Economic Commitment

reciprocal

fulfills

establish

duality

Economic Resource

Type

typifies

specifies

Economic Event Type

Business Role

specifies

specifies

typifies

qualifies

reserves

involves

Partner Third Party

Mediated Collaboration

Business Transaction

participates

requires

Agreement

Regulator

constrains

App

endi

xes

193

APP

EN

DIX

C: T

HE

UM

L M

OD

EL

RE

PRE

SEN

TIN

G Q

EL

T T

OO

L

App

endi

xes

194

APP

EN

DIX

D: T

HE

UM

L M

OD

EL

RE

PRE

SEN

TIN

G E

DI T

RA

NSL

AT

OR

TO

OL

Appendixes

195

APPENDIX E: THE XQUERY USED WITHIN EDI TRANSLATOR BETWEEN

(PAYMUL AND MT103)

This query is used for implementing the transformation rules between PAYMUL and MT103. define function usefichier ($f, $sender,$receiver ){ <MT> { <sender> {string($sender)} </sender>, <MT>103</MT>, <receiver> {string($receiver)} </receiver>, <M20> {let $y:=$f/GP4/LIN return concat(string($y/RFF[1]/@N1154),'-',

string($y/RFF[1]/@N1153),'/', string($y/RFF[2]/@N1154), '-',string($y/RFF[1]/@N1153))} </M20>, <M23B> {let $y:=$f/GP4/LIN/GP11[1]/SEQ/PAI return concat("CRED+", string($y/@N4461))} </M23B>, <M32A> {let $y:=$f/GP4/LIN return concat(string($y/DTM/@N2380), string($y/GP11[1]/SEQ/MOA/@N6345), string($y/GP11[1]/SEQ/MOA/@N5004))} </M32A>, <M33B> {let $y:=$f/GP4/LIN/GP11[1]/SEQ/MOA return concat (string ($y/@N6345), string ($y/@N5004))} </M33B>, <M50K> {let $y:=$f/GP4/LIN return concat (string ($y/GP6/FII/@N3194), ' - ', string ($y/GP7/NAD/@N3039), ' - ', string ($y/GP7/NAD/@N3124a),' ', string ($y/GP7/NAD/@N3124b),' ',

string ($y/GP7/NAD/@N3124c), ' ', string ($y/GP7/NAD/@N3124d),' ', string ($y/GP7/NAD/@N3124e),' ', string ($y/GP7/NAD/@N3036a),' ', string ($y/GP7/NAD/@N3036b),' ', string ($y/GP7/NAD/@N3036c),' ', string ($y/GP7/NAD/@N3042a),' ', string ($y/GP7/NAD/@N3042b),' ', string ($y/GP7/NAD/@N3042c),' ', string ($y/GP7/NAD/@N3164),' ', string ($y/GP7/NAD/@N3229),' ', string ($y/GP7/NAD/@N3251),' ', string ($y/GP7/NAD/@N3207))}

</M50K>, <O53A> { let $y:=$f/GP4/LIN/GP11[1]/SEQ return concat ( string ($y/GP12/FII/@N3194), ' - ', string ($y/GP13[1]/NAD/@N3039), ' - ', string ($y/GP13[1]/NAD/@N3124a), ' ', string ($y/GP13[1]/NAD/@N3124b), ' ', string ($y/GP13[1]/NAD/@N3124c), ' ',

Appendixes

196

string ($y/GP13[1]/NAD/@N3124d), ' ', string ($y/GP13[1]/NAD/@N3124e), ' ', string ($y/GP13[1]/NAD/@N3036a), ' ', string ($y/GP13[1]/NAD/@N3036b), ' ', string ($y/GP13[1]/NAD/@N3036c), ' ', string ($y/GP13[1]/NAD/@N3042a), ' ', string ($y/GP13[1]/NAD/@N3042b), ' ', string ($y/GP13[1]/NAD/@N3042c), ' ', string ($y/GP13[1]/NAD/@N3164), ' ', string ($y/GP13[1]/NAD/@N3229), ' ', string ($y/GP13[1]/NAD/@N3251), ' ' , string ($y/GP13[1]/NAD/@N3207) ) } </O53A>, <M59a> {let $y:=$f/GP4/LIN/GP11[1]/SEQ/GP16/PRC/GP17[1] let $z:=$f/GP4/LIN/GP11[2]/SEQ/GP13 return ( concat( "\ facture fournisseur=", string($y/DOC/@N1004), " \Montant=", string($y/DOC/MOA/@N5004), string($y/DOC/MOA/@N6345), " \Date emission facture=", string($y/DOC/DTM/@N2380), "- reference client=", string($y/DOC/RFF/@N1153), '\', string($y/DOC/RFF/@N1154)), concat( string($z/NAD/@N3039),' - ', string($z/NAD/@N3124a),' ', string($z/NAD/@N3124b),' ', string($z/NAD/@N3124c),' ', string($z/NAD/@N3124d),' ', string($z/NAD/@N3124e),' ', string($z/NAD/@N3036a),' ', string($z/NAD/@N3036b),' ', string($z/NAD/@N3036c),' ', string($z/NAD/@N3042a),' ', string($z/NAD/@N3042b),' ', string($z/NAD/@N3042c),' ', string($z/NAD/@N3164),' ', string($z/NAD/@N3229),' ', string($z/NAD/@N3251),' ', string($z/NAD/@N3207)) ) } </M59a>, <O70> {let $y:=$f/GP4/LIN/GP11[1]/SEQ return concat(string($y/RFF[1]/@N1153),'\', string($y/RFF[1]/@N1154),' - ', string($y/RFF[2]/@N1153),'\', string($y/RFF[2]/@N1154)) } </O70>, <M71A>OUR</M71A>, <O71G>0</O71G> }

Appendixes

197

</MT> } /* Le programme principal */ let $reference:=(document ("nombre_message.xml")), $nombre_m:= number ($reference/result/@nombre_message) return ( for $numero in (1 to $nombre_m) return ( let $racine := (document(concat("PAYMUL_etendu_", $numero,"b.xml"))), $x:=$racine/Interchange_EDI return

( for $f in $x/UNB/UNH let $sender :=$x/UNB/UNH/@N0004, $receiver:= $x/UNB/UNH/@N0010 return (write-to(usefichier($f, $sender,$receiver), concat("MT",string($f/@N0062),".xml")) ) )

) ) /* FIN */

App

endi

xes

198

APP

EN

DIX

F: T

HE

BR

AN

CH

ING

DIA

GR

AM

OF

ED

IFA

CT

PA

YM

UL

ME

SSA

GE

Appendixes

199

APPENDIX G: THE BRANCHING DIAGRAM OF SWIFT MT103 MESSAGE

Appendixes

200

APPENDIX F: THE BRANCHING DIAGRAM OF RUM REPRESENTATION

Appendixes

201

APPENDIX I: SNAPSHOOT OF EX-SMAL PROTOTYPE

Snapshot of the Batch Runner interface & Results Implemented within EX-SMAL Prototype

Calculating the best value of a coefficient: Let us assume that α0, 10 0 ≤≤ α , represents the best value for matching a schema S with itself, 0

Sx)S),f(Match(x, lim αα =

→ and α1, 10 1 ≤≤ α ,represents the best value for

matching a schema T with itself, 1Tx

)T),f(Match(x, lim αα =→

. We aim at finding the best value to match S with

T, ??)T),f(Match(x, limSx

=→

α , since the function of variation of the Match is continuous function and the

closed point to α0 and α1 at the same time is the middle point 2

10 αα + . Thus, we consider this value as the

best f to use for matching S and T. we should apply this method to discover all the coefficients used for our matching. This process is difficult and time consuming; therefore, we suggest the preceding discovered values after many Batches running process.

This curve shows the value of similarity between the same elements during the auto-match for coeff base=0.6

Appendixes

202

APPENDIX J: SCREENSHOT OF EISCO PROTOTYPE ADMINISTRATOR

INTERFACE WITH LINKING CONCEPT OPTION

The graphical links are translated to an XML file, having the following structure:

• A tag representing the source and the target schema • A tag representing the semantic rule (bridge rule) between two concepts.

<?xml version="1.0" encoding="ISO-8859-15"?> <Document> <fichier nameSource="DWv1" nameDestination="EDIv1"> <lien nature="Equivalence"> <concept name="(|http://owl.from.uml#Mapping|)"/> <concept name="(|http://owl.from.uml#Mapping|)"/> </lien> <lien nature="Equivalence"> <concept name="(|http://owl.from.uml#Entity|)"/> <concept name="(|http://owl.from.uml#Entity|)"/> </lien> <lien nature="Equivalence"> <concept name="(|http://owl.from.uml#File_Store|)"/> <concept name="(|http://owl.from.uml#Record|)"/> </lien> </fichier> </Document>