madonne masses de données issues de la numérisation du patrimoine project leader : jean-marc ogier...

7
MADONNE MAsses de DONnées issues de la Numérisation du patrimoiNE Project Leader : Jean-Marc OGIER L3i Laboratory, la Rochelle University Tel : 0033 5 46 45 82 15 – jean- [email protected] NAVIDOMASS NAVigation In DOcument MASSes Two French Projects on Analysis of Cultural Heritage Documents Mathieu Delalandre (CVC) IDoc Meeting, Valencia (Spain) 22th February 2007

Upload: lucas-cogswell

Post on 01-Apr-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MADONNE MAsses de DONnées issues de la Numérisation du patrimoiNE Project Leader : Jean-Marc OGIER L3i Laboratory, la Rochelle University Tel : 0033 5

MADONNE

MAsses de DONnées issues de la

Numérisation du patrimoiNE Project Leader : Jean-Marc OGIER

L3i Laboratory, la Rochelle UniversityTel : 0033 5 46 45 82 15 – [email protected]

NAVIDOMASS

NAVigation In DOcument MASSes

Two French Projects on Analysis of Cultural Heritage Documents

Mathieu Delalandre (CVC)

IDoc Meeting, Valencia (Spain)

22th February 2007

Page 2: MADONNE MAsses de DONnées issues de la Numérisation du patrimoiNE Project Leader : Jean-Marc OGIER L3i Laboratory, la Rochelle University Tel : 0033 5

MADONNE MAsses de DONnées issues de la

Numérisation du patrimoiNE

French ANR program “Masse de données”

Length 36 monthsFunding 110 000 €

NAVIDOMASS NAVigation In DOcument

MASSes

French ANR program “Masse de données et connaissances”

Length 36 monthsFunding 550 000 €

Introduction

Strategy

Model

Processing GUIHigh-Level Meta-Data

of images

Structured and Indexed Information

Cultural Document Images

System

Scope of projects …

The cultural heritage documents correspond to a very large mass of data.

The Madonne/NaviDoMass projects develop document analysis systems allowing to index and to browse inside this mass of data.

2003

2004

2005

2006

2007

2008

2009

Years

Calendar …

Page 3: MADONNE MAsses de DONnées issues de la Numérisation du patrimoiNE Project Leader : Jean-Marc OGIER L3i Laboratory, la Rochelle University Tel : 0033 5

Consortium

Centre de Recherche en Informatique de Paris 5 (Paris)

Institut de Recherche en Informatique et Systèmes Aléatoires (Rennes)

Laboratoire Informatique (Tours)

Laboratoire d'InfoRmatique en Image et Systèmes d'information (Lyon)

Laboratoire Lorrain de Recherche en Informatique et ses Applications (Nancy)

Laboratoire d'Informatique de Traitement de l'Information (Rouen)

Laboratoire d’informatique image et interaction (La Rochelle)

Centre d’Etude Supérieures de la Renaissance (Tours)

Professor 8Lecturer 14Post-Doctoral 3PhD Student 9Master Student 15

Engineer 6

55 Project Members

Permanent

On the last 3 years

Companies 5 HP, APROGEIDE … Libraries 5 CHAN, British library …Research Centers 10 CVC, Indian SI …

20 Project Partners

Page 4: MADONNE MAsses de DONnées issues de la Numérisation du patrimoiNE Project Leader : Jean-Marc OGIER L3i Laboratory, la Rochelle University Tel : 0033 5

OverviewDocument Layout [Ramel’05]

Bloc segmentation into footnote, text zone, dropcap, figure, ..

Background analysis Foreground analysis

Merging

10 000 pages of old printed books

Text density

Graphic density

Collection Modelling [Journet’06]

Directional rose

Old printed books

Page 5: MADONNE MAsses de DONnées issues de la Numérisation du patrimoiNE Project Leader : Jean-Marc OGIER L3i Laboratory, la Rochelle University Tel : 0033 5

Overview

Graphem based signature for handwritten patronymic retrieval

Document Layout and Retrieval [Couasnon’05]

Segmented Cells

(1) Line extraction based on Kalman Filter

(2) Positioning Grammar to correct and build cells from extracted lines

60 000 Forms of XIX° Century

Form viewerRetrieved

patronymic

“access to form”

Query Text Field

Page 6: MADONNE MAsses de DONnées issues de la Numérisation du patrimoiNE Project Leader : Jean-Marc OGIER L3i Laboratory, la Rochelle University Tel : 0033 5

Overview

text erasureinterline

Document Layout [Nicolas’06]

Handwritten pages of XIX° century

Segmentation based on Markov Random Field

Dropcap Retrieval [Parreti’05] [Uttama’05] [Delalandre’06] [Salmon’05]

10 000 dropcap images

Pattern rank

Freq

uen

cy

Style retrieval

textures MSTimage Structure retrieval

Printing retrievalquery compacity RLEAccuracy

Letter retrievalimage capital letter

combination of shape

descriptors

Page 7: MADONNE MAsses de DONnées issues de la Numérisation du patrimoiNE Project Leader : Jean-Marc OGIER L3i Laboratory, la Rochelle University Tel : 0033 5

PhD Thesis 4Journal Paper 8Conference Paper 43Master Thesis 15Technical Report 6

76 Publications

http://l3iexp.univ-lr.fr/madonne/publications.html

33 SoftwaresLicence 2Free 4Prototype 27

http://l3iexp.univ-lr.fr/madonne/ressources.html

ConclusionResults

Consortium 8 laboratories, 55 members

Renew of project NaviDoMass

WP related to MADONNE

Perspectives

NaviDoMass started since November 2007 …

5 Work Package (WP)

1. Document Layout analysis and structure based indexing

2. Information spotting

3. Structuring the feature space

4. User needs, participative design and groundtruthing

5. Interactive extraction and relevance feedback

New topics