high-frequency observation and characterization of the ... · high-frequency observation and...

31
High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate Time Series Grassi K. 1,2,3 , Dezecache C. 2 , Phan T. T. H. 2,4 , Poisson-Caillault E. 2 , Bigand A. 2 , Lefebvre A 1 . 1 Ifremer, Laboratoire Environnement et Ressources, 62321, Boulogne sur Mer, France 2 LISIC, EA 4491, Université du Littoral Côte d’Opale, 62228 Calais, France. 3 WeatherForce, 31000 Toulouse, France 4 VNUA - Vietnam National University of Agriculture

Upload: others

Post on 28-Jun-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

High-Frequency Observation and Characterization of

the Marine Environment: Completion and Spectral

Clustering of Multivariate Time Series

Grassi K.1,2,3, Dezecache C.2 , Phan T. T. H.2,4, Poisson-Caillault E.2, Bigand A.2, Lefebvre A1. 1 Ifremer, Laboratoire Environnement et Ressources, 62321, Boulogne sur Mer, France

2 LISIC, EA 4491, Université du Littoral Côte d’Opale, 62228 Calais, France. 3 WeatherForce, 31000 Toulouse, France 4VNUA - Vietnam National University of Agriculture

Page 2: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

Source : [Dickey, 2003]

Low frequency (weekly or less)

High frequency (weekly or more)

Nested scales

2

Page 3: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

REPHY/SRN

MAREL-Carnot

FerryBox

Source : [Dickey, 2003]

REPHY : RÉseau de surveillance du PHYtoplancton et des PHYcotoxines SRN : Suivi Régional des Nutriments

Low frequency (weekly or less)

High frequency (weekly or more)

Nested scales

3

Satellite images Sampling

Page 4: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

Satellite

Mobile stations:

FerryBox, SMATCH

Low frequency sampling :

REPHY / SRN Fixed stations: MAREL-Carnot, MESURHO 4

Multisource and multiparameters data

Page 5: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

Suivi Régional des Nutriments

28/03

Data gap problem

5

Page 6: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

Suivi Régional des Nutriments

28/03

6

Data gap problem

Page 7: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

MAREL Carnot station dataset: - 19 times series, sampling every 20 minutes - Missing values:

• 62,2% for Phosphate • 59,9% for Nitrates • 27,2% for pH • 12,3% for Fluorescence

- Small gaps to large gaps (7 months for pH) - Moving average or linear regression are inappropriate

Problem of missing values (Phan, 2018)

7

Data gap problem

Page 8: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

Dynamic Time Warping (DTW) based imputation

Q

8

G

References: -DTWUMI: https://cran.r-project.org/web/packages/DTWUMI/index.html

-DTWBI: https://cran.r-project.org/web/packages/DTWBI/index.html

-Thi-Thu-Hong Phan, Emilie Poisson Caillault, Alain Lefebvre, André Bigand, Dynamic time warping based imputation for univariate

time series data, Pattern Recognition Letters, 2017, ISSN 0167-8655, https://doi.org/10.1016/j.patrec.2017.08.019.

-T. T. H. Phan, E. P. Caillault, A. Bigand and A. Lefebvre, "DTW-Approach for uncorrelated multivariate time series imputation," 2017

IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), Tokyo, 2017, pp. 1-6.

https://doi/10.1109/MLSP.2017.8168165.

Methods: Dynamic Time Warping (DTW) based imputation

DTWBI => univariate time series / DTWUMI => multivariate time series

Data gap problem

Page 9: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

9

References: -DTWUMI: https://cran.r-project.org/web/packages/DTWUMI/index.html

-DTWBI: https://cran.r-project.org/web/packages/DTWBI/index.html

-Thi-Thu-Hong Phan, Emilie Poisson Caillault, Alain Lefebvre, André Bigand, Dynamic time warping based imputation for univariate

time series data, Pattern Recognition Letters, 2017, ISSN 0167-8655, https://doi.org/10.1016/j.patrec.2017.08.019.

-T. T. H. Phan, E. P. Caillault, A. Bigand and A. Lefebvre, "DTW-Approach for uncorrelated multivariate time series imputation," 2017

IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), Tokyo, 2017, pp. 1-6.

https://doi/10.1109/MLSP.2017.8168165.

Dynamic Time Warping (DTW) based imputation *

G

Q

Methods: Dynamic Time Warping (DTW) based imputation

DTWBI => univariate time series / DTWUMI => multivariate time series

Data gap problem

Page 10: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

10

Level of details

Recursive method : Multi-level Spectral Clustering

1st Spectral clustering

2nd Spectral clustering

3rd Spectral clustering

Classification in the spectral space

- DTW completion

- Data normalization

(centering, scaling)

state2 state1

state3 state4 state5 State1 state2

Scaled data

Size:105,192*9

NA’s: 0:0%

Raw data HF

Size: 92,968*9

NA’s: 320,401: 38%

Detection of environmental states

Page 11: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

11

1st Spectral clustering

2nd Spectral clustering

3rd Spectral clustering

Frequency Dynamic by months

Time index Months

s2

Scaled data

s1

s1

Scaled data

s1 s2

s2 s3 s4

s1

Scaled data

s1 s2

s2 s3 s4

s1 s2 s3 s4 S8 s7 s6 s5

Sta

tes

Sta

tes

Sta

tes

Results

Detection of environmental states

Page 12: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

Sta

tes

12

3rd Spectral clustering

Nitrate Fluorescence Dissolved Oxygen

Succession events

Time

Events : pressure and response

Pressure

Reponses

Response

Time index

12

Detection of environmental states

Classification independent from time and Fluorescence signal

Page 13: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

13

3rd Spectral clustering

Phosphate

Time index

Sta

tes

Rare/Extreme events

Phosphate Correlation State 7 = 0.62

States

Intermittent Events : rare/extreme

Detection of environmental states

Page 14: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

14

3rd Spectral clustering

Phosphate

Time index

Sta

tes

Rare/Extreme events

Correlation phosphates and turbidity

States

Detection of environmental states

Intermittent Events : rare/extreme

Page 15: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

15

3rd Spectral clustering

Phosphate

Time index

Sta

tes

Rare/Extreme events

Correlation phosphates and turbidity

States

Phosphate Desorption

Detection of environmental states

New Phosphate stock available for phytoplancton

Intermittent Events : rare/extreme

Page 16: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

16

HF Databases

S1

S2

S3

S4

S5

S6

S7

S8

Spectral-C

Label

The protocol allowed to : - Optimize HF data processing - Define states in multi-parameters time series - Detect, identify and characterize this states

- Characterize events and extract label for frequent, rare or extreme events

CONCLUSIONS and PERSPECTIVES

Adding news data sources

DTWBI

Page 17: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

17

Multi Agent

Learning

HF Databases

Sat

in-situ

S1

S2

S3

S4

S5

S6

S7

S8

DTWBI Spectral-C ML/DL

Correspondence

Label/Data

ML/DL

ML/DL

∑ Sat Model + +

Label Machine Learning

Deep Learning

S1

S2

Sx

Prediction

Label Classification

system

S1

S2

Sx

S1

S2

Sx

New data

Majority

Vote

S1

S2

Sx

training dataset

CONCLUSIONS and PERSPECTIVES

Deep-Learning

Page 18: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

Thank you for your attention

10/10/2018 18

The authors also want to acknowledge H2020 JERICO-Next for their financial contribution as well as the organizers.

This work has been partly funded by the French government and the region Hauts-de-France in the framework of the project CPER 2014-2020 MARCO

Kelly Grassi's PhD is funded by WeatherForce as part of its R & D program "Building an Initial State of the Atmosphere by Unconventional Data Aggregation".

Page 19: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

19

K-means Spectal-C Hierarchical-C

Page 20: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

20

Spectral Approach

Data segmentation

• Multi-sensor base

• No information

educational dataset: - A circle and a ball - 2000 points each

- States

Dimension 1 Dimension 1

Dim

ensi

on

2

Dim

ensi

on

2

K-means

Page 21: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

10/10/2018 21

Projection of the

classification on

the data

sampled

initial space

Using the algorithm of the nearest neighbors K

to rank the initial base

Linearly separable data

K-means

N according to the gap

Algorithme de la classification spectrale

Page 22: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

10/10/2018 22

Algorithme k-means

Criterion for minimizing intra-group distances

data

number of groups groups the barycentre of the group

K-means min J

𝑋

𝐾 µ𝑘

labels

1) Initialization of K centers

2) Assigning each point to its nearest center

3) New estimation of centers

4) Calculation of the criterion J, return to 2) if the criterion is not respected

with

Page 23: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

23

RESULTS

Rare and extreme events

Correlation : 0 .62

PCA States 7 (dim1/dim2)

PCA Regular series (dim1/dim2)

Page 24: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

24

1st Spectral clustering

Frequency of states by months

Sta

tes

Time index

Correlation of each parameter for a given cluster :

Temperature

s2

Scaled data

s1

Dynamics

Classification independently of time but Seasonal dynamics

Salinity Turbidity Temperature Dissolved Oxygen Nitrate Phosphate Silicate PAR Sea Level

S1 -0.35 0.30 -0.73 0.52 0.38 0.21 0.38 -0.21 0.014

S2 0.35 -0.30 0.73 -0.52 -0.38 -0.21 -0.38 0.21 -0.014

STATES DYNAMIC AND MAIN CONTRIBUTING PARAMETERS

Months States

Page 25: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

25

2nd Spectral clustering

Sta

te

s

Time index

States dynamics Frequency of states by months

Correlation of each parameter for a given cluster :

Nitrate Silicate

s1

Scaled data

s1 s2

s2 s3 s4

- News structuring variables : Oxygen, Nitrate, Silicate

- Actively involved in developing production processes

Salinity Turbidity Temperature Dissolved Oxygen Nitrate Phosphate Silicate PAR Sea Level

S1 0.04 -0.08 -0.48 0.62 -0.16 -0.14 -0.06 -0.09 0.02

S2 -0.41 0.40 -0.39 0.05 0.53 0.34 0.47 -0.15 -0.002

S3 0.30 -0.11 0.30 -0.46 0.11 -0.02 0.02 -0.05 0.009

S4 0.13 -0.23 0.53 -0.19 -0.48 -0.19 -0.42 0.26 -0.02

STATES DYNAMIC AND MAIN CONTRIBUTING PARAMETERS

Months States States

Page 26: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

26

3rd Spectral clustering S

tate

s

Time index

Dynamic states Frequency of states by months

s1

Scaled data

s1 s2

s2 s3 s4

s1 s2 s3 s4 S8 s7 s6 s5

8 states with some different dynamics

STATES DYNAMIC AND MAIN CONTRIBUTING PARAMETERS

Months

Page 27: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

27

3rd Spectral clustering

Salinity

Time index

Sta

tes

EXAMPLE OF STATES LABELISATION

States

Time

Page 28: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

28

3rd Spectral clustering

Salinity

Time index

Sta

tes

Sensor Failure

States

Time

EXAMPLE OF STATES LABELISATION

Page 29: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

29

Page 30: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

Fig. DTW cost matrix showing the optimal matching path

Identification of similar sub-sequences

- Pre-selection of sequences based on global features over all times series available

- Minimization of the matching path based on DTW cost matrix

30

Data gap problem

Page 31: High-Frequency Observation and Characterization of the ... · High-Frequency Observation and Characterization of the Marine Environment: Completion and Spectral Clustering of Multivariate

- Other methods are provided which aims at improving the limitation of DTW: • Derivative Dynamic Time Warping (DDTW) • Adaptive Feature Based Dynamic Time Warping (AFBDTW)

- Several functions are included to assess the similarity between time series: • Similarity • Root Mean Square Error (RMSE) • Normalized Mean Absolute Error (NMAE) • Fraction of Standard Deviation (FSD) • Fractional Bias (FB) • Fraction of data that satisfied smoothing amplitude cover (FA2)

Dynamic Time Warping (DTW) based imputation

Additional features within package DTWUMI

31