introduction to spatial epidemiology analyses and methodsmodel, the spatially structured component...

Post on 05-Dec-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to Spatial Epidemiology Analyses andMethods

Miguel Angel Luque Fernandez

Instituto de Investigacion Biosanitaria (ibs.GRANADA), Universidad de Granada.CIBER de Epidemiologıa y Salud Publica (CIBERESP).

London School of Hygiene and Tropical Medicine.TH Chan Harvard School of Public Health.

29 de noviembre de 2018

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 1 / 87

Content

TOC1. Introduction to Spatial Epidemiology

2. Spatial Epidemiology study in Harare

3. Disease mapping: measures of Disease Ocurrence (SIR) and Mortality(SMR)

4. Spatial Modelling: Poisson Regression (GLM)

5. Spatial Modelling: GLMM and Smoothing (Empirical Bayes Estimate)

6. Spatial Modelling: Conditional Autoregressive Modelling (INLA)

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 2 / 87

What is epidemiology?

Some textbook definitions:The study of the distribution and determinants of disease frequency inman (MacMahon and Pugh 1970)The discipline on principles of occurrence research in medicine(Miettinen 1985)The study of the distribution and determinants of health relatedstates and events in specified populations, . . . (Porta (ed.) Dictionaryof Epidemiology, 2014)

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 3 / 87

Epidemiology

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 4 / 87

Epidemiology

EpidemiologyClassical Epidemiology: focuses on the triad of person, place and time.

GISModern Epidemiology increasingly incorporates the spatial perspective(place) into the research designs and models using GeographicInformation System methods:

Geocoding.Distance estimation.Record linkage and data integration (Disease Mapping).Spatial and spatio-temporal clustering.Small area estimation and Bayesian applications to disease mapping.

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 5 / 87

Spatial Epidemiology

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 6 / 87

Spatial Epidemiology

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 7 / 87

Spatial Epidemiology

John Snow identified the spatial aggregation of Cholera cases in 1857 inLondon.

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 8 / 87

Epidemiology

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 9 / 87

Introduction

Types of spatial analysis in epidemiologyDisease mapping (Health services research focused: social epi)Geographical correlation (Social and Enviromental Health Epi)Risk assessement in relation to point or line resources (Infectious Epi)Cluster detection and disease clustering (Infectious Epi)

https://www.cdc.gov/dhdsp/maps/gisx/mapgallery/index.htmlMiguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 10 / 87

Introduction: Spatial Epidemiology Definition

DefinitionsEnglish D. 1992: ”The description of spatial patterns of diseaseincidence and mortality”.Lawson, AB. 2003: ”Spatial Epidemiology concerns the analysis of thespatial/geographical distribution of the incidende of disease”

Spatial epidemiology is the descriptionand analysis of geographically indexedhealth data with respect to demographic,enviromental, behavioral, socioeconomic,genetic, and infectious risk factors.

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 11 / 87

Introduction: Spatial Epi and Health Disparities

Spatial Epidemiology and Health Disparities ExamplesPhysical and social environments give rise to HEALTHDISPARITIES.Longer distances to reach mammography facilities (delay in diagnosis)[Nattinger AB. 2001].Pedestrian friendly enviroments and obesity. [Gordon-Larsen, 2006].Residents in major traffic corridors and cardiovascular disease.[McEntee JC. 2008].

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 12 / 87

Introduction: Problems

Problems in Spatial EpiScale (i.e., Autonomous regions,Province, Municipe, Hospital,School, Neighborhood, ZIPcode, census tract).Changes of bounderies.Unssucessfull geocoding rates(changes in representativity) oreven errors in geocoding.Missalignement.

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 13 / 87

Introduction: Problems

Problems in Spatial EpiScale (i.e., Autonomous regions,Province, Municipe, Hospital,School, Neighborhood, ZIPcode, census tract).Changes of bounderies.Unssucessfull geocoding rates(changes in representativity) oreven errors in geocodingMissalignement.

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 14 / 87

Introduction: Study Designs

Cross-sectional Ecological studies (mostly descriptives)The unit of anlysis is grouped by political/administrative units (e.g.nation, state, autonomous region, ZIP code, census tract) healthfacility, school, or other organization unit.Spatial dependence (clustering) must be accounted for usingsmoothing techniques, spatial regression or multi-level modeling.Ecological FalacyGeneralization of hypothesis

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 15 / 87

Introduction: Study Designs

Case-control, croossover and Cohort studies of enviromental riskfactors

Spatial dependence (clustering)GIS can help to estimate measures of access (e.g. distance to facility)or other local estimates derived from spatial surfaces (i.e., deprivationindex).Geocoding of adresses linked to Census level socio-demographic andenviromental variables (e.g. air pollution, water quality).

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 16 / 87

Introduction: Methodologies (Geocoding)

Geocoding evaluation1. Match rate: percentage of records being geocoded.2. Match score: how well the standardized address matches the street

database.3. Match type: kind of precission i.e., geocoding at the street level or

Zip Code.4. Protect privacy of individuals: Geomasking.5. Quality has a price: ESRI and ArcGIS Pro.6. Example Lian et al. found travel time and facility density were poorly

correlated with odds of late-stage breast cancer.

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 17 / 87

Introduction: Methodologies (Distance)

Why do we use ditances?To evaluate the impact of long distances on the provision and utilization

of health services.

Distance estimation1. Travel distance: Euclidean or network based (impedance must be

incorporated).

2. Trevel cost.

3. Impedance: en route conditions (congestion).

4. Quality has a price: ArcGIS Pro can be used to calculate distanceand travel time using ESRI’s cloud-based road network data.

5. Example Lian et al. found travel time and facility density were poorlycorrelated with odds of late-stage breast cancer.

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 18 / 87

Introduction: Methodologies (Clustering)

Count statisticsEvaluating clustering aggregated cases into spatial units of individual

disease cases is typically implemented using a count statistic to accountfor spatial-autocorrelation.

Moran’s I1. Moran’s I: tells us whether nearby units tend to exhibit similar rates

Ranges from -1 to +1, whith a value of -1 denoting that units whitlow rates are located near other units with high rates, while a Moran’sI value of +1 indicates a concentration of spatial units exhibitingsimilar rates.

2. Kulldorff’s spatial scan statistic: identifies the most likely diseaseclusters maximizing the likelihood that disease cases are locatedwithin a set of concentric circles that are moved across the study area.

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 19 / 87

Introduction: Methodologies (Diseae risk estimation)

Small Areas EstimationSmall size = unstable estimates = sporious associations

Small area unstabilitySE(SMR) = sqrt(1/cases)

Small number of cases leads to a larger SE (unstable estimates)

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 20 / 87

Introduction: Methodologies (Diseae risk estimation)

Approaches to deal with small areas1. Multi-level modelling: using GLMM to account for the random

are-level effects not explained by the covariates alone. We can fitthese models under a Frequentist (Empirical bayes estimation) orHierarchical Bayesian approach (Posterior probabilities) (Clayton andkaldor, 1987).

2. Conditional Autoregressive models: in addition to unexplainedvariability (overdispersion) we can also use the spatial structure of thedata to improve the small area estimates.

BYM Besag-York-Moille model is the most commonly used. In thismodel, the spatially structured component is modelled according to acertain adjacency structure given by a neighborhood matrix thatspecificies two areas are neighbours if they have a common boundary.

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 21 / 87

CONTENT: 2

Cholera Epidemic in HarareLet’s review critically the study I will be presenting regarding the Spatial

Epidemiology Analysis of the Cholera Epidemy in Harare, Zimbabwe.

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 22 / 87

CONTENT 3: Disease mapping

Disease mappingProvide risk estimates in the region of studyUsually data collected by Health AuthoritiesCrude measures of mortality and morbidity (incidence) can be mappedHowever, standardized measures SIR and SMR are most commonlymapped.

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 23 / 87

CRUDE Measures of ocurrence and mortality

CRUDE Incidence and Mortality Rates

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 24 / 87

Standardization of Rates

Standardization

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 25 / 87

Standardization of Rates

DIRECT Standardization

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 26 / 87

Standardization of Rates

DIRECT Standardization

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 27 / 87

Standardization of Rates

DIRECT Standardization

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 28 / 87

Standardization of Rates

DIRECT Standardization

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 29 / 87

Standardization of Rates

DIRECT Standardization

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 30 / 87

Standardization of Rates

INDIRECT Standardization

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 31 / 87

Standardization of Rates

INDIRECT Standardization

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 32 / 87

Standardization of Rates

INDIRECT Standardization

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 33 / 87

Disease mapping

ConclusionsOnce you have estimated your SIR or SMR, you would like to it in your

favority GIS software (usually merging it, using your ID, into the .dbfdatabase) and map it.

Please, be honest and consequent (just present strongt effects and relevant patterns)Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 34 / 87

Disease Mapping

ConclusionsIn general, the use of map displays should be minimised and only usedwhen ancillary statistical information is available.Any map which may be used for interpretation should be as simple aspossible and report statistical information closely without undue extraprocessing.For case event data, the simplest form of representation of relativerisk is a contoured risk surface.To reduce the potential bias in interpretation of such surfaces, it isprobably better to portray the surface as a probability (p-value)surface which displays the associated variability directly, rather thanpresenting the estimated relative risk surface itself.Probability maps may account for the population size better than theSMR, which may show high extreme values in low populated areas(consider overdispersion with spatial dependence).

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 35 / 87

Disease Mapping

ConclusionsFor aggregated count data, users may prefer coloured maps there issome justification for the use of greyscale maps in that tonal qualitycan bias interpretation.The use of class boundaries defined by percentiles of the observeddistribution or other cut points which produce internally standardisedrelative schemes should be avoided in favour of reporting ofgrouped rates.In general, the use of maps of relative risk should be limited toan aid to presentation of statistical results rather than a basicinferential tool.

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 36 / 87

Computing SIR and SMR

PRACTICAL#1 using Stata and RMeasures of Disease Ocurrence (SIR), Mortality (SMR) and Risk in Spatial

Epidemiology.

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 37 / 87

CONTENT: 4 GLM and Poisson Regression

Modeling countsModeling counts over time (RATES), Poisson Regression and GLM

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 38 / 87

Poisson distribution

Poisson distributionθ is called the canonicalor natural parameter.The associated function(log for Poisson) iscalled the canonical linkfunction.The parameter φ isknown as the scale ordispersion parameter (φ= 1).

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 39 / 87

Poisson process to model RATES

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 40 / 87

Poisson Process

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 41 / 87

Poisson Assumption

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 42 / 87

Generalized Linear Models

GLMThe family of generalised linear models (GLMs) is a larger class of models(derived from the exponential family) which enables us to develop and fitmodels for a much wider range of outcome types (continuous, binary andcount outcomes) (Wedderburn, 1972) (MacCullagh and Nelder, 1989).

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 43 / 87

GLM: Poisson modelling

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 44 / 87

GLM: Poisson modelling

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 45 / 87

GLM: Poisson Process (rates) modelling

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 46 / 87

Poisson Assumption: overdispersion

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 47 / 87

Poisson Assumption: overdispersion

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 48 / 87

PRACTICAL#2:

Practicals using R and StataGeneralized linear Model: Poisson family and link log, Risk Ratios and

Overdispersion.

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 49 / 87

CONTENT: 5 GLMM and EB estimation

GLMMGeneralized Linear Mixed Effect Models and Empiral Bayes Estimation

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 50 / 87

Levels: Hierarchical structure

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 51 / 87

Notation

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 52 / 87

Several names but same concept

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 53 / 87

Motivation Random effect models

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 54 / 87

Random Effect Models visualization

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 55 / 87

Random Effect Models visualization

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 56 / 87

Random Effect Models visualization

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 57 / 87

Random Effect Models visualization

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 58 / 87

Random Effect Models visualization

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 59 / 87

Random Effect Models visualization

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 60 / 87

Random Effect Models visualization

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 61 / 87

Random Effect Models visualization

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 62 / 87

A simple frequentist Random Effect Model

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 63 / 87

Types of Shrinkage estimation

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 64 / 87

Empirical Bayes Estimation

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 65 / 87

Types of Shrinkage estimation: differences

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 66 / 87

Linear Mixed Effect Model in Stata

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 67 / 87

Frequentist vs Bayesian

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 68 / 87

LME estimation

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 69 / 87

EBE estimation

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 70 / 87

Visual interpretation

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 71 / 87

Random-effects Poisson Regression

REPRAn important limitation of SMR is that estimates for small areas arevery imprecise.This problem can be addressed by using random-intercept Poissonmodels in conjunction with the Emprical Bayes Estimation (EB) orPrediciton.The resulting SMRs are shrunken toward the overall SMR, therebyborrowing strenght from others areas.Full likelihood estimation is possible with gamma-distributedrandom effects a.k.a negative binomial regression (NBR).REPR is the only non-normal context where GEE and random effectsmodels are estimating the same thing.

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 72 / 87

Random-effects Poisson RegressionModel Specification 1:

ln(µj) = lnej + β0 +K∑

k=1βmXj + Uj

Model Specification 2:

ln(µj) − lnej = β0 +K∑

k=1βmXj + Uj

Model Specification 3:

ln(µj)lnej

= β0 +K∑

k=1βmXj + Uj

Here Uj ∼ N(0,τ2) is a random intercept representing unobserved heterogeneitybetween areas and ln(ej) is the log of the expected number the outcome cases inarea j based on its age distribution and it is introduced in the model as an offset, acovariate with regression coefficient set to 1. The purpose of the offset is to ensurethat β1 and Uj can be interpreted as a model-based region-specific log SMR. Thisinterpretation becomes clear by substracting the offset from both sides of theequation.

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 73 / 87

Empirical Bayes Estimation interpretation

Interpretation: The log of the expected number of lip cancer cases in a county increases by 0.07 for every unit increase in x. The correspondingincidence rate ratio is 1.07 (= exp(0.068)) corresponding to a 7 % increase in the incidence rate per unit increase in x.

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 74 / 87

Empirical Bayes estimation in R

Empirical Bayes Smoothing

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 75 / 87

PRACTICAL#3:Random-intercept Poisson Regression

Empirical Bayes Estimation in Stata and REmpirical Bayes estimation with GLLAMM in Stata and R withINLA(GLM)

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 76 / 87

Disgression about ESTIMATION

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 77 / 87

Disgression about MODELLING

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 78 / 87

CONTENT: 6 Accounting for the Spatial Structure

BYM and INLABesag-York-Moille and Integrated Nested Laplace Aproximations Modeling

in R

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 79 / 87

Accounting for the Spatial Strucure

Local Empirical Bayes Smoothing

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 80 / 87

Neighbhours Mesh Data Structure

Local Empirical Bayes Smoothing: CAR = BYM model

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 81 / 87

Neighbhours Mesh Data Structure

Local Empirical Bayes Smoothing: CAR = BYM model

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 82 / 87

INLA Modelling

INLA

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 83 / 87

Empirical Bayes estimation in R

Empirical Bayes Smoothing

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 84 / 87

PRACTICAL#4:

R-INLAEmpirical Bayesian estimation of CAR (BYM) usin INLA(Bessage) in R

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 85 / 87

Referencias

Clasical but Consolidated: Excellent Overview1. dos Santos Silva, I. Cancer Epidemiology: Principles and

Methods. (1999). International Agency for Research on Cancer,Lyon.

2. Breslow, N.E., Day, N.E. Statistical Methods in Cancer ResearchVol. II. The Design and Analysis of Cohort Studies. (1987).IARC, Lyon.

3. Clayton, D., Hills, M. Statistical Models in Epidemiology.(1993).OUP, Oxford.

4. Pfeiffer DU., Robinson TP. et al Spatial Analysis inEpidemiology.(2008). OUP, Oxford.

5. Lawson AB. Statistical Methods in Spatial Epidemiology.(2006).Wiley. London

6. Blangiardo M and Cameletti M. (2015). Spatial andSpatio-Temporal Bayesian MOdels with R - INLA. Wiley. London

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 86 / 87

¡Gracias por vuestra atencion!

Miguel Angel Luque Fernandezmiguel.luque.easp@juntadeandalucia.es

Miguel Angel Luque Fernandez ibs.GRANADA 29 de noviembre de 2018 87 / 87

top related