algortimos bio-inspirados para clustering y visualizacion de datos geoespaciales

Download Algortimos bio-inspirados para clustering y visualizacion de datos geoespaciales

If you can't read please download the document

Upload: askroll

Post on 14-Jun-2015

263 views

Category:

Documents


0 download

TRANSCRIPT

  • 1. Facult des Hautes Etudes Commerciales (HEC) Institut des Systmes dinformation (ISI)Algoritmos bio-inspirados para clustering y visualizacin de datos geo-espacialesMiguel Arturo Barreto Snz

2. Outline Algoritmos bio-inspirados ? Desafios en el clustering yvisualizacion de datos geo-espaciales gp Algoritmos bio-inspirados usados enclustering y visualizacion de datosgeo-espaciales Conclusiones 1 3. 1.Bio inspirados 1 Bio-inspirados ?Speedos Aerodynamic Surfaces"Fastskin" suit, Fastskinfor Vehiclesinspired byshark skinTechnologiesT h l iInspired bySharks By Tracy Staedter, feb 2009 , Discovery News y 4. 1.Bio inspirados 1 Bio-inspirados ? Inspired bbyA clear version ofTouchcoshuman skinmultitouch sensor By Nick Bilton, Dec 30 2009,platformThe New York Times Sensors capture thep Sensors pick variation in pressure levels up the pressure of a of a pencil drawing. hand placed on a Touchco device 2 5. 1.Bio inspirados 1 Bio-inspirados ? La naturaleza innova inventa prueba valida mejora yinnova, inventa, prueba, valida,diversifica los sistemas vivos desde hace centenas demillones de aos. El punto de vista de los sistemas bio-inspirados se basaen el estudio de las invenciones y las astucias de lanaturaleza para inspirarse y crear soluciones (esto nosignifica necesariamente copiar). Innumerables ejemplos de soluciones de ingenieranatural t l son ya utilizadas para el dtili d l desarrollo d nuevosll demateriales, retinas artificiales, etc. Andres Perez-UribePerez Uribe 1 6. 1.Bio inspirados1 Bio-inspirados ? Fuentes de inspiracinLargo termino Evolucin E l iAuto-organizacin AprendizajeEmergenciaCorto terminoIndividuo Poblaciones 1 7. 1.Bio inspirados1 Bio-inspirados ? Fuentes de inspiracinLargo termino Evolucin E l iAuto-organizacin AprendizajeEmergenciaCorto terminoIndividuo Poblaciones 1 8. 1.Bio inspirados?1 Bio-inspirados? Auto-organizacin The rat whisker-barrel systemIt is also the rats sensory system of choice for exploring the environment and collecting informationabout the location, shape, size and texture of objects around it. The system is well suited to examiningneural coding issues because of its functional efficiency and its elegant structural organization. Theg y ggwhisker area of somatosensory cortex (known as barrel cortex) is arranged as a topographic map ofthe whiskers .This means that sensory signals arising in one whisker are channelled through arestricted population of neurons and can be sampled by an electrode at different stages of the sensorysystem. 9. 1.Bio inspirados?1 Bio-inspirados? Clustering bio-inspiradoNeural networks have solved a wide range ofproblems and hbl d have good ld learning capabilities. ibilitiTheir strengths include adaptation, ease ofimplementation, parallelization, speed, and p ppflexibility.Bio inspiredBio-inspired clustering is closely related to theconcept of competitive learning. 10. 1.Bio-inspirados ? Clustering bio-inspiradobio inspiradoHard and soft competitive learningHard a) k initial "means" b) k clusters arec) The centroid of d) Steps 2 and 3 are created by each of the krepeated until associating gclusters becomes convergence has been everythe new meansreached. observation with the nearest mean 11. 1.Bio-inspirados ? Clustering bio-inspiradobio inspiradoHard and soft competitive learning Soft S ftmi = mi + (t)hci(t)(x - mi)The neighborhood function hck(t) is centered over the best matchedg ()neuron mc, which is shown as a black cell. The neighboring neuronsthat have their weights recalculated by this best match are shown ingray. Other neurons are not affected. 12. 1.Bio-inspirados ? Clustering bio-inspiradobio inspiradoHierarchical Self-organizing structures Se o ga Self-organizing g Adaptive Hierarchical Hierarchical FeatureIncrementalGrowing Hierarchical SOM MapsGrid Growing 13. 1.Bio-inspirados ? Clustering bio-inspiradobio inspirado Hierarchical Self-organizing structuresFuzzy Growing Hierarchical Self-organizing Networks (FGHSON) 14. 2. Desafos en clustering y visualizacin de datos geo-espacialesInformation received fromremote sensing systems,and environmentalmonitoring devices used in: Agro-ecology Environmental change Species distribution Disease propagation Urban dynamics Migration patterns3 15. 2. Desafos en clustering y visualizacin de datos geo-espaciales The special nature of spatio-temporal data poses severalspatio temporal challenges to clustering and visualization. For instance: 1. Visualization of clusters in both geographic and feature space 2. The fact that spatial and temporal relationships exist at various levels (scales);( ); 3. To handle fuzzy boundaries in geospatial clusters 4. The temporal context in which some variables are involved 5. The high dimensionally of the geospatial data sets 6. 6 The large quantity of data17 16. 2. Desafos en clustering y visualizacin de datos geo-espacialesGeographic space and f tGhid feature space Geographic space is concerned with surface features as the terrain we walk on. Feature space is concerned with the representation of similarities associated with geo-referenced sites in the geographic space Geographic spaceFeature space23 17. 2. Desafios en clustering y visualizacion de datos geo-espacialesGeographic space and f tGhid feature spaceThe clusters found in thefeature space in manycases are not the same asthose found in geographicspace.Represent clusters of amultidimensional space:map multidimensional datao to t o d e s o aonto a two-dimensionallattice of cells.Similarity of sugarcanegrowing environmentalconditions (1999 2005)diti(1999-2005)using Self-organizingmaps 29 18. 2. Desafios en clustering y visualizacion de datos geo-espaciales Heterogeneity in scalesNecessary to havemethodologies toevaluate clusters atdifferent scales in orderto find interestingpatterns between levels.Improve the analysis ofcluster structure atdifferent scales,creating representationsof the cluster f ili if h lfacilitatingthe selection of clustersat different scales.Geographic spaceFeature space 19 19. 2. Desafios en clustering y visualizacion de datos geo-espaciales Boundaries in geospatial data Crisp FuzzyAlgorithms for clustering spatio-temporal databases have toconsider the neighbors of the geogeo-referenced data.For instance part of the complexity instance,of the problem lies in the fact thatthe boundaries of these neighborsare not hard, but rather soft ,boundaries. 21 20. 2. Desafos en clustering y visualizacin de datos geo-espaciales Temporal relationships b t Tl l ti hi betweenspatial objects The relationship between spatial objects can change over time. This dynamic relationships can be observed for instance in the cluster changes over the timetime.22 Similarity of sugarcane growing environmental conditions (1999-2001) using Self- organizing maps 21. 3. Algoritmos bio-inspirados usados en clustering y visualizacin de datos geo-espacialesi li i d d ti lWhy to use bio-inspired algorithms ?y p g1. Discovering natural clusters in unlabeled data sets.2. Reduction of information redundancy contained in the data.3. The maximization of mutual information between the inputsand the outputs of a network in the presence of noisenoise.4. To help discover nonlinear, local or partial correlationsbetween variables.5. To work with data with unknown distribution. 22. 3. Algoritmos bio-inspirados usados en clustering y visualizacin de datos geo-espacialesi li i d d ti lA trivial case: finding zones with analogous precipitation and air temperaturein South America by using FGHSONRecorderis!FGHSONFuzzy Growing Hierarchical Self-organizing Networks (FGHSON) 23. 3. Algoritmos bio-inspirados usados en clustering y visualizacin de datos geo-espacialesA trivial case: finding zones with analogous precipitation and air temperature in South America by usingFGHSON January Air temperature and precipitation 24. 3. Algoritmos bio-inspirados usados en clustering y visualizacin de datos geo-espacialesA trivial case: finding zones with analogous precipitation and air temperature in South America by usingFGHSON January Air temperature and precipitation 25. 3. Algoritmos bio-inspirados usados en clustering y visualizacin de datos geo-espacialesClusters of sites with similarcharacteristics in time and space For commercial (mass production) crops (rice, corn) it is known the when and where For native crops (e.g. guanabana, lulo) it is not the case(e g guanabana case. When and what I must cultivate ? Market demandThe COCH project 16 26. 3. Algoritmos bio-inspirados usados en clustering y visualizacin de datos geo-espaciales Clusters of sites with similar characteristics in time and spaceSoil What crops or varieties are likely to perform well where and when.ClimateGenotype (Source: Homologue) Homologues places for Colombian coffee production. Brazil, Equator, East Africa, and New Guinea.14 27. 3. Algoritmos bio-inspirados usados en clustering y visualizacin de datos geo-espacialesClusters of sites with similarcharacteristics in time and space Harvest at different time of the same crop 15 28. 3. Algoritmos bio-inspirados usados en clustering y visualizacin de datos geo-espaciales FGHSON using to find analogous ecoregions through time 29. 3. Algoritmos bio-inspirados usados en clustering y visualizacin de datos geo-espaciales FGHSON using to find analogous ecoregions through time 30. Conclusiones (I) Discovering natural clusters in unlabeled data sets. The continuous updating,large quantity, and th dil tit d the diverse uses of geospatial d t make diffi lt t l b l df ti l data, k difficult to labeledobservations in order to define classes. Reduction of information redundancy contained in the data. Soft competitivelearning algorithms create prototypes of the observations. Hence, large data sets g gpyp ,gcan be reduced without, or a minimal, lose of information The maximization of mutual information between the inputs and the outputsof a network in the presence of noise. Usually, geospatial variables are measuredby instruments in difficult and not controlled environmental conditions (e g satellites(e.g. satellites,meteorological stations). To help discover nonlinear, local or partial correlations between variables.Several soft competitive learning algorithms allow the projection of high-dimensionalspace in a two dimensional grid. Thus, allowing the visual exploratory analysis ofdata, facilitating to discover non linear, local, or partial correlations; To work with data with unknown distribution. Many clustering algorithms hadbeen developed to deal with certain data distributions (e g Gaussian distributions) (e.g. distributions).Soft competitive learning algorithms are very useful when working with geospatialdata because they do not need to assume any data distribution1 31. Conclusiones (II)FGHSONAdvantages1.1 FGHSON does not require a priory setup of the number of clustersclusters. This aspect is critical when dealing with geospatial data, because usually it is no possible estimate a priory the optimal number of clusters that can better represent a data set2. The membership of the observations to the clusters is fuzzy3. The final structure does not necessarily lead to a balanced hierarchy(i.e.(i e a hierarchy with equal depth in each branch) Therefore areas in thebranch). Therefore,input space that require more units for appropriate data representationcreate deeper branches than others. It is important when dealing withgeographical-based data, due to in many cases are found regions thatmust be better represented1 32. Conclusiones (III)FGHSONAdvantages4. The algorithm execute a self-organizing pg gg processes that can be p performed inparallel. Hence, when dealing when large data sets the tasks can be divideddistributing computational cost.5. A software using FGHSON algorithm in geosciences is in development6. The maps on individual layers can not grow irregularly in shape and they can notmay remove connections between neighboring units. In this way it is lose informationabout the input data.Disadvantages1. The FGHSOM can not project a high-dimensional space in a two dimensional space2. The FGHSOM is a new algorithm 1