W7: Data Mining as a Tool for Environmental Scientists (DM-TES'06)

Organized by Jessica Spate, Eibe Frank, Karina Gibert, Miquel Sànchez-Marrè, Joaquim Comas, and Ioannis Athanasiadis

Goals and Scope

In this workshop we aim to introduce interested parties to a range of data mining techniques and a selection of software packages. We also invite presentations of interesting applications of data mining to environmental problems.

Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields as diverse as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems.

Certain techniques such as Artificial Neural Networks, Clustering, Case Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling, often being used to address difficult and important problems. Other methods, for example Classification and Association Rule Extraction, have not been taken up by environmental modellers on any wide scale. On the other hand, classical statistical techniques for data analysis as regression, time series or principal components analysis are also suitable and have been applied for data mining of environmental data. Finally, integration of different techniques (either machine learning or statistics) into a single data mining process may produce significant improvements over the use of either approach alone, and constitutes an open issue to be explored.

Several high quality software packages have been developed, that enable easy investigation of data using multiple techniques. The WEKA package is one of the most widely applied and is open source and freely available for download, and GESCONDA is an environmental science-specific package under development. While a small number of environmental science projects have taken advantage of such technology and the wealth of recent data mining research, most of these have been undertaken by or with data mining specialists, and the majority of environmental modellers remain unaware of the tools available.

In this workshop, we introduce interested parties to a range of data mining techniques and to a small selection of software packages. We would like to bring those working specifically in the environmental modelling area into contact with data mining software and software developers, to make data mining techniques more accessible to modellers and to give developers a better idea of the needs and desires of the modelling community. The WEKA and GESCONDA packages will be introduced and discussed specifically, but not exclusively. We also invite presentations of interesting applications of data mining to environmental problems from workshop participants.


Work emphasizing (but not limited to) the following topics is of particular interest to the workshop:

Web References

The WEKA project: http://www.cs.waikato.ac.nz/~ml/weka/ GESCONDA design statement: http://www.eu-lat.org/eenviron/Marre.pdf

A special hands-on tutorial session where a real data set will be analyzed with the WEKA and the GESCONDA packages will be included in the workshop program. Workshop participants are encouraged to attend and explore the possibilities of data mining for a real application.

Workshop Organizing Committee:

Go To Workshop Blog

Position Paper

JM Spate, K Gibert, M Sànchez-Marrè, E Frank, J Comas, I Athanasiadis and Rebecca Letcher Data Mining as a Tool for Environmental Scientists


Kyoko Fukuda, Phillip Pearson Data mining and image segmentation approaches for classifying defoliation in aerial forest imagery
Alfredo Vellido, Joaquim Comas, Raul Cruz, Eugenia Marte Finding relevant features for the characterization of the ecological status of human altered streams using a constrained mixture model
Javier Aroba, M. Luisa De la Torre Application of Data Mining Techniques to Obtain Qualitative Models for Agricultural Contaminants in Ground Waters
Xavier Flores Alsina, Joaquim Comas, Karina Gibert, Miquel Sànchez-Marrè, Ignasi Rodriguez Roda A Data Mining Approach to Enhance Knowledge Extraction in Environmental Databases
Kris Villez, Dae Sung Lee, Christian Rosen, Peter Vanrolleghem Comparison of linear and non-linear PLS methods for soft-sensing of an SBR for nutrient removal
Terence A. Etchells, Alfredo Vellido, Eugenia Marti, Paulo J.G. Lisboa, Joaquim Comas On the prediction of the ecological status of human-altered streams and its rule-based interpretation
Saara Hyvonen, Heikki Junninen, Lauri Laakso, Miikka Dal Maso, Tiia Gronholm, Boris Bonn, Petri Keronen, Pasi Aalto, Veijo Hiltunen, Toivo Pohja, Samuli Launiainen, Pertti Hari, Heikki Mannila, Markku Data mining approaches to explaining aerosol formation
Joaquin Izquierdo, Rafael Perez, P. Amparo Lopez, Pedro L. Iglesias Neural identification of fuzzy anomalies in pressurized water systems
Alexander Campbell, Binh Pham, Yu-Chu Tian A framework for spatio-temporal data analysis and hypothesis exploration
Priscilla Minotti, Ana Scopel, Fernando Ruiz Selmo Data mining approaches for monitoring and modeling the invasion of honeylocust tree (Melia azedarach) in El Palmar Nacional Park
Dimitri Solomatine (could not attend) Optimal Modularization of Learning Models in Forecasting Natural Phenomena