W9: State of the art in methods and software for the identification, resolution and apportionment of contamination sources

Organized by Roma Tauler , Philip Hopke and Pentti Paatero


In this workshop we aim to compare current approaches and recent developments and software related with multivariate factor analysis methods in the analysis of environmental data. This type of approach has not been presented in previous iEMSs and there has been enough interest and research done in this direction in the recent years, including recent EPA implementations.

The three initial promoters of the session would be: Roma Tauler (myself) from the Institute of Chemistry and Environmental Research, CSIC-Spain (see below) Phil Hopke, Director of the Center for Air Resources Engineering and Science, from Clarkson University, Potsdam, NY., and Pentti Paatero from the Department of Physical Sciences, University of Helsinki. We are planning inviting other researchers in the field.


Environmental systems (air, water, soils, biota...) are very complex systems and it is necessary to obtain simplified descriptions of reality in order to produce mathematical models capable of being calculated on current computer technologies. Thus, although significant improvements have been made over the recent years in the mathematical modeling of transport, dilution, transformation diffusion and dispersion of contaminants in the environment, there are still many cases where these models (usually based on the solution of large differential equation systems) are insufficient to allow full development of effective and efficient environmental quality management strategies. Moreover, operating these models in an appropriate way requires a detailed knowledge and control of a large number of parameters, which makes this approach unrealistic in many practical situations. Thus, it is necessary to have other approaches available to assist in the identification of contamination sources, in the determination of their distribution (geographical, temporal, among environmental compartments,...) and in the apportionment of the contamination sources at a particular sampling point.

Environmental monitoring studies often produce huge amounts of measured physical and chemical concentration values at distant geographical sites and during different time periods. Moreover, the content of chemicals is also estimated at different environmental compartments (i.e. air, water, sediments, biota...). All these data sets are difficult to handle and evaluate in simple and fast ways using simple univariate statistical and modelling tools, especially due to their large size and to their multicomponent and multivariate nature. In order to discover relevant patterns and sources of variation within large environmental data sets, the application of modern chemometric methods based in statistical multivariate data analysis and in factor analysis is proposed. The basic assumption of these methods is that each of the measured parameters or chemical concentrations in a particular sample is mostly affected by contributions coming from multiple independent sources. By using these methods, point and area sources of contaminants in the environment and their origin (natural, anthropogenic, industrial, agricultural...) can be identified and their relative distribution among samples (geographical, temporal, among environmental compartments) can be evaluated. At each sampling site, relative source quantitative apportionment is estimated allowing an assessment of their environmental impact, distribution and time evolution.


In this workshop we would like to bring those working specifically in the environmental modelling area into contact with multivariate data analysis and recently developed chemometric methods and software. In particular, different methods (some of them selected as EPA reference methods for atmospheric source apportionment) will be reviewed and compared, including different Principal Component Analysis and Factor Analysis derived methods, UNMIX, PMF-ME and MCR-ALS among others. Some specific topics will be:

This workshop:

Is based on an initial selection of 4-5 presentations (30 minutes) and discussion (10-15 minutes) showing current approaches to the field. In particular UNMIX (Henry, R.C., 2003. Multivariate receptor modeling by N-dimensional edge detection, Chemometrics and Intelligent Laboratory Systems 65, 179-189.), PMF (Paatero, P., 1997. Least square formulation of robust non-negative factor analysis. Chemometrics and Intelligent Laboratory Systems 37, 23-35.), ME (Paatero, P., 1999. The Multilinear Engine-A table driven, least square program for solving multilinear problems, including the n-way parallel factor analysis model. Journal of Computational and Graphical Statistics 8(4), 854–888.) and MCR-ALS (Tauler, R., Chemometrics and Intelligent Laboratory Systems, 2005, 76(1) 101-110, http://www.ub.es/gesq/mcr/mcr.htm) approaches will be presented and compared in the solution of a particular benchmark data set. Participants interested in making a short presentation (10-15 min) are invited to email short (200 word max.) presentation abstracts to the organiser at rtaqam@iiqab.csic.es. Accepted abstracts will appear in the conference proceedings.

As an outcome, conclusions will be written summarizing the main results of method comparison on the analysis of the benchmark data set which will be incorporated into the final version of the position paper.


  1. Roma Tauler - Department of Environmental Chemistry, IIQAB-CSIC, Jordi Girona 18-26, Barcelona 08034, Spain, e-mail rtaqam@iiqab.csic.es
  2. Philip Hopke - Center for Air Resources Engineering and Science, Clarkson University, Box 5708, Potsdam, NY 13699-5708, e-mail hopkepk@clarkson.edu
  3. Pentti Paatero - Department of Physical Sciences, University of Helsinki, Box 64, FIN-00014. e-mail: Pentti.Paatero@helsinki.fi

Position Paper

Roma Tauler, Pentti Paatero and Philip Hopke with the assistance of Ronald C. Henry, Cliff Spiegelman, Eun Sug Park, and Richard L. Poirot State of the art in methods and software for the identification, resolution and apportionment of contamination sources


Cliff Spiegelman, Eun Sug Park, Byron Gajewski Jackknife uncertainty estimation for receptor modeling
Mar Viana, Jon Zabalza, Xavier Querol, Andres Alastuey, Jesus Miguel Santamaria, Jose Inaki Gil, Marina Menendez, Philip K. Hopke Comparative analysis of PMF and PCA-MLRA results for PM2.5 at an industrial site in Northern Spain
Richard Poirot, Rudolf Husar Combined Aerosol Trajectory Tools (CATT) for Regional Air Pollution Source Apportionment
Roma Tauler, Emma Pera-Trepat Application of Multivariate Curve Resolution Alternating Least Squares (MCR-ALS) to the analysis of environmental monitoring data sets
Ronald Henry Nonparametric Regression for Air Quality Source Apportionment
Pentti Paatero, Shelly Eberly, Philip K. Hopke Suggestions for optimized planning of multivariate monitoring of atmospheric pollution
Philip K. Hopke, Pentti Paatero, Shelly Eberly The US EPA Implementation of Positive Matrix Factorization and a New Approach to Uncertainty Estimation