W10: Uncertainty Management, Spatial and Temporal Reasoning, and Validation of Intelligent Environmental Decision Support Systems

organised by Miquel Sˆnchez-Marr, Ignasi Rodr’guez-Roda, Richard S. Sojda, Jean Philippe Steyer, Peter Struss


Title: Uncertainty Management, Spatial and Temporal Reasoning and Validation of Intelligent Environmental Decision Support Systems

Authors: Miquel Sˆnchez-Marr, Karina Gibert, Rick Sojda, Jean Philippe Steyer, Peter Struss, Ignasi Rodr’guez-Roda

Abstract: During daily operation of IEDSS several open challenge problems appear. The uncertainty of data being processed is intrinsic to the environmental system, which is being monitored by several on-line sensors and off-line data. Thus, anomalous data values at data gathering level or even uncertain reasoning process at later levels such as in diagnosis or decision support or planning can lead the environmental process to unsafe critical operation states. At diagnosis level or even at decision support level or planning level, spatial reasoning or temporal reasoning or both aspects can influence the reasoning processes undertaken by the IEDSS. Most of Environmental systems must take into account the spatial relationships between the environmental goal area and the nearby environmental areas and the temporal relationships between the current state and the past states of the environmental system to state accurate and reliable assertions to be used within the diagnosis process or decision support process or planning process. Finally, a related issue is a crucial point: are really reliable and safe the decisions proposed by the IEDSS? Are we sure about the goodness and performance of proposed solutions? How can we ensure a correct evaluation of the IEDSS? Main goal of this paper is to analyse these four issues mentioned above.

Title: Risk Assessment Module of the IWA/COST simulation benchmark: Validation and extension proposal

Authors: Jordi Dalmau, Ignasi Rodriguez-Roda, Jean-Phillippe Steyer, Joaquim Comas, Manel Poch

Abstract: The IWA/COST simulation benchmark has often used by the wastewater research community as a standardized simulation protocol to evaluate and compare different control strategies for a biological nitrogen removal process. It includes a plant layout, simulation model and parameters, a detailed description of the influent disturbances (dry weather, storm and rain events), as well as performance evaluation criteria to determine the relative effectiveness of proposed control strategies. The plant layout consists of five completely mixed reactors, including a pre-denitrification section. The ASM1 was selected to model the biological processes while T?kacs ten-layer model was chosen to describe the settling processes. Since the activated sludge process constitutes a complex system, consisting of a multi-specific microorganism population that often evolves to imbalances causing severe operational problems. The absence of basic knowledge about the interactions mechanisms between the microorganism communities and operational parameters, which are not described by standard models, is an obvious limitation when evaluating control strategies via simulation. In this context, an expert reasoning module called Risk Module was developed to detect favouring conditions for filamentous bulking, foaming, rising and deflocculation. By taking into account the several platforms where the simulation benchmark is used, a proposal for the development of the Risk Module using standard modelling and equations had been developed. This proposal allows the different benchmarking groups disposing of the Risk Module in their own platform. Validation will be done by the experts’ evaluation and real data from pilot and full-scale plants. In order to detect favouring conditions for different problems related to anaerobic digestion, an extension of the Risk Module for the Anaerobic Digestion Models (ADM) will be developed. Therefore, the Risk Module will provide evaluation qualitative criteria for the whole plant within the IWA/COST simulation benchmark.

Title: Non-Linear, Multivariate Forecasting of Hydrologic and Anthropogenic Responses to Meteorological Forcing With Case Studies

Authors: Edwin Roehl, Terry Murray

Abstract: Managers and users of natural resources often face two challenging problems. One is forecasting future natural system conditions for optimal resource allocation. Here, the natural system is comprised of the weather and a dependant hydrologic system that contains a water resource. The second problem is forecasting the behavior of a combined natural and man-made system, which also includes anthropogenic resource consumers. Even though detailed meteorological forecasting over weeks and months is impractical, hydrologic behaviors such as groundwater cycling can transpire over months and years. Alternatively, man-made systems exhibit behaviors that both lag and lead causal forcing, e.g., seasonal weather changes. This paper compares forecasting the behaviors of two systems. One is the upper Klamath Basin in Oregon and California where resource managers allocate water among competing interests, e.g., hydropower, farming, and fisheries. The second system is a water utility in coastal South Carolina, whose demand varies significantly with seasonal irrigation. Similar technical approaches were used to model both systems, but the results are instructively dissimilar. Time series of meteorological parameters, basin inflows, and consumer demand were decomposed into signals representing predominant “standard” (average) behaviors from subtler, more interesting “non-standard” (chaotic) behaviors. Next, dynamic process models were synthesized using non-linear, multivariate artificial neural networks to predict non-standard output behaviors (basin inflow or consumer demand) from non-standard inputs. Prediction accuracy was good for both systems, with R2’s exceeding 0.9. Finally, prediction sensitivity to shifting the output forward in time relative to the inputs was determined. It was found that Klamath predictions decayed towards a predictability horizon of less than 20 weeks, far short of the minimum seven-month forecast sought by the water resource managers. Conversely, prediction accuracy of the part-anthropogenically driven water demand decayed far more slowly, easily straddling the critical six-month spring-to-fall irrigation season over which utility managers sought to forecast.

Title: Experiences on Empirical Assessment of Rule-based AI Systems for Ecological Modelling

Authors: Virginia Brilhante

Abstract: In working with Artificial Intelligence techniques, more specifically logic-based knowledge representation and reasoning, applied do environmental modelling, again more specifically, to automating aspects of construction of ecological simulation models of the system dynamics kind, I have had a couple of opportunities to work on projects where comparative empirical assessments of systems were performed. More and more, the degree of complexity of AI systems renders them unsuitable for purely theoretical analytical studies, compelling us to resort to empirical methods which through data can flesh out the workings of a system and help us understand its behaviour and results. The first project developed a technique for eliciting sources of uncertainty in ecological simulation models [3]. This was done within a logic-based approach for it lent itself well for declarative representation of sources of uncertainty as well as for their propagation and combination throughout the models during simulation. To experiment with and validate the technique, we reconstructed in logic (through a Prolog implementation) a large system dynamics simulation model of a tropical forest area in Brazil, originally developed using the Stella modelling tool (isee systems, inc.), that included carbon cycling and production of commercial and non-commercial tree species. The assessment experiment consisted of comparing the reconstructed model with the original one, in order to verify whether we had accomplished a reasonable approximation of the original model to which we could apply the uncertainty elicitation technique. The findings were that in spite of the logic-based implementation of the model had been simplified in several ways -- use of difference equations instead of differential ones, disregard of inputs from a Nitrogen cycle submodel that the Stella model included, etc. -- its simulation results were fairly close to the ones produced by the original model. This could be observed on the very similar shapes of the curves produced by plotting values (for the logic-based model results using interpolation) of corresponding variables in the two models, such as carbon in above-ground vegetation, density of species per DBH (Diameter ate Breast Height) class etc., with respect to simulated time. The second project's aim was to explore ontology-based knowledge reuse, on the grounds that in order to reuse knowledge, people, or software systems, need to know its meaning and ontologies make possible to elicit such meaning. Two rule-based systems were built, S-0 and S-R, both able to synthesise conceptual system dynamics ecological models (Forrester diagrams) from data annotated through an ontology (or from metadata, for short) called Ecolingua [2]. S-0 performs synthesis having as information resource metadata only. S-R, in turn, performs synthesis having as resource metadata as well as reference models that are matched with the new metadata to synthesise new models. S-R thus demonstrates that on top of benefiting from 'knowledge specified through an ontology' (metadata, in our context here), systems can also benefit from reusing 'knowledge that can be derived from knowledge specified through an ontology' (the reference models), which has been a promise of the ontological approach in knowledge representation. For the evaluation experiment itself, the motivating question was: 'once ontology-based knowledge reuse has been achieved (like S-R did), what practical gains does this bring about to systems?'. The experiment's overall goal was then to provide empirical evidence towards answering this question. In the computational realm, where resources are still limited, gains in efficiency are sought for. This led to efficiency being chosen as the performance criterion on which the two systems would be compared in the experiment. Since we had at hand a comparative evaluation of two systems, a characterisation of differences between them was needed. Features in which S-0 and S-R differed were identified and their contribution to relative increased or decreased run time efficiency considered. Four of these features were identified: the model building algorithms, the constraints for synthesis of model components and the metadata retrieval mechanism, causing S-R to be more efficient than S-0, and the mechanism for selection of local partial solutions, only implemented in S-R, causing it to be less efficient than S-0. The next step was to clearly define our experimental hypothesis and the evaluation criterion to be measured. The formulated hypothesis was: "S-R's improved features through reuse of reference models give, compared to S-0, a net increased efficiency leading to shorter synthesis run times." The evaluation criterion, at this stage already loosely set to be efficiency, was more precisely defined as a measure of resources consumed as a function of the size of the task tackled, namely, CPU time as a function of the complexity of the synthesised models to which a metric was also defined. With such definitions, we could then proceed with designing an experimental procedure for producing scenarios in which we could compare the run times of the two systems over a range of models of different complexities under the same experimental conditions. A sample of models was taken from the literature and to each of them a metadata set was either artificially generated (by a program) or manually specified. A larger sample of metadata sets was derived from this initial sample through a systematic partition (also by a program) of each initial metadata set into subsets. The experimental procedure consisted of various scenarios for collecting run time measurements, which were created by exploring relations holding between three models given a metadata set: 1) a model synthesised from the metadata set using S-0, 2) a reference model, and 3) a model synthesised from the metadata set through reuse of the reference model using S-R. The procedure was automated and around 600 scenarios were executed each one providing one run time measurement of S-R comparable to S-0. These results were plotted (using an interpolation method where necessary), showing run time of the two systems in relation to complexity of the synthesised models, so that they could be visualised and interpreted. The interpretation consisted of drawing correlations between the systems' run time behaviour and their features, identified earlier, that had an impact on efficiency. The plots also revealed that processing manually specified metadata was significantly more demanding for both systems making them less efficient compared to scenarios where only artificially generated metadata was used. In sum, the experimental results came in support of the hypothesis: using a reference model improved synthesis performance remarkably (on the hardware/software platform used, S-0 run times ranged from 1.5 to 190 s, while S-R's ranged from near 0 to 3 s, approximately). There was a trade-off, however, between run time efficiency and metadata usage: S-0 was a slower system but thoroughly exploited metadata evidence available for synthesis, while S-R did not because the synthesised models were bound by the reference models. The final step was to generalise the experimental results by identifying the factors in model design problems and model synthesis systems, not restricted to the ecological modelling domain, that were essential for reproducing the behaviour of the ontology-supported knowledge reuse technique as observed in the experiment [1]. The generalisation was formulated as a generic causal explanation for the technique's expected behaviour, as far as efficiency was concerned, in relation to characteristics of modelling problems and systems. In retrospect, the experiments summarised here have in common the same empirical methodological framework, in the second experiment more elaborated than in the first one, which consists of: defining assessment criteria, identifying similarities and differences in the compared systems that have an effect on the assessment criteria, formulating an experimental hypothesis, designing an experimental procedure, collecting data for the experiment, generating experimental results by applying the procedure and then interpreting and generalising the results. This does not diverge from practices of other scientific disciplines with a stronger tradition on empirical studies. In fact, AI has a lot to draw upon classic empirical methods as Paul Cohen brilliantly discusses in [4]. I recall once discussing the empirical assessment of the model synthesis systems with a group of researchers and being asked why I had chosen efficiency as criterion and not something like the coverage of the ontology or how well the synthesised models represented the real ecological systems. My honest answer was that efficiency was a computational measure that allowed me to have more control over the experiments in that it did not depend on any subjective judgment by domain experts. Assessing quantifiable, computational aspects of AI environmental systems makes up a kind of comfort zone for us computer scientists, engineers and the like. When dealing with qualitative or less crisp but nevertheless important aspects of such systems such as effectiveness in decision support, quality of model designs or even uncertainty representation, then we find ourselves in a more uncharted and open territory. References: [1] Brilhante, V. Ontology-Enabled Reuse of Structural Data-Based Models. In Proceedings of the Workshop on Ontologies and their Applications at the 17th Brazilian Symposium on Artificial Intelligence (SBIA-2004), Sao Luis, Brazil, 2004 [2] Brilhante, V. Ecolingua: a Formal Ontology for Data in Ecology. Journal of the Brazilian Computer Society. Freitas, F., Stuckenschmidt, H. and Noy, N.F. (eds.), Special Issue on Ontologies Issues and Applications. To appear [3] Brilhante, V. and Campos dos Santos, J. L. Eliciting Sources of Uncertainty in Ecological Simulation Models. In Pahl-Wostl, C., Schmidt, S., Rizzoli, A.E. and Jakeman, A.J. (eds), Complexity and Integrated Resources Management, Transactions of the 2nd Biennial Meeting of the International Environmental Modelling and Software Society, iEMSs. Osnabrueck, Germany, 2004 [4] Cohen, P. Empirical Methods for Artificial Intelligence. The MIT Press, 1995

Title: Confronting EDSS-maintenance validation

Authors: Claudia Turon, Joaquim Comas, Ulises Cortes, Manel Poch

Abstract: Daily operation and maintenance tasks are needed to guarantee the correct performance of Constructed Wetlands (CWs). The definition of these activities is a complex task since these actions vary according to the technology, the configuration and the design of the wastewater treatment plant, the community characteristics and the features of the receiving media. To support the definition of these actions an Environmental Decision Support System (EDSS) has been constructed (EDSS-maintenance). The methodology used to develop EDSS-maintenance is based on the following 5 steps: environmental problem analysis, data and knowledge acquisition, model selection, model implementation and validation. The first 4 steps have been finished: the required data and knowledge to solve the environmental problem were acquired and translated into a knowledge base composed of IF – THEN rules. The validation process is ongoing and there are no clear guidelines on EDSS validation, on the contrary, it is still an open problem. This document presents a new approach for this step. The validation process has to guarantee both the functioning of the EDSS-maintenance and the compliance with the user requirement specifications: identify the CWs’ problems, identify the causes unleashing these disturbances and propose the most appropriate corrective actions. The validation procedure is done in several stages: checking the syntax and the semantic of the rules (Validation-1 step), comparing the tasks proposed by the EDSS-maintenance with real operation and maintenance protocols (Validation-2 step), expert evaluation of guidelines proposed by the EDSS-maintenance (Validation-3 step), and evaluating the results of the application of operation and maintenance protocols in new CWs (Validation-4 step). Two numerical indices allow verification of the EDSS-maintenance performance and the checking of the compliance of the protocols with the user requirements. Moreover, another index enables an easy revision and improvement of the knowledge bases (problems, causes and actions) and so enhance the decision support system.

Title: Comparing Different Cluster Algorithms In Environmental Databases Using Gesconda

Authors: Xavier Flores Alsina, Joaquim Comas, Karina Gibert, Miquel Sanchez-Marre, Ignasi Rodriguez Roda

Abstract: An intelligent Environmental Decision Support System (IEDSS) can be defined as an intelligent information system that reduces the time in which decisions are made in an environmental domain, and improves the consistency and quality if those decisions. The fully success of an IEDDS mainly depends on the knowledge embodied, which provides the system with enhanced abilities to reason about the environmental system in a more reliable way. Classic approaches are based on getting knowledge by manual interactive sessions with the environmental experts. But, when databases summarize the behaviour of the environmental system in the past, there is a more interesting and promising approach: using several common automated techniques from both statistics and machine learning fields to automatically or semiautomatically induce it. Clustering techniques have a great importance in knowledge discovery because they are able to find out new groups or cluster objects in environmental databases. The objective of this paper is to compare and to analyze different clustering algorithms using GESCONDA a prototype of data mining tool. Several partitions methods are proposed and the results are presented and discussed. The data analysis is carried out with environmental data set from a catalan wastewater treatment plant. Both water quality and operational parameters from different spatial locations are mined by means of different clustering methods to finally present the comparison among the results obtained.

Title: Using Multi-Agent Systems for decision-making at river basin scale

Authors: Thania Rendon

Abstract: The management of environmental systems has become quite difficult due to the complexity of the environmental problems they are faced with. Moreover, river basin systems can be especially complex to manage at a catchment scale where problematic situations/features such as uncertainty and imprecision of data, heterogeneity, intrinsic instability, and so on are found in a daily operation basis. Traditional software approaches usually are unable to cope with the intricacy of the environmental problems. Therefore, the use of different Artificial Intelligence techniques has aroused like a suitable solution for designing and developing decision support systems. Among these techniques we find particularly valuable the use of Multi-Agent systems (MAS) for modelling real environmental situations. Our work seeks to provide feasible solutions that support the decision-making throughout modelling and simulation of different scenarios in a river basin system by means of a MAS. To ensure the correct performance of our River basin MAS we shall validate it with real data coming from the real case study. So far, there cannot be found in the literature any benchmarking nor clear strategies/protocols to validate systems in the environmental domain. Thus, we ought to evaluate the system through an empiricist view. A set of representative situations of the case study should be set as input scenarios and the communication between agents must be verified. Then the solutions proposed by the system have to be confirmed and approved by the waste water treatment plant manager. As a result, the feedback from the human experts plays an indispensable and crucial role in the validation process.