https://doi.org/
You're currently viewing an old version of this dataset. To see the current version, click here.

Data for: Predicting Microbial Water Quality in On-Site Water Reuse Systems with Online Sensors

Widespread implementation of on-site water reuse is hindered by the limited availability of monitoring approaches that ensure microbial quality during operation. In this study, we developed a methodology for monitoring microbial water quality in on-site water reuse systems using inexpensive and commercially available online sensors. An extensive dataset containing sensor and microbial water quality data for six of the most critical types of disruptions in membrane bioreactors with chlorination was collected. We then tested the ability of three typological machine learning algorithms – logistic regression, support-vector machine, and random forest – to predict the microbial water quality as “safe” or “unsafe” for reuse. The main criteria for model optimization was to ensure a low false positive rate (FPR) – the percentage of safe predictions when the actual condition is unsafe – which is essential to protect users health. This resulted in enforcing a fixed FPR ≤ 2%. Maximizing the true positive rate (TPR) – the percentage of safe predictions when the actual condition is safe – was given second priority. Our results show that logistic-regression-based models using only two out of the six sensors (free chlorine and oxidation-reduction potential) achieved the highest TPR. Including sensor slopes as engineered features allowed to reach similar TPRs using only one sensor instead of two. Analysis of the occurrence of false predictions showed that these were mostly early alarms, a characteristic that could be regarded as an asset in alarm management. In conclusion, the simplest algorithm in combination with only one or two sensors performed best at predicting the microbial water quality. This result provides useful insights for water quality modeling or for applications where small datasets are a common challenge and a general advantage might be gained by using simpler models that reduce the risk of overfitting, allow better interpretability, and require less computational power.

Data and Resources

Citation

Metadata

Author
  • [
  • "
  • R
  • e
  • y
  • n
  • a
  • e
  • r
  • t
  • ,
  • E
  • v
  • a
  • "
  • ,
  • "
  • S
  • t
  • e
  • i
  • n
  • e
  • r
  • ,
  • P
  • h
  • i
  • l
  • i
  • p
  • p
  • "
  • ,
  • "
  • Y
  • u
  • ,
  • Q
  • i
  • x
  • i
  • n
  • g
  • "
  • ,
  • "
  • D
  • '
  • O
  • l
  • i
  • f
  • ,
  • L
  • u
  • k
  • a
  • s
  • "
  • ,
  • "
  • J
  • o
  • l
  • l
  • e
  • r
  • ,
  • N
  • o
  • a
  • h
  • "
  • ,
  • "
  • S
  • c
  • h
  • n
  • e
  • i
  • d
  • e
  • r
  • ,
  • M
  • a
  • r
  • i
  • a
  • n
  • e
  • Y
  • v
  • o
  • n
  • n
  • e
  • "
  • ,
  • "
  • M
  • o
  • r
  • g
  • e
  • n
  • r
  • o
  • t
  • h
  • ,
  • E
  • b
  • e
  • r
  • h
  • a
  • r
  • d
  • "
  • ]
Curator Reynaert, Eva
Contact Morgenroth, Eberhard <Eberhard.Morgenroth@eawag.ch>