A team of students from the 2020/2021 UBC Master of Data Science Program (MDS) completed a capstone project in partnership with Urban Data Lab (UDL). The 7-week project goal was to develop a proposed approach for real-time anomaly detection system with Urban Data Lab’s InfluxDB time series database.
UDL has access to the UBC Energy and Water Services (EWS) SkySpark analytics platform that collects data from buildings on the UBC campus including information such as heating, ventilation and air conditioning (HVAC) equipment and energy data. UDL stores data from SkySpark in their own database using InfluxDB. Potentially erroneous data has been observed reporting from SkySpark but there is currently no system in place with InfluxDB to flag these data.
The project goal was to develop a real-time anomaly detection system using open-source tools. The detection system would allow users to understand when data are potentially erroneous such that sensors can be investigated or anomalous data can be removed from analyses. The system would also include visualization of data identified as potentially anomalous on a dashboard and ideally provide automated notifications to users.
The Campus Energy Centre (CEC) hot water boiler facility sensor data was selected as the subset of SkySpark data for the project providing a variety of sensors for testing.
Anomaly Detection Framework
UDL is currently implementing live streaming from the SkySpark platform to InfluxDB using Telegraf. Telegraf is a plugin driven server agent used to listen to http posted JSON data provided by EWS, parse data into the format required by InfluxDB, and write to the database.
The approach used in this study provides near real-time anomaly detection with InfluxDB and Telegraf. Model training is completed by querying sensor data on an infrequent basis (for example monthly), training, and saving the models. Anomaly detection occurs on a continuous basis by reading recent data, loading and running the previously trained models, and writing predictions to InfluxDB. A schematic of the framework is shown below.
This framework was implemented in a test InfluxDB environment using Docker as live streaming from the SkySpark platform to InfluxDB was still being implemented during the project timeframe.
The model criteria for this study included the following:
- Provide anomaly detection on univariate data such that the model could be applied to individual sensors without a requirement of additional data.
- Be flexible enough that the model could reasonably be applied to any building sensor. While the project study used the CEC boiler sensors, a wide variety of sensors is available in SkySpark and the goal was to provide an anomaly detection method that would be generally applicable.
- Useable in a continuous predict-detect approach such that model training was not required every time a prediction on new data was made.
- Trainable in an unsupervised approach. Labelled anomalies were not readily available for the CEC sensors and it is anticipated that this will likely be the case for most sensors that may use the model.
A long short-term memory recurrent neural network model with an encoder-decoder architecture (LSTM-ED) was selected for the study as it met the above criteria. The model was implemented in Keras with TensorFlow.
A LSTM-ED model is trained for each sensor in an unsupervised approach using sequence reconstruction of input data. Anomaly predictions are then based on identifying data with high sequence reconstruction error using a simple maximum error threshold rule.
Basically, the model tries to recreate a sequence of data. Data are assumed normal if the model does a good job of recreating the sequence with error under a selected threshold. Data are flagged as anomalous if the sequence has poor reconstruction and error is higher than the selected threshold. A qualitative assessment criterion was used in this study and the model was found to have good initial performance on the selected subset of CEC sensors. A data pattern was identified that the model had trouble detecting but it is believed performance can be improved using more sophisticated anomaly identification threshold rules. For example, the threshold rule could look at the expected distribution of reconstruction error within a sequence of data instead of using a simple maximum error threshold.
It should be noted that anomaly identification for the CEC Main Power Meter sensor shown above could potentially be achieved through a simple rule (for example setting a maximum rate of change rule as most events appear to be data spikes). An individual rule for this sensor may even outperform the LSTM-ED for detecting specific types of anomalies as it can be seen in the above graphs that while all events are identified, not all data points within an event are always flagged. The purpose of using the LSTM-ED is that it provides a general model that should be capable of identifying multiple types of anomalies without needing to set specific rules on an individual sensor basis. This is important for an automated detection system given the wide variety of sensors on campus. It is also believed performance can be improved using a more sophisticated anomaly identification threshold rule. For example, the threshold rule could look at the expected distribution of reconstruction error within a sequence of data instead of using a simple maximum error threshold used in this study.
Dashboard and Notification System
A dashboard and notification system were also implemented with the anomaly detection model in the test InfluxDB environment. InfluxDB provides a built-in dashboard tool allowing visualization of data stored in the database and templates can be saved for future use. The template functionality is a useful feature if setting up multiple dashboards and would likely be required by UDL. The dashboard implemented for the project provides a simple display of sensor data with anomalous identified data highlighted. A user can select various time periods and the graphs are updated in real-time as predictions are made. An example dashboard with five of the CEC sensors used in the study is shown below.
A simple notification system was also setup using built-in InfluxDB functionality. Notifications can be configured through the InfluxDB user interface although it is understood that these can also be set directly using InfluxDB flux tasks. This study tested notifications sent to Slack and message alerts were successfully received when data were predicted as anomalous for a sensor in the test environment.
The capstone project provided MDS students the ability to work on a project with many components from developing a model that could be used in live-streaming to building a test environment in Docker for implementing the InfluxDB anomaly detection framework. The study also provided an initial open-source anomaly detection approach with InfluxDB that can be considered by UDL. The approach is general and should be applicable to a variety of sensors which was a goal of the project.
The project GitHub repository provides additional details on the study including source code, a full report, and a demo walk-through notebook for the InfluxDB test environment.
Studies that can be considered as next steps for the project include:
- Implementing the model online and monitoring real-time performance for a select group of sensors.
- Testing a wider variety of SkySpark sensors and using quantifiable performance measures where labelled data are available.
- Improving the anomaly detection threshold method. The threshold rule used in this study uses a maximum error threshold and it believed that a more sophisticated rule or algorithm can improve anomaly detection without modification to the LSTM-ED.
- Comparison of the LSTM-ED with additional models.
- Building a more complex dashboard and notification system as required. Grafana provides an alternate option if the InfluxDB built-in tools do not provide sufficient functionality.
Ideally, the detection system could ultimately be used to provide campus and building managers with real-time or near real-time notifications of potential issues in system operations reducing operational costs, downtime, and unexpected maintenance.
2021 UBC MDS Capstone Project Team
B.A.Sc. in Civil Engineering with over 10 years of experience in water resource engineering providing analysis and management of data collection programs. Nathan developed an interest in applying machine learning to support analyses typically approached using more traditional methods and using interactive applications to better communicate results and uncertainties. After several years of informal learning using R and Python, Nathan is completing the Master of Data Science degree.
Ryan had his first taste of statistical programming working on Monte Carlo method problems during his undergraduate degree. He graduated with a B.Sc. in Physics and worked as a Lab Technician in the engineering department of a fireplace manufacture for 4 years. Looking for a field that combined his interests of programming and statistics he came across UBC’s Master of Data Science program, which he is currently finishing.
B.Sc in applied mathematics. Mitch taught himself software engineering and some data science, then worked in ad-tech for a couple years. Enjoys building machine learning prototypes as well as building the implementation and supporting framework. Mitch returned to school to do his Master of Data Science to fill in missing gaps in his knowledge and to formalize his self teaching.