A group of students from the 2019/2020 cohort of UBC’s Master of Data Science Program (MDS) have completed a capstone project in partnership with UBC Urban Data Lab by applying machine learning techniques on data from UDL’s InfluxDB instance of live-streaming building energy datasets. The focus of the 9-week project was to build a semi-supervised classification model capable of assigning a secondary energy end-use category to all energy usage sensors (i.e. electrical meters/submeters) in the Pharmaceutical Sciences Building on UBC’s Vancouver campus. The results could then be used to sum up the energy usage by end-use category for a given time period and display the results in a web-based visualization dashboard.
About the Data
The data used for this project came from UDL’s InfluxDB SKYSPARK instance (more details here) and was queried using the InfluxDB-Python package and the InfluxQL language. The project scope was limited to the Pharmaceutical Sciences Building (Pharmacy Building) because this modern building has a significant number of sensors that are well documented with descriptive tags.
Two types of Sensors
The only sensors requiring end-uses were those that measure energy consumption (such as an electrical meter at the inlet of the building or sub-meter for energy fed to a specific system, floor, or piece of equipment). These sensors of interest are referred to in this project as Energy Consumption (EC) sensors. All other sensors in the building (such as temperature sensors, room occupancy sensors, fan speed sensors, ventilation system sash opening sensors, etc.) are referred to as Non-Energy Consumption (NC) sensors but were used as inputs to the model to help with the goal of predicting the end-uses of the EC sensors. There were just over 200 EC sensors and 7,800 NC sensors present in the Pharmacy Building’s data.
Energy Consumption (EC) Sensors
Non-Energy Consumption (NC) Sensors
About the Model
The end-use classification model was written in the Python language and makes use of common packages used for data wrangling, analysis, and modeling such as Pandas and Sci-Kit Learn. The general concept is shown in the flowchart below.
The basic idea is to first condense the ~7800 NC sensors in the Pharmacy building down to a more manageable size using clustering. Next, model and extract information on how these clusters of NC sensors react relative to each EC sensor. Finally, pass the numbers representing that relationship between NC and EC sensors into a supervised classification model along with some other information about the EC sensors and a “training” set of EC sensors that have already been manually labelled with end-use types. Once this model has been trained on this set of EC sensors, it can be used to predict end-uses for data without known end-uses and the results can be stored back into UDL’s InfluxDB instance.
Please see the project’s GitHub repository for a full report and more details on the model.
Results and Visualization Dashboard
The resulting end-use labels for all EC sensors in the Pharmacy Building were:
Table 1: Number of Sensors per End-Use
% of Sensors
Each EC Sensor’s ID and end-use category label was stored back in the UDL InfluxDB at the end of the modelling process. This allows the label to be accessed along with the energy usage readings for that sensor. Using Grafana open source visualization software and the Flux language, the team created a pie-chart as a proof of concept result for the overall process of querying data from the database, classifying end-use energy types for each EC sensor, storing the results, and finally accessing and visualizing the stored results.
At this time, this proof of concept dashboard is not publicly accessible but as seen in the screen shot above, the pie-chart is a summation of the energy used within the Pharmacy Building, grouped by end-use category for the given date-range.
This capstone project was one of the first projects undertaken using UDL’s InfluxDB instance of living-streaming building energy data. It provided the MDS students an opportunity to work with real-world time-series data and apply their recently acquired data science skills to design and test a model that can be applied to classify energy usage data in the database into end-use types. Additionally, feedback from the MDS capstone team on the capabilities and limitations of the data and database structure has allowed UDL to implement several performance and functionality improvements to the system.
There are many opportunities to improve the accuracy, performance and scalability of the model. Three main focuses for future work are:
- Adjust the model to work with the updated database structure. This would improve the accuracy as a unique ID produced will give a cleaner dataset and improve performance by doing aggregation within the query.
- Sort out the hierarchy of electrical meters in the Pharmacy Building. This will help make the model scalable for other UBC buildings and make the dashboard more informative.
- Optimization of the code. For example, making the feature selection part of the model dynamic, so it can handle data being fed in from multiple buildings.
2020 UBC MDS Capstone Project Team
B.S. in Statistics from the University of California, Santa Barbara and currently a Master of Data Science student. With over 4 years of experience in data analytics providing e-commerce businesses with technical solutions, such as Tableau and Domo dashboards. Pursuing data science to assist stakeholders with making more effective business decisions through predictive analytics.
B.Eng in Mechanical Engineering and currently a Master of Data Science Student at University of British Columbia’s Okanagan Campus. After almost five years of working as a construction engineer and seeing the improvements in efficiency achieved as he learned more about Excel, Connor decided to return to school in order to learn more appropriate methods of analyzing data.
B.Eng in Civil Engineering from the University of Victoria with 16 months of work experience in the industry. With an interest in data analytics and coding, Claudia was drawn to the Data Science Program at UBC. Claudia was surprised with her interest in the Machine Learning aspect of the program and hopes to continue learning more in this field.
B.Sc. in Mineral Engineering and currently a student enrolled in University of British Columbia Okanagan’s Master of Data Science program. After over a decade of working as a mine engineer in a variety of roles in both underground potash and open-pit copper, Alex decided to kick off the steel-toed boots and return to school to plumb the depths of his ignorance about statistics, probability, and data analysis tools that don’t start with “Ex” and end with “cel”. He’s still looking for the bottom.