Skip to content
Home » Paper released: synthetic building operation dataset

Paper released: synthetic building operation dataset

This new paper presents a synthetic building operation dataset which includes HVAC, lighting, miscellaneous electric loads (MELs) system operating conditions, occupant counts, environmental parameters, end-use and whole-building energy consumptions at 10-minute intervals.

The building sector accounts for over 30% of the final energy consumption and emits about one-third of the greenhouse gas (GHG) emissions worldwide according to several studies. Residential and commercial buildings consume about 60% of the electricity globally.

Improving building energy efficiency becomes essential to meet energy savings and carbon emission reduction goals. As electrification progresses, there is an ongoing trend to replace traditional fossil fuel with renewable power generation sources. The European Union has a plan to reach renewable power generation of least 20% of the energy demand by 2020, and 32% by 2030.

In the United States, the renewable target is to reach 14% by 2025 and 30% by 20305. The growing penetration of renewables requires buildings to be flexible so that the supply and demand can be balanced. Under this circumstance, Grid-Interactive Efficient Buildings (GEB) has become a hot research topic in recent years. Improving buildings’ energy efficiency and flexibility while maintaining good quality of building services and indoor environmental quality is of core interest in the building science domain.

Building energy models provide critical support to researchers aiming for the aforementioned goals. In general, the models can be classified into

  1. physics-based (white-box) models, which simulate the building physics with detailed building and system characteristics and operation schedules;
  2. reduced order (grey-box) models, which represent building physics with simplified equations identified with building operation data or by human expertise;
  3. data-driven (black-box) models, which utilise contextual, environmental, or energy features with statistical or machine learning techniques to predict future energy and/or environmental trends in buildings. Those models have been used in different phases of building lifecycles. For example, physics-based whole building energy simulations have been widely used in the building design phase to assist building energy code compliance. Predictive building controls using physics-based models, grey-box models, and data-driven models are proposed and implemented during the operation phase. Those models are also widely used for fault detection and diagnostics in the operation phase.

Regardless of the modelling approaches, a comprehensive building operation dataset is valuable. For the physics-based models, the system-level or end-use level information, and the time-series data can help improve model assumptions and calibrations. For grey-box and data-driven approaches, such a comprehensive dataset is critical for training reliable models.

As of now, there are numerous efforts in either collecting data from measurements or synthesising data with simulations. However, each of the dataset has its strengths and limitations. For instance, the Building Data Genome Project 2 dataset is a collection of whole building electrical, heating and cooling, water, steam and solar meters, and on-site weather data for over 1 600 non-residential buildings. However, it doesn’t provide more granular information about the system and thermal zones.

CU-BEMS provides system-level sub-metering of electricity consumptions, and zone-level indoor environmental measurements. But it doesn’t contain system-level operation data and data only spans for over a year. Other common limitations of the existing datasets include the lack of clear metadata that describes the building systems and meter structures, and occupancy information at high spatial and temporal resolutions.

To read the full paper presented by Berkeley Lab researchers, it is available here.