Abstract

Commercial buildings generally have large thermal inertia and thus can provide services to power grids (e.g., demand response (DR)) by modulating their heating, ventilation, and air conditioning (HVAC) systems. Shifting consumption on timescales of minutes to an hour can be accomplished through temperature setpoint adjustments that affect HVAC fan consumption. Estimating the counterfactual baseline power consumption of HVAC fans is challenging but is critical for assessing the capacity and participation of DR from HVAC fans in grid-interactive efficient buildings (GEBs). DR baseline methods have been developed for whole-building power profiles. This study evaluates those methods on total HVAC fan power profiles, which have different characteristics than whole-building power profiles. Specifically, we assess averaging methods (e.g., Y-day average, HighXofY, and MidXofY, with and without additive adjustments), which are the most commonly used in practice, and a least squares-based linear interpolation method recently developed for baselining HVAC fan power. We use empirical submetering data from HVAC fans in three University of Michigan buildings in our assessment. We find that the linear interpolation method has a low bias and by far the highest accuracy, indicating that it is potentially the most effective existing baseline method for quantifying the effects of short-term load shifting of HVAC fans. Overall, our results provide new insights on the applicability of existing DR baseline methods to baselining fan power and enable the more widespread contribution of GEBs to DR and other grid services.

1 Introduction

The U.S. Federal Energy Regulatory Commission (FERC) defines demand response (DR) as “changes in electric usage by demand-side resources from their normal consumption patterns in response to changes in the price of electricity over time or to incentive payments designed to induce lower electricity use at times of high wholesale market prices or when system reliability is jeopardized” [1]. DR is one of the most flexible and effective solutions to reduce power system investment and operation costs and to displace generation and network reinforcement [2]. It is also capable of mitigating the impacts of renewable energy fluctuations and enhancing system reliability during periods of high demand [3].

Commercial buildings account for roughly 20% of the energy consumed in the United States [4]. They are well suited for DR as they generally have high thermal inertia that can be utilized as an energy reservoir for short periods, without negatively impacting the comfort of occupants [5]. Within a building, some equipment is more responsive and has a higher DR potential, while other equipment is less responsive or even nonresponsive. Heating, ventilation, and air conditioning (HVAC) systems represent the bulk of power consumption in most commercial buildings. They are also becoming increasingly controllable with relatively sophisticated control and communications architectures, especially in grid-interactive efficient buildings (GEBs) [6,7]. Thus, HVAC systems are one of the largest DR resources [8]. Among many DR strategies that modulate HVAC power consumption, directly or indirectly controlling HVAC supply and return fans is a key source of flexibility enabling the provision of grid services [7,9]. For example, Ref. [10] introduces a feed-forward architecture to control the fans in commercial building HVAC systems to provide frequency regulation to the grid.

The objective of this article is to evaluate the performance (e.g., accuracy and bias) of existing DR baseline estimation methods in quantifying the effects of short-term (i.e., timescales of minutes to an hour) load shifting of commercial building HVAC fan power using empirical data from real buildings. We apply the methods to estimate the baseline power on days without DR events and then compare the estimated baseline with the actual measurements, i.e., the true baseline on days without DR. The evaluation is focused on simple, interpretable, and DR participant-friendly baseline methods that are commonly used or satisfy the practical requirements of electric utilities and independent system operators (ISOs) [1113].

DR baseline estimation seeks to estimate the counterfactual power profile that would have occurred without DR. Accurate baseline estimation methods are critical for assessing the performance of both individual GEBs and DR programs. Baselines are also needed for determining compensation to GEBs participating in DR programs or electricity markets, providing utilities and ISOs with a prediction of how much GEB flexibility was or is expected, and for a variety of grid operational and planning problems [14,15]. It is challenging to measure or calculate what would have occurred without DR and thus, fundamentally, baselines are imperfect [16]. In particular, for smaller devices and more granular end-uses with irregular or unpredictable power consumption, establishing a robust and accurate baseline can be difficult [1719]. When a load is dependent on consumer behavior, it is also typically more difficult to establish a baseline [20].

We note that baseline estimation is different than electric load forecasting. The latter forecasts what the load will be in the future. In contrast, DR baseline estimation predicts what the load would have been without DR in the past or future, where prediction of the past can leverage a posteriori data. Similar building models could be used for both tasks; however, the choice of model/method to use is always a trade-off between accuracy and simplicity, and DR applications tend to favor simplicity. Specifically, DR baseline methods should also be simple enough for all stakeholders (including electricity customers participating in DR programs, referred to as DR participants) to understand, calculate, and implement [2123]. Therefore, in practice, much simpler methods are used for DR baseline estimation than load forecasting.

A variety of DR baseline methods have been proposed, generally based on whole-building electric load (power) profiles. Those baseline methods can be classified into four categories: averaging methods, regression methods, control group methods, and machine learning methods. Among them, averaging methods are the most commonly used by electric utilities and ISOs [12,21,22,24]. Averaging methods use the average load of several days selected from recent days without DR events to estimate the baseline [25,26]. Averaging methods typically also incorporate a multiplicative or additive adjustment [25]. Their application and performance highly depend on the availability of recent days without DR events, which can be very limited if the DR resource is frequently actuated to provide DR services.

It may be possible to establish more accurate DR baseline estimates if, rather than using whole-building electric load data, we use sub-metered load data from the equipment or devices providing DR [14]. For instance, for short-term load shifting via room temperature setpoint control, we expect a response primarily from the supply and return fans and secondarily from the chiller(s) [6,10]. Not only does submetering have the potential to improve baselining to provide more accurate DR performance estimation and grid service delivery verification but it also allows us to attain a more granular understanding of and insights into the impact of DR actions [14].

In this article, we evaluate the ability of existing DR baseline methods to estimate total HVAC fan power baselines. Since these methods were designed for baselining whole-building electric load profiles, their applicability for baselining the total HVAC fan power is unknown. Table 1 lists the baseline methods selected for the evaluation. In addition to averaging methods, we investigate a least squares-based linear interpolation method recently developed for baselining HVAC fan power data [6]. All of these methods are commonly used or are adequately simple and interpretable to be used by utilities and ISOs. More sophisticated methods (e.g., regression methods, control group methods, and machine learning methods) are not included in the evaluation, as they are typically not used in practice due to their much higher data requirements, excessive complexity, lower interpretability, and only marginal gains in accuracy [14].

Table 1

Baseline methods evaluated in this work

Method typeMethod examplesExamples of application in practice
Y-day average5-day averageISO New England [11]
10-day averageCalifornia ISO [12]
HighXofYHigh4of5PJM Interconnection [13]
High5of10New York ISO [23]
MidXofYMid4of6Electric Reliability Council of Texas [13,21]
LowXofYLow4of5Unknown (proposed in Ref. [22], used in Refs. [24,27,28])
Low5of10Unknown (proposed in Ref. [22], used in Refs. [28,29])
NearestXofYNearest3of6Unknown (proposed in Ref. [30], used in Ref. [31])
Nearest5of10Unknown (proposed in Ref. [30])
Linear interpolationLinear interpolationUnknown (proposed in Ref. [6], used in Refs. [3034])
Method typeMethod examplesExamples of application in practice
Y-day average5-day averageISO New England [11]
10-day averageCalifornia ISO [12]
HighXofYHigh4of5PJM Interconnection [13]
High5of10New York ISO [23]
MidXofYMid4of6Electric Reliability Council of Texas [13,21]
LowXofYLow4of5Unknown (proposed in Ref. [22], used in Refs. [24,27,28])
Low5of10Unknown (proposed in Ref. [22], used in Refs. [28,29])
NearestXofYNearest3of6Unknown (proposed in Ref. [30], used in Ref. [31])
Nearest5of10Unknown (proposed in Ref. [30])
Linear interpolationLinear interpolationUnknown (proposed in Ref. [6], used in Refs. [3034])

The contributions of this article are threefold: (1) We evaluate how existing baseline methods developed for baselining whole-building power consumption translate to baselining HVAC fan power consumption; (2) we apply selected baseline methods to empirical data from three University of Michigan buildings to quantitatively assess their performance; and (3) we find that the linear interpolation method has a low bias and by far the highest accuracy among the methods tested, making it potentially the most effective existing DR baseline method for quantifying the effects of short-term load shifting of HVAC fans. Our evaluation serves as one of the first systematic assessments of the applicability of existing DR baseline methods in baselining HVAC fan power. Note that compared with our technical report [30], this study contributes novel results based on a refined selection of evaluation metrics, improved and more practical implementation of adjustments for averaging methods, and new analyses and discussions. The results inform better baselining and in turn can enhance the implementation, financial settlement, and benefits realization of HVAC-based DR from GEBs.

The remainder of this article is organized as follows. Section 2 describes how HVAC fans can be used for DR. Section 3 introduces the baseline methods evaluated in this study. Section 4 introduces the evaluation methodology. Section 5 describes the data used in our evaluation. Section 6 presents the numerical results of the evaluation. Section 7 concludes this article and discusses the future work.

2 Demand Response From HVAC Fans

Modulating the power consumption of HVAC fans in GEBs is a key source of flexibility for grid services [7,9,10]. HVAC fans can respond quickly, providing high-value fast DR, which can help accommodate more renewable energy sources in the grid. Figure 1 shows an example of our recent experiments using temperature setpoint control to attain short-term load shifting of HVAC fans [32,33]. The morning experiment is called an up-down test. During the response window, room temperature setpoints are decreased below nominal values for 30 min and then increased symmetrically above nominal values for 30 min, causing HVAC fan power to go up and then go down. The fans return to normal operation after a settling window. The afternoon experiment is a down-up test with opposite setpoint changes. In this building, we have sub-metered all supply and return fans in the HVAC system. As shown in Fig. 1, the response is clearly identifiable from the fan power data. In contrast, it can be far less obvious with only whole-building electric load data.

Fig. 1
Total HVAC fan power profile on a DR experiment day (Sep. 26, 2017) at the Rackham Building, University of Michigan
Fig. 1
Total HVAC fan power profile on a DR experiment day (Sep. 26, 2017) at the Rackham Building, University of Michigan
Close modal

To assess the impact of the load shifting strategy, fan power baselines, i.e., counterfactual fan power profiles that would have occurred without DR events, need to be estimated. In Fig. 1, we have included a linear baseline created through linear interpolation (see Sec. 3.8 for details), which does not aim to accurately capture the unknown time-series but aims to capture average changes in fan power consumption during the response and settling windows (specifically, here, we are interested in the average change in power in the first 30 min of the response window, second 30 min of the response window, and in the 60-min settling window). Figure 1 also helps us to highlight how baseline estimation is different from electric load forecasting. Load forecasting is based on predictions of explanatory variables and aims to produce accurate time-series forecasts of load profiles. In contrast, baseline estimation methods used to predict the past can leverage a posteriori knowledge, e.g., here we use data from before the response window and after the settling window to develop our linear interpolation. Furthermore, to assess the capacity, participation, performance, and financial rewards of GEBs providing grid services, which are the primary uses of DR baseline models [9,14], in most cases, we do not necessarily need accurate time-series predictions (i.e., to capture detailed load dynamics), but rather we need accurate predictions of average load over time windows.

As mentioned in Sec. 1, in this article, we consider only simple and interpretable DR baseline methods. Here, we provide more justification for why we have excluded alternative methods. Although regression methods [35] are used by some entities to baseline whole-building electric load, in our previous study, we found that HVAC fan power and outdoor temperature (the most commonly used regressor in estimating whole-building electric load baselines) have a low correlation (0.28 on average) [30]. Therefore, the applicability of regression methods for baselining HVAC fan power is greatly limited. Control group and machine learning methods are typically not used in practice as they can be complicated, less interpretable, and/or require a large volume of data. Control group methods estimate the baseline using load data of nonresponsive buildings with the most similar load patterns to DR participants [36]. However, it is challenging to match or cluster similar buildings since there is generally a large amount of variability in HVAC equipment. Machine learning methods find the relationship between the load and its related factors by training black-box models (e.g., neural networks [37]), which are difficult to interpret and explain to DR participants. Moreover, these more sophisticated methods only attain marginal gains in accuracy according to some ISOs [14]. Physics-based HVAC fan power models could also be used to generate DR baseline estimates, but they require significant data and effort for model calibration. Such models are generally more suitable for building analysis and load forecasting than DR baseline estimation.

To the best of our knowledge, there are only three methods in the literature specifically designed to baseline HVAC fan power. One is the linear interpolation method proposed in Ref. [6], illustrated in Fig. 1, and included in our evaluation. Another is the tensor completion based method developed in Ref. [31]. It estimates baselines by finding dominant fan power patterns hidden in high-dimensional data and is classified as a machine learning method. We do not include it in our evaluation for the reasons mentioned earlier. The third is the signal bandwidth separation method proposed in Ref. [10]. It estimates the fan power baseline using a low-pass filter on load data from DR events. This method is applicable to settings in which DR signals vary much faster than the baseline load. This method is not evaluated here either as our experimental data correspond to load shifting on timescales of minutes to an hour, i.e., timescales that overlap with the timescales of baseline load variation. In contrast, Ref. [10] use it for baselining fan power in GEBs participating in ancillary services by following DR signals that vary every 2–10 s.

We also note that for some DR programs ex-post baselines are not necessary because buildings are asked to self-schedule their baseline and deliver services around that baseline. However, this makes the job of the building harder as it not only needs to perform DR but also compensate for baseline forecast error. Self-scheduled baselines are less common in traditional DR programs than in emerging DR programs like loads providing frequency regulation.

3 Baseline Methods

Before describing the baseline methods, we define DR days as the days when DR events occur and other days as baseline days. Note that weekdays and weekends normally have different load patterns. Our current data set only includes a limited number of weekends. Therefore, we only consider weekdays in this work. Nevertheless, the methods for weekday baseline estimation also apply to weekends, and the same process of baseline method performance evaluation could be conducted on load data from weekends.

In this section, we first present the generic form of averaging baseline methods and then explain how it varies in different averaging methods. After that, we introduce an additive adjustment method that can be applied to the averaging methods. Finally, we present the linear interpolation baseline method. The nomenclature used in this section is partly based on Ref. [22], but with modifications for clarity.

3.1 Generic Form of Averaging Methods.

Let i be the index for GEBs and d be the index for days. According to the sampling time of the power data, we divide a day into a set of time-steps T={1,2,,|T|}, and let t be the index for time-steps. We define the actual load and estimated baseline load (i.e., the total HVAC fan power) of GEB i on day d at time-step t as pi(d, t) and p^i(d,t), respectively.

Averaging methods estimate the baseline using the data from some selected recent baseline days preceding the DR day. Let D(d) be the set of baseline days selected by an averaging method to baseline the DR event on day d. Let d′ be the index of baseline days in the set D(d). The baseline of GEB i on day d at time-step t, i.e., p^i(d,t), is then obtained by taking the mean of the same GEB’s load at time-step t among the selected days:
(1)
Although different averaging methods select the set of baseline days D(d) in different manners, Eq. (1) is the generic form of their baseline estimates. Next, we describe how the set D(d) varies for different averaging baseline methods.

3.2 Y-Day Average Method.

For the Y-day average method [25], the set D(d) consists of the Y most recent baseline days preceding day d with the same day type (e.g., day of week, or weekend/weekday) as day d. We denote this set of days by R(Y,d), i.e., D(d)=R(Y,d). In this study, we use weekday/weekend day types and evaluate the 5-day average method (i.e., Y = 5) and the 10-day average method (i.e., Y = 10), which are used by ISO New England and California ISO, respectively [11,12].

3.3 HighXofY Average Method.

The HighXofY average method averages the load of the X days that have the highest daily electricity consumption among the Y most recent baseline days of the same day type as the DR day. We denote this set of days by H(X,Y,d), i.e., D(d)=H(X,Y,d). Let d″ be the index of baseline days in the set R(Y,d) but not in the set H(X,Y,d), which is determined by the following conditions:

  • H(X,Y,d)R(Y,d);

  • |H(X,Y,d)|=X; and

  • pi(d′) ≥ pi(d″) for any dH(X,Y,d) and dR(Y,d)H(X,Y,d), where pi(d) is the daily total load of GEB i on day d, specifically, pi(d)=tTpi(d,t).

The first condition requires that the method select days from the set R(Y,d), i.e., the Y most recent baseline days preceding day d with the same day type as day d. The second condition requires that X days are selected. The third condition requires that, among the days in R(Y,d), any selected day has a daily total load greater than or equal to that of any unselected day. In this article, we evaluate the High4of5 baseline method and the High5of10 baseline method used by PJM Interconnection and New York ISO, respectively [13,23].

Note that this method is useful for baselining peak days with high electricity consumption. We do not expect it to work well here, since we use it to baseline both peak days and off-peak days, which is required for assessing DR used for grid services that may be needed at any time. We expect this method to produce positively biased baselines and so we explore the impact of an additive adjustment, described in Sec. 3.7, which can reduce bias.

3.4 LowXofY Average Method.

The LowXofY average method averages the load of the X days that have the lowest daily electricity consumption among the Y most recent baseline days of the same day type as the DR day. We denote this set of days by L(X,Y,d), i.e., D(d)=L(X,Y,d), which is determined by the following conditions:

  • L(X,Y,d)R(Y,d);

  • |L(X,Y,d)|=X; and

  • pi(d′) ≤ pi(d″) for any dL(X,Y,d) and dR(Y,d)L(X,Y,d).

Similar to the HighXofY average method, the first and second conditions require that X days are selected from the set R(Y,d). The third condition requires that, among the days in R(Y,d), any selected day has a daily total load less than or equal to that of any unselected day. In this article, we evaluate the Low4of5 and Low5of10 baseline methods, which are proposed in Ref. [22] and used in many articles such as Refs. [24,2729].

This method is useful for baselining days with low electricity consumption and, again, we do not expect it to work well here. Reference [22] reports that it produces negatively biased baselines, but that the baselines can also have high accuracy. Again, we explore the impact of an additive adjustment, described in Sec. 3.7, to reduce bias.

3.5 MidXofY Average Method.

The MidXofY average method is used by the Electric Reliability Council of Texas [21]. It averages the load of the X days that have middling levels of daily electricity consumption among the Y most recent baseline days of the same day type as the DR day. We denote this set of days by M(X,Y,d), i.e., D(d)=M(X,Y,d), which is determined based on the following conditions:

  • M(X,Y,d)R(Y,d);

  • |M(X,Y,d)|=X; and

  • M(X,Y,d)=R(Y,d){H(Z,Y,d)L(Z,Y,d)}, where Z = (YX)/2 and (YX)mod2=0.

The first and second conditions again require that X days are selected from the set R(Y,d). The third condition requires that the Z days with the lowest electricity consumption and the Z days with the highest electricity consumption are dropped, retaining X days with the middling levels of electricity consumption. In this article, we evaluate the Mid4of6 baseline method [13].

3.6 NearestXofY Average Method.

We also evaluate the NearestXofY average method proposed in our technical report [30] and used in Ref. [31]. It averages the load of the X days among the Y most recent baseline days of the same day type as the DR day that have load profiles outside of the DR event window nearest to that of the DR day. The DR event window includes the response window and a settling window, as shown in Fig. 1. We denote this set of days by N(X,Y,d), i.e., D(d)=N(X,Y,d), which is determined based on the following conditions:

  • N(X,Y,d)R(Y,d);

  • |N(X,Y,d)|=X; and

  • |tTTiDR(d)[pi(d,t)pi(d,t)]||tTTiDR(d)[pi(d,t)pi(d,t)]| for any dN(X,Y,d) and dR(Y,d)N(X,Y,d), where TiDR(d) is the set of time-steps within the DR event window (including the response window and settling window) for GEB i on day d.

Again, the first and second conditions require that X days are selected from the set of baseline days R(Y,d). The third condition requires that the electricity consumption over the entire DR day except for the DR event window is closer to that of selected baseline days than to that of unselected baseline days. In this article, we test the Nearest3of6 and Nearest5of10 baseline methods.

3.7 Adjustment Method.

Averaging methods are easy to understand and implement for both utilities and DR participants, but can have large errors [26]. Their performance highly depends on the similarity of power profiles between the DR day and the selected baseline days. However, conditions on DR and baseline days can be very different. Therefore, adjustments (including additive and multiplicative adjustments) based on the DR day data are frequently applied to improve accuracy and reduce bias. Additive adjustments add or subtract a fixed load to the estimated baseline load at each time-step, while multiplicative adjustments multiply the estimated baseline load at each time-step by a fixed amount, such that the adjusted baseline is equal to the observed load on average during a time window shortly before the start of the DR event, referred to as an adjustment window. Additive adjustments are generally preferred to multiplicative adjustments, as baselines can become volatile under multiplicative adjustments [38]. Note that by using an adjustment, we assume that the GEB does not take anticipatory actions (e.g., building pre-cooling) during the adjustment window.

In this article, we test an additive adjustment defined as the average difference between the actual load and the estimated baseline load during the adjustment window. Specifically, the adjustment factor is expressed as follows:
(2)
where TiAD(d) is the set of time-steps within the adjustment window for GEB i on DR day d. In this study, we use the 2-h period directly before the DR event as the adjustment window, as in Refs. [24,25]. None of the buildings take anticipatory actions before the DR event. To evaluate the effectiveness of the adjustment method, we compare the overall performance of the averaging methods with and without the additive adjustment.

3.8 Linear Interpolation Method.

The linear interpolation method was first proposed in Ref. [6] and then used and improved in Refs. [3034]. It estimates the baseline by a simple linear interpolation on the fan power data within short-time windows immediately before and after the DR event window, i.e., before the DR event starts and after the fans settle back to their normal operation. The baseline of GEB i on day d at time-step t is
(3)
where ai(d) and bi(d) are scalar constants. To obtain ai(d) and bi(d), we use least squares to fit the 1-min interval load data from the 5-min period just before the DR event and the 5-min period immediately after the settling time [33]. See Fig. 2 for an example. Since the linear interpolation method uses some data from immediately after the DR event window, it is applicable to ex-post analyses, but not to look-ahead analyses requiring forecasts. In line with the objectives of baseline estimation described in Sec. 2, the linear interpolation method does not aim to produce accurate time-series predictions, but instead to generate estimates that are accurate on average over DR event windows. In the previous study, this baseline method seemed to perform well; this article evaluates it on larger data sets and compares it with the other baseline methods.
Fig. 2
An example of the linear interpolation baseline method, illustrated on total HVAC fan power data from Weill Hall (WH) on a baseline day, assuming a DR event window from 9:00 a.m. to 11:00 a.m. The method uses data points labeled “actual load” to generate the linear interpolation.
Fig. 2
An example of the linear interpolation baseline method, illustrated on total HVAC fan power data from Weill Hall (WH) on a baseline day, assuming a DR event window from 9:00 a.m. to 11:00 a.m. The method uses data points labeled “actual load” to generate the linear interpolation.
Close modal

4 Performance Evaluation Methodology

In this section, we introduce the methodology used to evaluate the performance of the selected baseline methods. We first explain the evaluation process. After that, the metrics for assessing the baseline methods are introduced. Finally, we discuss the limitations of our evaluation methodology.

4.1 Evaluation Process.

We evaluate the baseline methods assuming two different DR event windows, i.e., 9:00 a.m. to 11:00 a.m. (referred to as the morning event window) and 13:00 p.m. to 15:00 p.m. (referred to as the afternoon event window), which correspond to the times of our DR events on DR days [32,33,39]. Specifically, on a DR day, we conducted two short-term load shifting DR experiments each lasting for 1 h, i.e., 9:00 a.m. to 10:00 a.m. and 13:00 p.m. to 14:00 p.m.

We use 1-min interval data corresponding to the total HVAC fan power on baseline days to evaluate the baseline methods. That is, we apply the methods to estimate the baseline fan power on days without DR events (i.e., baseline days). If a baseline method is perfectly accurate, the estimated fan power should be exactly the same as the measured fan power data on baseline days. By comparing the estimated baseline with the measured fan power data (i.e., the true baseline), we can calculate and evaluate the baseline method error.

A rolling origin blocked cross validation process is conducted to evaluate each averaging method on each data set [31,40,41]. For example, to evaluate the 5-day averaging method on a data set with a total of 16 baseline days that are chronologically ordered, we first use the data of days 1–5 to estimate the baseline of day 6 and then use the data of days 2–6 to estimate the baseline of day 7. This process goes on until we reach the last run using the data of days 11–15 to estimate the baseline of day 16. The linear interpolation method does not require a cross validation paradigm. It is run directly on each day of a data set. In each run of a baseline method, evaluation metrics are calculated and used for the statistical analysis of the method’s performance on the data set.

4.2 Evaluation Metrics.

The performance of the evaluated baseline methods is quantified by the following metrics.

4.2.1 Coefficient of Variation.

The coefficient of variation (CV) is used to evaluate the accuracy of baseline estimates. It is defined as the ratio of the standard deviation of estimation errors to the mean of the true values. For a baseline method applied to a DR event window of GEB i on day d, the CV is expressed as follows:
(4)
which is the root mean squared error normalized by the mean of the true values and also referred to as the normalized root mean square error. Lower CV values indicate more accurate baseline estimates. We calculate the CV for each event window on each baseline day considered in the evaluation data set and then take the mean across all baseline days to obtain the average CV for the morning event window and the average CV for the afternoon event window. The average CVs are then used to indicate the accuracy of a baseline method on that data set.

4.2.2 Normalized Mean Bias Error.

The normalized mean bias error (NMBE) is used to evaluate the bias of baseline estimates. It is defined as the mean bias error normalized by the mean of the true values. For a baseline method applied to a DR event window of GEB i on day d, the NMBE is expressed as follows:
(5)
The average NMBEs associated with the morning and afternoon event windows are calculated by taking the mean of NMBEs over all baseline days in the evaluation data set and are used to indicate the bias of a baseline method on that data set. A positive NMBE indicates overestimation of the baseline load, while a negative NMBE indicates underestimation. When the NMBE is close to zero and the CV is larger than zero, the baseline method sometimes overestimates and sometimes underestimates the baseline, but overall the overestimations and underestimations balance each other out. As described in Sec. 2, most applications of DR baseline estimation require accurate predictions of average load over time windows rather than accurate predictions of dynamic time series. Therefore, bias is more important than the accuracy, specifically, the closer the NMBE is to zero, the better the baseline method.
In Sec. 6, in addition to providing the average CV and average NMBE of each baseline method on each data set, we also report the 95% confidence intervals for each metric. The 95% confidence intervals of the CV and NMBE for a morning or afternoon event window for GEB i are expressed as follows:
where means and standard deviations (std) are taken over the CVs or NMBEs of all baseline days in the data set that the baseline method is tested on and Ni is the number of those days [31,40,41]. We also present box plots of the metrics enabling visualization of the error performance statistics of each method.

4.3 Limitations of the Evaluation Process.

Our error evaluation process may underestimate the true baseline method error because we have assumed that we know the length of the settling window exactly. Specifically, we assume the HVAC system settles back to its baseline operation an hour after the DR event and use that time frame for our error assessment. However, in practice, the settling time is uncertain and unknown. Therefore, our assumption will affect the accuracy of our error assessment. Moreover, the practical implementation of some baseline methods including the linear interpolation method depends on an estimate of the settling time, which introduces additional error. Our error assessment does not capture that. We leave the estimation of the settling time and how its error impacts the baseline error for future investigation.

5 Data

We have installed current sensors in three buildings on the University of Michigan campus to submeter HVAC supply and return fans. The three buildings are the Bob and Betty Beyster Building (BBB), the Rackham Building (RAC), and the Weill Hall (WH). BBB is a 104,132 ft2 classroom/office building constructed in 2005; RAC is a 157,957 ft2 office/auditorium building constructed in 1938; and WH is a 97,989 ft2 classroom/office building constructed in 2006 [32,33]. All three buildings have the single-duct variable air volume HVAC systems.

We use data from the summers of 2017 and 2018, specifically, minutely single-phase current of each HVAC system fan in each building. As the voltages and power factors generally have only small variations when the buildings are occupied, we assume constant power factors (0.95 for supply fans and 0.99 for return fans) and voltage (275.8 volt), which were determined using 1 week of measured voltage and power factor data, and use these values to estimate the three-phase fan power [32,33]. The data are separated into five data sets corresponding to five building-years, i.e., BBB-2017, RAC- 2017, WH-2017, BBB-2018, and RAC-2018. Table 2 summarizes the five data sets used in our evaluation. This study evaluates the baseline methods on each data set separately. In the future work, we aim to evaluate how utilizing multiple years of building data and data from other buildings with similar characteristics could improve baseline estimation.

Table 2

Summary of the HVAC fan power data sets

# of fans (SF: supply fan; RF: return fan.)# of baseline daysTotal fan power in occupied mode (kW)
PeakAverage
BBB-20171 SF, 1 RF55 (in June to October)35.812.2
BBB-20184 SFs, 3 RFs16 (in October)105.338.3
RAC-20174 SFs, 4 RFs49 (in July to October)63.618.7
RAC-20184 SFs, 4 RFs30 (in May to October)63.024.7
WH-20172 SFs, 2 RFs86 (in June to October)125.945.7
# of fans (SF: supply fan; RF: return fan.)# of baseline daysTotal fan power in occupied mode (kW)
PeakAverage
BBB-20171 SF, 1 RF55 (in June to October)35.812.2
BBB-20184 SFs, 3 RFs16 (in October)105.338.3
RAC-20174 SFs, 4 RFs49 (in July to October)63.618.7
RAC-20184 SFs, 4 RFs30 (in May to October)63.024.7
WH-20172 SFs, 2 RFs86 (in June to October)125.945.7

6 Results and Discussion

In this section, numerical results quantifying the performance of the baseline methods are presented and discussed. The baseline methods are evaluated on the data of each building-year.

6.1 Overall Performance.

Figure 3 compares the overall performance of the averaging and linear interpolation (Lin. intrpl.) methods. The left plot shows the results of the averaging methods without the additive adjustment, and the right plot shows the results with the additive adjustment. The CV is the mean of ten average CV values corresponding to five building-years and two DR event windows, and likewise for the NMBE. That is, although the evaluation data sets of different building-years have different numbers of days that a baseline method is tested on, the average CV or NMBE for each building-year has the same weight in assessing the method’s overall performance. Table 3 lists the number of days that each baseline method is tested on for each data set. The numbers (i.e., Ni) in Table 3 are also used to calculate the confidence intervals presented in Sec. 6.2.

Fig. 3
Overall performance comparison: Left, averaging method results without the additive adjustments; Right: average method results with the additive adjustments. Note that the scales are different.
Fig. 3
Overall performance comparison: Left, averaging method results without the additive adjustments; Right: average method results with the additive adjustments. Note that the scales are different.
Close modal
Table 3

The number of days each baseline method is tested on for each data set (Ni)

Building-year
Baseline methodBBB-2017RAC-2017WH-2017BBB-2018RAC-2018
5-day average5044811125
10-day average453976620
High4of55044811125
High5of10453976620
Mid4of64943801024
Low4of55044811125
Low5of10453976620
Nearest3of64943801024
Nearest5of10453976620
Linear interpolation5549861630
Building-year
Baseline methodBBB-2017RAC-2017WH-2017BBB-2018RAC-2018
5-day average5044811125
10-day average453976620
High4of55044811125
High5of10453976620
Mid4of64943801024
Low4of55044811125
Low5of10453976620
Nearest3of64943801024
Nearest5of10453976620
Linear interpolation5549861630

Figure 3 shows that the additive adjustment greatly improves the performance of the averaging baseline methods. Specifically, the CV and NMBE of all averaging methods are decreased, except the NMBE of the Nearest5of10 average method, which is slightly increased from 0.86% to 1.36%. The figure also shows that the Nearest3of6 and Nearest5of10 average methods perform well in selecting the baseline days used to compute the averages. Specifically, from the left plot (without the additive adjustment), we can see that they have the smallest CV values (i.e., the highest accuracy) and low NMBE values (i.e., small bias) as compared to the other averaging methods. However, from the right plot, we see that they are no longer the best averaging methods when the adjustment is applied. In the future, it will be worth investigating how the NearestXofY average method can be appropriately adjusted and improved.

In general, the linear interpolation method is the most accurate (i.e., it has the lowest CV). The Low4of5 and 5-day average methods with additive adjustments and the linear interpolation method are the best in terms of NMBE, which, as explained earlier, is a better metric than CV for most DR applications. According to guidelines by the American Society of Heating, Refrigeration and Air Conditioning Engineers (ASHRAE) [42], it is much more difficult to achieve a lower CV than a lower NMBE. Therefore, the linear interpolation method generally has the best performance in our evaluation. Furthermore, as will be shown later in this section, when tested on different building-years and DR event windows, the linear interpolation method always attains the smallest confidence interval for the average NMBE, indicating that it is the most stable method in terms of bias variability. However, we note that these results may be specific to our setting, in particular, we assume short (i.e., 2 h) DR event windows. The relative performance of the linear interpolation method is likely a function of the event window duration and more bias may be present for longer windows.

6.2 Different Building-Years and Demand Response Event Windows.

In Tables 47, we report the average values and 95% confidence intervals associated with CV and NMBE for each baseline method (averaging methods were implemented with the additive adjustment), building-year, and DR event window. In each column of Tables 47, the smallest average value and smallest confidence interval are given in bold. The smallest average CV (respectively, the smallest average NMBE) corresponds to the method with the highest accuracy (respectively, the lowest bias) on average, and the smallest confidence interval corresponds to the method with the most consistent accuracy or bias across different days in the data set.

Table 4

Average value ± 95% confidence interval of CV (%) – morning event window

Building-year
Baseline methodBBB-2017RAC-2017WH-2017BBB-2018RAC-2018
5-day average21.42 ± 2.3213.31 ± 3.2420.50 ± 3.3413.97 ± 3.3220.24 ± 7.04
10-day average21.48 ± 2.5013.36 ± 2.9519.74 ± 3.0316.00 ± 4.3118.56 ± 7.51
High4of521.99 ± 2.3114.19 ± 3.8521.47 ± 3.4714.66 ± 4.0521.73 ± 7.23
High5of1023.41 ± 2.6018.14 ± 4.1222.78 ± 3.4116.86 ± 6.0217.61 ± 6.37
Mid4of622.39 ± 2.5113.66 ± 3.0920.30 ± 3.0415.07 ± 3.7320.11 ± 7.57
Low4of522.18 ± 2.4213.06 ± 2.9919.89 ± 3.0814.29 ± 3.2718.86 ± 6.76
Low5of1022.68 ± 2.8012.39 ± 2.8718.88 ± 2.9815.96 ± 3.2920.82 ± 9.02
Nearest3of622.23 ± 2.2712.39 ± 2.7719.45 ± 3.5016.26 ± 4.5919.27 ± 7.52
Nearest5of1021.30 ± 2.1311.98 ± 2.9217.61 ± 2.8315.72 ± 6.4218.34 ± 7.78
Linear interpolation14.47 ± 0.485.95 ± 1.3312.60 ± 1.439.68 ± 0.696.48 ± 3.64
Building-year
Baseline methodBBB-2017RAC-2017WH-2017BBB-2018RAC-2018
5-day average21.42 ± 2.3213.31 ± 3.2420.50 ± 3.3413.97 ± 3.3220.24 ± 7.04
10-day average21.48 ± 2.5013.36 ± 2.9519.74 ± 3.0316.00 ± 4.3118.56 ± 7.51
High4of521.99 ± 2.3114.19 ± 3.8521.47 ± 3.4714.66 ± 4.0521.73 ± 7.23
High5of1023.41 ± 2.6018.14 ± 4.1222.78 ± 3.4116.86 ± 6.0217.61 ± 6.37
Mid4of622.39 ± 2.5113.66 ± 3.0920.30 ± 3.0415.07 ± 3.7320.11 ± 7.57
Low4of522.18 ± 2.4213.06 ± 2.9919.89 ± 3.0814.29 ± 3.2718.86 ± 6.76
Low5of1022.68 ± 2.8012.39 ± 2.8718.88 ± 2.9815.96 ± 3.2920.82 ± 9.02
Nearest3of622.23 ± 2.2712.39 ± 2.7719.45 ± 3.5016.26 ± 4.5919.27 ± 7.52
Nearest5of1021.30 ± 2.1311.98 ± 2.9217.61 ± 2.8315.72 ± 6.4218.34 ± 7.78
Linear interpolation14.47 ± 0.485.95 ± 1.3312.60 ± 1.439.68 ± 0.696.48 ± 3.64
Table 5

Average value ± 95% confidence interval of CV (%) – afternoon event window

Building-year
Baseline methodBBB-2017RAC-2017WH-2017BBB-2018RAC-2018
5-day average14.08 ± 0.897.09 ± 1.5815.26 ± 1.6010.24 ± 1.508.68 ± 2.29
10-day average14.17 ± 0.986.85 ± 1.6614.49 ± 1.5410.17 ± 1.168.15 ± 1.99
High4of514.26 ± 0.917.42 ± 1.6016.34 ± 1.7510.86 ± 1.589.10 ± 2.50
High5of1014.79 ± 0.998.65 ± 1.8217.40 ± 2.0510.69 ± 1.209.88 ± 2.42
Mid4of614.38 ± 0.916.80 ± 1.5316.70 ± 1.7611.22 ± 1.667.55 ± 2.04
Low4of514.39 ± 0.936.61 ± 1.5415.60 ± 1.6910.41 ± 1.466.83 ± 1.80
Low5of1014.81 ± 1.016.09 ± 1.5513.31 ± 1.6010.29 ± 1.517.27 ± 1.81
Nearest3of614.68 ± 0.936.39 ± 1.4414.82 ± 1.6010.89 ± 1.458.35 ± 1.76
Nearest5of1014.39 ± 1.006.10 ± 1.6714.07 ± 1.5810.66 ± 1.137.71 ± 2.37
Linear interpolation12.38 ± 0.523.90 ± 0.359.20 ± 1.297.21 ± 0.913.25 ± 0.55
Building-year
Baseline methodBBB-2017RAC-2017WH-2017BBB-2018RAC-2018
5-day average14.08 ± 0.897.09 ± 1.5815.26 ± 1.6010.24 ± 1.508.68 ± 2.29
10-day average14.17 ± 0.986.85 ± 1.6614.49 ± 1.5410.17 ± 1.168.15 ± 1.99
High4of514.26 ± 0.917.42 ± 1.6016.34 ± 1.7510.86 ± 1.589.10 ± 2.50
High5of1014.79 ± 0.998.65 ± 1.8217.40 ± 2.0510.69 ± 1.209.88 ± 2.42
Mid4of614.38 ± 0.916.80 ± 1.5316.70 ± 1.7611.22 ± 1.667.55 ± 2.04
Low4of514.39 ± 0.936.61 ± 1.5415.60 ± 1.6910.41 ± 1.466.83 ± 1.80
Low5of1014.81 ± 1.016.09 ± 1.5513.31 ± 1.6010.29 ± 1.517.27 ± 1.81
Nearest3of614.68 ± 0.936.39 ± 1.4414.82 ± 1.6010.89 ± 1.458.35 ± 1.76
Nearest5of1014.39 ± 1.006.10 ± 1.6714.07 ± 1.5810.66 ± 1.137.71 ± 2.37
Linear interpolation12.38 ± 0.523.90 ± 0.359.20 ± 1.297.21 ± 0.913.25 ± 0.55
Table 6

Average value ± 95% confidence interval of NMBE (%) – morning event window

Building-year
Baseline methodBBB-2017RAC-2017WH-2017BBB-2018RAC-2018
5-day average−1.02 ± 4.820.28 ± 4.42−0.93 ± 5.14−5.78 ± 6.500.50 ± 9.97
10-day average−2.81 ± 5.240.50 ± 4.44−2.50 ± 4.91−12.13 ± 5.301.93 ± 10.39
High4of5−3.89 ± 4.820.54 ± 4.98−1.74 ± 5.38−7.06 ± 6.910.27 ± 10.51
High5of10−9.45 ± 5.153.33 ± 6.26−4.02 ± 5.65−13.06 ± 6.990.59 ± 9.08
Mid4of6−2.34 ± 5.180.18 ± 4.47−1.72 ± 4.91−9.57 ± 5.63−0.29 ± 10.42
Low4of50.88 ± 5.060.05 ± 4.17−0.18 ± 4.86−5.92 ± 6.530.68 ± 9.37
Low5of103.84 ± 5.61−2.34 ± 4.04−0.99 ± 4.70−11.19 ± 5.083.27 ± 12.22
Nearest3of6−1.08 ± 4.90−0.76 ± 3.91−1.58 ± 5.04−9.31 ± 7.320.88 ± 10.23
Nearest5of10−1.66 ± 4.88−2.23 ± 3.98−2.55 ± 4.33−10.67 ± 7.952.41 ± 10.49
Linear interpolation1.58 ± 1.17−1.23 ± 1.60−4.25 ± 2.112.21 ± 1.461.46 ± 3.31
Building-year
Baseline methodBBB-2017RAC-2017WH-2017BBB-2018RAC-2018
5-day average−1.02 ± 4.820.28 ± 4.42−0.93 ± 5.14−5.78 ± 6.500.50 ± 9.97
10-day average−2.81 ± 5.240.50 ± 4.44−2.50 ± 4.91−12.13 ± 5.301.93 ± 10.39
High4of5−3.89 ± 4.820.54 ± 4.98−1.74 ± 5.38−7.06 ± 6.910.27 ± 10.51
High5of10−9.45 ± 5.153.33 ± 6.26−4.02 ± 5.65−13.06 ± 6.990.59 ± 9.08
Mid4of6−2.34 ± 5.180.18 ± 4.47−1.72 ± 4.91−9.57 ± 5.63−0.29 ± 10.42
Low4of50.88 ± 5.060.05 ± 4.17−0.18 ± 4.86−5.92 ± 6.530.68 ± 9.37
Low5of103.84 ± 5.61−2.34 ± 4.04−0.99 ± 4.70−11.19 ± 5.083.27 ± 12.22
Nearest3of6−1.08 ± 4.90−0.76 ± 3.91−1.58 ± 5.04−9.31 ± 7.320.88 ± 10.23
Nearest5of10−1.66 ± 4.88−2.23 ± 3.98−2.55 ± 4.33−10.67 ± 7.952.41 ± 10.49
Linear interpolation1.58 ± 1.17−1.23 ± 1.60−4.25 ± 2.112.21 ± 1.461.46 ± 3.31
Table 7

Average value ±95% confidence interval of NMBE (%) – afternoon event window

Building-year
Baseline methodBBB-2017RAC-2017WH-2017BBB-2018RAC-2018
5-day average−0.42 ± 1.97−0.91 ± 2.361.02 ± 2.791.61 ± 3.902.07 ± 3.80
10-day average−0.40 ± 2.32−1.66 ± 2.391.24 ± 2.730.64 ± 5.36−0.21 ± 3.91
High4of50.01 ± 1.98−1.39 ± 2.431.78 ± 3.032.93 ± 4.131.63 ± 4.06
High5of101.29 ± 2.37−3.84 ± 2.763.72 ± 3.302.20 ± 5.33−2.45 ± 4.66
Mid4of6−0.01 ± 2.07−1.42 ± 2.232.92 ± 3.112.90 ± 4.41−0.68 ± 3.38
Low4of5−0.71 ± 2.05−0.85 ± 2.201.80 ± 2.901.86 ± 3.880.41 ± 2.96
Low5of10−2.08 ± 2.400.53 ± 2.15−1.25 ± 2.54−0.93 ± 5.402.04 ± 3.34
Nearest3of6−0.17 ± 2.07−0.07 ± 2.080.39 ± 2.691.95 ± 3.920.75 ± 3.55
Nearest5of100.04 ± 2.26−0.15 ± 2.24−0.02 ± 2.651.18 ± 5.500.07 ± 3.90
Linear interpolation−3.13 ± 1.02−0.16 ± 0.53−2.36 ± 1.30−0.12 ± 1.09−0.20 ± 0.93
Building-year
Baseline methodBBB-2017RAC-2017WH-2017BBB-2018RAC-2018
5-day average−0.42 ± 1.97−0.91 ± 2.361.02 ± 2.791.61 ± 3.902.07 ± 3.80
10-day average−0.40 ± 2.32−1.66 ± 2.391.24 ± 2.730.64 ± 5.36−0.21 ± 3.91
High4of50.01 ± 1.98−1.39 ± 2.431.78 ± 3.032.93 ± 4.131.63 ± 4.06
High5of101.29 ± 2.37−3.84 ± 2.763.72 ± 3.302.20 ± 5.33−2.45 ± 4.66
Mid4of6−0.01 ± 2.07−1.42 ± 2.232.92 ± 3.112.90 ± 4.41−0.68 ± 3.38
Low4of5−0.71 ± 2.05−0.85 ± 2.201.80 ± 2.901.86 ± 3.880.41 ± 2.96
Low5of10−2.08 ± 2.400.53 ± 2.15−1.25 ± 2.54−0.93 ± 5.402.04 ± 3.34
Nearest3of6−0.17 ± 2.07−0.07 ± 2.080.39 ± 2.691.95 ± 3.920.75 ± 3.55
Nearest5of100.04 ± 2.26−0.15 ± 2.24−0.02 ± 2.651.18 ± 5.500.07 ± 3.90
Linear interpolation−3.13 ± 1.02−0.16 ± 0.53−2.36 ± 1.30−0.12 ± 1.09−0.20 ± 0.93

As shown in Tables 4 and 5, the linear interpolation method has the smallest average CV and smallest confidence interval for all building-years and both event windows. According to the ASHRAE guidelines [42], the suggested acceptable maximum CV is 30% when using hourly data and 15% when using monthly data. Here, with minutely data, we find the average CV of the linear interpolation method is smaller than 15% in all cases. In contrast, the averaging baseline methods have average CV values larger than 15% in many cases and even larger than 20% in some cases.

As for the bias reported in Tables 6 and 7, none of the baseline methods consistently attains the smallest average NMBE. Nevertheless, the linear interpolation method always has the smallest confidence interval. It indicates that the linear interpolation method has the least variable bias levels across different days in the data set. The performance of the linear interpolation method is also consistent across different data sets. According to the ASHRAE guidelines [42], the suggested acceptable maximum NMBE is ±10% when using hourly data and ±5% when using monthly data. Here, with minutely data, we find the average NMBE of the linear interpolation method is lower than ±5% in all cases, while the other methods have average NMBE values higher than ±5% or even higher than ±10% in some cases.

The average CV, average NMBE, and their confidence intervals are generally smaller in the afternoon event window. That is, the baseline methods perform better in baselining afternoon DR events. The reason may be that the total fan power profile is more stable in the afternoon, while it is more volatile in the morning.

In Figs. 48, we visualize the statistics of the evaluation metrics for each building-year using box plots. The figures show that the baseline methods have variable performance across different days. The linear interpolation method generally has the least variable performance.

Fig. 4
Boxplots of the evaluation metrics for BBB-2017 (with an additive adjustment for the averaging methods)
Fig. 4
Boxplots of the evaluation metrics for BBB-2017 (with an additive adjustment for the averaging methods)
Close modal
Fig. 5
Boxplots of the evaluation metrics for RAC-2017 (with an additive adjustment for the averaging methods)
Fig. 5
Boxplots of the evaluation metrics for RAC-2017 (with an additive adjustment for the averaging methods)
Close modal
Fig. 6
Boxplots of the evaluation metrics for WH-2017 (with an additive adjustment for the averaging methods)
Fig. 6
Boxplots of the evaluation metrics for WH-2017 (with an additive adjustment for the averaging methods)
Close modal
Fig. 7
Boxplots of the evaluation metrics for BBB-2018 (with an additive adjustment for the averaging methods)
Fig. 7
Boxplots of the evaluation metrics for BBB-2018 (with an additive adjustment for the averaging methods)
Close modal
Fig. 8
Boxplots of the evaluation metrics for RAC-2018 (with an additive adjustment for the averaging methods)
Fig. 8
Boxplots of the evaluation metrics for RAC-2018 (with an additive adjustment for the averaging methods)
Close modal

6.3 Example Time Series Plots.

To give more intuition for the results, Fig. 9 shows time-series plots of the actual total fan power of the WH building on Aug. 2, 2017, and baselines estimated by the implemented methods, where the averaging methods use additive adjustments. Specifically, the upper-left and upper-right plots show the actual and estimated fan power curves during the morning event window and the lower-left and lower-right plots show similar results for the afternoon event window. For clarity, each plot only includes five estimated baselines. For the same example, Fig. 10 shows the time-series errors of three baseline methods: the linear interpolation method, the Low4of5 average method, and the 5-day average method, which generally have the best overall performance as discussed in Sec. 6.1. However, they are not necessarily the best for this specific example.

Fig. 9
Example time-series plots of the actual total fan power and baselines estimates. The averaging methods are implemented with additive adjustments.
Fig. 9
Example time-series plots of the actual total fan power and baselines estimates. The averaging methods are implemented with additive adjustments.
Close modal
Fig. 10
Example time-series plots of the errors of the three best baseline methods
Fig. 10
Example time-series plots of the errors of the three best baseline methods
Close modal

As shown in Figs. 9 and 10, the linear interpolation method has the best performance. The averaging methods have relatively larger errors in this case, though the CV and NMBE of the High4of5 average method baseline for the morning window are 8.40% and 1.26%, respectively, which are comparable to those of the linear interpolation baseline.

In general, we have found that averaging methods do not perform well in baselining fan power data, as it is difficult for them to precisely capture the minute-scale variation in HVAC fan power. As shown in Fig. 9, the actual baseline fan power is highly volatile, and while the averaging methods produce volatile estimates, they do not usually align with the actual load. In contrast, the linear interpolation method does not try to estimate the minute-scale variation of the HVAC fan power. It assumes that the HVAC fan power trend is approximately linear and expects the positive and negative errors to balance out over time. It is appropriate to use such a method when we care about the average response over the event (e.g., for overall impact assessment and financial settlement) instead of the accuracy of instantaneous estimates. As shown in Figs. 9 and 10, the linear interpolation method captures the hourly trend and its minute-scale positive and negative errors somewhat balance out over the DR event window. In many cases, the actual baseline fan power generally follows a linear trend in the short term, also helping explain the result that the linear interpolation method is generally best.

Another reason that the linear interpolation method generally performs better than the averaging methods is due to its more effective use of a posteriori knowledge. The HighXofY, MidXofY, LowXofY, and NearestXofY methods use the daily electricity consumption on the DR day in selecting baseline days included in the baseline calculation. However, as evidenced by the results, the daily electricity consumption might not be a sufficiently effective indicator of short-term fan power. The additive adjustment, which uses the data from a short window (i.e., 2 h) before the DR event to greatly improve the performance of the averaging methods, proves to be an effective indicator here. The linear interpolation method uses data both before and after the DR event window, which turn out to be effective indicators of the baseline fan power.

6.4 Practical Implications.

While HVAC fan power control represents an important DR source and submetering fan power data can improve baselining [7,14], the averaging baseline methods commonly used by utilities and ISOs do not produce accurate fan power baseline estimates. In general, such commonly used methods are outperformed by the simple linear interpolation method. As a result, building owner and operators aiming to utilize HVAC fans to provide DR and other grid services may benefits from adopting this simple method over traditional baselining methods.

7 Conclusions and Future Work

In this article, a variety of baseline methods were evaluated on building HVAC fan power data. Our numerical results show that, with an additive adjustment and assuming the building takes no anticipatory actions before the DR event, averaging methods work well for baselining fan power in some cases. Nevertheless, their performance is not consistent across all cases. The simple linear interpolation method generally has the best performance. In particular, it has a low bias and by far the highest accuracy on average. For DR applications such as analyzing the overall impacts of DR actions or financial settlement, the linear interpolation method, Low4of5 average method, and 5-day average method are the best baseline methods among the methods we tested as they have the smallest levels of bias on average. However, the linear interpolation method may be preferable for two reasons. First, as mentioned, it has low bias and the highest accuracy on average. Second, for every building-year in both DR event windows, it has the smallest CV and NMBE confidence intervals, indicating that its performance is more stable than that of the other methods.

In the future work, we aim to collect HVAC fan power data sets covering more diversified geographical areas with different climates and further evaluate the existing baseline methods and validate our results using such extended data sets that have or can be processed to have different temporal granularity. We also plan to explore the applicability of other baseline methods. While in Ref. [30], we found that regression methods using outdoor air temperature as the main explanatory variable are inappropriate for baseline estimation of total fan power, some time-series methods, e.g., Refs. [43,44], and machine learning methods, e.g., Refs. [45,46], may be applicable. It also may be possible to take advantage of fan power data from individual fans rather than using only total fan power data. Specifically, it is possible that more granular data can be utilized to obtain fan power patterns that are consistent among different fans and over different days, in turn improving our estimates of the total fan power baseline. In this regard, it is worth exploring methods such as tensor decomposition [31,47,48], which is capable of high-dimensional data mining and analysis. A further research topic is exploring how these methods could be leveraged to improve the linear interpolation method.

In the end, our results have implications for enabling deeper participation of commercial buildings in grid services. The widespread use of DR strategies that control building HVAC system fan power could provide much needed quick response to grids with increasing penetrations of intermittent renewable energy sources. Through better baselining of HVAC fan power, individual GEBs, flexibility aggregators, and ISOs can better assess the capacity and participation of building HVAC systems to deliver grid services that improve the reliability, economics, and sustainability of power grids.

Acknowledgment

This work was supported by the U.S. Department of Energy Building Technologies Office under the project I-DREEM: Impact of Demand Response on short and long-term building Energy Efficiency Metrics (Contract Number DE-AC02-76SF00515).

Conflict of Interest

There are no conflicts of interest.

Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

References

1.
FERC
,
2018
, “
Assessment of Demand Response and Advanced Metering
,” Staff Report,
Federal Energy Regulatory Commission (FERC)
,
Washington, DC, USA
.
2.
Strbac
,
G.
,
2008
, “
Demand Side Management: Benefits and Challenges
,”
Energy Policy
,
36
(
12
), pp.
4419
4426
.
3.
Aghaei
,
J.
, and
Alizadeh
,
M.-I.
,
2013
, “
Demand Response in Smart Electricity Grids Equipped With Renewable Energy Sources: A Review
,”
Renewable. Sustainable. Energy. Rev.
,
18
, pp.
64
72
.
4.
EIA
,
2020
, “
Use of Energy in the United States Explained
,”
Energy Information Administration (EIA)
,
Washington, DC
, Technical Report, https://www.eia.gov/energyexplained/use-of-energy/.
5.
Aduda
,
K. O.
,
Labeodan
,
T.
, and
Zeiler
,
W.
,
2018
, “
Towards Critical Performance Considerations for Using Office Buildings as a Power Flexibility Resource—A Survey
,”
Energy Build.
,
159
, pp.
164
178
.
6.
Beil
,
I.
,
Hiskens
,
I.
, and
Backhaus
,
S.
,
2015
, “
Round-Trip Efficiency of Fast Demand Response in a Large Commercial Air Conditioner
,”
Energy Build.
,
97
, pp.
47
55
.
7.
DOE
,
2019
, “
Grid-Interactive Efficient Buildings Technical Report Series: Overview of Research Challenges and Gaps
,”
Department of Energy (DOE)
,
Washington, DC
, Technical Report No. DOE/GO-102019-5227.
8.
Hao
,
H.
,
Middelkoop
,
T.
,
Barooah
,
P.
, and
Meyn
,
S.
,
2012
, “
How Demand Response From Commercial Buildings Will Provide the Regulation Needs of the Grid
,”
Allerton Conference on Communication, Control, and Computing
,
Monticello, IL
,
Oct. 1–5
, pp.
1908
1913
.
9.
Lee
,
Z. E.
,
Sun
,
Q.
,
Ma
,
Z.
,
Wang
,
J.
,
MacDonald
,
J. S.
, and
Zhang
,
K. M.
,
2020
, “
Providing Grid Services With Heat Pumps: A Review
,”
ASME J. Eng. Sustain. Build. Cities
,
1
(
1
), p.
011007
.
10.
Hao
,
H.
,
Lin
,
Y.
,
Kowli
,
A. S.
,
Barooah
,
P.
, and
Meyn
,
S.
,
2014
, “
Ancillary Service to the Grid Through Control of Fans in Commercial Building HVAC Systems
,”
IEEE Trans. Smart Grid
,
5
(
4
), pp.
2066
2074
.
11.
ISO-NE
,
2017
, “
Revisions to Implement Full Integration of Demand Response
,”
ISO New England (ISO-NE), Inc.
,
Holyoke, MA
.
12.
Nexant
,
2017
, “
California ISO Baseline Accuracy Work Group Proposal
,”
Nexant, Inc.
,
San Francisco, CA
.
13.
KEMA
,
2011
, “
PJM Empirical Analysis of Demand Response Baseline Methods
,” White paper,
KEMA, Inc.
,
Clark Lake, MI
.
14.
DOE
,
2019
, “
Grid-Interactive Efficient Buildings Technical Report Series: Whole-Building Controls, Sensors, Modeling, and Analytics
,”
Department of Energy (DOE)
,
Washington, DC
, Technical Report No. DOE/GO-102019-5230.
15.
Zhang
,
Y.
,
Chen
,
W.
,
Xu
,
R.
, and
Black
,
J.
,
2016
, “
A Cluster-Based Method for Calculating Baselines for Residential Loads
,”
IEEE Trans. Smart Grid
,
7
(
5
), pp.
2368
2377
.
16.
EnerNOC
,
2011
, “
The Demand Response Baseline
,” White Paper,
EnerNOC, Inc.
,
Boston, MA
.
17.
Thanos
,
G.
,
Minou
,
M.
,
Ganu
,
T.
,
Arya
,
V.
,
Chakraborty
,
D.
,
Van Deventer
,
J.
, and
Stamoulis
,
G. D.
,
2013
, “
Evaluating Demand Response Programs by Means of Key Performance Indicators
,”
International Conference on Communication Systems and Networks
,
Bangalore, India
,
Jan. 7–10
, pp.
1
6
.
18.
Jain
,
R. K.
,
Smith
,
K. M.
,
Culligan
,
P. J.
, and
Taylor
,
J. E.
,
2014
, “
Forecasting Energy Consumption of Multi-Family Residential Buildings Using Support Vector Regression: Investigating the Impact of Temporal and Spatial Monitoring Granularity on Performance Accuracy
,”
Appl. Energy.
,
123
, pp.
168
178
.
19.
Sevlian
,
R.
, and
Rajagopal
,
R.
,
2018
, “
A Scaling Law for Short Term Load Forecasting on Varying Levels of Aggregation
,”
Int. J. Electric. Power Energy Syst.
,
98
, pp.
350
361
.
20.
Nolan
,
S.
, and
O’Malley
,
M.
,
2015
, “
Challenges and Barriers to Demand Response Deployment and Evaluation
,”
Appl. Energy.
,
152
, pp.
1
10
.
21.
KEMA
,
2013
, “
Development of Demand Response Mechanism: Baseline Consumption Methodology—Phase 1 Results
,”
KEMA, Inc.
,
Clark Lake, MI
, Technical Report, https://www.aemo.com.au/-/media/Files/PDF/Baseline-consumption-methodology---Phase-I-Reportpdf.pdf.
22.
Wijaya
,
T. K.
,
Vasirani
,
M.
, and
Aberer
,
K.
,
2014
, “
When Bias Matters: An Economic Assessment of Demand Response Baselines for Residential Customers
,”
IEEE Trans. Smart Grid
,
5
(
4
), pp.
1755
1763
.
23.
Grimm
,
C.
,
2008
, “
Evaluating Baselines for Demand Response Programs
,”
AEIC Load Research Workshop
,
San Antonio, TX
,
Feb. 25–27
, pp.
1
31
.
24.
Mohajeryami
,
S.
,
Doostan
,
M.
,
Asadinejad
,
A.
, and
Schwarz
,
P.
,
2017
, “
Error Analysis of Customer Baseline Load (CBL) Calculation Methods for Residential Customers
,”
IEEE. Trans. Ind. Appl.
,
53
(
1
), pp.
5
14
.
25.
Coughlin
,
K.
,
Piette
,
M. A.
,
Goldman
,
C.
, and
Kiliccote
,
S.
,
2009
, “
Statistical Analysis of Baseline Load Models for Non-Residential Buildings
,”
Energy Build.
,
41
(
4
), pp.
374
381
.
26.
Wang
,
F.
,
Li
,
K.
,
Liu
,
C.
,
Mi
,
Z.
,
Shafie-Khah
,
M.
, and
Catalão
,
J. P. S.
,
2018
, “
Synchronous Pattern Matching Principle-Based Residential Demand Response Baseline Estimation: Mechanism Analysis and Approach Description
,”
IEEE Trans. Smart Grid
,
9
(
6
), pp.
6972
6985
.
27.
Oyedokun
,
J.
,
Bu
,
S.
,
Han
,
Z.
, and
Liu
,
X.
,
2019
, “
Customer Baseline Load Estimation for Incentive-Based Demand Response Using Long Short-Term Memory Recurrent Neural Network
,”
IEEE PES Innovative Smart Grid Technologies Conference–Europe
,
Bucharest, Romania
,
Sept. 29–Oct. 2
, pp.
1
5
.
28.
Lee
,
E.
,
Lee
,
K.
,
Lee
,
H.
,
Kim
,
E.
, and
Rhee
,
W.
,
2019
, “
Defining Virtual Control Group to Improve Customer Baseline Load Calculation of Residential Demand Response
,”
Appl. Energy.
,
250
, pp.
946
958
.
29.
Li
,
K.
,
Wang
,
F.
,
Mi
,
Z.
,
Fotuhi-Firuzabad
,
M.
,
Duié
,
N.
, and
Wang
,
T.
,
2019
, “
Capacity and Output Power Estimation Approach of Individual Behind-the-Meter Distributed Photovoltaic System for Demand Response Baseline Estimation
,”
Appl. Energy.
,
253
, p.
113595
.
30.
Lei
,
S.
,
Mathieu
,
J. L.
, and
Jain
,
R. K.
,
2019
, “
Performance of Existing Baseline Models in Quantifying the Effects of Short-Term Load Shifting of Campus Buildings
,”
SLAC National Accelerator Laboratory
,
Menlo Park, CA
, Technical Report No. SLAC-R-1131.
31.
Lei
,
S.
,
Hong
,
D.
,
Mathieu
,
J. L.
, and
Hiskens
,
I. A.
,
2020
, “
Baseline Estimation of Commercial Building HVAC Fan Power Using Tensor Completion
,”
Electric Power Syst. Res.
,
189
, p.
106624
.
32.
Keskar
,
A.
,
Anderson
,
D.
,
Johnson
,
J. X.
,
Hiskens
,
I. A.
, and
Mathieu
,
J. L.
,
2018
, “
Experimental Investigation of the Additional Energy Consumed by Building HVAC Systems Providing Grid Ancillary Services
,”
ACEEE Summer Study on Energy Efficiency in Buildings
,
Pacific Grove, CA
,
Aug. 12–17
, pp.
1
12
.
33.
Keskar
,
A.
,
Anderson
,
D.
,
Johnson
,
J. X.
,
Hiskens
,
I. A.
, and
Mathieu
,
J. L.
,
2020
, “
Do Commercial Buildings Become Less Efficient When They Provide Grid Ancillary Services?
,”
Energy Efficiency
,
13
, pp.
487
501
.
34.
Afshari
,
S.
,
Wolfe
,
J.
,
Nazir
,
M. S.
,
Hiskens
,
I. A.
,
Johnson
,
J. X.
,
Mathieu
,
J. L.
,
Lin
,
Y.
,
Barnes
,
A. K.
,
Geller
,
D. A.
, and
Backhaus
,
S. N.
,
2017
, “
An Experimental Study of Energy Consumption in Buildings Providing Ancillary Services
,”
IEEE PES Innovative Smart Grid Technologies Conference–North America
,
Washington, DC
,
Apr. 23–26
, pp.
1
5
.
35.
Mathieu
,
J. L.
,
Price
,
P. N.
,
Kiliccote
,
S.
, and
Piette
,
M. A.
,
2011
, “
Quantifying Changes in Building Electricity Use, With Application to Demand Response
,”
IEEE Trans. Smart Grid
,
2
(
3
), pp.
507
518
.
36.
Song
,
T.
,
Li
,
Y.
,
Zhang
,
X.-P.
,
Li
,
J.
,
Wu
,
C.
,
Wu
,
Q.
, and
Wang
,
B.
,
2018
, “
A Cluster-Based Baseline Load Calculation Approach for Individual Industrial and Commercial Customer
,”
Energies
,
12
(
1
), p.
64
.
37.
Jazaeri
,
J.
,
Alpcan
,
T.
,
Gordon
,
R.
,
Brandao
,
M.
,
Hoban
,
T.
, and
Seeling
,
C.
,
2016
, “
Baseline Methodologies for Small Scale Residential Demand Response
,”
IEEE PES Innovative Smart Grid Technologies Conference–Asia
,
Melbourne, VIC, Australia
,
Nov. 28–Dec. 1
, pp.
747
752
.
38.
Goldberg
,
M. L.
, and
Agnew
,
G. K.
,
2013
, “
Measurement and Verification for Demand Response
,”
Department of Energy (DOE) and Federal Energy Regulatory Commission (FERC)
,
Washington, DC
, Technical Report, https://www.ferc.gov/sites/default/files/2020-04/napdr-mv.pdf
39.
Keskar
,
A.
,
Lei
,
S.
,
Webb
,
T.
,
Nagy
,
S.
,
Lee
,
H.
,
Hiskens
,
I. A.
,
Mathieu
,
J. L.
, and
Johnson
,
J. X.
,
2020
, “
Stay Cool and Be Flexible: Energy-Efficient Grid Services Using Commercial Buildings HVAC Systems
,”
ACEEE Summer Study on Energy Efficiency in Buildings
,
Virtual Conference
,
Aug. 17–21
, pp.
1
16
.
40.
Snijders
,
T. A. B.
,
1988
, “On Cross-Validation for Predictor Evaluation in Time Series,”
On Model Uncertainty and its Statistical Implications
,
T. K.
Dijkstra
, ed.,
Springer
,
Berlin/Heidelberg, Germany
, pp.
56
69
.
41.
Taieb
,
S. B.
, and
Hyndman
,
R.
,
2014
, “
Boosting Multi-step Autoregressive Forecasts
,”
International Conference on Machine Learning
,
Beijing, China
,
June 21–26
, pp.
109
117
.
42.
ASHRAE
,
2014
, “
ASHRAE Guideline 14–2014, Measurement of Energy, Demand, and Water Savings
,”
American Society of Heating, Ventilating, and Air Conditioning Engineers (ASHRAE)
,
Atlanta, GA
, Technical Report, https://www.techstreet.com/ashrae/standards/guideline-14-2014-measurement-of-energy-demand-and-water-savings?gateway_code=ashrae&product_id=1888937.
43.
Vemuri
,
S.
,
Huang
,
W. L.
, and
Nelson
,
D. J.
,
1981
, “
On-Line Algorithms for Forecasting Hourly Loads of an Electric Utility
,”
IEEE Transactions on Power Apparatus and Systems
,
PAS-100
(
8
), pp.
3775
3784
.
44.
Christiaanse
,
W. R.
,
1971
, “
Short-Term Load Forecasting Using General Exponential Smoothing
,”
IEEE Trans. Power Apparatus Syst.
,
PAS-90
(
2
), pp.
900
911
.
45.
Rahman
,
S.
, and
Bhatnagar
,
R.
,
1988
, “
An Expert System Based Algorithm for Short Term Load Forecast
,”
IEEE Trans. Power Syst.
,
3
(
2
), pp.
392
399
.
46.
Karatasou
,
S.
,
Santamouris
,
M.
, and
Geros
,
V.
,
2006
, “
Modeling and Predicting Building’s Energy Use With Artificial Neural Networks: Methods and Results
,”
Energy Build.
,
38
(
8
), pp.
949
958
.
47.
Kolda
,
T. G.
, and
Bader
,
B. W.
,
2009
, “
Tensor Decompositions and Applications
,”
SIAM Rev.
,
51
(
3
), pp.
455
500
.
48.
Hong
,
D.
,
Lei
,
S.
,
Mathieu
,
J. L.
, and
Balzano
,
L.
,
2019
, “
Exploration of Tensor Decomposition Applied to Commercial Building Baseline Estimation
,”
IEEE Global Conference on Signal and Information Processing
,
Ottawa, Ontario, Canada
,
Nov. 11–14
, pp.
1
5
.