This work leverages the current state of the art in reinforcement learning for continuous control, the Deep Deterministic Policy Gradient (DDPG) algorithm, towards the optimal 24-hour dispatch of shared energy assets within building clusters. The modeled DDPG agent interacts with a battery environment, designed to emulate a shared battery system. The aim here is to not only learn an efficient charged/discharged policy, but to also address the continuous domain question of how much energy should be charged or discharged. Experimentally, we examine the impact of the learned dispatch strategy towards minimizing demand peaks within the building cluster. Our results show that across the variety of building cluster combinations studied, the algorithm is able to learn and exploit energy arbitrage, tailoring it into battery dispatch strategies for peak demand shifting.

This content is only available via PDF.
You do not currently have access to this content.