Keywords:Attention-masking, deep Q network, combinatory optimization
Recently deep neural network-based reinforcement learning methods, which demonstrated unprecedented success in game and robotic control, are gradually gaining attention to solve the combinatory optimization problem. However, effective operation in smart grid system has to be submitted to various constraints such as power demand-supply relation, lower and upper bound of battery electricity, market price etc. Because of these constraints, DRL algorithm is not efficient to get an optimized result. In this paper we address this issue by developing an attention-masking extended deep Q network reinforcement learning algorithm. Special focus was lied on the prediction ability of the trained AME-DQN model given various weather conditions and demand profile. These results were further compared with MILP results and finally we demonstrate that the AME-DQN are able to predict optimized actions which satisfy all the constraints while the MILP failed to meet the conditions in most of the cases.