Microgrid monitoring focusing on power data, such as voltage and current, has become more significant in the development of decentralized power supply system. The power data transmission delay between distributed generator is vital for evaluating the stability and financial outcome of overall grid performance. In this thesis, both hardware and simulation has been discussed for optimizing the data packets transmission delay, energy consumption, and collision rate. To minimize the transmission delay and collision rate, state-action-reward-state-action (SARSA) and Q-learning method based on Markov decision process (MDP) model is used to search the most efficient data transmission scheme for each agent device. A training process comparison between SARSA and Q-learning is given out for representing the training speed of these two methodologies in the scenario of source-relaying-destination model. To balance the exploration and exploitation process involved in these two methods, a parameter is introduced to optimize the cost time of training process. Finally, the simulation result of average throughput and data packets collision rate in the network with 20 agent nodes is presented to indicate the application feasibility of reinforcement learning algorithm in the development of scalable network. The results show that, the average throughput and collision rate stay on the expected ideal performance level for the overall network when the number of nodes is not too large. Also, the hardware development based on Bluetooth Low Energy (BLE) is used to reveal the process of data packets transmission.


Vincent J. Winstead

Committee Member

Xuanhui Wu

Committee Member

Jianwu Zeng

Date of Degree




Document Type



Master of Science (MS)


Electrical and Computer Engineering and Technology


Science, Engineering and Technology



Rights Statement

In Copyright