Abstract World Health Organization (WHO) stated COVID-19 as a pandemic in March 2020. Since then 26795847 cases have been reported worldwide and 878963 lost their lives due to the illness by September 3 2020. Prediction of the COVID-19 pandemic will enable policymakers to optimize the use of healthcare system capacity and resource allocation to minimize the fatality rate. In this research we design a novel hybrid reinforcement learning-based algorithm capable of solving complex optimization problems. We apply our algorithm to several well-known benchmarks and show that the proposed methodology provides quality solutions for most complex benchmarks. Besides we show the dominance of the offered method over state-of-the-art methods through several measures. Moreover to demonstrate the suggested method’s efficiency in optimizing real-world problems we implement our approach to the most recent data from Quebec Canada to predict the COVID-19 outbreak. Our algorithm combined with the most recent mathematical model for COVID-19 pandemic prediction accurately reflected the future trend of the pandemic with a mean square error of 6.29E−06. Furthermore we generate several scenarios for deepening our insight into pandemic growth. We determine essential factors and deliver various managerial insights to help policymakers making decisions regarding future social measures.