Total views : 466

Reinforcement Learning Algorithms: Survey and Classification


  • Dr. Ambedkar Institute of Technology, Bangalore – 560056, Karnataka, India
  • Department of Computer Science New Horizon college of Engineering, Bangalore – 560103, Karnataka, India


Reinforcement Learning (RL) has emerged as a strong approach in the field of Artificial intelligence, specifically, in the field of machine learning, robotic navigation, etc. In this paper we try to do a brief survey on the various RL algorithms, and try to give a perspective on how the trajectory is moving in the research landscape. We are also attempting to classify RL as a 3-D (dimensional) problem, and give a perspective on how the journey of various algorithms in each of these dimensions progressing. We provide a quick recap of basic classifications in RL, and some popular, but old, algorithms. This research paper then discusses some of the recent trends; and also summarizes the entire landscape as can be seen from a bird’s eye view. We provide our perspective in saying that Reinforcement learning is a 3D problem and conclude with challenges that remain ahead of us. We have deliberately kept any deep mathematical equations and concepts out of this paper, as the purpose of this paper is to provide an overview and abstractions to a serious onlooker. We hope this paper provides a brief summary and trends in RL for researchers, students and interested scholars.


Artificial Intelligence, Cognitive Search, Game Theory, Machine Learning, Reinforcement Learning.

Full Text:

 |  (PDF views: 558)


  • Russell S and Norvig P. Reinforcement Learning: Artificial Intelligence A Modern Approach. 3rd edition.New Delhi, India: Dorling Kindersley; 2014. ch. 22, sec. 2-5, p. 872–902.
  • Sutton R, Barto A.Reinforcement Learning: An Introduction.Cambridge, U.S: MIT Press; 1998.
  • Quentin JM, Huysa B, Anthony C, Peggy S. Reward-Based Learning, Model-Based and Model-Free. Encyclopedia of Comput. Neurosci. 2014.
  • Jaakkola T, Singh SP, Jordan MI.Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems.Adv. I n Neural Info. Proc. Syst. vol. 7. Cambridge, MA: MIT Press; 1995.
  • Ng AY, Jordan MI.PEGASUS: a policy search method for large MDPs and POMDPs. Proc. of the 16th conf. on Uncertainty in Arti. Intell. San Francisco: 2000.
  • Watkins CJ,Dayan P.Q-learning. Mach. Learn. vol. 8.1992.p.279–292.
  • Sutton RS.Dyna, an Integrated Architecture for Learning, Planning, and Reacting. ACM SIGART Bulletin.1991; 2(4):160–3.
  • Sutton RS.Learning to predict by the methods of temporal differences. Mach Learn.1998;3(1): 9–44.
  • Tesauro G.Temporal difference learning and TD-Gammon.Commun. ACM. 1995; 38:58–68.
  • Daw ND.Model-based reinforcement learning as cognitive search: Neuro computational Theories.Center for Neural Sci. and Dept of Psychology. New York University; 2012.
  • Konidaris G. Osentoski S, Thomas P.Value Function Approximation in Reinforcement Learning Using the Fourier Basis. Proc. 25th AAAI Conference on Artificial Intelligence.2011.p. 380–5.
  • J. Tsitsiklis, Roy BV.An analysis of temporal-difference learning with function approximation. IEEE Trans. Automatic Control. 1997; 42(5).
  • S. Mahadevan, Amarel SM.Automating value function approximation using global state space analysis. Proc. of 20th Nat. Conf. on Arti. Intell. and the 17th Innovative Appl.of Arti. Intell. Conf. (AAAI-05). Pittsburgh, U.S., 2005.p.1000–05.
  • Lagoudakis MG, Parr R.Least-squares policy iteration.Journ. of Mach. Learn. Res., 2003; 4: 1107–49.
  • Tadepalli P, Givan R, Driessens K.Relational Reinforcement Learning An Overview. Proc. of the ICML workshop on Relational Reinforcement Learning.2004; Banff, Canada.
  • R. Munos.Policy gradient in continuous time. Jour. of Mach. Learn.Res., vol. 7, pp. 771–791, 2006.
  • M. B. Hafez and C. K. Loo.Topological Q-learning with internally guided exploration for mobile robot navigation.Neural Computation and Application, 26, 2015.
  • Zajdel R.Epoch-Incremental Reinforcement Learning Algorithms.Int. Journal. Appl. Math. Comp. Sci. 2013; 23(3): 623–635. 2013.
  • Takase N, Kubota N, Baba N.Multi-scale Q-Learning of A Mobile Robot in Dynamic Environments. SCIS-ISIS. Kobe, Japan. Nov. 2012.
  • . Jose JFR. Solway A, Diuk C, McGuire JT, Barto AG, Niv Y.A Neural Signature of Hierarchical Reinforcement Learning.Neuron. 2011 Jul; 71(2): 370–379.
  • Chapman JR. Work Breakdown Structures, ver. 2.01, [Online].2004. Available:
  • Mnih V, Kavukcuoglu K, Silver D, et al.Human-level control through deep reinforcement learning. Nature. 2015 Feb; 518: 529–33.
  • Kim BG, Zhang Y, Schaar MV, Lee JW.Dynamic Pricing and Energy Consumption Scheduling With Reinforcement Learning. IEEE Trans. On Smart Grid. 2016 Sep; 07( 05).
  • Bowling M,Veloso MM.An analysis of stochastic game theory for multiagent reinforcement learning. Comp. Sc. Dep., Carnegie Mellon Univ., Tech. Rep. CMU-CS-00-165, 2000.
  • Nash J.Non-cooperative games. The Annals of Mathematics.1951; 54(2):286–95.
  • Hu J,Wellman MP. Multiagent reinforcement learning: Theoretical framework and an algorithm. Proc. 15th Intl. Conf.on Mach. Learn. San Francisco: 1998. p. 242–50.
  • Nanduri V, Das TK.A Reinforcement Learning Algorithm for obtaining Nash Equilibrium of Multi-player Matrix Games. IIE Trans.2009; 41(2):
  • Zhifei S, Joo EM.A survey of inverse reinforcement learning techniques. Int. Jour. of Intelligent Computing and Cybernetics.2012; 5(3): 293–311.
  • Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: A survey J. Artif. Intell. Res. 1996 May; 4:237–85.
  • Kara EC, Berges M, Krogh B, Kar S. Using smart devices for system-level management and control in the smart grid: A reinforcement learning framework. Proc. IEEE Smart Grid Commun, Tainan City, Taiwan: 2012.p. 85–90.
  • Pan GY, Jou JY, Lai BC. Scalable Power Management Using Multilevel Reinforcement Learning for Multiprocessors.ACM Trans. on Des. Auto. of Electronic Sys. (TODAES).2014 Sep;19( 4):
  • Gosavi A.Reinforcement Learning: A Tutorial Survey and Recent Advances. INFORMS Journal on Computing.2009 Spring; 21( 2): 178–192.
  • Doshi F, Pineau J, Roy N.Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs. Proc. of 25th Inter. Conf. on Mach. Learn(ACM). 2008. p. 256–63.
  • Vijayakumar MV.A Society of Mind Approach to Cognition and Metacognition in a Cognitive Architecture. Ph.D.thesis, Comp. Sci. and Eng. Univ. of Hull, U.K. April 2008.
  • Waelti P, Dickinson A, Schultz W.Dopamine responses comply with basic assumptions of formal learning theory.Nature.2001; 412( 6842): 43–48.
  • Wunderlich K, Smittenaar P, Dolan RJ.Dopamine enhances model-based over model-free choice behavior.Neuron.2012; 75( 3): 418–24.
  • Braga APS, Araujo AFR.A topological reinforcement learning agent for navigation. Neural Comput. and Appl.2003; 12:220–36.
  • Silver D. Value Funct i on Approximat i on . [Online].Available: http: // web/Teaching_files/FA.pdf
  • Irodova M,Sloan RH. Reinforcement Learning and Function Approximation. Proceedings of 25th National Conference on Artificial Intelligence. (AAAI), 2005.
  • Daw ND, Doya K. The Computational Neurobiology of Learning and Reward. Current Opinion in Neurobiology.Available from:
  • Borkar VS.Reinforcement learning in Markovian evolutionary games. Advances in Complex Systems.2002; 5( 1):55–72.
  • Littman ML.Markov games as a framework for multi-agent reinforcement learning.11th Intl. Conf. on Mach. Learn, San Francisco. CA., U.S. 1994.p. 151–63.
  • Shapley LS.Stochastic games.PNAS.1953; 39:1095–100.
  • Maia T. V.Two-factor theory, the actor-critic model, and conditioned avoidance. Learn. Behav. 2010;38(1):50–67.
  • Braga APS . Araújo AFR.Influence Zones: a strategy to enhance reinforcement learning. Neurocomputing.2006;70( 1–3):21–34.
  • Busoniu L, Babuska R, Schutter BD,Ernst D.Reinforcement learning and dynamic programming using function approximators.New York, USA: C.R.C. Press; 2010.
  • Millán JDR, Posenato D, Dedieu E.Continuous-action Q-learning. Mach. Learn.2002; 49(2–3):247–65.
  • Dietterich TG.Hierarchical reinforcement learning with the MAXQ value function decomposition. Jour.Arti. Intel.Res 2000;13: 227–303.
  • Frampton M,Lemon O.Recent research advances in Reinforcement Learning in Spoken Dialogue Systems.The Knowledge Engineering Review. Cambridge Univ.Press.2009 Dec;24(4):
  • Uc-Cetina V.A Novel Reinforcement Learning Architecture for Continuous State and Action Spaces. Advances in Arti.Intell. 2013;2013:
  • Vien NA, Ngo H, Lee S, et al. Approximate Planning for Bayesian Hierarchical Reinforcement Learning. Applied Intelligence. 2014 Oct; 41(3):
  • Li H, Liao X,Carin L. Multi-task Reinforcement Learning in Partially Observable Stochastic Environments. J. of Mach. Learn. Res.2009;10:1131–86.
  • Kovacs T,Egginton R.On The Analysis and Design of Software for Reinforcement Learning, with a Survey of Existing Systems. Mach. Learn 2011 Feb; 84:7–49.
  • Moore A,Atkeson C. Prioritized sweeping: reinforcement learning with less data and less real time.Mach. Learn.1993; 13:103–30.
  • Riedmiller M, Gabel T, Hafner R, Lange S. Reinforcement learning for robot soccer.Auto. Robots.2009; 27: 55–73.
  • Bakker B.The State of Mind: Reinforcement Learning with Recurrent Neural Networks. Ph.D. Thesis, Unit of Cogn.Psych., Leiden University, 2004.
  • Bianchi RAC, Martins MF, Ribeiro CH, et al. HeuristicallyAccelerated Multiagent Reinforcement Learning. IEEE Trans. On Cyber.2014; 44(2):
  • Ormoneit D, Sen S.Kernel-Based Reinforcement Learning.Mach. Learn. 2002;49(2–3):161–178.
  • Nedic A, Bertsekas DP.Least-squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Syst.2003;13:79–110.
  • Lagoudakis MG, Parr R.Reinforcement learning as classification: Leveraging modern classifiers. In Proc. 20th Inter.Conf. on Mach. Learn. (ICML-03), Washington, U.S.; 2003.p. 424–31.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.