沃新书屋 - 强化学习与最优控制
本书资料更新时间:2025-05-09 06:40:41

强化学习与最优控制

强化学习与最优控制精美图片

强化学习与最优控制书籍详细信息


内容简介:

本书的目的是考虑大型且具有挑战性的多阶段决策问题,这些问题原则上可以通过动态规划和最优控制来解决,但它们的精确解决方案在计算上是难以处理的。本书讨论依赖于近似的解决方法,以产生具有足够性能的次优策略。这些方法统称为增强学习,也可以叫做近似动态规划和神经动态规划等。本书的主题产生于最优控制和人工智能思想的相互作用。本书的目的之一是探索这两个领域之间的共同边界,并架设一座具有任一领域背景的专业人士都可以访问的桥梁。

书籍目录:

1. Exact Dynamic Programming 1.1. Deterministic Dynamic Programming 1.1.1. Deterministic Problems 1.1.2. The Dynamic Programming Algorithm 1.1.3. Approximation in Value Space 1.2. Stochastic Dynamic Programming 1.3. Examples, Variations, and Simplifications 1.3.1. Deterministic Shortest Path Problems 1.3.2. Discrete Deterministic Optimization 1.3.3. Problems with a Termination State 1.3.4. Forecasts 1.3.5. Problems with Uncontrollable State Components 1.3.6. Partial State Information and Belief States 1.3.7. Linear Quadratic Optimal Control 1.3.8. Systems with Unknown Parameters - Adaptive Control 1.4. Reinforcement Learning and Optimal Control - Some Terminology 1.5. Notes and Sources 2. Approximation in Value Space 2.1. Approximation Approaches in Reinforcement Learning 2.1.1. General Issues of Approximation in Value Space 2.1.2. Off-Line and On-Line Methods 2.1.3. Model-Based Simplification of the Lookahead Minimization 2.1.4. Model-Free Q-Factor Approximation in Value Space 2.1.5. Approximation in Policy Space on Top of Approximation in Value Space 2.1.6. When is Approximation in Value Space Effective? 2.2. Multistep Lookahead 2.2.1. Multistep Lookahead and Rolling Horizon 2.2.2. Multistep Lookahead and Deterministic Problems 2.3. Problem Approximation 2.3.1. Enforced Decomposition 2.3.2. Probabilistic Approximation - Certainty Equivalent Control 2.4. Rollout 2.4.1. On-Line Rollout for Deterministic Discrete Optimization 2.4.2. Stochastic Rollout and Monte Carlo Tree Search 2.4.3. Rollout with an Expert 2.5. On-Line Rollout for Deterministic Infinite-Spaces Problems - Optimization Heuristics 2.5.1. Model Predictive Control 2.5.2. Target Tubes and the Constrained Controllability Condition 2.5.3. Variants of Model Predictive Control 2.6. Notes and Sources 3. Parametric Approximation 3.1. Approximation Architectures 3.1.1. Linear and Nonlinear Feature-Based Architectures 3.1.2. Training of Linear and Nonlinear Architectures 3.1.3. Incremental Gradient and Newton Methods 3.2. Neural Networks 3.2.1. Training of Neural Networks 3.2.2. Multilayer and Deep Neural Networks 3.3. Sequential Dynamic Programming Approximation 3.4. Q-Factor Parametric Approximation 3.5. Parametric Approximation in Policy Space by Classification 3.6. Notes and Sources 4. Infinite Horizon Dynamic Programming 4.1. An Overview of Infinite Horizon Problems 4.2. Stochastic Shortest Path Problems 4.3. Discounted Problems 4.4. Semi-Markov Discounted Problems 4.5. Asynchronous Distributed Value Iteration 4.6. Policy Iteration 4.6.1. Exact Policy Iteration 4.6.2. Optimistic and Multistep Lookahead Policy Iteration 4.6.3. Policy Iteration for Q-factors 4.7. Notes and Sources 4.8. Appendix: Mathematical Analysis 4.8.1. Proofs for Stochastic Shortest Path Problems 4.8.2. Proofs for Discounted Problems 4.8.3. Convergence of Exact and Optimistic Policy Iteration 5. Infinite Horizon Reinforcement Learning 5.1. Approximation in Value Space - Performance Bounds 5.1.1. Limited Lookahead 5.1.2. Rollout 5.1.3. Approximate Policy Iteration 5.2. Fitted Value Iteration 5.3. Simulation-Based Policy Iteration with Parametric Approximation 5.3.1. Self-Learning and Actor-Critic Methods 5.3.2. A Model-Based Variant 5.3.3. A Model-Free Variant 5.3.4. Implementation Issues of Parametric Policy Iteration 5.3.5. Convergence Issues of Parametric Policy Iteration - Oscillations 5.4. Q-Learning 5.4.1. Optimistic Policy Iteration with Parametric Q-Factor Approximation - SARSA and DQN 5.5. Additional Methods - Temporal Differences 5.6. Exact and Approximate Linear Programming 5.7. Approximation in Policy Space 5.7.1. Training by Cost Optimization - Policy Gradient, Cross-Entropy, and Random Search Methods 5.7.2. Expert-Based Supervised Learning 5.7.3. Approximate Policy Iteration, Rollout, and Approximation in Policy Space 5.8. Notes and Sources 5.9. Appendix: Mathematical Analysis 5.9.1. Performance Bounds for Multistep Lookahead 5.9.2. Performance Bounds for Rollout 5.9.3. Performance Bounds for Approximate Policy Iteration 6. Aggregation 6.1. Aggregation with Representative States 6.1.1. Continuous Control Space Discretization 6.1.2. Continuous State Space - POMDP Discretization 6.2. Aggregation with Representative Features 6.2.1. Hard Aggregation and Error Bounds 6.2.2. Aggregation Using Features 6.3. Methods for Solving the Aggregate Problem 6.3.1. Simulation-Based Policy Iteration 6.3.2. Simulation-Based Value Iteration and Q-Learning 6.4. Feature-Based Aggregation with a Neural Network 6.5. Biased Aggregation 6.6. Notes and Sources 6.7. Appendix: Mathematical Analysis

作者简介:

Dimitri P. Bertseka,美国MIT终身教授,美国国家工程院院士,清华大学复杂与网络化系统研究中心客座教授。电气工程与计算机科学领域国际知名作者,著有《非线性规划》《网络优化》《凸优化》等十几本畅销教材和专著。

其它内容:

暂无其它内容!


下载点评

  • 稳定(441+)
  • 过期(805+)
  • 流畅(950+)
  • 批注(176+)
  • 宝藏(384+)
  • 影印(639+)
  • 图文(312+)
  • 破损(483+)
  • 带目录(845+)
  • 惊喜(214+)
  • 精校(888+)
  • 无损(607+)
  • 直链(418+)
  • 最新(245+)
  • 可听读(238+)
  • 水印(254+)
  • 可打印(892+)
  • MOBI(770+)
  • 感谢(735+)

下载评论

  • 用户1720498439: ( 2024-07-09 12:13:59 )

    音频功能搭配PDF/AZW3格式,无损数字阅读体验,推荐下载。

  • 用户1722902777: ( 2024-08-06 08:06:17 )

    图文版电子书下载秒传,支持EPUB/TXT格式导出,操作便捷。

  • 用户1726374498: ( 2024-09-15 12:28:18 )

    优质的小说资源,音频设计提升阅读体验,操作便捷。

  • 用户1728661096: ( 2024-10-11 23:38:16 )

    完整的小说资源,音频设计提升阅读体验,操作便捷。

  • 用户1730903178: ( 2024-11-06 22:26:18 )

    极速下载AZW3/TXT文件,完整学术推荐收藏,值得收藏。


相关书评

暂时还没有人为这本书评论!


以下书单推荐