Skip to content

socool1003/Markov_chain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Abstract: This project systematically demonstrates the hierarchical application of Markov processes in robot state analysis and decision optimization through a mobile robot navigation example. Starting from Markov chain modeling, the project delves into the limitations of predicting stochastic system behavior, then further upgrades to a Markov Decision Process (MDP) to achieve optimal policy solutions, aiming to provide theoretical and practical foundations for intelligent robot control under uncertainty.

Part 1: Robot Motion Analysis Based on Markov Chains

This section first establishes a simplified mobile robot model navigating within a 3x3 grid environment. Its purpose is to analyze the robot's state transition patterns under the influence of both a fixed control policy and random movement noise.

Key Findings:

  1. Problem Modeling: Defined 8 legal grid cells as system states and constructed an $8 \times 8$ probability transition matrix based on a fixed "preferential right-turn" policy and 80%/10%/10% movement noise (intended direction/left/right).
  2. Short-Term State Evolution: Simulated and calculated the robot's positional probability distribution at different time steps (k=0, 5, 10, 20) starting from an initial position, visually demonstrating its movement trend through heatmaps.
  3. Long-Term Stationary Distribution: Solved for the stationary distribution of the Markov chain, revealing that under the current fixed policy, the robot would be "stuck" in the bottom-right corner ($S_8$) for approximately 77% of the time, unable to effectively explore the entire environment, highlighting the severe limitations of a fixed policy.

Part 2: Policy Optimization Based on Markov Decision Processes

To overcome the fixed policy deficiencies identified in the Markov chain analysis, this project further elevated the problem to a Markov Decision Process (MDP) model. By introducing an action space and reward function, the problem was transformed from passive system analysis into an active task of solving for an optimal policy.

Key Content & Discoveries:

  1. MDP Modeling: Redefined the action space (Up, Down, Left, Right) and the reward function (high reward for target $S_3$, cost for movement, penalty for collision), and set a discount factor $\gamma=0.9$.
  2. Goal-Oriented Policy: Theoretical analysis showed that solving the MDP through algorithms like value iteration or policy iteration could yield an optimal policy. This policy effectively guides the robot to efficiently reach the target position ($S_3$).
  3. Breaking Limitations: The MDP framework enabled the learning of intelligent, goal-oriented behaviors, completely resolving the problem of the robot being "stuck" as seen in the Markov chain analysis, thereby optimizing robot behavior and facilitating more effective environmental exploration.

Technical Stack & Dependencies

  • Programming Language: Python
  • Core Libraries: numpy (for numerical computation and matrix operations), matplotlib (for plotting), seaborn (for heatmap visualization).

Project Summary & Learning Outcomes

Through this project, I systematically constructed and simulated methods for modeling, analyzing, and optimizing mobile robot systems in uncertain environments. Markov chains provided a quantitative tool for understanding stochastic system behavior, while Markov Decision Processes offered a powerful theoretical framework for designing intelligent, goal-oriented behaviors.

About

A simulator for analyzing robot navigation using Markov Chains and MDPs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages