From 5cf4b9b5cbd88c094427cffff5a4881d3cc48118 Mon Sep 17 00:00:00 2001 From: Ojaswi Chopra Date: Sat, 22 Jun 2024 23:01:52 +0530 Subject: [PATCH] Trial-2 --- contrib/machine-learning/reinforcement-learning.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/contrib/machine-learning/reinforcement-learning.md b/contrib/machine-learning/reinforcement-learning.md index 47c350b..760d530 100644 --- a/contrib/machine-learning/reinforcement-learning.md +++ b/contrib/machine-learning/reinforcement-learning.md @@ -300,11 +300,11 @@ Congratulations on completing your journey through this comprehensive guide to r *Happy coding, and may your RL adventures be rewarding!* -\( Q(s, a) \leftarrow Q(s, a) + \alpha \left( r + \gamma \max_{a'} Q(s', a') - Q(s, a) \right) \) +$$ Q(s, a) \leftarrow Q(s, a) + \alpha \left( r + \gamma \max_{a'} Q(s', a') - Q(s, a) \right) $$ where: -- \( Q(s, a) \) is the Q-value of state \( s \) and action \( a \). -- \( r \) is the observed reward. -- \( s' \) is the next state. -- \( \alpha \) is the learning rate. -- \( \gamma \) is the discount factor. +- $Q(s, a)$ is the Q-value of state $s$ and action $a$. +- $r$ is the observed reward. +- $s'$ is the next state. +- $\alpha$ is the learning rate. +- $\gamma$ is the discount factor.