prevision_quantum_nn.applications.reinforcement_learning.policy
¶
Policy module
contains the classes to handle policies in the Reinforcement Learning framework.
Module Contents¶
Classes¶
Policy, base class for definiing policies. |
|
Behavior Policy. |
|
Target Policy. |
-
prevision_quantum_nn.applications.reinforcement_learning.policy.
LEARNING_TYPES
= ['monte_carlo', 'q_learning', 'td_learning']¶
-
class
prevision_quantum_nn.applications.reinforcement_learning.policy.
Policy
(params)¶ Policy, base class for definiing policies.
-
params
¶ parameters of the policy
- Type
dictionary
-
iteration
¶ the current iteration at which the policy is
- Type
int
-
epsilon
¶ the parameter of the epsilon-greedy method
- Type
float
-
epsilon_decay
¶ the decay parameter that will decay epsilon after each iteration
- Type
float
-
gamma
¶ the parameter of the temporal difference method
- Type
float
-
learner
¶ the learner used to evaluate the Q-value function
- Type
RLLearner
-
use_memory_replay
¶ if True, memory replay will be activated
- Type
bool
-
memory_replay_period
¶ the period of the memory replay method
- Type
int
-
learning_type
¶ one instance of learning_types. It can be “monte_carlo”, “q_learning” or “td_learning”
- Type
str
-
associate_learner
(self, learner)¶ associates a learner to the policy
-
step
(self, state)¶ Step one iteration further.
- Parameters
state (numpy array) – state at which the agent is before taking an action
- Returns
- int
action taken by the policy
- Return type
action
-
update_epsilon_greedy_parameter
(self)¶ updates epsilon if higher than minimum
-
fit_learner
(self, state, action, reward, new_state)¶ Fits the learner.
- Parameters
state (numpy array) – state at which the learner is
action (int) – action that has just been taken in that state
reward (float) – reward obtained by taking this action in the current state
-
abstract
get_action
(self, state, action=None)¶ get action for current state to be overridden
-
-
class
prevision_quantum_nn.applications.reinforcement_learning.policy.
BehaviorPolicy
(params)¶ Bases:
prevision_quantum_nn.applications.reinforcement_learning.policy.Policy
Behavior Policy.
Implements an exploratory policy with the epsilon greedy method.
-
get_action
(self, state, action=None)¶ Returns an action to be taken by the agent.
According to epsilon greedy policy.
- Parameters
state (numpy array) – state at which the agent is
action (int) – action that has just been taken
-
-
class
prevision_quantum_nn.applications.reinforcement_learning.policy.
TargetPolicy
(params)¶ Bases:
prevision_quantum_nn.applications.reinforcement_learning.policy.Policy
Target Policy.
Implements an purely exploitation policy without random actions
-
get_action
(self, state, action=None)¶ Get action.
- Parameters
state (numpy array) – state at which the action needs to be taken
- Returns
- int
the action taken at that state
- Return type
action
-