prevision_quantum_nn.applications.reinforcement_learning.policy

Policy module

contains the classes to handle policies in the Reinforcement Learning framework.

Module Contents

Classes

Policy Policy, base class for definiing policies.
BehaviorPolicy Behavior Policy.
TargetPolicy Target Policy.
prevision_quantum_nn.applications.reinforcement_learning.policy.LEARNING_TYPES = ['monte_carlo', 'q_learning', 'td_learning']
class prevision_quantum_nn.applications.reinforcement_learning.policy.Policy(params)

Policy, base class for definiing policies.

params

parameters of the policy

Type:dictionary
iteration

the current iteration at which the policy is

Type:int
epsilon

the parameter of the epsilon-greedy method

Type:float
epsilon_decay

the decay parameter that will decay epsilon after each iteration

Type:float
gamma

the parameter of the temporal difference method

Type:float
learner

the learner used to evaluate the Q-value function

Type:RLLearner
use_memory_replay

if True, memory replay will be activated

Type:bool
memory_replay_period

the period of the memory replay method

Type:int
learning_type

one instance of learning_types. It can be “monte_carlo”, “q_learning” or “td_learning”

Type:str
associate_learner(self, learner)

associates a learner to the policy

step(self, state)

Step one iteration further.

Parameters:state (numpy array) – state at which the agent is before taking an action
Returns:
int
action taken by the policy
Return type:action
update_epsilon_greedy_parameter(self)

updates epsilon if higher than minimum

fit_learner(self, state, action, reward, new_state)

Fits the learner.

Parameters:
  • state (numpy array) – state at which the learner is
  • action (int) – action that has just been taken in that state
  • reward (float) – reward obtained by taking this action in the current state
get_action(self, state, action=None)

get action for current state to be overridden

class prevision_quantum_nn.applications.reinforcement_learning.policy.BehaviorPolicy(params)

Bases: prevision_quantum_nn.applications.reinforcement_learning.policy.Policy

Behavior Policy.

Implements an exploratory policy with the epsilon greedy method.

get_action(self, state, action=None)

Returns an action to be taken by the agent.

According to epsilon greedy policy.

Parameters:
  • state (numpy array) – state at which the agent is
  • action (int) – action that has just been taken
class prevision_quantum_nn.applications.reinforcement_learning.policy.TargetPolicy(params)

Bases: prevision_quantum_nn.applications.reinforcement_learning.policy.Policy

Target Policy.

Implements an purely exploitation policy without random actions

get_action(self, state, action=None)

Get action.

Parameters:state (numpy array) – state at which the action needs to be taken
Returns:
int
the action taken at that state
Return type:action