prevision_quantum_nn.applications.reinforcement_learning.policy

Policy module

contains the classes to handle policies in the Reinforcement Learning framework.

Module Contents

Classes

Policy

Policy, base class for definiing policies.

BehaviorPolicy

Behavior Policy.

TargetPolicy

Target Policy.

prevision_quantum_nn.applications.reinforcement_learning.policy.LEARNING_TYPES = ['monte_carlo', 'q_learning', 'td_learning']
class prevision_quantum_nn.applications.reinforcement_learning.policy.Policy(params)

Policy, base class for definiing policies.

params

parameters of the policy

Type

dictionary

iteration

the current iteration at which the policy is

Type

int

epsilon

the parameter of the epsilon-greedy method

Type

float

epsilon_decay

the decay parameter that will decay epsilon after each iteration

Type

float

gamma

the parameter of the temporal difference method

Type

float

learner

the learner used to evaluate the Q-value function

Type

RLLearner

use_memory_replay

if True, memory replay will be activated

Type

bool

memory_replay_period

the period of the memory replay method

Type

int

learning_type

one instance of learning_types. It can be “monte_carlo”, “q_learning” or “td_learning”

Type

str

associate_learner(self, learner)

associates a learner to the policy

step(self, state)

Step one iteration further.

Parameters

state (numpy array) – state at which the agent is before taking an action

Returns

int

action taken by the policy

Return type

action

update_epsilon_greedy_parameter(self)

updates epsilon if higher than minimum

fit_learner(self, state, action, reward, new_state)

Fits the learner.

Parameters
  • state (numpy array) – state at which the learner is

  • action (int) – action that has just been taken in that state

  • reward (float) – reward obtained by taking this action in the current state

abstract get_action(self, state, action=None)

get action for current state to be overridden

class prevision_quantum_nn.applications.reinforcement_learning.policy.BehaviorPolicy(params)

Bases: prevision_quantum_nn.applications.reinforcement_learning.policy.Policy

Behavior Policy.

Implements an exploratory policy with the epsilon greedy method.

get_action(self, state, action=None)

Returns an action to be taken by the agent.

According to epsilon greedy policy.

Parameters
  • state (numpy array) – state at which the agent is

  • action (int) – action that has just been taken

class prevision_quantum_nn.applications.reinforcement_learning.policy.TargetPolicy(params)

Bases: prevision_quantum_nn.applications.reinforcement_learning.policy.Policy

Target Policy.

Implements an purely exploitation policy without random actions

get_action(self, state, action=None)

Get action.

Parameters

state (numpy array) – state at which the action needs to be taken

Returns

int

the action taken at that state

Return type

action