`prevision_quantum_nn.applications.reinforcement_learning.policy`¶

Policy module

contains the classes to handle policies in the Reinforcement Learning framework.

Module Contents¶

Classes¶

`Policy`	Policy, base class for definiing policies.
`BehaviorPolicy`	Behavior Policy.
`TargetPolicy`	Target Policy.

prevision_quantum_nn.applications.reinforcement_learning.policy.LEARNING_TYPES = ['monte_carlo', 'q_learning', 'td_learning']¶

class prevision_quantum_nn.applications.reinforcement_learning.policy.Policy(params)¶

Policy, base class for definiing policies.

params¶

parameters of the policy

Type: dictionary

iteration¶

the current iteration at which the policy is

Type: int

epsilon¶

the parameter of the epsilon-greedy method

Type: float

epsilon_decay¶

the decay parameter that will decay epsilon after each iteration

Type: float

gamma¶

the parameter of the temporal difference method

Type: float

learner¶

the learner used to evaluate the Q-value function

Type: RLLearner

use_memory_replay¶

if True, memory replay will be activated

Type: bool

memory_replay_period¶

the period of the memory replay method

Type: int

learning_type¶

one instance of learning_types. It can be “monte_carlo”, “q_learning” or “td_learning”

Type: str

associate_learner(self, learner)¶: associates a learner to the policy

step(self, state)¶

Step one iteration further.

Parameters

state (numpy array) – state at which the agent is before taking an action

Returns

int: action taken by the policy

Return type

action

update_epsilon_greedy_parameter(self)¶: updates epsilon if higher than minimum

fit_learner(self, state, action, reward, new_state)¶

Fits the learner.

Parameters

state (numpy array) – state at which the learner is
action (int) – action that has just been taken in that state
reward (float) – reward obtained by taking this action in the current state

abstract get_action(self, state, action=None)¶: get action for current state to be overridden

class prevision_quantum_nn.applications.reinforcement_learning.policy.BehaviorPolicy(params)¶

Bases: prevision_quantum_nn.applications.reinforcement_learning.policy.Policy

Behavior Policy.

Implements an exploratory policy with the epsilon greedy method.

get_action(self, state, action=None)¶

Returns an action to be taken by the agent.

According to epsilon greedy policy.

Parameters

state (numpy array) – state at which the agent is
action (int) – action that has just been taken

class prevision_quantum_nn.applications.reinforcement_learning.policy.TargetPolicy(params)¶

Bases: prevision_quantum_nn.applications.reinforcement_learning.policy.Policy

Target Policy.

Implements an purely exploitation policy without random actions

get_action(self, state, action=None)¶

Get action.

Parameters

state (numpy array) – state at which the action needs to be taken

Returns

int: the action taken at that state

Return type

action

prevision_quantum_nn.applications.reinforcement_learning.policy¶

Module Contents¶

Classes¶

`prevision_quantum_nn.applications.reinforcement_learning.policy`¶