`prevision_quantum_nn.applications.reinforcement_learning.policy`¶

Policy module

contains the classes to handle policies in the Reinforcement Learning framework.

Module Contents¶

Classes¶

`Policy`	Policy, base class for definiing policies.
`BehaviorPolicy`	Behavior Policy.
`TargetPolicy`	Target Policy.

prevision_quantum_nn.applications.reinforcement_learning.policy.LEARNING_TYPES = ['monte_carlo', 'q_learning', 'td_learning']¶

class prevision_quantum_nn.applications.reinforcement_learning.policy.Policy(params)¶

Policy, base class for definiing policies.

params¶

parameters of the policy

Type:	dictionary

iteration¶

the current iteration at which the policy is

Type:	int

epsilon¶

the parameter of the epsilon-greedy method

Type:	float

epsilon_decay¶

the decay parameter that will decay epsilon after each iteration

Type:	float

gamma¶

the parameter of the temporal difference method

Type:	float

learner¶

the learner used to evaluate the Q-value function

Type:	RLLearner

use_memory_replay¶

if True, memory replay will be activated

Type:	bool

memory_replay_period¶

the period of the memory replay method

Type:	int

learning_type¶

one instance of learning_types. It can be “monte_carlo”, “q_learning” or “td_learning”

Type:	str

associate_learner(self, learner)¶: associates a learner to the policy

step(self, state)¶

Step one iteration further.

Parameters:	state (numpy array) – state at which the agent is before taking an action
Returns:	int action taken by the policy
Return type:	action

update_epsilon_greedy_parameter(self)¶: updates epsilon if higher than minimum

fit_learner(self, state, action, reward, new_state)¶

Fits the learner.

Parameters:	state (numpy array) – state at which the learner is action (int) – action that has just been taken in that state reward (float) – reward obtained by taking this action in the current state

get_action(self, state, action=None)¶: get action for current state to be overridden

class prevision_quantum_nn.applications.reinforcement_learning.policy.BehaviorPolicy(params)¶

Bases: prevision_quantum_nn.applications.reinforcement_learning.policy.Policy

Behavior Policy.

Implements an exploratory policy with the epsilon greedy method.

get_action(self, state, action=None)¶

Returns an action to be taken by the agent.

According to epsilon greedy policy.

Parameters:	state (numpy array) – state at which the agent is action (int) – action that has just been taken

class prevision_quantum_nn.applications.reinforcement_learning.policy.TargetPolicy(params)¶

Bases: prevision_quantum_nn.applications.reinforcement_learning.policy.Policy

Target Policy.

Implements an purely exploitation policy without random actions

get_action(self, state, action=None)¶

Get action.

Parameters:	state (numpy array) – state at which the action needs to be taken
Returns:	int the action taken at that state
Return type:	action

prevision_quantum_nn.applications.reinforcement_learning.policy¶

Module Contents¶

Classes¶

`prevision_quantum_nn.applications.reinforcement_learning.policy`¶