prevision_quantum_nn.applications.reinforcement_learning.policy¶
Policy module
contains the classes to handle policies in the Reinforcement Learning framework.
Module Contents¶
Classes¶
Policy |
Policy, base class for definiing policies. |
BehaviorPolicy |
Behavior Policy. |
TargetPolicy |
Target Policy. |
-
prevision_quantum_nn.applications.reinforcement_learning.policy.LEARNING_TYPES= ['monte_carlo', 'q_learning', 'td_learning']¶
-
class
prevision_quantum_nn.applications.reinforcement_learning.policy.Policy(params)¶ Policy, base class for definiing policies.
-
params¶ parameters of the policy
Type: dictionary
-
iteration¶ the current iteration at which the policy is
Type: int
-
epsilon¶ the parameter of the epsilon-greedy method
Type: float
-
epsilon_decay¶ the decay parameter that will decay epsilon after each iteration
Type: float
-
gamma¶ the parameter of the temporal difference method
Type: float
-
learner¶ the learner used to evaluate the Q-value function
Type: RLLearner
-
use_memory_replay¶ if True, memory replay will be activated
Type: bool
-
memory_replay_period¶ the period of the memory replay method
Type: int
-
learning_type¶ one instance of learning_types. It can be “monte_carlo”, “q_learning” or “td_learning”
Type: str
-
associate_learner(self, learner)¶ associates a learner to the policy
-
step(self, state)¶ Step one iteration further.
Parameters: state (numpy array) – state at which the agent is before taking an action Returns: - int
- action taken by the policy
Return type: action
-
update_epsilon_greedy_parameter(self)¶ updates epsilon if higher than minimum
-
fit_learner(self, state, action, reward, new_state)¶ Fits the learner.
Parameters: - state (numpy array) – state at which the learner is
- action (int) – action that has just been taken in that state
- reward (float) – reward obtained by taking this action in the current state
-
get_action(self, state, action=None)¶ get action for current state to be overridden
-
-
class
prevision_quantum_nn.applications.reinforcement_learning.policy.BehaviorPolicy(params)¶ Bases:
prevision_quantum_nn.applications.reinforcement_learning.policy.PolicyBehavior Policy.
Implements an exploratory policy with the epsilon greedy method.
-
get_action(self, state, action=None)¶ Returns an action to be taken by the agent.
According to epsilon greedy policy.
Parameters: - state (numpy array) – state at which the agent is
- action (int) – action that has just been taken
-
-
class
prevision_quantum_nn.applications.reinforcement_learning.policy.TargetPolicy(params)¶ Bases:
prevision_quantum_nn.applications.reinforcement_learning.policy.PolicyTarget Policy.
Implements an purely exploitation policy without random actions
-
get_action(self, state, action=None)¶ Get action.
Parameters: state (numpy array) – state at which the action needs to be taken Returns: - int
- the action taken at that state
Return type: action
-