Deep Deterministic Policy Gradient¶

Deep Deterministic Policy Gradient (DDPG).

class jax_agents.algorithms.ddpg.DDPG(config: jax_agents.algorithms.ddpg.DDPGConfig)¶

Bases: object

classmethod load(config, state_path)¶: Return a DDPG instance initialized with the pickled algo_state.

select_action(state, algo_func, algo_state)¶: Output on policy action.

train_step(data_batch, algo_func, algo_state)¶: Update all functions.

class jax_agents.algorithms.ddpg.DDPGConfig¶

Bases: tuple

Config to initialize DDPG.

Parameters:

state_dim – the dimension of the state vector.
action_dim – the dimension of the action vector.
pi_net_size – a list of int corresponding to the hidden sizes of the policy network.
q_net_size – a list of int corresponding to the hidden sizes of the q network.
learning_rate – the learning rate of the adam optimizer.
gamma – the discount factor of the algorithm.
seed – the random seed for initialization of the networks.

class jax_agents.algorithms.ddpg.DDPGFunc¶

Bases: tuple

Config to initialize the DDPG functions.

Parameters:	pi_net – policy neural network q_net – q function neural network pi_optimizer – policy optimizer (adam) q_optimizer – q function optimizer (adam) gamma – discount factor state_dim – dimension of the state vector. action_dim – dimension of the action vector.

class jax_agents.algorithms.ddpg.DDPGState¶

Bases: tuple

State of the DDPG networks.

Parameters:	pi_params – policy neural network parameters q_params – q function neural network parameters pi_opt_state – state of the policy optimizer (adam) q_opt_state – state of the q function optimizer (adam)