DQN Replay dataset

Submitted by on Jan 27 2021 } Suggest Revision
By: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi
Resource Type:
Data
License:
CC BY-SA 4.0
Language:
not code
Data Format:
Images, Other

Description

The DQN Replay Dataset is generated using DQN agents trained on 60 Atari 2600 games for 200 million frames each, while using sticky actions (with 25% probability that the agent’s previous action is executed instead of the current action) to make the problem more challenging. For each of the 60 games, we train 5 DQN agents with different random initializations, and store all of the (state, action, reward, next state) tuples encountered during training into 5 replay datasets per game, resulting in a total of 300 datasets. The DQN Replay Dataset can be used for training offline RL agents, without any interaction with the environment during training. Each game replay dataset is approximately 3.5 times larger than ImageNet and includes samples from all of the intermediate policies seen during the optimization of online DQN.
Post comment
Cancel