torchrecurrent.benchmarks.copy_memory#

torchrecurrent.benchmarks.copy_memory(seq_len, n_samples, num_classes=10, **kwargs)[source]#

Generate data for the copy memory benchmark.

The copy memory task is a synthetic sequence learning problem where a model must memorize and reproduce an input sequence after a long delay. Each sample consists of:

A random sequence of integers (the content to be memorized).
A delimiter symbol marking the end of the input.
A sequence of zeros acting as distractors.
The target sequence requires the model to output padding until the delimiter, then reproduce the original random sequence.

Parameters:

seq_len (int) – Length of the random sequence to memorize.
n_samples (int) – Number of samples to generate.
num_classes (int, optional) – Number of distinct classes used for the random sequence. Defaults to 10. The delimiter token uses the value num_classes.
**kwargs – Additional keyword arguments passed to torch.utils.data.DataLoader (e.g. batch_size, shuffle).

Returns:

A DataLoader yielding batches of (input_seq, target_seq) where:

input_seq has shape (n_samples, 2 * seq_len + 1) and contains the random sequence, followed by a delimiter token, followed by distractor zeros.
target_seq has shape (n_samples, 2 * seq_len + 1) and contains padding + delimiter, followed by the original random sequence.

Return type:

torch.utils.data.DataLoader