torchrecurrent.benchmarks.copy_memory#

torchrecurrent.benchmarks.copy_memory(seq_len, n_samples, num_classes=10, **kwargs)[source]#

Generate data for the copy memory benchmark.

The copy memory task is a synthetic sequence learning problem where a model must memorize and reproduce an input sequence after a long delay. Each sample consists of:

  • A random sequence of integers (the content to be memorized).

  • A delimiter symbol marking the end of the input.

  • A sequence of zeros acting as distractors.

  • The target sequence requires the model to output padding until the delimiter, then reproduce the original random sequence.

Parameters:
  • seq_len (int) – Length of the random sequence to memorize.

  • n_samples (int) – Number of samples to generate.

  • num_classes (int, optional) – Number of distinct classes used for the random sequence. Defaults to 10. The delimiter token uses the value num_classes.

  • **kwargs – Additional keyword arguments passed to torch.utils.data.DataLoader (e.g. batch_size, shuffle).

Returns:

A DataLoader yielding batches of (input_seq, target_seq) where:

  • input_seq has shape (n_samples, 2 * seq_len + 1) and contains the random sequence, followed by a delimiter token, followed by distractor zeros.

  • target_seq has shape (n_samples, 2 * seq_len + 1) and contains padding + delimiter, followed by the original random sequence.

Return type:

torch.utils.data.DataLoader