Datasets ======== |backendall_full| .. contents:: :depth: 1 :local: Dataset declarations allow specifications to reference external data at scale, without having to replicate it manually in the specification. Basics ------ Datasets are declared as follows using the ``@dataset`` annotation: .. code-block:: agda @dataset myDataset : Tensor Real [28, 28] Datasets can be any type :code:`t` that can be constructed from the following grammar: .. code-block:: agda t ::= List r | Vector r n | Tensor s ns s ::= Index n | Nat | Int | Real r ::= t | s where :code:`n` is a known constant and :code:`ns` is a list of known constants. Once declared, datasets can be used as any other named ``List``, ``Vector`` or ``Tensor`` would be, e.g. .. code-block:: agda forall x in myDataset . robustAround x Supported formats ----------------- The actual implementation of the dataset is provided when using the specification during training or verification. At the moment only the ``IDX`` format is supported. A full description of this format can be found at the bottom of this `page `_. There are numerous libraries for converting datasets into this format: - Python - https://github.com/ivanyu/idx2numpy - C++ - https://github.com/JohnnyVM/idx-library - Haskell - https://hackage.haskell.org/package/mnist-idx If you would be interested in other formats being supported, please get in touch. Multiple, related datasets -------------------------- Notice that the type restrictions on datasets means that you can't currently have heterogeneously-typed data within a single dataset. However, a common use case for heterogenous datasets is to import a training dataset with both training examples and training labels. An explanation of how to achieve this with inferrable parameters is given in the parameters documentation.