Seeds

Seed configs declare existing data used as input during generation. A SeedConfig combines a seed source with optional row sampling and selection settings. Seed source objects declare where seed data comes from; the engine reads them through seed readers.

Use these objects with DataDesignerConfigBuilder.with_seed_dataset(). Related pages: Seed Datasets and seed readers.

Built-in seed sources include local files, Hugging Face paths, in-memory DataFrames, directories, file contents, and agent rollout traces. Plugin seed sources can extend the same discriminated union through the plugin system.

Seed Config

Classes:

Name	Description
`SeedConfig`	Configuration for sampling data from a seed dataset.

`SeedConfig`

Bases: ConfigBase

Configuration for sampling data from a seed dataset.

Attributes:

Name	Type	Description
`source`	`SeedSourceT`	A SeedSource defining where the seed data exists
`sampling_strategy`	`SamplingStrategy`	Strategy for how to sample rows from the dataset. - ORDERED: Read rows sequentially in their original order. - SHUFFLE: Randomly shuffle rows before sampling. When used with selection_strategy, shuffling occurs within the selected range/partition.
`selection_strategy`	`IndexRange \| PartitionBlock \| None`	Optional strategy to select a subset of the dataset. - IndexRange: Select a specific range of indices (e.g., rows 100-200). - PartitionBlock: Select a partition by splitting the dataset into N equal parts. Partition indices are zero-based (index=0 is the first partition, index=1 is the second, etc.).

Examples:

Read rows sequentially from start to end: SeedConfig( source=LocalFileSeedSource(path="my_data.parquet"), sampling_strategy=SamplingStrategy.ORDERED )

Read rows in random order: SeedConfig( source=LocalFileSeedSource(path="my_data.parquet"), sampling_strategy=SamplingStrategy.SHUFFLE )

Read specific index range (rows 100-199): SeedConfig( source=LocalFileSeedSource(path="my_data.parquet"), sampling_strategy=SamplingStrategy.ORDERED, selection_strategy=IndexRange(start=100, end=199) )

Read random rows from a specific index range (shuffles within rows 100-199): SeedConfig( source=LocalFileSeedSource(path="my_data.parquet"), sampling_strategy=SamplingStrategy.SHUFFLE, selection_strategy=IndexRange(start=100, end=199) )

Read from partition 2 (3rd partition, zero-based) of 5 partitions (20% of dataset): SeedConfig( source=LocalFileSeedSource(path="my_data.parquet"), sampling_strategy=SamplingStrategy.ORDERED, selection_strategy=PartitionBlock(index=2, num_partitions=5) )

Read shuffled rows from partition 0 of 10 partitions (shuffles within the partition): SeedConfig( source=LocalFileSeedSource(path="my_data.parquet"), sampling_strategy=SamplingStrategy.SHUFFLE, selection_strategy=PartitionBlock(index=0, num_partitions=10) )

Built-In Seed Sources

Classes:

Name	Description
`FileSystemSeedSource`	Base class for seed sources backed by a directory of files.
`SeedSource`	Base class for seed dataset configurations.

`FileSystemSeedSource`

Bases: SeedSource, ABC

Base class for seed sources backed by a directory of files.

Use this base when a seed reader needs to enumerate files under a directory on disk and turn each (or groups of them) into seed rows. Concrete plugin configs declare a Literal seed_type and pair with a FileSystemSeedReader implementation.

Attributes:

Name	Type	Description
`path`	`str`	Directory containing seed artifacts. Relative paths are resolved from the current working directory when the config is loaded, not from the config file location.
`file_pattern`	`str`	Case-sensitive filename pattern used to match files under the provided directory. Patterns match basenames only, not relative paths. Defaults to `'*'`.
`recursive`	`bool`	Whether to search nested subdirectories under the provided directory for matching files. Defaults to `True`.

`SeedSource`

Bases: BaseModel, ABC

Base class for seed dataset configurations.

All subclasses must define a seed_type field with a Literal value. This serves as a discriminated union discriminator.

Attributes:

Name	Type	Description
`seed_type`	`str`	Discriminator field that identifies the specific seed source type. Subclasses must override this field with a `Literal` value.

Seeds

Seed Config

SeedConfig

Built-In Seed Sources

FileSystemSeedSource

SeedSource

DataFrame Seed Source

`SeedConfig`

`FileSystemSeedSource`

`SeedSource`