Seed Readers
Seed readers are engine-side adapters that turn a configured seed source into tabular seed rows. The engine attaches a SeedSource and secret resolver, asks the reader for column names and dataset size, then streams batches into generation.
Related pages: seeds, Seed Datasets, and Build Your Own.
Core Contracts
SeedReader
Bases: ABC, Generic[SourceT]
Base class for reading a seed dataset.
Seeds are read using duckdb. Reader implementations define duckdb connection setup details
and how to get a URI that can be queried with duckdb (i.e. "... FROM
The Data Designer engine automatically supplies the appropriate SeedSource
and a SecretResolver to use for any secret fields in the config via
attach(...). Subclasses that need per-attachment setup can override
on_attach(...) without needing to call super().
Methods:
| Name | Description |
|---|---|
attach |
Attach a source and secret resolver to the instance. |
create_filesystem_context |
Create a rooted filesystem context for directory-backed seed readers. |
get_column_names |
Returns the seed dataset's column names |
get_seed_type |
Return the seed_type of the source class this reader is generic over. |
on_attach |
Hook for subclasses that need per-attachment setup. |
attach(source, secret_resolver)
Attach a source and secret resolver to the instance.
This is called internally by the engine so that these objects do not need to be provided in the reader's constructor.
Source code in packages/data-designer-engine/src/data_designer/engine/resources/seed_reader.py
185 186 187 188 189 190 191 192 193 194 | |
create_filesystem_context(root_path)
Create a rooted filesystem context for directory-backed seed readers.
Source code in packages/data-designer-engine/src/data_designer/engine/resources/seed_reader.py
234 235 236 237 238 | |
get_column_names()
Returns the seed dataset's column names
Source code in packages/data-designer-engine/src/data_designer/engine/resources/seed_reader.py
266 267 268 269 270 271 272 | |
get_seed_type()
Return the seed_type of the source class this reader is generic over.
Source code in packages/data-designer-engine/src/data_designer/engine/resources/seed_reader.py
306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 | |
on_attach()
Hook for subclasses that need per-attachment setup.
Source code in packages/data-designer-engine/src/data_designer/engine/resources/seed_reader.py
196 197 | |
FileSystemSeedReader
Bases: SeedReader[FileSystemSourceT], ABC
Base class for filesystem-derived seed readers.
Plugin authors implement build_manifest(...) to describe the cheap logical
rows available under the configured filesystem root. Readers that need
expensive enrichment can optionally override hydrate_row(...) to emit one
record dict or an iterable of record dicts per manifest row. When emitted
records change the manifest schema, output_columns must declare the exact
hydrated output schema for each emitted record. The framework owns
attachment-scoped filesystem context reuse, manifest sampling, partitioning,
randomization, batching, and DuckDB registration details.
SeedReaderFileSystemContext
SeedReaderBatch
Bases: Protocol
SeedReaderBatchReader
Bases: Protocol
PandasSeedReaderBatch
create_seed_reader_output_dataframe
Source code in packages/data-designer-engine/src/data_designer/engine/resources/seed_reader.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | |
Built-In Readers
LocalFileSeedReader
Bases: SeedReader[LocalFileSeedSource]
HuggingFaceSeedReader
Bases: SeedReader[HuggingFaceSeedSource]
DataFrameSeedReader
Bases: SeedReader[DataFrameSeedSource]
DirectorySeedReader
Bases: FileSystemSeedReader[DirectorySeedSource]
FileContentsSeedReader
Bases: FileSystemSeedReader[FileContentsSeedSource]
AgentRolloutSeedReader
Bases: FileSystemSeedReader[AgentRolloutSeedSource]
Registry and Errors
SeedReaderRegistry
Source code in packages/data-designer-engine/src/data_designer/engine/resources/seed_reader.py
651 652 653 654 | |
SeedReaderError
Bases: DataDesignerError