Column Generators
Column generators execute column generation in the Data Designer engine. A generator receives the upstream data needed for its task, returns row or batch data with generated values added, and reports the generation strategy the scheduler should use.
Related pages: column_configs, Build Your Own, Using Models in Plugins, and Custom Columns.
Configuration
User-facing column configs inherit from SingleColumnConfig and define a unique column_type discriminator. During compilation, the engine may group related configs into multi-column configs for generators that create sampler or seed columns together.
Generation strategy
Column generator base classes return GenerationStrategy values to tell the engine whether they run per row or over a full batch.
Implementation bases
Generators that operate on a full batch can inherit from ColumnGeneratorFullColumn. Row-oriented non-model generators can inherit from ColumnGeneratorCellByCell. Generators that create initial rows use FromScratchColumnGenerator. Model-backed plugin generators should use ColumnGeneratorWithModelRegistry or ColumnGeneratorWithModel; see Using Models in Plugins for authoring guidance.
ColumnGenerator
Bases: ConfigurableTask[TaskConfigT], ABC
Methods:
| Name | Description |
|---|---|
agenerate |
Async generate — delegates to sync |
generate |
Sync generate — overridden by most concrete generators. |
log_pre_generation |
A shared method to log info before the generator's |
Attributes:
| Name | Type | Description |
|---|---|---|
is_llm_bound |
bool
|
Whether this generator makes model/API calls during generation. |
is_order_dependent |
bool
|
Whether this generator's output depends on prior row-group calls. |
Source code in packages/data-designer-engine/src/data_designer/engine/configurable_task.py
24 25 26 27 28 | |
is_llm_bound
property
Whether this generator makes model/API calls during generation.
is_order_dependent
property
Whether this generator's output depends on prior row-group calls.
Example: SeedDatasetColumnGenerator tracks its position in the seed dataset, so row group N must complete before N+1 starts.
agenerate(data)
async
agenerate(data: dict) -> dict
agenerate(data: pd.DataFrame) -> pd.DataFrame
Async generate — delegates to sync generate() via thread pool.
Subclasses with native async support (e.g. ColumnGeneratorWithModelChatCompletion) should override this with a direct async implementation.
Source code in packages/data-designer-engine/src/data_designer/engine/column_generators/generators/base.py
113 114 115 116 117 118 119 120 121 | |
generate(data)
generate(data: dict) -> dict
generate(data: pd.DataFrame) -> pd.DataFrame
Sync generate — overridden by most concrete generators.
Default bridges to agenerate() for async-first subclasses that only
implement agenerate(). Raises NotImplementedError if neither
generate() nor agenerate() is overridden.
Source code in packages/data-designer-engine/src/data_designer/engine/column_generators/generators/base.py
96 97 98 99 100 101 102 103 104 105 | |
log_pre_generation()
A shared method to log info before the generator's generate method is called.
The idea is for dataset builders to call this method for all generators before calling their
generate method. This is to avoid logging the same information multiple times when running
generators in parallel.
Source code in packages/data-designer-engine/src/data_designer/engine/column_generators/generators/base.py
123 124 125 126 127 128 129 | |
ColumnGeneratorFullColumn
Bases: ColumnGenerator[TaskConfigT], ABC
Base class for column generators that transform a full batch at once.
Override generate to return the complete batch DataFrame after adding
generated values. Use this base when generation is vectorizable or when an
external API accepts batched input more efficiently than per-row calls.
Methods:
| Name | Description |
|---|---|
generate |
Generate an entire batch of row outputs. |
Source code in packages/data-designer-engine/src/data_designer/engine/configurable_task.py
24 25 26 27 28 | |
generate(data)
abstractmethod
Generate an entire batch of row outputs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
DataFrame containing the upstream columns this generator depends on. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame containing the input columns plus the new column and any side-effect |
DataFrame
|
columns. When |
DataFrame
|
the input; when it is |
Source code in packages/data-designer-engine/src/data_designer/engine/column_generators/generators/base.py
257 258 259 260 261 262 263 264 265 266 267 268 | |
ColumnGeneratorCellByCell
Bases: ColumnGenerator[TaskConfigT], ABC
Base class for column generators invoked once per row.
Override generate to return the complete row mapping after adding the
generated value. The engine calls the generator once per row and may run
calls concurrently. Use this base when generation is independent per row
(e.g. an LLM call per row, a per-row transform).
Methods:
| Name | Description |
|---|---|
generate |
Generate one row's output from a single row's upstream values. |
Source code in packages/data-designer-engine/src/data_designer/engine/configurable_task.py
24 25 26 27 28 | |
generate(data)
abstractmethod
Generate one row's output from a single row's upstream values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict
|
Current row mapping containing the upstream values available to this column. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
Complete row mapping with existing keys preserved and the new column value added. |
dict
|
Include declared side-effect columns when the config creates them. |
Source code in packages/data-designer-engine/src/data_designer/engine/column_generators/generators/base.py
232 233 234 235 236 237 238 239 240 241 242 | |
FromScratchColumnGenerator
Bases: ColumnGenerator[TaskConfigT], ABC
Methods:
| Name | Description |
|---|---|
agenerate_from_scratch |
Async wrapper — wraps sync |
Source code in packages/data-designer-engine/src/data_designer/engine/configurable_task.py
24 25 26 27 28 | |
agenerate_from_scratch(num_records)
async
Async wrapper — wraps sync generate_from_scratch() in a thread.
Source code in packages/data-designer-engine/src/data_designer/engine/column_generators/generators/base.py
140 141 142 | |
ColumnGeneratorWithModelRegistry
Bases: ColumnGenerator[TaskConfigT], ABC
Source code in packages/data-designer-engine/src/data_designer/engine/configurable_task.py
24 25 26 27 28 | |
ColumnGeneratorWithModel
Bases: ColumnGeneratorWithModelRegistry[TaskConfigT], ABC
Source code in packages/data-designer-engine/src/data_designer/engine/configurable_task.py
24 25 26 27 28 | |