Using Models in Plugins
Model access belongs in column generator implementations, not config objects. Keep the config declarative by asking users for model aliases, then resolve those aliases at runtime through the model registry.
Do not construct model clients in plugin configs, read API keys in configs, or bypass Data Designer's model providers. The engine already builds a ResourceProvider for each generator, and that provider exposes the model registry at:
self.resource_provider.model_registry
Access the registry
Use a model-aware column generator base whenever your plugin needs the registry:
| Need | Base class | Registry access |
|---|---|---|
| Primary model alias | ColumnGeneratorWithModel |
Use self.model, self.model_config, and self.inference_parameters. |
| Multiple aliases or provider inspection | ColumnGeneratorWithModelRegistry |
Use self.get_model(alias), self.get_model_config(alias), and self.get_model_provider_name(alias). |
ColumnGeneratorWithModel is a convenience subclass of ColumnGeneratorWithModelRegistry. It expects the config to have a model_alias field and resolves that one alias for you. For independent model calls, return GenerationStrategy.CELL_BY_CELL so the runtime can fan out rows like the built-in LLM, embedding, and image generators. Use full-column generation only when your plugin intentionally calls a batched API for the whole DataFrame.
from __future__ import annotations
from data_designer.config.column_configs import GenerationStrategy
from data_designer.engine.column_generators.generators.base import ColumnGeneratorWithModel
from data_designer.engine.models.parsers.errors import ParserException
from data_designer_sentiment_label.config import SentimentLabelColumnConfig
def parse_sentiment_label(response: str) -> str:
label = response.strip().lower()
if label not in {"positive", "neutral", "negative"}:
raise ParserException("Expected exactly one of: positive, neutral, negative.", source=response)
return label
class SentimentLabelColumnGenerator(ColumnGeneratorWithModel[SentimentLabelColumnConfig]):
@staticmethod
def get_generation_strategy() -> GenerationStrategy:
return GenerationStrategy.CELL_BY_CELL
async def agenerate(self, data: dict) -> dict:
label, _ = await self.model.agenerate(
prompt=f"Classify the sentiment of this text: {data[self.config.source_column]}",
system_prompt="Return exactly one label: positive, neutral, or negative.",
parser=parse_sentiment_label,
max_correction_steps=self.resource_provider.run_config.max_conversation_correction_steps,
max_conversation_restarts=self.resource_provider.run_config.max_conversation_restarts,
purpose=f"running generation for column '{self.config.name}'",
)
data[self.config.name] = label
return data
The matching config should include model_alias: str as a normal user-facing field:
from __future__ import annotations
from typing import Literal
from data_designer.config.base import SingleColumnConfig
class SentimentLabelColumnConfig(SingleColumnConfig):
column_type: Literal["sentiment-label"] = "sentiment-label"
source_column: str
model_alias: str
@property
def required_columns(self) -> list[str]:
return [self.source_column]
@property
def side_effect_columns(self) -> list[str]:
return []
Users set that alias from default model settings or from DataDesignerConfigBuilder(model_configs=...).
Use multiple models
If your plugin uses multiple model aliases, inherit from ColumnGeneratorWithModelRegistry and resolve each alias explicitly with self.get_model(...).
The config should keep a primary model_alias field because startup health checks collect that field from model-generated column configs. A config for this pattern might also define judge_model_alias, critic_model_alias, or another task-specific alias.
Validate additional alias fields in _validate() or _initialize() with get_model_config(...) so missing aliases fail before generation starts. This checks that the alias exists; only the primary model_alias is included in the standard startup health check.
from __future__ import annotations
from data_designer.config.column_configs import GenerationStrategy
from data_designer.engine.column_generators.generators.base import ColumnGeneratorWithModelRegistry
from data_designer.engine.models.parsers.errors import ParserException
from data_designer_pairwise_judge.config import PairwiseJudgeColumnConfig
def parse_score(response: str) -> int:
text = response.strip()
if text not in {"1", "2", "3", "4", "5"}:
raise ParserException("Expected an integer score from 1 to 5.", source=response)
return int(text)
class PairwiseJudgeColumnGenerator(ColumnGeneratorWithModelRegistry[PairwiseJudgeColumnConfig]):
@staticmethod
def get_generation_strategy() -> GenerationStrategy:
return GenerationStrategy.CELL_BY_CELL
def _validate(self) -> None:
self.get_model_config(self.config.model_alias)
self.get_model_config(self.config.judge_model_alias)
async def agenerate(self, data: dict) -> dict:
generator_model = self.get_model(self.config.model_alias)
judge_model = self.get_model(self.config.judge_model_alias)
retry_kwargs = {
"max_correction_steps": self.resource_provider.run_config.max_conversation_correction_steps,
"max_conversation_restarts": self.resource_provider.run_config.max_conversation_restarts,
}
draft, _ = await generator_model.agenerate(
prompt=f"Draft an answer for: {data['question']}",
purpose=f"drafting an answer for column '{self.config.name}'",
**retry_kwargs,
)
score, _ = await judge_model.agenerate(
prompt=f"Score this answer from 1 to 5: {draft}",
system_prompt="Return exactly one integer from 1 to 5.",
parser=parse_score,
purpose=f"judging an answer for column '{self.config.name}'",
**retry_kwargs,
)
data[self.config.name] = {"draft": draft, "score": score}
return data
What the registry returns
get_model(...) returns a ModelFacade. Call the facade based on the modality your plugin needs:
- Chat completion aliases use
model.generate(...)orawait model.agenerate(...)and return(parsed_output, trace). - Embedding aliases use
model.generate_text_embeddings(...)orawait model.agenerate_text_embeddings(...)and returnlist[list[float]]. - Image aliases use
model.generate_image(...)orawait model.agenerate_image(...)and returnlist[str]of base64-encoded image data.
Choose a model alias whose ModelConfig.inference_parameters.generation_type matches the facade method you call. The facade merges the alias's configured inference parameters into each request.
Pass runtime context such as prompt, system_prompt, parser, tool_alias, multi_modal_context, max_correction_steps, max_conversation_restarts, and purpose at the call site. Parser functions should raise ParserException for invalid model responses; that is what allows ModelFacade.generate(...) and ModelFacade.agenerate(...) to run correction turns and conversation restarts.
Prefer implementing agenerate(...) for model-backed plugins. The base generate(...) method can bridge to agenerate(...) for sync runs when the subclass only implements async generation. If your plugin has a sync-specific path, implement both generate(...) and agenerate(...), as the built-in generators do.
Health checks and scheduling
The model-aware bases mark the generator as LLM-bound, so the async scheduler treats the work like other model calls.
Plugin discovery treats column generator implementations that inherit from ColumnGeneratorWithModelRegistry as model-generated column types for startup model health checks. The standard health-check collection expects a primary model_alias field on the config. Additional alias fields should be validated by the plugin implementation.
Built-in patterns
The built-in model-backed generators use these same hooks:
LLMTextCellGenerator,LLMCodeCellGenerator,LLMStructuredCellGenerator, andLLMJudgeCellGeneratorinherit through a chat-completion base that usesColumnGeneratorWithModel. They render prompts from row data, callself.model.generate(...)orself.model.agenerate(...), pass parsers into theModelFacade, and store optional trace side-effect columns.EmbeddingCellGeneratorusesColumnGeneratorWithModelbut calls the facade's embedding methods instead of chat completion.ImageCellGeneratorusesColumnGeneratorWithModel, renders a prompt, calls the facade's image methods, and writes generated media through the artifact storage supplied by the sameResourceProvider.CustomColumnGeneratoris the inline-function counterpart: when users declaremodel_aliases, it builds amodelsdict fromresource_provider.model_registry. Packaged plugins usually useColumnGeneratorWithModelorColumnGeneratorWithModelRegistrydirectly instead of recreating that dict.
See Column Generators for the full base-class API and Custom Model Settings for configuring model aliases.