Build Your Own
Data Designer supports three plugin types: column generators, seed readers, and processors. They all use the same package shape: a config class, an implementation class, and a Plugin object registered through a data_designer.plugins entry point.
Use this page as the implementation checklist for plugin packages. Each tab below shows the core files for one plugin type.
Package shape
Use the same structure for each plugin package:
data-designer-my-plugin/
|-- pyproject.toml
`-- src/
`-- data_designer_my_plugin/
|-- __init__.py
|-- config.py
|-- impl.py
`-- plugin.py
Implementation patterns
This index-multiplier plugin adds a custom column whose value is the row index multiplied by a configurable integer.
Model-backed generators
If your column generator interacts with models, include at least one model_alias field in the config and use the model registry from the implementation. See Using Models in Plugins for the registry access pattern.
Full-column vs cell-by-cell generators
The example below uses ColumnGeneratorFullColumn because it can fill the whole batch from the DataFrame index. Use ColumnGeneratorCellByCell when each row can be generated independently from its upstream values and your generate method should receive and return a row dictionary. Cell-by-cell generation is especially useful for independent LLM calls because the async engine can run rows concurrently; the built-in LLM completion generators are good examples. Prefer ColumnGeneratorFullColumn for vectorized pandas operations, batched external APIs, or logic that needs to inspect or update the full batch at once.
config.py:
from __future__ import annotations
from typing import Literal
from data_designer.config.base import SingleColumnConfig
class IndexMultiplierColumnConfig(SingleColumnConfig):
column_type: Literal["index-multiplier"] = "index-multiplier"
multiplier: int = 2
@staticmethod
def get_column_emoji() -> str:
return "✖️"
@property
def required_columns(self) -> list[str]:
return []
@property
def side_effect_columns(self) -> list[str]:
return []
impl.py:
from __future__ import annotations
from typing import TYPE_CHECKING
from data_designer.engine.column_generators.generators.base import ColumnGeneratorFullColumn
from data_designer_index_multiplier.config import IndexMultiplierColumnConfig
if TYPE_CHECKING:
import pandas as pd
class IndexMultiplierColumnGenerator(ColumnGeneratorFullColumn[IndexMultiplierColumnConfig]):
def generate(self, data: pd.DataFrame) -> pd.DataFrame:
data[self.config.name] = data.index * self.config.multiplier
return data
plugin.py:
from __future__ import annotations
from data_designer.plugins import Plugin, PluginType
plugin = Plugin(
config_qualified_name="data_designer_index_multiplier.config.IndexMultiplierColumnConfig",
impl_qualified_name="data_designer_index_multiplier.impl.IndexMultiplierColumnGenerator",
plugin_type=PluginType.COLUMN_GENERATOR,
)
Entry point:
[project.entry-points."data_designer.plugins"]
index-multiplier = "data_designer_index_multiplier.plugin:plugin"
For the generator implementation contract, see Column Generators. For inline custom functions, see Custom Columns.
This prefixed-text-files plugin loads text files from a directory and emits a seed dataset with prefixed file contents.
config.py:
from __future__ import annotations
from typing import Literal
from data_designer.config.seed_source import FileSystemSeedSource
class PrefixedTextSeedSource(FileSystemSeedSource):
seed_type: Literal["prefixed-text-files"] = "prefixed-text-files"
prefix: str = "plugin"
impl.py:
from __future__ import annotations
from pathlib import Path
from typing import Any
import data_designer.lazy_heavy_imports as lazy
from data_designer.engine.resources.seed_reader import (
FileSystemSeedReader,
SeedReaderFileSystemContext,
)
from data_designer_prefixed_text_seed_reader.config import PrefixedTextSeedSource
class PrefixedTextSeedReader(FileSystemSeedReader[PrefixedTextSeedSource]):
output_columns = ["relative_path", "file_name", "prefixed_content"]
def build_manifest(
self,
*,
context: SeedReaderFileSystemContext,
) -> lazy.pd.DataFrame | list[dict[str, str]]:
matched_paths = self.get_matching_relative_paths(
context=context,
file_pattern=self.source.file_pattern,
recursive=self.source.recursive,
)
return [
{
"relative_path": relative_path,
"file_name": Path(relative_path).name,
}
for relative_path in matched_paths
]
def hydrate_row(
self,
*,
manifest_row: dict[str, Any],
context: SeedReaderFileSystemContext,
) -> dict[str, str]:
relative_path = str(manifest_row["relative_path"])
with context.fs.open(relative_path, "r", encoding="utf-8") as handle:
content = handle.read().strip()
return {
"relative_path": relative_path,
"file_name": str(manifest_row["file_name"]),
"prefixed_content": f"{self.source.prefix}:{content}",
}
plugin.py:
from __future__ import annotations
from data_designer.plugins import Plugin, PluginType
plugin = Plugin(
config_qualified_name="data_designer_prefixed_text_seed_reader.config.PrefixedTextSeedSource",
impl_qualified_name="data_designer_prefixed_text_seed_reader.impl.PrefixedTextSeedReader",
plugin_type=PluginType.SEED_READER,
)
Entry point:
[project.entry-points."data_designer.plugins"]
prefixed-text-files = "data_designer_prefixed_text_seed_reader.plugin:plugin"
For the engine API behind this example, see Seed Readers.
This regex-filter plugin filters rows whose column value matches a regular expression.
config.py:
from __future__ import annotations
from typing import Literal
from pydantic import Field
from data_designer.config.base import ProcessorConfig
class RegexFilterProcessorConfig(ProcessorConfig):
processor_type: Literal["regex-filter"] = "regex-filter"
column: str = Field(description="Column to match against.")
pattern: str = Field(description="Regex pattern to match.")
invert: bool = Field(default=False, description="If True, keep rows that do not match.")
impl.py:
from __future__ import annotations
from typing import TYPE_CHECKING
from data_designer.engine.processing.processors.base import Processor
from data_designer_regex_filter.config import RegexFilterProcessorConfig
if TYPE_CHECKING:
import pandas as pd
class RegexFilterProcessor(Processor[RegexFilterProcessorConfig]):
def process_before_batch(self, data: pd.DataFrame) -> pd.DataFrame:
mask = data[self.config.column].astype(str).str.contains(self.config.pattern, regex=True)
if self.config.invert:
mask = ~mask
return data[mask].reset_index(drop=True)
plugin.py:
from __future__ import annotations
from data_designer.plugins import Plugin, PluginType
plugin = Plugin(
config_qualified_name="data_designer_regex_filter.config.RegexFilterProcessorConfig",
impl_qualified_name="data_designer_regex_filter.impl.RegexFilterProcessor",
plugin_type=PluginType.PROCESSOR,
)
Entry point:
[project.entry-points."data_designer.plugins"]
regex-filter = "data_designer_regex_filter.plugin:plugin"
For callback selection and processor execution details, see Processors. For the engine API behind this example, see Engine Processors code reference.
Install and use locally
Install any plugin package in editable mode from the package directory:
uv pip install -e .
The editable install registers the data_designer.plugins entry point so Data Designer can discover the plugin.
Restart your kernel after installing
Data Designer caches the plugin registry on first import, so an import data_designer that already happened in your Python process — typical in a notebook — won't pick up a freshly installed plugin. After uv pip install -e ., restart the kernel (or interpreter) so the next import rebuilds the registry.
Validate plugins
Data Designer provides a testing utility for common plugin structure checks:
from data_designer.engine.testing.utils import assert_valid_plugin
from data_designer_index_multiplier.plugin import plugin
assert_valid_plugin(plugin)
assert_valid_plugin checks that the plugin's config inherits from ConfigBase and that the implementation class inherits from the appropriate base for its plugin type (ConfigurableTask for column generators, SeedReader for seed readers).
For published plugins, add at least one functional test that runs the plugin through DataDesigner.preview(...). This catches packaging and entry point issues that a direct implementation test can miss.
Multiple plugins in one package
A single Python package can register multiple plugins by defining multiple Plugin objects and entry points:
[project.entry-points."data_designer.plugins"]
my-column-generator = "my_package.plugins.column_generator.plugin:column_generator_plugin"
my-seed-reader = "my_package.plugins.seed_reader.plugin:seed_reader_plugin"
my-processor = "my_package.plugins.processor.plugin:processor_plugin"