Dataset Creation Results

DatasetCreationResults is returned by DataDesigner.create(). It provides access to persisted creation artifacts, including the generated dataset, profiling analysis, processor outputs, task traces, dataset metadata, and Hugging Face Hub upload support.

Preview generation uses the in-memory data_designer.config.preview_results.PreviewResults object returned by DataDesigner.preview(). Persisted dataset creation uses DatasetCreationResults.

`DatasetCreationResults`

Bases: WithRecordSamplerMixin

Results container for a Data Designer dataset creation run.

This class provides access to the generated dataset, profiling analysis, and visualization utilities. It is returned by the DataDesigner.create() method and implements ResultsProtocol of the DataDesigner interface.

Creates a new instance with results based on a dataset creation run.

Parameters:

Name	Type	Description	Default
`artifact_storage`	`ArtifactStorage`	Storage manager for accessing generated artifacts.	required
`analysis`	`DatasetProfilerResults`	Profiling results for the generated dataset.	required
`config_builder`	`DataDesignerConfigBuilder`	Configuration builder used to create the dataset.	required
`dataset_metadata`	`DatasetMetadata`	Metadata about the generated dataset (e.g., seed column names).	required
`task_traces`	`list[TaskTrace] \| None`	Optional list of TaskTrace objects from the async scheduler.	`None`

Methods:

Name	Description
`get_path_to_processor_artifacts`	Get the path to the artifacts generated by a processor.
`load_analysis`	Load the profiling analysis results for the generated dataset.
`load_dataset`	Load the generated dataset as a pandas DataFrame.
`load_processor_dataset`	Load the dataset generated by a processor.
`push_to_hub`	Push dataset to HuggingFace Hub.

Source code in packages/data-designer/src/data_designer/interface/results.py

def __init__(
    self,
    *,
    artifact_storage: ArtifactStorage,
    analysis: DatasetProfilerResults,
    config_builder: DataDesignerConfigBuilder,
    dataset_metadata: DatasetMetadata,
    task_traces: list[TaskTrace] | None = None,
):
    """Creates a new instance with results based on a dataset creation run.

    Args:
        artifact_storage: Storage manager for accessing generated artifacts.
        analysis: Profiling results for the generated dataset.
        config_builder: Configuration builder used to create the dataset.
        dataset_metadata: Metadata about the generated dataset (e.g., seed column names).
        task_traces: Optional list of TaskTrace objects from the async scheduler.
    """
    self.artifact_storage = artifact_storage
    self._analysis = analysis
    self._config_builder = config_builder
    self.dataset_metadata = dataset_metadata
    self.task_traces: list[TaskTrace] = task_traces or []

`get_path_to_processor_artifacts(processor_name)`

Get the path to the artifacts generated by a processor.

Parameters:

Name	Type	Description	Default
`processor_name`	`str`	The name of the processor to load the artifact from.	required

Returns:

Type	Description
`Path`	The path to the artifacts.

Source code in packages/data-designer/src/data_designer/interface/results.py

def get_path_to_processor_artifacts(self, processor_name: str) -> Path:
    """Get the path to the artifacts generated by a processor.

    Args:
        processor_name: The name of the processor to load the artifact from.

    Returns:
        The path to the artifacts.
    """
    if not self.artifact_storage.processors_outputs_path.exists():
        raise ArtifactStorageError(f"Processor {processor_name} has no artifacts.")
    return self.artifact_storage.processors_outputs_path / processor_name

`load_analysis()`

Load the profiling analysis results for the generated dataset.

Returns:

Type	Description
`DatasetProfilerResults`	DatasetProfilerResults containing statistical analysis and quality metrics for configured columns in the generated dataset.

Source code in packages/data-designer/src/data_designer/interface/results.py

def load_analysis(self) -> DatasetProfilerResults:
    """Load the profiling analysis results for the generated dataset.

    Returns:
        DatasetProfilerResults containing statistical analysis and quality metrics
            for configured columns in the generated dataset.
    """
    return self._analysis

`load_dataset()`

Load the generated dataset as a pandas DataFrame.

Returns:

Type	Description
`DataFrame`	A pandas DataFrame containing the full generated dataset.

Source code in packages/data-designer/src/data_designer/interface/results.py

def load_dataset(self) -> pd.DataFrame:
    """Load the generated dataset as a pandas DataFrame.

    Returns:
        A pandas DataFrame containing the full generated dataset.
    """
    return self.artifact_storage.load_dataset()

`load_processor_dataset(processor_name)`

Load the dataset generated by a processor.

This only works for processors that write their artifacts in Parquet format.

Parameters:

Name	Type	Description	Default
`processor_name`	`str`	The name of the processor to load the dataset from.	required

Returns:

Type	Description
`DataFrame`	A pandas DataFrame containing the dataset generated by the processor.

Source code in packages/data-designer/src/data_designer/interface/results.py

def load_processor_dataset(self, processor_name: str) -> pd.DataFrame:
    """Load the dataset generated by a processor.

    This only works for processors that write their artifacts in Parquet format.

    Args:
        processor_name: The name of the processor to load the dataset from.

    Returns:
        A pandas DataFrame containing the dataset generated by the processor.
    """
    return self.artifact_storage.load_processor_dataset(processor_name)

`push_to_hub(repo_id, description, *, token=None, private=False, tags=None)`

Push dataset to HuggingFace Hub.

Uploads all artifacts including: - Main parquet batch files (data subset) - Processor output batch files ({processor_name} subsets) - Configuration (builder_config.json) - Metadata (metadata.json) - Auto-generated dataset card (README.md)

Parameters:

Name	Type	Description	Default
`repo_id`	`str`	HuggingFace repo ID (e.g., "username/my-dataset")	required
`description`	`str`	Custom description text for the dataset card. Appears after the title.	required
`token`	`str \| None`	HuggingFace API token. If None, the token is automatically resolved from HF_TOKEN environment variable or cached credentials from `hf auth login`.	`None`
`private`	`bool`	Create private repo	`False`
`tags`	`list[str] \| None`	Additional custom tags for the dataset.	`None`

Returns:

Type	Description
`str`	URL to the uploaded dataset

Example

results = data_designer.create(config, num_records=1000) description = "This dataset contains synthetic conversations for training chatbots." results.push_to_hub("username/my-synthetic-dataset", description, tags=["chatbot", "conversation"]) 'https://huggingface.co/datasets/username/my-synthetic-dataset'

Source code in packages/data-designer/src/data_designer/interface/results.py

def push_to_hub(
    self,
    repo_id: str,
    description: str,
    *,
    token: str | None = None,
    private: bool = False,
    tags: list[str] | None = None,
) -> str:
    """Push dataset to HuggingFace Hub.

    Uploads all artifacts including:
    - Main parquet batch files (data subset)
    - Processor output batch files ({processor_name} subsets)
    - Configuration (builder_config.json)
    - Metadata (metadata.json)
    - Auto-generated dataset card (README.md)

    Args:
        repo_id: HuggingFace repo ID (e.g., "username/my-dataset")
        description: Custom description text for the dataset card.
            Appears after the title.
        token: HuggingFace API token. If None, the token is automatically
            resolved from HF_TOKEN environment variable or cached credentials
            from `hf auth login`.
        private: Create private repo
        tags: Additional custom tags for the dataset.

    Returns:
        URL to the uploaded dataset

    Example:
        >>> results = data_designer.create(config, num_records=1000)
        >>> description = "This dataset contains synthetic conversations for training chatbots."
        >>> results.push_to_hub("username/my-synthetic-dataset", description, tags=["chatbot", "conversation"])
        'https://huggingface.co/datasets/username/my-synthetic-dataset'
    """
    client = HuggingFaceHubClient(token=token)
    return client.upload_dataset(
        repo_id=repo_id,
        base_dataset_path=self.artifact_storage.base_dataset_path,
        private=private,
        description=description,
        tags=tags,
    )