Dataset Creation Results
DatasetCreationResults is returned by DataDesigner.create(). It provides access to persisted creation artifacts, including the generated dataset, profiling analysis, processor outputs, task traces, dataset metadata, and Hugging Face Hub upload support.
Preview generation uses the in-memory data_designer.config.preview_results.PreviewResults object returned by DataDesigner.preview(). Persisted dataset creation uses DatasetCreationResults.
DatasetCreationResults
Bases: WithRecordSamplerMixin
Results container for a Data Designer dataset creation run.
This class provides access to the generated dataset, profiling analysis, and visualization utilities. It is returned by the DataDesigner.create() method and implements ResultsProtocol of the DataDesigner interface.
Creates a new instance with results based on a dataset creation run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
artifact_storage
|
ArtifactStorage
|
Storage manager for accessing generated artifacts. |
required |
analysis
|
DatasetProfilerResults
|
Profiling results for the generated dataset. |
required |
config_builder
|
DataDesignerConfigBuilder
|
Configuration builder used to create the dataset. |
required |
dataset_metadata
|
DatasetMetadata
|
Metadata about the generated dataset (e.g., seed column names). |
required |
task_traces
|
list[TaskTrace] | None
|
Optional list of TaskTrace objects from the async scheduler. |
None
|
Methods:
| Name | Description |
|---|---|
get_path_to_processor_artifacts |
Get the path to the artifacts generated by a processor. |
load_analysis |
Load the profiling analysis results for the generated dataset. |
load_dataset |
Load the generated dataset as a pandas DataFrame. |
load_processor_dataset |
Load the dataset generated by a processor. |
push_to_hub |
Push dataset to HuggingFace Hub. |
Source code in packages/data-designer/src/data_designer/interface/results.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | |
get_path_to_processor_artifacts(processor_name)
Get the path to the artifacts generated by a processor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
processor_name
|
str
|
The name of the processor to load the artifact from. |
required |
Returns:
| Type | Description |
|---|---|
Path
|
The path to the artifacts. |
Source code in packages/data-designer/src/data_designer/interface/results.py
85 86 87 88 89 90 91 92 93 94 95 96 | |
load_analysis()
Load the profiling analysis results for the generated dataset.
Returns:
| Type | Description |
|---|---|
DatasetProfilerResults
|
DatasetProfilerResults containing statistical analysis and quality metrics for configured columns in the generated dataset. |
Source code in packages/data-designer/src/data_designer/interface/results.py
55 56 57 58 59 60 61 62 | |
load_dataset()
Load the generated dataset as a pandas DataFrame.
Returns:
| Type | Description |
|---|---|
DataFrame
|
A pandas DataFrame containing the full generated dataset. |
Source code in packages/data-designer/src/data_designer/interface/results.py
64 65 66 67 68 69 70 | |
load_processor_dataset(processor_name)
Load the dataset generated by a processor.
This only works for processors that write their artifacts in Parquet format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
processor_name
|
str
|
The name of the processor to load the dataset from. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
A pandas DataFrame containing the dataset generated by the processor. |
Source code in packages/data-designer/src/data_designer/interface/results.py
72 73 74 75 76 77 78 79 80 81 82 83 | |
push_to_hub(repo_id, description, *, token=None, private=False, tags=None)
Push dataset to HuggingFace Hub.
Uploads all artifacts including: - Main parquet batch files (data subset) - Processor output batch files ({processor_name} subsets) - Configuration (builder_config.json) - Metadata (metadata.json) - Auto-generated dataset card (README.md)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
repo_id
|
str
|
HuggingFace repo ID (e.g., "username/my-dataset") |
required |
description
|
str
|
Custom description text for the dataset card. Appears after the title. |
required |
token
|
str | None
|
HuggingFace API token. If None, the token is automatically
resolved from HF_TOKEN environment variable or cached credentials
from |
None
|
private
|
bool
|
Create private repo |
False
|
tags
|
list[str] | None
|
Additional custom tags for the dataset. |
None
|
Returns:
| Type | Description |
|---|---|
str
|
URL to the uploaded dataset |
Example
results = data_designer.create(config, num_records=1000) description = "This dataset contains synthetic conversations for training chatbots." results.push_to_hub("username/my-synthetic-dataset", description, tags=["chatbot", "conversation"]) 'https://huggingface.co/datasets/username/my-synthetic-dataset'
Source code in packages/data-designer/src/data_designer/interface/results.py
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 | |