Quick start: Synthetic data
Synthetic data is artificially generated data that mimics the structure and statistical properties of real data, but contains no actual personal or sensitive information.
Use a generator to create sythetic data based on your requirements. Follow the generator quick start guide if you want to create your own generator.
Step 1: Generate synthetic data
- Navigate to the generator that you wish to use by clicking Generators in the left-side navigation menu and selecting from the available generators.
- On the generator page, click Generate data.
- Configure the generation parameters based on your requirements:
Parameter | Description |
---|---|
Sample size | Define the number of rows to generate for your synthetic data. |
Conditional simulation | Use a seed dataset as context for conditional simulation. |
Sampling controls | Adjust how closely your synthetic data mirrors the original distribution. |
Fairness | Apply statistical parity to a target column with respect to selected sensitive columns, ensuring statistical independence. |
Rebalancing | Rebalance the distribution in a categorical column based on specified categories and their new percentages. |
Imputation | Replace missing values in a column with statistically coherent values from the same column. |
Data report | Control data report generation for this model. Disable it to speed up the process of finalizing synthetic dataset generation. |
- Click Start generation.
Follow progress in the Generation status section on the synthetic dataset page.
What’s next
When the generation of synthetic data completes, you can download the data in CSV, Parquet, or XSLX format.
You can also make the synthetic dataset public so others can access its page, review the Data report, Data samples, Generation steps, and Configuration, or download the data in their preferred format.
Step 2: Download synthetic data
- With the synthetic dataset open, click Download synthetic dataset.
- Select to download as CSV, Parquet, or Excel.
Step 3: Make the synthetic dataset public
- With the synthetic dataset open, click Share.
- Select Public from the Visibility dropdown
- Click Save.
Now you can copy and share the URL of the synthetic dataset.
What’s next
Use the synthetic dataset to freely analyze the data without the privacy-related concerns.
Share the dataset URL so others can do the same.