Quick startSynthetic datasets

Quick start: Synthetic data

Synthetic data is artificially generated data that mimics the structure and statistical properties of real data, but contains no actual personal or sensitive information.

Use a generator to create sythetic data based on your requirements. Follow the generator quick start guide if you want to create your own generator.

Step 1: Generate synthetic data

  1. Navigate to the generator that you wish to use by clicking Generators in the left-side navigation menu and selecting from the available generators.
  2. On the generator page, click Generate data.
  3. Configure the generation parameters based on your requirements:
ParameterDescription
Sample sizeDefine the number of rows to generate for your synthetic data.
Conditional simulationUse a seed dataset as context for conditional simulation.
Sampling controlsAdjust how closely your synthetic data mirrors the original distribution.
FairnessApply statistical parity to a target column with respect to selected sensitive columns, ensuring statistical independence.
RebalancingRebalance the distribution in a categorical column based on specified categories and their new percentages.
ImputationReplace missing values in a column with statistically coherent values from the same column.
Data reportControl data report generation for this model. Disable it to speed up the process of finalizing synthetic dataset generation.
  1. Click Start generation.

Follow progress in the Generation status section on the synthetic dataset page.

What’s next

When the generation of synthetic data completes, you can download the data in CSV, Parquet, or XSLX format.

You can also make the synthetic dataset public so others can access its page, review the Data report, Data samples, Generation steps, and Configuration, or download the data in their preferred format.

Step 2: Download synthetic data

  1. With the synthetic dataset open, click Download synthetic dataset.
  2. Select to download as CSV, Parquet, or Excel.

Step 3: Make the synthetic dataset public

  1. With the synthetic dataset open, click Share.
  2. Select Public from the Visibility dropdown
  3. Click Save.

Now you can copy and share the URL of the synthetic dataset.

What’s next

Use the synthetic dataset to freely analyze the data without the privacy-related concerns.

Share the dataset URL so others can do the same.