GeneratorsTrain a new generator

Train a new generator

With generators, MOSTLY AI makes it easy to use Generative AI to train your own AI models on single- and multi-table datasets. You can then use a trained generator to generate endless amounts of high-quality and privacy-protected synthetic data.

You can quickly train a new generator with a single tabular data file.

📑
You can also train generators with two-table and multi-table datasets. For more information, see Set table relationships.
  1. On the MOSTLY AI platform, open Generators from the left-side navigation menu.
  2. There are four ways to create a new generator:
MethodDescription
Start from a connectorUse an existing connector to train a new generator.
Upload your dataProvide a CSV, Parquet, or TSV file to train a new generator from your local file system.
Use the SDKNavigate to the Synthetic Data SDK repository.
Import a generatorUpload a configured generator file.
  1. After selecting your training method and uploading any required files, click Configure models.
  2. Each connected or uploaded table supports its own configuration. Expand each table description to customize model behavior.
MethodDescription
ModelThe model your generator uses to create synthetic data.
ComputeThe compute resources used to train the generator.
Training parametersThe model-level parameters which control the training process. Each parameter is defined by a tooltip in the platform.
Differential privacyUse differential privacy when you need a mathematical guarantee of privacy, with epsilon quantifying the upper bound on an individual’s impact on the trained model.
Flexible generationEnabled by default, flexible generation gives you the option to apply smart imputation, data rebalancing, seeded generation and apply fairness when you generate synthetic datasets with the model.
Value protectionValue protection prevents membership inference by replacing rare categories and removing extreme values from your dataset.
Model reportEnabled by default, the Model report provides metrics and charts to gauge the quality of a model. The calculated metrics and charts include accuracy, similarity, and distances between original and synthetic samples, and the correlations, univariate, and bivariate distribution charts to compare the original and synthetic correlations and distributions.
💡
MOSTLY AI offers three training Presets in the Model configuration section header if you don’t want to configure individual parameters: Accuracy, Speed, and Turbo.
  1. In the Model configuration section header, you can optionally configure Random State which is a seed value to ensure reproducible results during training. If left empty, a random seed will be used each time.
  2. After completing configuration, click Start training to begin the training process.

Follow progress in the Training status section on the generator page.