GeneratorsConfigure generatorsImprove model accuracy

Improve model accuracy

The default model configuration provided during synthetic data generation on the MOSTLY AI platform work well for most standard use cases, but if you wish to customize the configuration to achieve certain benchmarks, MOSTLY AI offers the ability to customize each model configuration.

Use pre-configured presets

If you wish to modify the default configuration, you can do so by selecting a pre-configured preset.

MOSTLY AI offers the following pre-configured presets:

PresetDescriptionBehavior
AccuracyRuns training until the model reaches the highest possible accuracy (lowest validation loss). Takes the longest.Full training with maximum epochs and sample size for best accuracy.
SpeedShortcut to reduce maximum training epochs and sample size.Faster training while still achieving high accuracy.
TurboMinimal training with reduced epochs and sample size.Fastest option; best for quick sense checks where lower accuracy is tolerable.

Use a custom configuration

Each table in the subject dataset has a dedicated AI model trained to generate downstream synthetic data. Each model has a number of parameters which can be customized to your use case.

Customizable model parameters are:

ParameterDescriptionDefault
Max sample sizeMaximum percentage of the dataset to be used, see Training sample size for more informationAll available rows
Max training timeMaximum duration allowed for training in minutes10
Max training epochsMaximum number of training iterations100
ModelMOSTLY AI model to train, see Model size for more informationMOSTLY_AI/Medium
Physical batch sizeNumber of samples processed per training stepAuto
Gradient accumulation stepsNumber of forward and backward passes performed before a single optimizer step is takenAuto
Max sequence window (linked tables & Text)Maximum rows considered for linked/text sequences100

Training sample size

By default, MOSTLY AI uses all records in a table to train the model for that table. If your model configuration already uses a percentage or number of rows that are less than what you have in your original table, you can increase that for higher model accuracy.

You can configure the training sample size for each table from the Model configuration page of a generator.

Steps

  1. With an untrained generator open, go to the Model configuration page by clicking Configure models.
  2. Click a model to expand its settings.
  3. Set the Max sample size as number of rows.
    • For subject tables, you set the number of rows to be used for training from the original subject table.
    • For linked tables, you set the number of sequences (the sequences equal the number of subjects) to be used for training.
      💡
      The training size of linked tables is defined by the number of sequences. A sequence is the defined as 1 subject (sample) from the subject table and all of its related samples from the linked table. Think of the training size for linked tables as defined by the number of subjects or sequences used to pull the related sequences for a subject.

Model Size

Model size defines the amount of internal parameters that the AI model uses to learn from your data. A larger model uses more parameters to analyze and train on your data. You can use three different model sizes.

ModelDescription
MOSTLY_AI/SmallUses fewer parameters, takes less memory and time, at the cost of accuracy.
MOSTLY_AI/MediumUses optimal parameters and is best for most use cases.
MOSTLY_AI/LargeCan further improve accuracy for large datasets, but requires significantly more memory and compute time.

You can configure the model size from the Model configuration page of a generator.

Steps

  1. With an untrained generator open, go to the Model configuration page by clicking Configure models.
  2. Click a model to expand its settings.
  3. For Model, select the model size you wish to use.