Improve model accuracy
The default model configuration provided during synthetic data generation on the MOSTLY AI platform work well for most standard use cases, but if you wish to customize the configuration to achieve certain benchmarks, MOSTLY AI offers the ability to customize each model configuration.
Use pre-configured presets
If you wish to modify the default configuration, you can do so by selecting a pre-configured preset.
MOSTLY AI offers the following pre-configured presets:
Preset | Description | Behavior |
---|---|---|
Accuracy | Runs training until the model reaches the highest possible accuracy (lowest validation loss). Takes the longest. | Full training with maximum epochs and sample size for best accuracy. |
Speed | Shortcut to reduce maximum training epochs and sample size. | Faster training while still achieving high accuracy. |
Turbo | Minimal training with reduced epochs and sample size. | Fastest option; best for quick sense checks where lower accuracy is tolerable. |
Use a custom configuration
Each table in the subject dataset has a dedicated AI model trained to generate downstream synthetic data. Each model has a number of parameters which can be customized to your use case.
Customizable model parameters are:
Parameter | Description | Default |
---|---|---|
Max sample size | Maximum percentage of the dataset to be used, see Training sample size for more information | All available rows |
Max training time | Maximum duration allowed for training in minutes | 10 |
Max training epochs | Maximum number of training iterations | 100 |
Model | MOSTLY AI model to train, see Model size for more information | MOSTLY_AI/Medium |
Physical batch size | Number of samples processed per training step | Auto |
Gradient accumulation steps | Number of forward and backward passes performed before a single optimizer step is taken | Auto |
Max sequence window (linked tables & Text) | Maximum rows considered for linked/text sequences | 100 |
Training sample size
By default, MOSTLY AI uses all records in a table to train the model for that table. If your model configuration already uses a percentage or number of rows that are less than what you have in your original table, you can increase that for higher model accuracy.
You can configure the training sample size for each table from the Model configuration page of a generator.
Steps
- With an untrained generator open, go to the Model configuration page by clicking Configure models.
- Click a model to expand its settings.
- Set the Max sample size as number of rows.
- For subject tables, you set the number of rows to be used for training from the original subject table.
- For linked tables, you set the number of sequences (the sequences equal the number of subjects) to be used for training.
💡The training size of linked tables is defined by the number of sequences. A sequence is the defined as 1 subject (sample) from the subject table and all of its related samples from the linked table. Think of the training size for linked tables as defined by the number of subjects or sequences used to pull the related sequences for a subject.
Model Size
Model size defines the amount of internal parameters that the AI model uses to learn from your data. A larger model uses more parameters to analyze and train on your data. You can use three different model sizes.
Model | Description |
---|---|
MOSTLY_AI/Small | Uses fewer parameters, takes less memory and time, at the cost of accuracy. |
MOSTLY_AI/Medium | Uses optimal parameters and is best for most use cases. |
MOSTLY_AI/Large | Can further improve accuracy for large datasets, but requires significantly more memory and compute time. |
You can configure the model size from the Model configuration page of a generator.
Steps
- With an untrained generator open, go to the Model configuration page by clicking Configure models.
- Click a model to expand its settings.
- For Model, select the model size you wish to use.