Generators

With MOSTLY AI, your journey to synthetic data starts with the training of a Generator. You can train Generators on both tabular and textual data.

What is a Generator?

A Generator bundles the training of generative AI models and the definition of metadata about your data (table schemas, table relationships, and data types) to enable you to generate brand new synthetic datasets.

A trained Generator can produce high-quality tabular and textual synthetic data that retains the correlations and features from your original data and maintains the referential integrity between tables in multi-table scenarios.

Features

Features	Description
Generative AI for tabular synthetic data	Train Generative AI models on your original tabular data to be able to generate high-quality and privacy-safe synthetic data
Support for LLMs to train on text data	Use an extensive list of language models to train on your text data.
Multi-table datasets	You can train generators on multi-table datasets and configure them to retain intra- and inter-table correlations and referential integrity
Multi-source data	Train a generator on tabular data from multiple data sources • files (`CSV`, `Parquet`) • databases • cloud buckets • Pandas `DataFrame` objects (with Synthetic Data SDK)
Support for multiple data types	Configure data types for each table column to ensure your generators captures correctly the data types of your original data • Categorical • Numeric • Character • Datetime • Geo-location • Language/Text
Time-series and events data	Train models on sequential data and retain the events patterns and coherence from your original data
AI training settings	Configure AI training speed and accuracy settings
Export and import	Export and import generators between air-gapped and wider-audience deployments of MOSTLY AI