Models configuration

Unstructured text generation in MOSTLY AI is available via the configuration of any text generation model available on Hugging Face as well as via the built-in LSTM model provided by MOSTLY AI.

You can use any text generation model available on Hugging Face. To do so, you can first acquire a Hugging Face token.

Create a Hugging Face token

You need to sign up or use an existing account to obtain a Hugging Face token. The token must have at least Read permissions. This document demonstrates the creation of a Read token. You can also use a Fine-grained token with the required read permissions.

Steps

Log in to Hugging Face, navigate to the profile menu, and select Access Tokens.
Click + Create new token.
In Token type selector, choose Read.
Provide a name for the token in the Token name field.
Click Create token.
Copy the token key.

Add Hugging Face language models

You can select any language model available on Hugging Face and define it as a model for synthetic text in MOSTLY AI.

Prerequisites

You must be logged in as a Super Admin to be able to add or remove Hugging Face models.

Steps

In MOSTLY AI, select Models from the profile menu.
For HuggingFace token, paste your HuggingFace token.
Under Available models, list the models you want to make available.
Use the convention below. Place each model on a new line, without separators.
```
provider/Model-Name
provider/Model-Name
```
Click Save.

Result

The models are now available for selection whenever someone at your organization needs to train a new generator.

How are Hugging Face models cached?

After a Super Admin adds a model, that model appears in the Model drop-down (Model configuration page) for the next generator that a user configures. Any user can then select the new model and use it to fine-tune the model with new text data.

After the generator training starts, MOSTLY AI connects to Hugging Face to download the model. At this point, the model is cached and is available for further use without the need to download the model again.

Manual model download and installation

To install Hugging Face models on airgapped platforms, follow the official Hugging Face documentation for downloading models locally. For gated models, ensure you generate an access token and configure it using HF_TOKEN, HF_TOKEN_PATH, or by passing it explicitly, as described in their docs.

Once downloaded, copy the model files from your local HuggingFace cache (typically in ~/.cache/huggingface/hub, unless overridden by environment variables) to the Shared Storage, in the models/hub/ subdirectory (e.g. ${BUCKET}/models/hub/).

For a code example, refer to the Synthetic Data SDK documentation.