Quick start: Datasets
Datasets are collections of data that can be used to train generators and create synthetic data.
Follow these instructions to create a new dataset. You can share datasets with other MOSTLY AI users by making it public so others can use it to create synthetic data.
Step 1: Create a dataset
- On the MOSTLY AI platform, open Datasets from the left-side navigation menu.
- Click New dataset.
- Configure the dataset parameters.
Parameter | Description |
---|---|
Name | Human-readable name which identifies the dataset on the platform. |
Description | Instructions and information to guide the Assistant and data consumers. Note that a dataset can consist entirely of just a description which provides the link to a data resource and instructions for the Assistant to download the dataset found at that location. |
File upload | Upload any files to include in your dataset. You can include files that are not explicitly data, but can be used during analysis, such as geographical shapefiles. |
Connector | Connect to external data sources that have been configured in the platform. |
- Click Save.
The dataset is now available on the Datasets page.
What’s next
After you’ve created a new dataset, you can use it to train a generator and generate synthetic data. You can also transfer it to an organization.
Step 2: Share the dataset with your organization
- Open the dataset from the Datasets page.
- Click Share in the upper-right corner.
- Select the organization to which to share the resource using the Owner dropdown and click Save.
The dataset is now available to all members of your organization and they can use it to create a generator or other artifacts.
What’s next
You can make the dataset public so that it is available to all logged-in users in the platform.
Step 3: Make the dataset public
If your dataset might be of interest to other users, you can make it public. Be cautious not to share datasets which may contain any sensitive information.
- Open the dataset from the Datasets page.
- Click Share.
- Select Public from the Visibility dropdown and click Save.
Result
Public datasets of organizations are listed on the organization’s profile page.
They are also available on the Datasets page for all logged-in users.
What’s next
To create a generator, see Quickstart: Generators.