DatasetsPublic datasets

Public datasets

Even if you don’t have your own dataset yet, you can still get started right away. MOSTLY AI hosts a repository of public datasets which can be used to explore the platform, other users can create their own datasets and make them public, giving other MOSTLY AI users access to those datasets, and MOSTLY AI provides several datasets out of the box on the platform.

US Census Income dataset

This dataset is taken from the Adult Dataset from UC Irvine’s Machine Learning Repository.

It is an extraction from the 1994 US Census database and contains 48,842 records and 13 columns of data, with a mix of data types.

Click here to download the .csv.gz file.

us-census-income.csv.gz
       age         workclass  fnlwgt  education      marital-status         occupation  ...                race     sex hours-per-week  native-country capital  income
0       39         State-gov   77516  Bachelors       Never-married       Adm-clerical  ...               White    Male             40   United-States    2174   <=50K
1       50  Self-emp-not-inc   83311  Bachelors  Married-civ-spouse    Exec-managerial  ...               White    Male             13   United-States       0   <=50K
2       38           Private  215646    HS-grad            Divorced  Handlers-cleaners  ...               White    Male             40   United-States       0   <=50K
3       53           Private  234721       11th  Married-civ-spouse  Handlers-cleaners  ...               Black    Male             40   United-States       0   <=50K
4       28           Private  338409  Bachelors  Married-civ-spouse     Prof-specialty  ...               Black  Female             40            Cuba       0   <=50K
...    ...               ...     ...        ...                 ...                ...  ...                 ...     ...            ...             ...     ...     ...
48837   39           Private  215419  Bachelors            Divorced     Prof-specialty  ...               White  Female             36   United-States       0   <=50K
48838   64                 ?  321403    HS-grad             Widowed                  ?  ...               Black    Male             40   United-States       0   <=50K
48839   38           Private  374983  Bachelors  Married-civ-spouse     Prof-specialty  ...               White    Male             50   United-States       0   <=50K
48840   44           Private   83891  Bachelors            Divorced       Adm-clerical  ...  Asian-Pac-Islander    Male             40   United-States    5455   <=50K
48841   35      Self-emp-inc  182148  Bachelors  Married-civ-spouse    Exec-managerial  ...               White    Male             60   United-States       0    >50K

Baseball dataset

This dataset is taken from the Sean Lahman Baseball Database.

Click here to download the .zip file. It includes the players.csv and seasons.csv files.

players.csv
                 id country   birthDate   deathDate nameFirst   nameLast  weight  height bats throws
0      00020a493f3b    P.R.  1993-02-10         NaN     Jorge      Lopez   195.0    75.0    R      R
1      000492168bd5     USA  1945-10-12  1970-12-14    Herman       Hill   190.0    74.0    L      R
2      0007b3925736     USA  1890-12-24  1956-09-12       Tod      Sloan   175.0    72.0    L      R
3      000f9b5832e6     USA  1886-03-06  1948-05-26      Bill    Sweeney   175.0    71.0    R      R
4      00148e917757     USA  1959-09-10         NaN     Bruce    Robbins   190.0    73.0    L      L
...             ...     ...         ...         ...       ...        ...     ...     ...  ...    ...
18995  fff2e8e0ccff    P.R.  1953-04-02         NaN    Hector       Cruz   170.0    71.0    R      R
18996  fff3d8297c46     USA  1917-05-19  1993-06-07    Skippy    Roberge   185.0    71.0    R      R
18997  fff913eb4437  Panama  1976-06-20         NaN    Carlos        Lee   270.0    74.0    R      R
18998  fffa11996763    Cuba  1965-10-11         NaN   Orlando  Hernandez   210.0    74.0    R      R
18999  fffa80049d40    P.R.  1990-02-18         NaN       Joe      Colon   180.0    72.0    R      R
seasons.csv
          players_id  year team league   G  AB  R  H  HR  RBI   SB   CS  BB    SO
0       00020a493f3b  2015  MIL     NL   2   2  0  0   0  0.0  0.0  0.0   0   2.0
1       00020a493f3b  2017  MIL     NL   1   0  0  0   0  0.0  0.0  0.0   0   0.0
2       00020a493f3b  2018  MIL     NL  10   2  1  1   0  2.0  0.0  0.0   0   1.0
3       00020a493f3b  2018  KCA     AL   7   0  0  0   0  0.0  0.0  0.0   0   0.0
4       000492168bd5  1969  MIN     AL  16   2  4  0   0  0.0  1.0  2.0   0   1.0
...              ...   ...  ...    ...  ..  .. .. ..  ..  ...  ...  ...  ..   ...
103573  fffa11996763  2005  CHA     AL  24   3  0  1   0  0.0  0.0  0.0   0   1.0
103574  fffa11996763  2006  ARI     NL   9  11  0  3   0  0.0  0.0  0.0   1   0.0
103575  fffa11996763  2006  NYN     NL  20  35  4  5   0  2.0  1.0  0.0   0  10.0
103576  fffa11996763  2007  NYN     NL  28  48  1  8   0  3.0  2.0  0.0   0  18.0
103577  fffa80049d40  2016  CLE     AL  11   0  0  0   0  0.0  0.0  0.0   0   0.0

CDNOW dataset

This dataset contains a CRM table and the entire purchase history up to the end of June 1998 of 23,570 customers who made their first-ever purchase at CDNOW in the first quarter of 1997.

Click here to download the the .csv.gz file.

CDNOW_CRM_table.csv.gz
      first_name last_name       state gender   birthdate
0          Bobby  Thompson      Oregon      M  1972-07-19
1           John      Wood  New Jersey      M  1962-02-08
2        Michael  Griffith   Minnesota      M  1981-03-22
3           Eric    Walker    Michigan      M  1942-10-07
4         Austin    Levine  New Jersey      M  1952-05-23
...          ...       ...         ...    ...         ...
23565       Luis     Braun     Florida      M  1954-05-09
23566   Nicholas   Aguilar     Indiana      M  1950-10-01
23567     Alison    Larson  New Jersey      F  1954-06-08
23568     Joseph      Cook        Utah      M  1935-06-11
23569     Debbie    Zamora    Illinois      F  1977-06-02

Click here to download the .zip file. It includes the customers.csv and purchases.csv tables.

customers.csv
          id      zone       state gender age_category  age
0          1   Pacific      Oregon      M        young   26
1          2   Eastern  New Jersey      M       medium   36
2          3   Central   Minnesota      M        young   17
3          4   Eastern    Michigan      M       medium   56
4          5   Eastern  New Jersey      M       medium   46
...      ...       ...         ...    ...          ...  ...
23565  23566   Eastern     Florida      M       medium   44
23566  23567   Eastern     Indiana      M       medium   48
23567  23568   Eastern  New Jersey      F       medium   44
23568  23569  Mountain        Utah      M          old   63
23569  23570   Central    Illinois      F        young   21
purchases.csv
       users_id        date  cds    amt
0             1  1997-01-01    1  11.77
1             2  1997-01-12    1  12.00
2             2  1997-01-12    5  77.00
3             3  1997-01-02    2  20.76
4             3  1997-03-30    2  20.76
...         ...         ...  ...    ...
69654     23568  1997-04-05    4  83.74
69655     23568  1997-04-22    1  14.99
69656     23569  1997-03-25    2  25.74
69657     23570  1997-03-25    3  51.12
69658     23570  1997-03-26    2  42.96

Netflix Prize dataset

This sequence dataset is an excerpt from the original Netflix Prize dataset. It contains 500,000+ ratings from 10,000 users.

Click here to download the .zip file.

users.csv
           id
0         495
1         840
2        1374
3        1522
4        1619
...       ...
9995  2648416
9996  2648568
9997  2648678
9998  2648907
9999  2649207
ratings.csv
        users_id        date                                 movie  rating
0            495  2003-10-08                         A Mighty Wind       4
1            495  2003-10-24                          On the Beach       4
2            495  2003-11-17                         Seven Samurai       5
3            495  2003-11-26                       Midnight Cowboy       4
4            495  2003-12-04                               Yojimbo       5
...          ...         ...                                   ...     ...
501283   2649207  2005-02-08                     Napoleon Dynamite       5
501284   2649207  2005-02-08       The Importance of Being Earnest       4
501285   2649207  2005-06-08                   Friday Night Lights       2
501286   2649207  2005-06-16  The Hitchhiker's Guide to the Galaxy       1
501287   2649207  2005-08-14                                   Ray       3

Meteostat weather dataset

This dataset provides global historical weather observations and 30-year climate normals, including daily temperature, precipitation, wind, and sunshine data from 1948 to 2025. There are stations all over the world.

Click here to use it on the MOSLTY AI Platform.

Use the stations table to identify which station you’d like to use:

stations.csv
  station_name country region wmo_id icao_id      lat       lon  elev_m  hourly_from  hourly_to daily_from   daily_to monthly_from monthly_to
0  Holden Agdm      CA     AB  71227    CXHD  53.1900 -112.2500   688.0  2020-01-01 2024-12-07 2002-11-01 2024-03-13   2003-01-01 2022-01-01
1  Athabasca 1      CA     AB   &lt;NA&gt;    &lt;NA&gt;  54.7200 -113.2900   515.0         NaT        NaT 2000-01-01 2022-07-12   2000-01-01 2010-01-01
2    Jan Mayen      NO   &lt;NA&gt;  01001    ENJA  70.9333   -8.6667    10.0  1931-01-01 2025-03-20 1921-12-31 2025-07-21   1922-01-01 2022-01-01
3     Grahuken      NO     SJ  01002    &lt;NA&gt;  79.7833   14.4667     0.0  1986-11-09 2025-03-20 2010-10-07 2020-08-17          NaT        NaT
4     Hornsund      NO   &lt;NA&gt;  01003    &lt;NA&gt;  77.0000   15.5000    10.0  1985-06-01 2025-03-20 2009-11-26 2020-08-31   2016-01-01 2017-01-01
...         ...     ...       ...        ...        ...        ...        ...        ...
4995     Jharsuguda      IN     OR  42886    VEJH  21.9167  84.0833   228.0  1957-01-02 2025-06-27 1949-08-11 2025-07-23   1949-01-01 2021-01-01
4996  Keongjhargarh      IN     OR  42891    &lt;NA&gt;  21.6167  85.5167   461.0  2001-09-17 2025-06-15        NaT        NaT          NaT        NaT
4997       Baripada      IN     OR  42894    &lt;NA&gt;  21.9333  86.7667    53.0  2009-02-05 2025-06-11        NaT        NaT          NaT        NaT
4998       Balasore      IN     OR  42895    &lt;NA&gt;  21.5167  86.9333    18.0  1944-01-01 2025-06-15 1901-01-01 2025-07-18   1901-01-01 2021-01-01
4999         Contai      IN     WB  42900    &lt;NA&gt;  21.7833  87.7500    10.0  2002-03-14 2025-06-15        NaT        NaT          NaT        NaT
lhr-weather-sample.csv
           station country       date  avg_temp_c  min_temp_c  max_temp_c  precip_mm  pressure_hpa
0  London Heathrow      GB 2022-07-30        21.2        15.9        27.0        0.0        1018.3
1  London Heathrow      GB 2022-07-31        22.4        19.8        27.6        0.0        1016.2
2  London Heathrow      GB 2022-08-01        21.9        18.2        27.8        0.0        1018.0
3  London Heathrow      GB 2022-08-02        21.7        18.3        26.9        0.0        1015.3
4  London Heathrow      GB 2022-08-03        21.5        17.8        27.2        0.0        1013.5
...         ...     ...         ...        ...        ...        ...        ...        ...
1090  London Heathrow      GB 2025-07-24        18.2        15.4        20.5        0.2        1017.5
1091  London Heathrow      GB 2025-07-25        21.6        16.1        26.3        0.0        1018.9
1092  London Heathrow      GB 2025-07-26        20.5        16.5        24.7        0.0        1017.4
1093  London Heathrow      GB 2025-07-27        19.2        15.4        23.1        0.2        1018.2
1094  London Heathrow      GB 2025-07-28        19.3        14.5        23.9        0.0        1020.6

Yahoo financial market data

This dataset provides daily historical financial market data, including closing prices, for global equities and indices via Yahoo! Finance’s API.

Click here to use it on the MOSLTLY AI platform.

          date      GOOGL      NVDA      AAPL      MSFT      AMZN      META      TSLA
  2021-01-04     85.79     13.08    126.24    209.62    159.33    267.47    243.26
  2021-01-05     86.48     13.37    127.80    209.82    160.93    269.49    245.04
  2021-01-06     85.63     12.58    123.50    204.38    156.92    261.87    251.99
  2021-01-07     88.19     13.31    127.71    210.19    158.11    267.27    272.01
  2021-01-08     89.36     13.24    128.82    211.48    159.13    266.11    293.34
...         ...        ...        ...        ...        ...        ...        ...
  2025-07-22    190.10    171.38    212.48    510.06    229.30    712.97    328.49
  2025-07-23    191.34    167.03    214.40    505.27    227.47    704.81    332.11
  2025-07-24    190.23    170.78    214.15    505.87    228.29    713.58    332.56
  2025-07-25    192.17    173.74    213.76    510.88    232.23    714.80    305.30
  2025-07-26    193.18    173.50    213.88    513.71    231.44    712.68    316.06