# Dataset Download Inventory

Generated: 2026-07-01T17:49:18.976847+00:00
Catalog: `datasets/catalog.json`
Raw data directory: `data/raw`
Downloaded/updated: 33
Skipped: 5
Failed: 0
Downloaded local footprint: 8.31 GiB

| Dataset | Stream | Kind | Status | Remote Size | Local Size | Reason | Source |
| --- | --- | --- | --- | ---: | ---: | --- | --- |
| openforesight | dataset_construction | hf_dataset | downloaded | 456.21 MiB | 455.83 MiB |  | nikhilchandak/OpenForesight |
| kalshibench_v2 | dataset_construction | hf_dataset | downloaded | 200.90 KiB | 198.91 KiB |  | 2084Collective/kalshibench-v2 |
| kalshibench_v1 | dataset_construction | hf_dataset | downloaded | 161.72 KiB | 159.72 KiB |  | 2084Collective/kalshibench-v1 |
| forecastbench | dataset_construction | hf_dataset | downloaded | 116.39 MiB | 116.23 MiB |  | forecastingresearch/forecastbench-datasets |
| futurex_past | dataset_construction | hf_dataset | downloaded | 252.17 KiB | 250.17 KiB |  | futurex-ai/Futurex-Past |
| futurex_online | dataset_construction | hf_dataset | downloaded | 27.62 KiB | 25.62 KiB |  | futurex-ai/Futurex-Online |
| prophet_arena_100 | dataset_construction | hf_dataset | downloaded | 2.05 MiB | 2.05 MiB |  | prophetarena/Prophet-Arena-Subset-100 |
| prophet_arena_1200 | dataset_construction | hf_dataset | downloaded | 8.61 MiB | 8.60 MiB |  | prophetarena/Prophet-Arena-Subset-1200 |
| metaculus_binary_chandak | dataset_construction | hf_dataset | downloaded | 3.38 MiB | 3.37 MiB |  | nikhilchandak/metaculus-binary |
| metaculus_binary_jijivski | dataset_construction | hf_dataset | downloaded | 89.97 KiB | 88.20 KiB |  | jijivski/metaculus_binary |
| forecast_snapshots_metaculus_large | social_forecasting | hf_dataset | downloaded | 5.91 MiB | 5.90 MiB |  | chestnutforty/forecast-snapshots-metaculus-6f1cdfd9b3 |
| forecast_snapshots_metaculus_small | social_forecasting | hf_dataset | downloaded | 594.83 KiB | 593.03 KiB |  | chestnutforty/forecast-snapshots-metaculus-2cc65706d0 |
| ir_event_forecasting_sample | dataset_construction | hf_dataset | downloaded | 88.16 KiB | 86.29 KiB |  | EventForecasting/IR_event_forecasting_sample |
| kalshi_markets | market_data | hf_dataset | downloaded | 1.37 MiB | 1.37 MiB |  | thomaswmitch/kalshi-prediction-markets-markets |
| kalshi_trades_wmitch | market_data | hf_dataset | downloaded | 264.05 MiB | 264.05 MiB |  | thomaswmitch/kalshi-prediction-markets-betting |
| kalshi_trades_trevorjs | market_data | hf_dataset | downloaded | 5.29 GiB | 5.29 GiB |  | TrevorJS/kalshi-trades |
| forecast_snapshots_kalshi | market_data | hf_dataset | downloaded | 12.50 MiB | 12.50 MiB |  | chestnutforty/forecast-snapshots-kalshi_events-768472771c |
| kalshi_filtered | market_data | hf_dataset | downloaded | 176.08 KiB | 174.08 KiB |  | dzorlu/kalshi-filtered |
| kalshi_prop_closes | market_data | hf_dataset | downloaded | 29.81 MiB | 29.81 MiB |  | mvpeav/kalshi-prop-closes |
| kalshi_rfq_momentum | market_data | hf_dataset | downloaded | 3.64 KiB | 0 B |  | mvpeav/kalshi-rfq-momentum |
| mlb_polymarket_kalshi_matched_sample | market_data | hf_dataset | downloaded | 249.69 KiB | 152.92 KiB |  | Coyevans/mlb-polymarket-kalshi-matched-book-sample |
| polymarket_kalshi_scoresync_sample | market_data | hf_dataset | downloaded | 188.31 KiB | 186.46 KiB |  | Coyevans/polymarket-kalshi-scoresync-orderbook-sample |
| polymarket_10000 | market_data | hf_dataset | downloaded | 4.53 MiB | 4.53 MiB |  | CK0607/polymarket_10000 |
| polymarket_clean | market_data | hf_dataset | downloaded | 83.01 KiB | 81.01 KiB |  | CK0607/polymarket_clean |
| closed_polymarket_2025h1 | market_data | hf_dataset | downloaded | 2.55 MiB | 2.55 MiB |  | CK0607/closed-polymarket-2025H1 |
| polymarket_dataset_bbasavar | market_data | hf_dataset | downloaded | 90.32 MiB | 90.32 MiB |  | bbasavar/PolymarketDataset |
| polymarket_5min_crypto_updown | market_data | hf_dataset | downloaded | 691.44 MiB | 691.44 MiB |  | kachoio/polymarket-5-minute-crypto-up-down-markets |
| polymarket_minute_parquet | market_data | hf_dataset | downloaded | 1.33 GiB | 1.33 GiB |  | Mithilss/polymarket_minute_parquet |
| polymarket_full_sii | market_data | hf_dataset | skipped | 159.11 GiB | 0 B | large_opt_in | SII-WANGZJ/Polymarket_data |
| polymarket_crypto_derivatives | market_data | hf_dataset | skipped | 17.90 GiB | 0 B | large_opt_in | trentmkelly/polymarket_crypto_derivatives |
| polymarket_crypto_updown | market_data | hf_dataset | skipped | 27.08 GiB | 0 B | large_opt_in | aliplayer1/polymarket-crypto-updown |
| polymarket_onchain_v1 | market_data | hf_dataset | skipped | 118.45 GiB | 0 B | large_opt_in | moose-code/polymarket-onchain-v1 |
| autocast | dataset_construction | git_repo | cloned |  | 5.19 MiB |  | https://github.com/andyzoujm/autocast.git |
| mirai | dataset_construction | git_repo | cloned |  | 8.77 MiB |  | https://github.com/yecchen/MIRAI.git |
| polybench | dataset_construction | git_repo | cloned |  | 2.48 MiB |  | https://github.com/PolyBench/PolyBench.git |
| prophet | dataset_construction | git_repo | cloned |  | 66.31 KiB |  | https://github.com/TZWwww/PROPHET.git |
| openforecast | dataset_construction | manual | skipped |  | 0 B | manual_or_disabled | https://github.com/miaomiao1215/Openforecast |
| halawi_llm_forecasting | social_forecasting | git_repo | cloned |  | 10.53 MiB |  | https://github.com/dannyallover/llm_forecasting.git |

Large datasets are cataloged but skipped by default when they exceed the configured size policy. Run `python3 scripts/download_datasets.py --include-large` to fetch them.
