# Dataset Schema Profile

Raw directory: `data/raw`
Datasets profiled: 33

| Dataset | Files | Data Size | Sample Columns |
| --- | ---: | ---: | --- |
| autocast | 1 | 1.92 MiB |  |
| closed_polymarket_2025h1 | 1 | 2.55 MiB | condition_id, market_slug, end_date_iso, category, minimum_tick_size, minimum_order_size, liquidity, volume |
| forecast_snapshots_kalshi | 3 | 12.50 MiB | market_source, market_id, question, question_type, unit, url, description, resolution_criteria, categories, tags, snapshot_time, snapshot_datetime, ... |
| forecast_snapshots_metaculus_large | 3 | 5.90 MiB | market_source, market_id, question, question_type, unit, url, description, resolution_criteria, categories, tags, snapshot_time, snapshot_datetime, ... |
| forecast_snapshots_metaculus_small | 3 | 590.84 KiB | market_source, market_id, question, question_type, unit, url, description, resolution_criteria, categories, tags, snapshot_time, snapshot_datetime, ... |
| forecastbench | 41 | 116.20 MiB |  |
| futurex_online | 1 | 23.17 KiB | id, prompt, end_time, level, en_title |
| futurex_past | 1 | 244.11 KiB | id, prompt, end_time, level, title, ground_truth |
| halawi_llm_forecasting | 0 | 0 B |  |
| ir_event_forecasting_sample | 2 | 84.96 KiB | original_data, original_question, original_answer, solution, candidates, reasoning_content, question, model_answer |
| kalshi_filtered | 1 | 172.27 KiB | cutoff_time, week_id, ticker, event_ticker, title, subtitle, category, rules_primary, price_at_cutoff, last_price, last_price_dollars, yes_bid, ... |
| kalshi_markets | 1 | 1.37 MiB | ticker, event_ticker, market_type, title, subtitle, yes_sub_title, no_sub_title, open_time, close_time, expected_expiration_time, expiration_time, latest_expiration_time, ... |
| kalshi_prop_closes | 22 | 29.80 MiB | src, sport, league, gid, mid, teams, start, cutoff, type, period, units, desc, ... |
| kalshi_rfq_momentum | 0 | 0 B |  |
| kalshi_trades_trevorjs | 20 | 5.29 GiB | ticker, event_ticker, market_type, title, yes_sub_title, no_sub_title, status, yes_bid, yes_ask, no_bid, no_ask, last_price, ... |
| kalshi_trades_wmitch | 2 | 264.04 MiB | trade_id, ticker, count, created_time, yes_price, no_price, taker_side, market_ticker |
| kalshibench_v1 | 1 | 158.74 KiB | id, question, description, category, close_time, ground_truth, market_probability, series_ticker, source |
| kalshibench_v2 | 1 | 197.93 KiB | id, question, description, category, close_time, ground_truth, market_probability, series_ticker, source |
| metaculus_binary_chandak | 1 | 3.37 MiB | date_resolve_at, date_begin, extracted_urls, question_type, url, background, resolution_criteria, is_resolved, date_close, question, data_source, resolution, ... |
| metaculus_binary_jijivski | 2 | 87.69 KiB | question, possibilities, label, description, created_time, resolve_time |
| mirai | 2 | 104.50 KiB |  |
| mlb_polymarket_kalshi_matched_sample | 2 | 129.73 KiB | ts, date, game, team, poly_bid, poly_ask, poly_mid, kalshi_yes_bid, kalshi_yes_ask, kalshi_mid, xvenue_spread, winner, ... |
| openforesight | 8 | 455.82 MiB | qid, question_title, background, resolution_criteria, answer, answer_type, url, article_title, article_description, article_maintext, article_publish_date, article_modify_date, ... |
| polybench | 0 | 0 B |  |
| polymarket_10000 | 1 | 4.53 MiB | enable_order_book, active, closed, archived, accepting_orders, accepting_order_timestamp, minimum_order_size, minimum_tick_size, condition_id, question_id, question, description, ... |
| polymarket_5min_crypto_updown | 14 | 691.43 MiB | condition_id, event_id, slug, market_start, market_end, recorded_at, token_up, token_down, volume, liquidity, outcome, n_ticks |
| polymarket_clean | 1 | 80.01 KiB | id, amount, shares, userId, outcome, dpmShares, probAfter, contractId, probBefore, createdTime |
| polymarket_dataset_bbasavar | 1 | 90.32 MiB | instruction, input, output |
| polymarket_kalshi_scoresync_sample | 3 | 182.58 KiB | timestamp, ticker, title, game_id, home_team, away_team, home_score, away_score, score_diff, period, time_remaining, game_state, ... |
| polymarket_minute_parquet | 18 | 1.33 GiB | timestamp, price, token_id, side, market_id, event_id, question |
| prophet | 0 | 0 B |  |
| prophet_arena_100 | 1 | 2.02 MiB | event_ticker, title, category, markets, close_time, market_outcome, sources, market_info, snapshot_time, submission_id, submission_created_at |
| prophet_arena_1200 | 1 | 8.57 MiB | submission_id, event_ticker, title, snapshot_time, close_time, market_data, market_outcome, category, markets, augmented_title, rules, sources |
