Sample datasets
a few realistic CSVs for testing the tools on this site · updated 4 May 2026
Every tool here works just as well with your own CSV, but sometimes you
just want something to drop in and see what the output looks like. The
files below are small, realistic, and cover the obvious edge cases —
including an intentionally broken one for testing the
validators. Each link saves the file to your machine; nothing is uploaded
anywhere when you then drop it into a tool.
Browse the full collection on GitHub →
The cards below are a curated subset of tinytoolkit-org/csv-datasets.
More files (Titanic, wine quality, OHLCV stock data, NYC taxi trips, 50k server metrics) plus reproducible Python generators. CC0 1.0.
Small starter files
iris.csv
The classic 1936 Fisher iris dataset — clean, well-formed, three species, four numeric columns. The Hello World of CSV.
1.0 KB · 30 rows × 5 columns · download ↓
products-with-quoting.csv
Product catalog exercising every RFC 4180 corner: embedded commas, escaped double quotes, mixed types, multi-value tags.
2.0 KB · 14 rows × 6 columns · download ↓
sales-with-bom-and-mixed-types.csv
Order data with leading-zero ZIP codes, phone numbers, ISO dates, and currency — the things tools love to silently mangle.
1.5 KB · 20 rows × 7 columns · download ↓
broken-csv.csv
Intentionally malformed: an unescaped quote, an unterminated quoted string, a stray comma in an email, a blank row, and a non-date date.
0.6 KB · 10 rows × 5 columns (some broken) · download ↓
Larger datasets
Bigger files for benchmarking, testing memory usage, and seeing how the tools handle thousands or millions of rows. They're synthetic but schema-realistic and use a stable seed, so the same file is reproducible.
ecommerce-orders-10k.csv
Ten thousand orders flattened to a single row each — quotes, commas in product names, mixed types, ISO timestamps. Good for testing CSV → JSON, dedupe, and split.
689 KB · 10,000 rows × 10 columns · download ↓
server-logs-50k.csv
Fifty thousand HTTP access log entries with realistic status-code distribution and millisecond response times — exercises streaming and filtering.
6.5 MB · 50,000 rows × 9 columns · download ↓
iot-sensor-100k.csv
A hundred thousand IoT sensor readings across ~200 devices and five metrics, with monotonically increasing timestamps — ideal for stress-testing the viewer.
5.2 MB · 100,000 rows × 5 columns · download ↓
github-issues-25k.csv
Twenty-five thousand GitHub-style issue records with semicolon-joined labels and assignees, reaction counts, and a body excerpt — wide rows with text-y data.
4.9 MB · 25,000 rows × 14 columns · download ↓
broken-csv-large.csv
Five hundred rows with ~20% intentionally broken in ten different ways: wrong delimiters, unterminated quotes, embedded NULs, BOMs mid-file, and bad dates.
27 KB · 500 rows · ~100 broken · download ↓
How to use these
Click any card to download. Then drop or paste the file into whichever
tool fits — CSV to JSON for the clean
ones, the viewer for a quick look, or any
of the others. The intentionally-broken file (broken-csv.csv)
is useful for trying parsers and seeing how they recover.
Want one added?
If you have a real-world file you'd love to see as a sample (anonymised,
of course), email [email protected] and
I'll add it. The smaller and weirder the better — the goal is to have
files that exercise the tools, not pretty demos.
— S., [email protected]