Sample datasets

a few realistic CSVs for testing the tools on this site · updated 4 May 2026

Every tool here works just as well with your own CSV, but sometimes you just want something to drop in and see what the output looks like. The files below are small, realistic, and cover the obvious edge cases — including an intentionally broken one for testing the validators. Each link saves the file to your machine; nothing is uploaded anywhere when you then drop it into a tool.

Browse the full collection on GitHub → The cards below are a curated subset of tinytoolkit-org/csv-datasets. More files (Titanic, wine quality, OHLCV stock data, NYC taxi trips, 50k server metrics) plus reproducible Python generators. CC0 1.0.

Small starter files

iris.csv

The classic 1936 Fisher iris dataset — clean, well-formed, three species, four numeric columns. The Hello World of CSV.

1.0 KB · 30 rows × 5 columns · download ↓

products-with-quoting.csv

Product catalog exercising every RFC 4180 corner: embedded commas, escaped double quotes, mixed types, multi-value tags.

2.0 KB · 14 rows × 6 columns · download ↓

sales-with-bom-and-mixed-types.csv

Order data with leading-zero ZIP codes, phone numbers, ISO dates, and currency — the things tools love to silently mangle.

1.5 KB · 20 rows × 7 columns · download ↓

broken-csv.csv

Intentionally malformed: an unescaped quote, an unterminated quoted string, a stray comma in an email, a blank row, and a non-date date.

0.6 KB · 10 rows × 5 columns (some broken) · download ↓

Larger datasets

Bigger files for benchmarking, testing memory usage, and seeing how the tools handle thousands or millions of rows. They're synthetic but schema-realistic and use a stable seed, so the same file is reproducible.

ecommerce-orders-10k.csv

Ten thousand orders flattened to a single row each — quotes, commas in product names, mixed types, ISO timestamps. Good for testing CSV → JSON, dedupe, and split.

689 KB · 10,000 rows × 10 columns · download ↓

server-logs-50k.csv

Fifty thousand HTTP access log entries with realistic status-code distribution and millisecond response times — exercises streaming and filtering.

6.5 MB · 50,000 rows × 9 columns · download ↓

iot-sensor-100k.csv

A hundred thousand IoT sensor readings across ~200 devices and five metrics, with monotonically increasing timestamps — ideal for stress-testing the viewer.

5.2 MB · 100,000 rows × 5 columns · download ↓

github-issues-25k.csv

Twenty-five thousand GitHub-style issue records with semicolon-joined labels and assignees, reaction counts, and a body excerpt — wide rows with text-y data.

4.9 MB · 25,000 rows × 14 columns · download ↓

broken-csv-large.csv

Five hundred rows with ~20% intentionally broken in ten different ways: wrong delimiters, unterminated quotes, embedded NULs, BOMs mid-file, and bad dates.

27 KB · 500 rows · ~100 broken · download ↓

How to use these

Click any card to download. Then drop or paste the file into whichever tool fits — CSV to JSON for the clean ones, the viewer for a quick look, or any of the others. The intentionally-broken file (broken-csv.csv) is useful for trying parsers and seeing how they recover.

Want one added?

If you have a real-world file you'd love to see as a sample (anonymised, of course), email [email protected] and I'll add it. The smaller and weirder the better — the goal is to have files that exercise the tools, not pretty demos.

— S., [email protected]