Remove Duplicates From CSV
CSV Dedupe Toolkit
Remove duplicate rows from a CSV by any column or combination of columns. Leave the column field empty for whole-row dedupe. Choose whether to keep the first or last occurrence of each duplicate, and optionally ignore case or trim whitespace during comparison. Comparison rules are explicit, so nothing in the data is changed silently. Runs entirely in the browser.
Before you start
You need to know:
- What "duplicate" means for your data. Exact row duplicates? Duplicate by email? Duplicate within the same country? The answer decides which columns go into the key.
- Which row to keep when there are duplicates — the first occurrence in the file (typical for "canonical" data) or the last (typical for "latest wins" exports).
- Whether
"Alice"and"alice "should be considered the same (casing + whitespace). These are off by default — turn them on when you know the source is dirty.
How to use it
- Paste or drop your CSV in the left pane.
- In Cols, type the column names that define a duplicate, comma-separated — e.g.
emailoremail, country. Leave blank for whole-row dedupe. - Toggle ignore case if casing shouldn't matter.
- Toggle trim if leading/trailing whitespace shouldn't matter.
- Pick Keep first (default) or last.
- Click Dedupe. The status bar reports how many duplicates were removed.
- Copy or Download .csv.
Options explained
Cols (the dedupe key)
Each row is reduced to a key built from the listed columns, joined with a separator. Two rows with the same key are duplicates. Leaving the field blank uses every column — equivalent to whole-row dedupe.
Examples: email (one row per email); email, country (same email in
different countries is allowed); first_name, last_name, dob (dedupe by identity tuple).
ignore case & trim
ignore case lowercases the key before comparing: Alice and
alice collapse. trim removes leading/trailing whitespace from each
key component: "alice@x " and "alice@x" collapse. Both operate on the
comparison key only — the output row preserves the original values.
Keep first / last
Which duplicate survives. First preserves the original file order and keeps the earliest occurrence. Last keeps the most recent — useful when the source is an append-only log and the newest row has the canonical state.
Example
Input (duplicates by email):
email,name,last_login
[email protected],Alice,2024-01-01
[email protected],Bob,2024-01-02
[email protected],Alice (updated),2024-02-09
Dedupe on email, ignore case on, Keep last:
email,name,last_login
[email protected],Bob,2024-01-02
[email protected],Alice (updated),2024-02-09
Tips & common pitfalls
- "Near duplicates" aren't caught.
[email protected]and[email protected]are different keys — no fuzzy matching, on purpose. Normalise upstream if that's the problem. - Whole-row dedupe on big files is expensive. If you can pick a key column, do — it's dramatically faster and uses less memory.
- Empty cells are real keys. Every row with an empty
emailis a duplicate of every other row with an emptyemail. If you want to keep those unique, filter them out first. - Column names are case-sensitive.
emailandEmailare different headers. Check the CSV if the tool complains. - Keep: Last reads the whole file before emitting anything, so it uses slightly more memory than Keep: First. Usually negligible.
Troubleshooting
"Column not found" on a column that's clearly in the file.
Casing mismatch or an invisible character (like a BOM on the first column). Try copying the header name straight from the file rather than typing it.
The row count went down more than expected.
Turn off trim and ignore case to see the count with exact comparison, then re-enable them one by one. The difference tells you how dirty the source is.
I want to see which rows were removed, not which survived.
That's not a current option. Workaround: dedupe the file, then use CSV Diff to compare the original and the result.
Frequently asked questions
Can I dedupe on multiple columns?
Yes. Enter a comma-separated list, e.g. email, country. A row is a duplicate only if the combination of those columns matches.
Does it upload my file?
No. All deduping runs in your browser. The "Removed N duplicates" feedback is computed locally. See the privacy policy.
Will the original file order be preserved?
Yes. Output rows appear in the order they were first seen in the input. If you set Keep: last, the surviving row for each key is the last-seen one, but the surviving rows still come out in first-seen order.
Can it handle a million rows?
Yes, as long as it fits in your browser's RAM. A million thin rows is fine on a laptop; wide rows with long text will use a lot more memory.