FreeFileConverter
Data

Cleaning Messy CSV Before You Convert: A Practical Checklist

May 20265 min read

CSV is deceptively simple. Comma-separated values — how hard can it be? In practice, CSV files exported from different systems carry a surprising number of inconsistencies that break parsers, corrupt conversions, and cause mysterious errors downstream. Before converting any CSV to JSON, TSV, or XML, run through these checks.

Common CSV problems and how to fix them

Missing or duplicate header row
CSV converters use the first row as column names. If there's no header row, every row becomes data including your intended headers — all columns get generic names like Column1, Column2. Duplicate headers cause the second column to silently overwrite the first during JSON conversion.
Fix: add a clean header row
Open in a text editor or spreadsheet. Ensure row 1 contains unique, descriptive column names with no spaces or special characters (use first_name not First Name for cleaner JSON keys). Remove any duplicate column names.
Inconsistent column counts
A row with 5 columns when the header has 4 means a value contains an unescaped comma. A row with 3 columns means a value is missing or a field was accidentally split. Both cause parsing errors or silently misaligned data.
Fix: quote fields containing commas
Any field value containing a comma must be wrapped in double quotes: "Smith, John" not Smith, John. Most spreadsheet applications do this automatically on export — but manual or programmatic CSV generation often doesn't.
Mixed line endings
Windows uses \r\n (CRLF), Unix/Mac uses \n (LF). A CSV exported from Excel on Windows and edited on a Mac can end up with mixed line endings. Some parsers handle this gracefully; others split rows incorrectly or include \r as part of the last field value.
Fix: normalize line endings
Open in VS Code and check the bottom-right corner for "CRLF" or "LF". Click it to switch. Or use a text editor's Find & Replace to replace all \r\n with \n. Save as UTF-8.
Wrong encoding (not UTF-8)
Excel exports CSV in Windows-1252 encoding by default, not UTF-8. This causes accented characters, em dashes, curly quotes, and any non-ASCII text to appear as garbled characters (’ instead of ') after conversion.
Fix: save as UTF-8 CSV from Excel
In Excel: File → Save As → CSV UTF-8 (with BOM). In Google Sheets, export as CSV — it's always UTF-8. Our converter reads files as UTF-8; if your source file is in a different encoding, re-save it first.
Empty rows and trailing whitespace
Empty rows at the end of the file (common in Excel exports) produce empty objects in JSON output. Leading/trailing spaces in field values become part of the value — Alice and Alice become different JSON values.
Fix: strip empty rows and trim whitespace
Open the file in a text editor and delete any blank lines at the end. For whitespace in values, a find-replace of , with , handles most cases for unquoted fields. For quoted fields, manual cleanup or a script is more reliable.
Quick sanity check: Before converting, open your CSV in a spreadsheet and scan for rows where a column suddenly shifts right — that's almost always an unescaped comma in a field value. Fix those first.

Convert your cleaned CSV

Once your CSV is clean — consistent headers, properly quoted fields, UTF-8 encoding, no empty rows — conversion to JSON, TSV, or XML is reliable and lossless. Drop it into the Data Files panel and download your converted file.

Convert CSV → JSON, TSV, or XML
Free, browser-based, no upload required.
Convert now →
← Back to blog