csv doctor

工作里被 CSV 折腾的次数太多了——日文系统导出的 Shift_JIS 编码乱码、坐标列藏在表头深处、看不见的零宽空格混在数据里、重复的列名、各种边角情况。每次都用一两个一次性脚本处理,写完就扔。
于是顺手做了 csv doctor 把这些处理流程固化成一个浏览器工具。
它做的事很具体:
- 自动检测 Shift_JIS / CP932 / UTF-8 编码并修复
- Unicode 标准化、移除不可见字符、清理空白
- 自动识别坐标列(中英日)
- 如果有坐标列,导出 GeoJSON / CZML / KML
- 50 MB / 100,000 行内的文件本地处理
所有处理都在浏览器里完成,文件不会上传到服务器——这点对于工作中处理敏感数据很重要。
技术:React 19 + TypeScript + Vite,PapaParse 解析 CSV,encoding-japanese 处理日文编码。
这个工具是用 Claude Code 协作搭起来的早期版本,后续会继续打磨。
Got tired of CSV pain at work — Shift_JIS exports from legacy Japanese systems, coordinate columns hiding at random positions, invisible zero-width spaces in cells, duplicate headers, all the edge cases. I'd been writing one-off scripts every time and throwing them away.
So I made csv doctor to encode those flows into one browser tool.
What it does:
- Auto-detects Shift_JIS / CP932 / UTF-8 encodings and fixes them
- Unicode normalization, invisible-character removal, whitespace cleanup
- Auto-detects coordinate columns in English, Chinese, Japanese
- If coordinates are present: exports to GeoJSON / CZML / KML
- Up to 50 MB / 100,000 rows, all local
Everything runs in-browser. Files never leave your machine — important when you're handling sensitive data at work.
Stack: React 19 + TypeScript + Vite, PapaParse for CSV, encoding-japanese for Japanese encodings.
Built with Claude Code as an early collaboration. Will keep refining.
--- layout: lab.njk title_zh: csv doctor title_en: csv doctor year: 2026 cover: /images/lab/csv-doctor/cover.png description: "Browser-based CSV cleaner & WebGIS converter. Auto-detects encodings, normalizes Unicode, exports to GeoJSON / CZML / KML. All processing local — files never leave your browser." url: https://csv-doctor.surreal.tools github: https://github.com/lavalse/csv-doctor workTags: - lab - software permalink: /lab/csv-doctor/ ---  <div lang="zh"> 工作里被 CSV 折腾的次数太多了——日文系统导出的 Shift_JIS 编码乱码、坐标列藏在表头深处、看不见的零宽空格混在数据里、重复的列名、各种边角情况。每次都用一两个一次性脚本处理,写完就扔。 于是顺手做了 csv doctor 把这些处理流程固化成一个浏览器工具。 它做的事很具体: - 自动检测 Shift_JIS / CP932 / UTF-8 编码并修复 - Unicode 标准化、移除不可见字符、清理空白 - 自动识别坐标列(中英日) - 如果有坐标列,导出 GeoJSON / CZML / KML - 50 MB / 100,000 行内的文件本地处理 所有处理都在浏览器里完成,文件不会上传到服务器——这点对于工作中处理敏感数据很重要。 技术:React 19 + TypeScript + Vite,[PapaParse](https://www.papaparse.com/) 解析 CSV,[encoding-japanese](https://github.com/polygonplanet/encoding.js) 处理日文编码。 这个工具是用 [Claude Code](https://www.claude.com/product/claude-code) 协作搭起来的早期版本,后续会继续打磨。 </div> <div lang="en"> Got tired of CSV pain at work — Shift_JIS exports from legacy Japanese systems, coordinate columns hiding at random positions, invisible zero-width spaces in cells, duplicate headers, all the edge cases. I'd been writing one-off scripts every time and throwing them away. So I made csv doctor to encode those flows into one browser tool. What it does: - Auto-detects Shift_JIS / CP932 / UTF-8 encodings and fixes them - Unicode normalization, invisible-character removal, whitespace cleanup - Auto-detects coordinate columns in English, Chinese, Japanese - If coordinates are present: exports to GeoJSON / CZML / KML - Up to 50 MB / 100,000 rows, all local Everything runs in-browser. Files never leave your machine — important when you're handling sensitive data at work. Stack: React 19 + TypeScript + Vite, [PapaParse](https://www.papaparse.com/) for CSV, [encoding-japanese](https://github.com/polygonplanet/encoding.js) for Japanese encodings. Built with [Claude Code](https://www.claude.com/product/claude-code) as an early collaboration. Will keep refining. </div>