open - excel save as utf-8 csv
Excel to CSV with UTF8 encoding (20)
I have an Excel file that has some Spanish characters (tildes, etc.) that I need to convert to a CSV file to use as an import file. However, when I do Save As CSV it mangles the "special" Spanish characters that aren't ASCII characters. It also seems to do this with the left and right quotes and long dashes that appear to be coming from the original user creating the Excel file in Mac.
Since CSV is just a text file I'm sure it can handle a UTF8 encoding, so I'm guessing it is an Excel limitation, but I'm looking for a way to get from Excel to CSV and keep the non-ASCII characters intact.
This will fix the corrupted CSV file saved by Excel and re-save it in the proper encoding.
- Export CSV from Excel
- Load into Notepad++
- Fix encoding
Excel saves in CP-1252 / Windows-1252. Open the CSV file in Notepad++. Select
Encoding > Character Sets > Western European > Windows-1252
Encoding > Convert to UTF-8 File > Save
First tell Notepad++ the encoding, then convert. Some of these other answers are converting without setting the proper encoding first, mangling the file even more. They would turn what should be
達. If your character does not fit into CP-1252 then it was already lost when it was saved as CSV. Use another answer for that.
Save the Excel sheet as "Unicode Text (.txt)". The good news is that all the international characters are in UTF16 (note, not in UTF8). However, the new "*.txt" file is TAB delimited, not comma delimited, and therefore is not a true CSV.
(optional) Unless you can use a TAB delimited file for import, use your favorite text editor and replace the TAB characters with commas ",".
Import your *.txt file in the target application. Make sure it can accept UTF16 format.
If UTF-16 has been properly implemented with support for non-BMP code points, that you can convert a UTF-16 file to UTF-8 without losing information. I leave it to you to find your favourite method of doing so.
I use this procedure to import data from Excel to Moodle.
"nevets1219" is right about Google docs, however if you simply "import" the file it often does not convert it to UTF-8.
But if you import the CSV into an existing Google spreadsheet it does convert to UTF-8.
Here's a recipe:
- On the main Docs (or Drive) screen click the "Create" button and choose "Spreadsheet"
- From the "File" menu choose "Import"
- Click "Choose File"
- Choose "Replace spreadsheet"
- Choose whichever character you are using as a Separator
- Click "Import"
- From the "File" menu choose "Download as" -> CSV (current sheet)
The resulting file will be in UTF-8
easiest way: no need Open office and google docs
- Save your file as "Unicode text file";
- now you have an unicode text file
- open it with "notepad" and "Save as" it with selecting "utf-8" or other code page that you want
- rename file extension from "txt" to "csv"
dont open it with Ms-office anyway!!! Now you have a tab delimited CSV file.
A simple workaround is to use Google Spreadsheet. Paste (values only if you have complex formulas) or import the sheet then download CSV. I just tried a few characters and it works rather well.
NOTE: Google Sheets does have limitations when importing. See here.
NOTE: Be careful of sensitive data with Google Sheets.
EDIT: Another alternative - basically they use VB macro or addins to force the save as UTF8. I have not tried any of these solutions but they sound reasonable.
Another one I've found useful: "Numbers" allows encoding-settings when saving as CSV.
Assuming an Windows environment, save and work with the file as usual in Excel but then open up the saved Excel file in Gnome Gnumeric (free). Save Gnome Gnumeric's spreadsheet as CSV which - for me anyway - saves it as UTF-8 CSV.
Came across the same problem and googled out this post. None of the above worked for me. At last I converted my Unicode .xls to .xml (choose Save as ... XML Spreadsheet 2003) and it produced the correct character. Then I wrote code to parse the xml and extracted content for my use.
Encoding -> Convert to Ansi will encode it in ANSI/UNICODE. Utf8 is a subset of Unicode. Perhaps in ANSI will be encoded correctly, but here we are talking about UTF8, @SequenceDigitale.
There are faster ways, like exporting as csv ( comma delimited ) and then, opening that csv with Notepad++ ( free ), then Encoding > Convert to UTF8. But only if you have to do this once per file. If you need to change and export fequently, then the best is LibreOffice or GDocs solution.
Excel typically saves a csv file as ANSI encoding instead of utf8.
One option to correct the file is to use Notepad or Notepad++:
- Open the .csv with Notepad or Notepad++.
- Copy the contents to your computer clipboard.
- Delete the contents from the file.
- Change the encoding of the file to utf8.
- Paste the contents back from the clipboard.
- Save the file.
I have also came across the same problem but there is an easy solution for this.
- Open your xlsx file in Excel 2016 or higher.
- In "Save As" choose this option: "(CSV UTF-8(Comma Delimited)*.csv)"
It works perfectly and a csv file is generated which can be imported in any software. I imported this csv file in my SQLITE database and it works perfectly with all unicode characters intact.
I have the same problem and come across this add in , and it works perfectly fine in excel 2013 beside excel 2007 and 2010 which it is mention for.
I know this is an old question but I happened to come upon this question while struggling with the same issues as the OP.
Not having found any of the offered solutions a viable option, I set out to discover if there is a way to do this just using Excel.
Fortunately, I have found that the lost character issue only happens (in my case) when saving from xlsx format to csv format. I tried saving the xlsx file to xls first, then to csv. It actually worked.
Please give it a try and see if it works for you. Good luck.
I was not able to find a VBA solution for this problem on Mac Excel. There simply seemed to be no way to output UTF-8 text.
So I finally had to give up on VBA, bit the bullet, and learned AppleScript. It wasn't nearly as bad as I had thought.
Solution is described here: http://talesoftech.blogspot.com/2011/05/excel-on-mac-goodbye-vba-hello.html
Microsoft Excel has an option to export spreadsheet using Unicode encoding. See following screenshot.
Save Dialog > Tools Button > Web Options > Encoding Tab
Under Excel 2016, we have a CSV export option dedicated to UTF-8 format.
What about using Powershell.
Get-Content 'C:\my.csv' | Out-File 'C:\my_utf8.csv' -Encoding UTF8
After saving as CSV under Excel in the command line put:
iconv -f cp1250 -t utf-8 file-encoded-cp1250.csv > file-encoded-utf8.csv
(remember to replace cp1250 with your encoding).
Works fast and great for big files like post codes database, which cannot be imported to GoogleDocs (400.000 cells limit).
another solution is to open the file by winword and save it as txt and then reopen it by excel and it will work ISA