sql-server - nchar - nvarchar vs varchar performance
What is the difference between varchar and nvarchar? (12)
varchar: Variable-length, non-Unicode character data. The database collation determines which code page the data is stored using.
nvarchar: Variable-length Unicode character data. Dependent on the database collation for comparisons.
Armed with this knowledge, use whichever one matches your input data (ASCII v. Unicode).
Is it just that
nvarchar supports multibyte characters? If that is the case, is there really any point, other than storage concerns, to using
nvarchar is safe to use compared to
varchar in order to make our code error free (type mismatching) because
nvarchar allows unicode characters also.
When we use
where condition in SQL Server query and if we are using
= operator, it will throw error some times. Probable reason for this is our mapping column will be difined in
varchar. If we defined it in
nvarchar this problem my not happen. Still we stick to
varchar and avoid this issue we better use
LIKE key word rather than
nvarchar column can store any Unicode data. A
varchar column is restricted to an 8-bit codepage. Some people think that
varchar should be used because it takes up less space. I believe this is not the correct answer. Codepage incompatabilities are a pain, and Unicode is the cure for codepage problems. With cheap disk and memory nowadays, there is really no reason to waste time mucking around with code pages anymore.
All modern operating systems and development platforms use Unicode internally. By using
nvarchar rather than
varchar, you can avoid doing encoding conversions every time you read from or write to the database. Conversions take time, and are prone to errors. And recovery from conversion errors is a non-trivial problem.
If you are interfacing with an application that uses only ASCII, I would still recommend using Unicode in the database. The OS and database collation algorithms will work better with Unicode. Unicode avoids conversion problems when interfacing with other systems. And you will be preparing for the future. And you can always validate that your data is restricted to 7-bit ASCII for whatever legacy system you're having to maintain, even while enjoying some of the benefits of full Unicode storage.
Follow Difference Between Sql Server VARCHAR and NVARCHAR Data Type. Here you could see in a very descriptive way.
In generalnvarchar stores data as Unicode, so, if you're going to store multilingual data (more than one language) in a data column you need the N variant.
I always use nvarchar as it allows whatever I'm building to withstand pretty much any data I throw at it. My CMS system does Chinese by accident, because I used nvarchar. These days, any new applications shouldn't really be concerned with the amount of space required.
I had a look at the answers and many seem to recommend to use
varchar, because space is not a problem anymore, so there is no harm in enabling Unicode for little extra storage. Well, this is not always true when you want to apply an index over your column. SQL Server has a limit of 900 bytes on the size of the field you can index. So if you have a
varchar(900) you can still index it, but not
nvarchar, the number of characters is halved, so you can index up to
nvarchar(450). So if you are confident you don't need
nvarchar, I don't recommend using it.
In general, in databases, I recommend sticking to the size you need, because you can always expand. For example, a colleague at work once thought that there is no harm in using
nvarchar(max) for a column, as we have no problem with storage at all. Later on, when we tried to apply an index over this column, SQL Server rejected this. If, however, he started with even
varchar(5), we could have simply expanded it later to what we need without such a problem that will require us to do a field migration plan to fix this problem.
I would say, it depends.
If you develop a desktop application, where the OS works in Unicode (like all current Windows systems) and language does natively support Unicode (default strings are Unicode, like in Java or C#), then go nvarchar.
If you develop a web application, where strings come in as UTF-8, and language is PHP, which still does not support Unicode natively (in versions 5.x), then varchar will probably be a better choice.
If a single byte is used to store a character, there are 256 possible combinations, and thereby you can save 256 different characters. Collation is the pattern which defines the characters and the rules by which they are compared and sorted.
1252, which is the Latin1 (ANSI), is the most common. Single-byte character sets are also inadequate to store all the characters used by many languages. For example, some Asian languages have thousands of characters, so they must use two bytes per character.
When systems using multiple code pages are used in a network, it becomes difficult to manage communication. To standardize things, the ISO and Unicode consortium introduced the Unicode. Unicode uses two bytes to store each character. That is 65,536 different characters can be defined, so almost all the characters can be covered with Unicode. If two computers use Unicode, every symbol will be represented in the same way and no conversion is needed - this is the idea behind Unicode.
SQL Server has two categories of character datatypes:
- non-Unicode (char, varchar, and text)
- Unicode (nchar, nvarchar, and ntext)
If we need to save character data from multiple countries, always use Unicode.
Mainly nvarchar stores Unicode characters and varchar stores non-Unicode characters.
"Unicodes" means 16-bit character encoding scheme allowing characters from lots of other languages like Arabic, Hebrew, Chinese, Japanese, to be encoded in a single character set.
That means unicodes is using 2 bytes per character to store and nonunicodes uses only one byte per character to store. Which means unicodes need double capacity to store compared to non-unicodes.
My two cents
Indexes can fail when not using the correct datatypes:
In SQL Server: When you have an index over a VARCHAR column and present it a Unicode String, SQL Server does not make use of the index. The same thing happens when you present a BigInt to a indexed-column containing SmallInt. Even if the BigInt is small enough to be a SmallInt, SQL Server is not able to use the index. The other way around you do not have this problem (when providing SmallInt or Ansi-Code to an indexed BigInt ot NVARCHAR column).
Datatypes can vary between different DBMS's (DataBase Management System):
Know that every database has slightly different datatypes and VARCHAR does not means the same everywhere. While SQL Server has VARCHAR and NVARCHAR, an Apache/Derby database has only VARCHAR and there VARCHAR is in Unicode.
nvarchar stores Unicode data while
varchar stores single-byte character data. Other than storage differences (
nvarchar requires twice the storage space as
varchar), which you already mentioned, the main reason for preferring
varchar would be internationalization (i.e. storing strings in other languages).
nVarchar will help you to store Unicode characters. It is the way to go if you want to store localized data.