sql - example - varchar vs string
What are the use cases for selecting CHAR over VARCHAR in SQL? (13)
I realize that CHAR is recommended if all my values are fixed-width. But, so what? Why not just pick VARCHAR for all text fields just to be safe.
CHAR takes up less storage space than VARCHAR if all your data values in that field are the same length. Now perhaps in 2009 a 800GB database is the same for all intents and purposes as a 810GB if you converted the VARCHARs to CHARs, but for short strings (1 or 2 characters), CHAR is still a industry "best practice" I would say.
Now if you look at the wide variety of data types most databases provide even for integers alone (bit, tiny, int, bigint), there ARE reasons to choose one over the other. Simply choosing bigint every time is actually being a bit ignorant of the purposes and uses of the field. If a field simply represents a persons age in years, a bigint is overkill. Now it's not necessarily "wrong", but it's not efficient.
But its an interesting argument, and as databases improve over time, it could be argued CHAR vs VARCHAR does get less relevant.
Char is a little bit faster, so if you have a column that you KNOW will be a certain length, use char. For example, storing (M)ale/(F)emale/(U)nknown for gender, or 2 characters for a US state.
Fragmentation. Char reserves space and VarChar does not. Page split can be required to accommodate update to varchar.
Generally pick CHAR if all rows will have close to the same length. Pick VARCHAR when the length varies significantly. CHAR may also be a bit faster because all the rows are of the same length.
It varies by DB implementation, but generally VARCHAR uses one or two more bytes of storage (for length or termination) in addition to the actual data. So (assuming you are using a one byte character set) storing the word "FooBar"
- CHAR(6) = 6 bytes (no overhead)
- VARCHAR(10) = 8 bytes (2 bytes of overhead)
- CHAR(10) = 10 bytes (4 bytes of overhead)
Bottom line is CHAR can be faster and more space efficient for data of relatively the same length (within two characters length difference).
Note: Microsoft SQL has 2 bytes of overhead for a VARCHAR. This may vary from DB to DB, but generally there is at least 1 byte of overhead needed to indicate length or EOL on a VARCHAR.
As was pointed out by Gaven in the comments, if you are using a multi-byte, variable length character set like UTF8 then CHAR stores the maximum number of bytes necessary to store the number of characters. So if UTF8 needs at most 3 bytes to store a character, then CHAR(6) will be fixed at 18 bytes, even if only storing latin1 characters. So in this case VARCHAR becomes a much better choice.
I think in your case there is probably no reason to not pick Varchar. It gives you flexibility and as has been mentioned by a number of respondants, performance is such now that except in very specific circumstances us meer mortals (as opposed to Google DBA's) will not notice the difference.
An interesting thing worth noting when it comes to DB Types is the sqlite (a popular mini database with pretty impressive performance) puts everything into the database as a string and types on the fly.
I always use VarChar and usually make it much bigger than I might strickly need. Eg. 50 for Firstname, as you say why not just to be safe.
I would NEVER use chars. I’ve had this debate with many people and they always bring up the tired cliché that char is faster. Well I say, how much faster? What are we talking about here, milliseconds, seconds and if so how many? You’re telling me because someone claims its a few milliseconds faster, we should introduce tons of hard to fix bugs into the system?
So here are some issues you will run into:
Every field will be padded, so you end up with code forever that has RTRIMS everywhere. This is also a huge disk space waste for the longer fields.
Now let’s say you have the quintessential example of a char field of just one character but the field is optional. If somebody passes an empty string to that field it becomes one space. So when another application/process queries it, they get one single space, if they don’t use rtrim. We’ve had xml documents, files and other programs, display just one space, in optional fields and break things.
So now you have to ensure that you’re passing nulls and not empty string, to the char field. But that’s NOT the correct use of null. Here is the use of null. Lets say you get a file from a vendor
Name|Gender|City Bob||Los Angeles
If gender is not specified than you enter Bob, empty string and Los Angeles into the table. Now lets say you get the file and its format changes and gender is no longer included but was in the past.
Well now since gender is not included, I would use null. Varchars support this without issues.
Char on the other hand is different. You always have to send null. If you ever send empty string, you will end up with a field that has spaces in it.
I could go on and on with all the bugs I’ve had to fix from chars and in about 20 years of development.
If you're working with me and you're working with Oracle, I would probably make you use
varchar in almost every circumstance. The assumption that
char uses less processing power than
varchar may be true...for now...but database engines get better over time and this sort of general rule has the making of a future "myth".
Another thing: I have never seen a performance problem because someone decided to go with
varchar. You will make much better use of your time writing good code (fewer calls to the database) and efficient SQL (how do indexes work, how does the optimizer make decisions, why is
exists faster than
Final thought: I have seen all sorts of problems with use of
CHAR, people looking for '' when they should be looking for ' ', or people looking for 'FOO' when they should be looking for 'FOO (bunch of spaces here)', or people not trimming the trailing blanks, or bugs with Powerbuilder adding up to 2000 blanks to the value it returns from an Oracle procedure.
In addition to performance benefits,
CHAR can be used to indicate that all values should be the same length, e.g., a column for U.S. state abbreviations.
It's the classic space versus performance tradeoff.
In MS SQL 2005, Varchar (or NVarchar for lanuagues requiring two bytes per character ie Chinese) are variable length. If you add to the row after it has been written to the hard disk it will locate the data in a non-contigious location to the original row and lead to fragmentation of your data files. This will affect performance.
So, if space is not an issue then Char are better for performance but if you want to keep the database size down then varchars are better.
Many people have pointed out that if you know the exact length of the value using CHAR has some benefits. But while storing US states as CHAR(2) is great today, when you get the message from sales that 'We have just made our first sale to Australia', you are in a world of pain. I always send to overestimate how long I think fields will need to be rather than making an 'exact' guess to cover for future events. VARCHAR will give me more flexibility in this area.
There is a difference between early performance optimization and using a best practice type of rule. If you are creating new tables where you will always have a fixed length field, it makes sense to use CHAR, you should be using it in that case. This isn't early optimization, but rather implementing a rule of thumb (or best practice).
i.e. - If you have a 2 letter state field, use CHAR(2). If you have a field with the actual state names, use VARCHAR.
There is some small processing overhead in calculating the actual needed size for a column value and allocating the space for a Varchar, so if you are definitely sure how long the value will always be, it is better to use Char and avoid the hit.
when using varchar values SQL Server needs an additional 2 bytes per row to store some info about that column whereas if you use char it doesn't need that so unless you