sql - warehouse - when to use surrogate key

What is the difference between a primary key and a surrogate key? (4)

A surrogate key is a made up value with the sole purpose of uniquely identifying a row. Usually, this is represented by an auto incrementing ID.

Example code:

    SurrogateKey INT IDENTITY(1,1) -- A surrogate key that increments automatically

A primary key is the identifying column or set of columns of a table. Can be surrogate key or any other unique combination of columns (for example a compound key). MUST be unique for any row and cannot be NULL.

Example code:

    PrimaryKey INT PRIMARY KEY -- A primary key is just an unique identifier

I googled a lot, but I did not find the exact straight forward answer with an example.

Any example for this would be more helpful.

A surrogate key is typically a numeric value. Within SQL Server, Microsoft allows you to define a column with an identity property to help generate surrogate key values.

The PRIMARY KEY constraint uniquely identifies each record in a database table. Primary keys must contain UNIQUE values. A primary key column cannot contain NULL values. Most tables should have a primary key, and each table can have only ONE primary key.


I think Michelle Poolet describes it in a very clear way:

A surrogate key is an artificially produced value, most often a system-managed, incrementing counter whose values can range from 1 to n, where n represents a table's maximum number of rows. In SQL Server, you create a surrogate key by assigning an identity property to a column that has a number data type.


It usually helps you use a surrogate key when you change a composite key with an identity column.

The primary key is a unique key in your table that you choose that best uniquely identifies a record in the table. All tables should have a primary key, because if you ever need to update or delete a record you need to know how to uniquely identify it.

A surrogate key is an artificially generated key. They're useful when your records essentially have no natural key (such as a Person table, since it's possible for two people born on the same date to have the same name, or records in a log, since it's possible for two events to happen such they they carry the same timestamp). Most often you'll see these implemented as integers in an automatically incrementing field, or as GUIDs that are generated automatically for each record. ID numbers are almost always surrogate keys.

Unlike primary keys, not all tables need surrogate keys, however. If you have a table that lists the states in America, you don't really need an ID number for them. You could use the state abbreviation as a primary key code.

The main advantage of the surrogate key is that they're easy to guarantee as unique. The main disadvantage is that they don't have any meaning. There's no meaning that "28" is Wisconsin, for example, but when you see 'WI' in the State column of your Address table, you know what state you're talking about without needing to look up which state is which in your State table.