Should be simple, but it's not...
So, in order to understand SQL Server Backups, there are somethings to know beforehand, such as Transaction Log, Recovery Model, checkpoints, MinLSN, etc...
Recovery Model
I'll point out this first since it's very important: if the database is set to FULL
recovery, you must run transaction log backups to clear it out, otherwise it will keep growing! For SIMPLE
, it gets truncated automatically after a checkpoint. SQL Server Express by default has SIMPLE
recovery model. Other editions, such as Standard, has FULL
recovery model as default.
Which recovery model and backup strategy you choose will depend on your RPO (Recovery Point Objective) and RTO (Recovery Time Objective).
Here are different types of recovery models:
- Simple: Use for simple cases, where you can avoid additional management of the database, such as the transaction log files.
- Full: Most flexible at the expense of additional management overhead. Gives you point-in-time recovery. You must perform transaction log backups to clear out the space, otherwise it will keep growing.
- Bulk-logged: Similar to Full, but bulk operations are minimally logged.
The recovery model is not set as part of the CREATE DATABASE
. It must be done using the ALTER
statement after the database has been created. To change the defaults for new databases, update the model
database.
To view the current recovery models of the databases, use the following:
SELECT name, recovery_model_desc
FROM sys.databases
What is Transaction Log?
(Imagine if you were writing your own database server. How would you maintain ACID properties? Transaction Log is one way to accomplish that.)
BEGIN TRANSACTION?
No, it's not just a log of when you issue BEGIN TRANSACTION
. It's a log of records that SQL needs in order to maintain ACID (Atomic, Consistent, Isolaeted, Durable) properties. This includes implicit transactions.
Informational log file that we can use when we need to debug?
No, it should not be confused with typical log files that an application will create. The Transaction Log is really part of the SQL data, and not an ancillary information. It's not merely a description of what's been done. For example, we might have code that writes to a log file in our applications, such as "INFO: Wrote 1 record to Person table", but for SQL, the log would contain the actual data needed to recover the database, such as "INFO: Wrote 'Riker' to the last name column of Person database where row ID = 1, when it was 'Picard' before."
Write-Ahead Transaction Log
The Transaction Log is implemented using write-ahead logging. This means that log is always written to the disk first before the actual data.
Rollback
How do you think SQL Server does a rollback of a transaction? Transaction Log. It's also used for other recovery scenarios. The official documentation has a good overview.
Tail-Log Backups
A tail-log is the Transaction Log data that hasn't been backed up since the last back up. What's interesting is that it can be backed up even if the database fails to start, or even when the database is damaged. This goes to show again that Transaction Log is as important as the actual data, if not more so.
Checkpoints and Recovery Interval
Checkpoint: Writes data held in memory to disk, and brings data files up-to-date with transaction log. There are several different types you can configure for checkpoints.
The recovery interval specifies the time it would take to bring SQL Server database online after a system crash. There are two places where it can be set, and several types, but generally it's about a minute. It's not guaranteed, so say you had a transaction running for two hours (maybe you're updating millions of rows) and then the server crashes, so you reboot the server, and when the SQL Server starts, it will now need to undo that transaction before making the database available, which most likely would take more than a minute.
As per documentation, Checkpoint performs:
- Writes to Transaction Log about the checkpoint, such as that it is starting, and the
MinLSN
(see below).
- If the database is using
SIMPLE
recovery model, all spaces before the MinLSN
is marked for reuse
- Writes all dirty log and data pages to disk (note that log is written to disk first, before data).
- Writes to Transaction Log that the checkpoint has ended.
- Writes the
LSN
of the start of this change to the database boot page.
MinLSN
Each record in the Transaction Log has a Log Sequence Number (LSN). MinLSN is the Minimum Recovery LSN. It is the minimum of the:
- LSN of the start of the checkpoint.
- LSN of the start of the oldest active transaction
- LSN of the start of the oldest replication transaction that has not yet been delivered to the distribution database.
SQL Backup Files
With the complexities of backups, SQL Server has features to support and manage multiple devices and options (such as striped, mirrored) for storing the actual backup data.
At the highest level is the Media Set. In each Media Set, you can have multiple Backup Sets, which can be from same or different databases. Backup Sets are usually appended in the Media Set, or can overwrite existing ones.
Is that all?
No, there are so many other topics related to backups, such as Log Shipping, Always on Availability, Mirroring, Replication, Clusters, Snapshots, Azure Blob Backups, etc...
Is the database usable during the backup?
This is the one last thing that I was curious about, and yes, database is usable while a backup is being performed, but obviously it may suffer performance wise, and you can't do database file related operations.
Additional Resources: