May 17, 2018

Implementing a Queue using SQL Server

Sometimes an application needs a queue backed by a data store (e.g., not a transient in-memory data structure). One such scenario might come up if two applications running on different servers need to coordinate with each other via the queue. If the applications are using a common database already, an obvious choice might be to use a database table as the data store for the queue. Intuitively, this sounds simple, and it mostly is, but whenever multiple processes are trying to update and read from the same resource, caution needs to be taken, such as avoiding deadlock issues and ensuring items are dequeued only once.

While researching this topic, came across an old, but excellent article by Remus Rusanu, which this post is based on.

No duplicates, please

One thing we want to make sure is that when we dequeue an item, that it must be done only once. We don't want to allow two separate processes to dequeue the same item (unless of course, that's a desired requirement). Now, we can use traditional transactions, or sub queries, but with newer versions of SQL Server (2008 and up), we can use Common Table Expressions with OUTPUT parameter. The gist of it is that the dequeue operation is done as a single atomic unit. The execution plan is also simpler compared to the one using the sub-query.

Delete or Update?

The example in the article deletes the row from the queue when dequeuing. If you need to update instead of delete, here's an example:

-- Create a queue table, where each item will be updated instead of being deleted after a dequeue operation.
CREATE TABLE QueueForUpdateStrategy (
    Id BIGINT NOT NULL IDENTITY(1,1),
    CreateDateTime DATETIME NOT NULL DEFAULT GETDATE(),
    UpdateDateTime DATETIME NOT NULL DEFAULT GETDATE(),
    Payload NVARCHAR(500),
    IsDequeued BIT NOT NULL DEFAULT 0 -- Oh, SQL Server/TSQL, why no love for boolean?
);
GO

CREATE CLUSTERED INDEX cdxQueueForUpdateStrategy ON QueueForUpdateStrategy (Id);
GO


-- Stored procedure to enqueue an item
CREATE PROCEDURE usp_Enqueue
    @Payload NVARCHAR(500)
AS
    SET NOCOUNT ON
    INSERT QueueForUpdateStrategy (Payload) VALUES (@Payload)
GO

-- Stored procedure to dequeue an item, using UPDATE.
CREATE PROCEDURE usp_Dequeue
AS
    SET NOCOUNT ON
    ;WITH QueueCTE AS (
        SELECT Top(1) *
        FROM QueueForUpdateStrategy WITH (ROWLOCK, READPAST)
        WHERE IsDequeued = 0
        ORDER BY ID
    )
    UPDATE QueueCTE SET IsDequeued = 1, UpdateDateTime = GetDate()
    OUTPUT
        DELETED.Payload,
        DELETED.CreateDateTime,
        INSERTED.UpdateDateTime
GO



-- Insert some items to the queue
EXEC usp_Enqueue 'Test A'
EXEC usp_Enqueue 'Test B'
EXEC usp_Enqueue 'Test C'


-- Dequeue an item
EXEC usp_Dequeue

Curious why there's ; before the WITH? It's because T-SQL has other uses for the WITH keyword, such as when specifying hints, so adding a semicolon ensures that the prior statement is terminated and avoids unnecessary errors.

Obviously, with this method the table will keep growing, so we'll need to clean up at some point...

Sub-Query

If we didn't have CTE, we might have considered using a sub-query, such as:

UPDATE QueueForUpdateStrategy SET IsDequeued = 1, UpdateDateTime = GETDATE()
OUTPUT DELETED.Payload, DELETED.CreateDateTime, DELETED.UpdateDateTime, INSERTED.UpdateDateTime
WHERE ID = (
    SELECT Top(1) ID FROM QueueForUpdateStrategy
    WHERE IsDequeued = 0
    ORDER BY ID
)

Execution plan compared:

Looks reasonable, but is it atomic? Probably, in most cases.

Verifying Locks Obtained

I tried to look into what/when/how locks are obtained for the Common Table Expression vs. Sub-Query, but with the limited time and my limits of SQL Server knowledge, couldn't really confirm. The estimated execution plan does not show what locks are used, though it looks like there are other ways to extract it. A lot of articles and techniques are about finding out why a process is taking a long time due to waiting for a lock, more relevant for production environments. I also tried running SQL Trace with all of the lock events turned on, but didn't have enough time to research the trace results.

SQL Server has built-in Queue!

Note that SQL Server has a concept of built-in Queue already, but its intended purpose is to be used by the SQL Server Service Broker, and only applies in a specific use case for using the Database Engine components to communicate between separate databases. See the article for more information.

Message Queueing

Depending on requirements, it might be better to use a dedicated message queuing service, such as RabbitMQ, Microsoft MQ Server (MSMQ), IBM WebSphere MQ, etc.

Why do it yourself?

Better yet, just use cloud PaaS, such as Amazon SQS.