July 8, 2018

Login as Service Principal (Application) in Azure Active Directory with JWT

Here's a quick post about how to login as a Service Principal in Azure for non-interactive login scenario.

Say you have an API service, protected by Azure Active Directory (AAD). It's being used by a web application, and it works fine with an interactive login – for users in the AD.

Now you're building a headless application, such as a long-running service, and it also needs to access the APIs. You can create a regular user in AD and use it from the application using username / password authentication, but you know that's not a clean solution. Azure must have something, and they do, in the form of Service Principal. It's also known as an Application, for example, when you're creating a service principal, you're actually registering an application. Like many things in Azure, they seem to love confusing terms.

More info about registering an application can be found in Microsoft's documentation.

Benefits

One benefit of using a service principal is that you can create multiple keys (passwords), and use that from your application. This can be useful if you want to control access to APIs by say different clients of the application.

How many keys would Azure let you create? It doesn't seem to be documented anywhere...

Also, you may not have permission to create a user in the Active Directory (perhaps it's managed by a different team in your company), but you can register an application (as long as the App Registrations option is set in the Active Directory).

High-level Diagram

Here's a quick diagram that I created to illustrate the flow:

Note: Steps 4 and 5 are cached by Azure SDK.

Sample Code

A simple, but functional demo code can be found in my Github repository.

On the Wire

Here's what the request looks like for getting the token from AAD from the client application:

TO URL: https://login.microsoftonline.com/

POST /[YourADTenantName].onmicrosoft.com/oauth2/token HTTP/1.1
Host: login.microsoftonline.com
Content-Type: application/x-www-form-urlencoded
Cache-Control: no-cache

client_id=[YourServicePrincipalID}&client_secret=[OneOfSecretKey]&grant_type=client_credentials&resource=[YourServiceAppId or URI]

Azure Management API

Service Principals can access Azure resources, and can be assigned to roles as well. One scenario where this can be useful is if you need to monitor Azure metrics. A sample can be found in Azure's official repository.

Can you add the Service Principal to AAD Group?

Yes, but not via the Portal yet at the time of writing this post. That should give more flexibility when setting up authorization policies, as you can use group based claims.

June 18, 2018

SQL Server Backup, Transaction Log, and Checkpoint

Should be simple, but it's not...

So, in order to understand SQL Server Backups, there are somethings to know beforehand, such as Transaction Log, Recovery Model, checkpoints, MinLSN, etc...

Recovery Model

I'll point out this first since it's very important: if the database is set to FULL recovery, you must run transaction log backups to clear it out, otherwise it will keep growing! For SIMPLE, it gets truncated automatically after a checkpoint. SQL Server Express by default has SIMPLE recovery model. Other editions, such as Standard, has FULL recovery model as default.

Which recovery model and backup strategy you choose will depend on your RPO (Recovery Point Objective) and RTO (Recovery Time Objective).

Here are different types of recovery models:

  • Simple: Use for simple cases, where you can avoid additional management of the database, such as the transaction log files.
  • Full: Most flexible at the expense of additional management overhead. Gives you point-in-time recovery. You must perform transaction log backups to clear out the space, otherwise it will keep growing.
  • Bulk-logged: Similar to Full, but bulk operations are minimally logged.

The recovery model is not set as part of the CREATE DATABASE. It must be done using the ALTER statement after the database has been created. To change the defaults for new databases, update the model database.

To view the current recovery models of the databases, use the following:

SELECT name, recovery_model_desc  
FROM sys.databases  

What is Transaction Log?

(Imagine if you were writing your own database server. How would you maintain ACID properties? Transaction Log is one way to accomplish that.)

BEGIN TRANSACTION?

No, it's not just a log of when you issue BEGIN TRANSACTION. It's a log of records that SQL needs in order to maintain ACID (Atomic, Consistent, Isolaeted, Durable) properties. This includes implicit transactions.

Informational log file that we can use when we need to debug?

No, it should not be confused with typical log files that an application will create. The Transaction Log is really part of the SQL data, and not an ancillary information. It's not merely a description of what's been done. For example, we might have code that writes to a log file in our applications, such as "INFO: Wrote 1 record to Person table", but for SQL, the log would contain the actual data needed to recover the database, such as "INFO: Wrote 'Riker' to the last name column of Person database where row ID = 1, when it was 'Picard' before."

Write-Ahead Transaction Log

The Transaction Log is implemented using write-ahead logging. This means that log is always written to the disk first before the actual data.

Rollback

How do you think SQL Server does a rollback of a transaction? Transaction Log. It's also used for other recovery scenarios. The official documentation has a good overview.

Tail-Log Backups

A tail-log is the Transaction Log data that hasn't been backed up since the last back up. What's interesting is that it can be backed up even if the database fails to start, or even when the database is damaged. This goes to show again that Transaction Log is as important as the actual data, if not more so.

Checkpoints and Recovery Interval

Checkpoint: Writes data held in memory to disk, and brings data files up-to-date with transaction log. There are several different types you can configure for checkpoints.

The recovery interval specifies the time it would take to bring SQL Server database online after a system crash. There are two places where it can be set, and several types, but generally it's about a minute. It's not guaranteed, so say you had a transaction running for two hours (maybe you're updating millions of rows) and then the server crashes, so you reboot the server, and when the SQL Server starts, it will now need to undo that transaction before making the database available, which most likely would take more than a minute.

As per documentation, Checkpoint performs:

  • Writes to Transaction Log about the checkpoint, such as that it is starting, and the MinLSN (see below).
  • If the database is using SIMPLE recovery model, all spaces before the MinLSN is marked for reuse
  • Writes all dirty log and data pages to disk (note that log is written to disk first, before data).
  • Writes to Transaction Log that the checkpoint has ended.
  • Writes the LSN of the start of this change to the database boot page.

MinLSN

Each record in the Transaction Log has a Log Sequence Number (LSN). MinLSN is the Minimum Recovery LSN. It is the minimum of the:

  • LSN of the start of the checkpoint.
  • LSN of the start of the oldest active transaction
  • LSN of the start of the oldest replication transaction that has not yet been delivered to the distribution database.

SQL Backup Files

With the complexities of backups, SQL Server has features to support and manage multiple devices and options (such as striped, mirrored) for storing the actual backup data.

At the highest level is the Media Set. In each Media Set, you can have multiple Backup Sets, which can be from same or different databases. Backup Sets are usually appended in the Media Set, or can overwrite existing ones.

Is that all?

No, there are so many other topics related to backups, such as Log Shipping, Always on Availability, Mirroring, Replication, Clusters, Snapshots, Azure Blob Backups, etc...

Is the database usable during the backup?

This is the one last thing that I was curious about, and yes, database is usable while a backup is being performed, but obviously it may suffer performance wise, and you can't do database file related operations.

Additional Resources:

June 2, 2018

C# Null-Conditional and IL

So I was writing some code like the following:

if (service != null)
{
    service.Close();
}

Then I remembered, I can use Null-Conditional (yes, I know it's been out for a while with C# 6.0, hard to break force-of-habit):

service?.Close();

Clean and simple, but I wondered, is it really the same? So I fired up sharplab.io, and confirmed it's the same...or is it?

Let's go one step further. Fired up dotnetfiddle.net and checked the disassembled IL (my comments below on IL were added after skimming through Getting Started with IL documentation and some other sources).

Using null-conditional:

...
05. public static void Main()
06. {
07.     Service s = new Service();
08. 
09.     s?.Close();
10. }
...

IL:

...
  .method public hidebysig static void  Main() cil managed
  {
    // 
    .maxstack  1
    .locals init ([0] class Service s) // Create local variable '0' of type Service.
    .language '{3F5162F8-07C6-11D3-9053-00C04FA302A1}', '{994B45C4-E6E9-11D2-903F-00C04FA302A1}', '{5A869D0B-6611-11D3-BD2A-0000F80849BD}'
    .line 6,6 : 2,3 ''
    IL_0000:  nop
    .line 7,7 : 3,29 ''
    IL_0001:  newobj     instance void Service::.ctor() // Create new object, reference pushed to stack.
    IL_0006:  stloc.0    // Store to Local: Pop stack into local variable '0'.
    .line 9,9 : 9,20 ''
    IL_0007:  ldloc.0    // Load from Local: Push local variable '0' to stack.
    IL_0008:  brtrue.s   IL_000c  // If top of the stack is not zero, branch to IL_000c, pop stack.

    IL_000a:  br.s       IL_0013  // Branch to IL_0013

    IL_000c:  ldloc.0    // Load from Local: Push local variable '0' to stack.
    IL_000d:  call       instance void Service::Close()
    IL_0012:  nop
    .line 10,10 : 2,3 ''
    IL_0013:  ret        // Return.
  } // end of method Program::Main
...

Using if statement:

...
05. public static void Main()
06. {
07.     Service s = new Service();
08. 
09.     if (s != null)
10.     {
11.         s.Close();
12.     }
13. }
...

IL:

...
  .method public hidebysig static void  Main() cil managed
  {
    // 
    .maxstack  2
    .locals init ([0] class Service s, // Create local variable '0' of type Service.
             [1] bool V_1)
    .language '{3F5162F8-07C6-11D3-9053-00C04FA302A1}', '{994B45C4-E6E9-11D2-903F-00C04FA302A1}', '{5A869D0B-6611-11D3-BD2A-0000F80849BD}'
    .line 6,6 : 2,3 ''
    IL_0000:  nop
    .line 7,7 : 3,29 ''
    IL_0001:  newobj     instance void Service::.ctor() // Create new object, reference pushed to stack.
    IL_0006:  stloc.0    // Store to Local: Pop stack into local variable '0'.
    .line 9,9 : 9,23 ''
    IL_0007:  ldloc.0    // Load from Local: Push local variable '0' to stack.
    IL_0008:  ldnull     // Push null to stack.
    IL_0009:  cgt.un     // Pop two items.  If first (reference to service) is greater
                         //   than second (null), then push 1 to stack, otherwise push 0 to stack.
    IL_000b:  stloc.1    // Pop stack into local variable '1'
    .line 16707566,16707566 : 0,0 ''
    IL_000c:  ldloc.1    // Push local variable '1' to stack.
    IL_000d:  brfalse.s  IL_0018  // If top of the stack is 0, then branch to IL_0018

    .line 10,10 : 3,4 ''
    IL_000f:  nop
    .line 11,11 : 10,20 ''
    IL_0010:  ldloc.0    // Load from Local: Push local variable '0' to stack.
    IL_0011:  callvirt   instance void Service::Close()
    IL_0016:  nop
    .line 12,12 : 3,4 ''
    IL_0017:  nop
    .line 13,13 : 2,3 ''
    IL_0018:  ret        // Return.
  } // end of method Program::Main
...

So, even though Roslyn says they are the same, there are some differences at the IL level. I'm no expert at IL, but it looks like null-conditional has some optimizations — it's not actually comparing the object to null, instead branching directly if the object reference ("pointer") is not zero.

Does this mean that Roslyn result from sharplab.io is not correct? Hmm... One more thing to try - created a DLL of the code instead of using the online tools, and used dotPeek to see how it reconstructs the C# code from the compiled DLL, and the result is, as I suspected it, same as what IL is showing, i.e., it knows about the null-conditional operator.

I suppose I could investigate further on how sharplab.io is built or how Roslyn works under the hood, but good enough for now.

One additional note — just because it's doing a null check, it doesn't mean it's completely safe — For some objects, when you call methods such as close(), it will throw an exception if it's disposed already.

May 17, 2018

Implementing a Queue using SQL Server

Sometimes an application needs a queue backed by a data store (e.g., not a transient in-memory data structure). One such scenario might come up if two applications running on different servers need to coordinate with each other via the queue. If the applications are using a common database already, an obvious choice might be to use a database table as the data store for the queue. Intuitively, this sounds simple, and it mostly is, but whenever multiple processes are trying to update and read from the same resource, caution needs to be taken, such as avoiding deadlock issues and ensuring items are dequeued only once.

While researching this topic, came across an old, but excellent article by Remus Rusanu, which this post is based on.

No duplicates, please

One thing we want to make sure is that when we dequeue an item, that it must be done only once. We don't want to allow two separate processes to dequeue the same item (unless of course, that's a desired requirement). Now, we can use traditional transactions, or sub queries, but with newer versions of SQL Server (2008 and up), we can use Common Table Expressions with OUTPUT parameter. The gist of it is that the dequeue operation is done as a single atomic unit. The execution plan is also simpler compared to the one using the sub-query.

Delete or Update?

The example in the article deletes the row from the queue when dequeuing. If you need to update instead of delete, here's an example:

-- Create a queue table, where each item will be updated instead of being deleted after a dequeue operation.
CREATE TABLE QueueForUpdateStrategy (
    Id BIGINT NOT NULL IDENTITY(1,1),
    CreateDateTime DATETIME NOT NULL DEFAULT GETDATE(),
    UpdateDateTime DATETIME NOT NULL DEFAULT GETDATE(),
    Payload NVARCHAR(500),
    IsDequeued BIT NOT NULL DEFAULT 0 -- Oh, SQL Server/TSQL, why no love for boolean?
);
GO

CREATE CLUSTERED INDEX cdxQueueForUpdateStrategy ON QueueForUpdateStrategy (Id);
GO


-- Stored procedure to enqueue an item
CREATE PROCEDURE usp_Enqueue
    @Payload NVARCHAR(500)
AS
    SET NOCOUNT ON
    INSERT QueueForUpdateStrategy (Payload) VALUES (@Payload)
GO

-- Stored procedure to dequeue an item, using UPDATE.
CREATE PROCEDURE usp_Dequeue
AS
    SET NOCOUNT ON
    ;WITH QueueCTE AS (
        SELECT Top(1) *
        FROM QueueForUpdateStrategy WITH (ROWLOCK, READPAST)
        WHERE IsDequeued = 0
        ORDER BY ID
    )
    UPDATE QueueCTE SET IsDequeued = 1, UpdateDateTime = GetDate()
    OUTPUT
        DELETED.Payload,
        DELETED.CreateDateTime,
        INSERTED.UpdateDateTime
GO



-- Insert some items to the queue
EXEC usp_Enqueue 'Test A'
EXEC usp_Enqueue 'Test B'
EXEC usp_Enqueue 'Test C'


-- Dequeue an item
EXEC usp_Dequeue

Curious why there's ; before the WITH? It's because T-SQL has other uses for the WITH keyword, such as when specifying hints, so adding a semicolon ensures that the prior statement is terminated and avoids unnecessary errors.

Obviously, with this method the table will keep growing, so we'll need to clean up at some point...

Sub-Query

If we didn't have CTE, we might have considered using a sub-query, such as:

UPDATE QueueForUpdateStrategy SET IsDequeued = 1, UpdateDateTime = GETDATE()
OUTPUT DELETED.Payload, DELETED.CreateDateTime, DELETED.UpdateDateTime, INSERTED.UpdateDateTime
WHERE ID = (
    SELECT Top(1) ID FROM QueueForUpdateStrategy
    WHERE IsDequeued = 0
    ORDER BY ID
)

Execution plan compared:

Looks reasonable, but is it atomic? Probably, in most cases.

Verifying Locks Obtained

I tried to look into what/when/how locks are obtained for the Common Table Expression vs. Sub-Query, but with the limited time and my limits of SQL Server knowledge, couldn't really confirm. The estimated execution plan does not show what locks are used, though it looks like there are other ways to extract it. A lot of articles and techniques are about finding out why a process is taking a long time due to waiting for a lock, more relevant for production environments. I also tried running SQL Trace with all of the lock events turned on, but didn't have enough time to research the trace results.

SQL Server has built-in Queue!

Note that SQL Server has a concept of built-in Queue already, but its intended purpose is to be used by the SQL Server Service Broker, and only applies in a specific use case for using the Database Engine components to communicate between separate databases. See the article for more information.

Message Queueing

Depending on requirements, it might be better to use a dedicated message queuing service, such as RabbitMQ, Microsoft MQ Server (MSMQ), IBM WebSphere MQ, etc.

Why do it yourself?

Better yet, just use cloud PaaS, such as Amazon SQS.

March 18, 2018

HTML Attributes vs. DOM Properties

Found a good summary of HTML Attributes vs. DOM Properties in Angular documentation:

Attributes are defined by HTML. Properties are defined by the DOM (Document Object Model).
  • A few HTML attributes have 1:1 mapping to properties. id is one example.
  • Some HTML attributes don't have corresponding properties. colspan is one example.
  • Some DOM properties don't have corresponding attributes. textContent is one example.
  • Many HTML attributes appear to map to properties ... but not in the way you might think! [such as disabled attribute]

Some more info on MDN's Content vs. IDL Attributes, and W3C Web IDL spec.

Additional notes:

  • Attributes are initial values. I wonder if that's how HTML form's reset functionality was implemented.
  • Aria attributes (used for accessibility) do not have corresponding properties.
  • disabled attribute, doesn't have a value, so the mere presence of the attribute sets it disabled. However, the DOM property for disabled has a boolean value.
  • JavaScript/DOM has methods to specifically work on attributes, e.g., getAttribute().

March 17, 2018

HTML Specification

Why in the world are there two HTML specs?

This Wired article has some information on it. Basically, it seems WHATWG will be the new stuff that browsers add, then trickle down to W3C as a snapshot specification.

March 15, 2018

IDE Fonts

Courier New, anyone?

When I started coding in Windows, I liked the font in Turbo C++ a lot. Then when Andale Mono came out, I switched to that in VS from the default. Then to Consolas when that came out. Then, I used Input Font, which is not a mono spaced font. I can cram more text on the screen and it was still pretty enough and legible. I liked it except in cases where I had to line up multi-line assignments. It did have a little quirk when installing on Windows – it's documented on their site.

Then, as I got to learn React, noticed a peculiar font that had italics for the attribute names. Dan Abramov was using it. Afterwards, I noticed John Papa was also using it in VS Code for Angular. The font is called Operator Mono. So I googled it, and found it to be a bit pricey... But I found an alternative that seems to accomplish the similar effect for free with Fira Code font and Script12 BT font. Fira Code also has different ligatures for common programming operators, though your editor needs to support ligatures (VS Code and Atom both do).

Through this Github discussion, found a font that already combines Fira Code and Script12. So I installed that, and also a theme that supports italics. Here's what it looks like in my VS Code:

Not sure if I like the italics actually, will have to try it out for a bit.

Epilogue

I don't remember the name of the font that was used in Turbo C++ for Windows, so I tried to search for it. Alas, Google is unable to find it. The closest I came to is OCR-A, which seems similar enough, and perhaps it might actually be the font that was used, though being purposefully made for OCR, probably not...

March 7, 2018

npm Install

I've been using npm without giving much thought... Here's a quick post on a couple of options that I wanted to know more about.

npm install [package name] -g (or --global)

  • Installs a package in global mode, and not under your local project/package. For Windows, %APPDATA%\npm (e.g., C:\Users\[Username]\AppData\Roaming\npm). Note that this is not where Node.js and npm is installed, that's usually C:\Program Files\nodejs.
  • It's mainly for packages with executables, as they will become available from everywhere, since above directory is added to the PATH when you install Node.js (when installed with default options).
  • It does not modify package.json of your current project.

An example – let's install Angular CLI:

  • npm install @angular/cli -g

After running the command, in the directory mentioned above:

  • ng (for bash) and ng.cmd (for Windows) will be added, so you can run ng from any command line window.
  • node_modules\@angular will be added.

npm install [package name] --save-dev (or -D)

The package will appear under devDependencies. This means that when you run npm install in production mode, the package will not be installed.

devDependencies are things you need during development and not necessarily in production, such as a unit testing framework.

You might come across npm --save. This has been deprecated, since it's the default option now – npm will add the package under the dependencies section in addition to installing it under node_modules.

How do you run npm install in production mode?

  • npm install --production
  • If NODE_ENV is set to production, then simply npm install
  • To force, regardless of NODE_ENV variable, then npm --only=prod

You can't have a package in both devDependencies and dependencies.

Production mode is more important for server-side apps, i.e., using Node.js. For front-end, if you're just publishing the files after running through a package manager such as WebPack, probably should still be okay to use them as intended, though may need further investigation...

Some other tidbits

  • If you run npm install, npm will traverse up to find the root of your project, i.e., the directory with package.json, and install under node_modules from the directory accordingly. If package.json is not found, it will just use the current directory as root.
  • In npm lingo, prefix is the root directory of your project, or the global npm directory if -g is used.

Screenshot of Node.js installation step where you specify that npm global directory should be added to the path.

See also: npm docs - npm-folders, npm-install.

February 27, 2018

Touch Command on Windows

Windows still doesn't include touch command... A quick search on Google brings up several posts in StackOverflow, and PowerShell seems to be the best option. With Microsoft now letting you run Linux natively on Windows, perhaps it will never be included in Windows itself.

Examples:

PS C:\> $(Get-Item ".\UseCurrentDateTime.txt").LastWriteTime=$(Get-Date)
PS C:\> $(Get-Item ".\UseSpecificDateTime.txt").LastWriteTime=$(Get-Date "2/26/2018 9:00 PM")

Use the tab key for auto-completion.

Other properties: CreationTime, LastAccessTime