December 15, 2020

Getting Repository Size in GitHub Enterprise

For some reason, GitHub doesn't make it easy to see the repository size on its website. For personal accounts, you can go to Settings (no, not the Repository Settings, but your user settings) then click on Repositories from the left navigation menu. This will list out the repositories and their corresponding sizes.

For GitHub Enterprise, there doesn't seem to be a similar list of repositories with their sizes. One way to get the repository size is to use the GitHub Repository API. First, unless your repository is public, create a Personal Access Token (PAT) with repo permission that will be used for the API. Then, use Postman, Insomnia, curl, etc., to call the API for your repository using the following GET request format and the authorization header:

https://api.github.com/repos/[Organization Name]/[Repository Name]

authorization: token [Personal Access Token]

Here's an example from one of my repositories – note the size property:

Comparing the response and my repository list, seems like the unit of the size in KB.

If you're allowed to install Chrome Extensions, you can also try github-repo-size.

What about LFS? Well, you can look at it at the organization level as a whole in the Billing setting, but at the repository level, you're out of luck, though there's an open feature request for it.

Of course, there are other ways to get repository and LFS sizes after you clone and such, but it would've been nice if GitHub made it easier to see them in the first place. Here's Bitbucket, in the main page for the repository settings:

(Note that there are two hosting options for GitHub Enterprise, on the cloud and on-prem. This post is about the cloud-hosted plan).

December 10, 2020

Remap Caps Lock Key as Backspace Key

Here's a quick trick that I've been using for a while now to alleviate my wrist pain (which I suppose is an occupational hazard) – remapping the Caps Lock key as the Backspace key so that I don't have to extend my pinky finger too far. It maybe also be more efficient, since it's quicker to press the Caps Lock key than the Backspace key.

I've been using remapkey.exe from Windows Resource Kit 2003 (which still works on Windows 10), but sadly it's no longer available for download from Microsoft.

There are many tools to remap keys on Windows. The recent resurrection of PowerToys now includes Keyboard Manager that allows remapping keys, so that might be a good choice to try.

September 12, 2020

Git Bash Prompt Showing Date and Time on Windows

Sometimes I have Git Bash open for several days, and knowing when I executed a particular command is helpful. Here's a quick-and-dirty way of showing the date and time in the bash prompt, which came about from looking at C:\Program Files\Git\etc\profile.d\git-prompt.sh and doing echo $PS1.

  • Copy the following content into a text editor and save it as .profile in the %USERPROFILE% folder:
PS1="\[\033]0;$TITLEPREFIX:$PWD\007\]\n\[\033[94m\]\D{%m/%d %H:%M:%S} \[\033[32m\]\u@\h \[\033[35m\]$MSYSTEM \[\033[33m\]\w\[\033[36m\]"'`__git_ps1`'"\[\033[0m\]\n$ "
MSYS2_PS1="$PS1"
  • Run source .profile to reload the prompt.

For the full list of configurable colors, refer to ANSI Color Escape Codes.

So how did I know to create the .profile file? There's a .bash_profile file in the %USERPROFILE% folder:

# generated by Git for Windows
test -f ~/.profile && . ~/.profile
test -f ~/.bashrc && . ~/.bashrc

Windows Terminal

Lately, I've been trying out Windows Terminal with oh-my-posh for PowerShell (based on the post by Scott Hanselman). You can also host cmd, bash (via wsl), etc., in Windows Terminal... I fear my days of using plain cmd and Git Bash on Windows will come to an end soon, though I'll surely miss the start up time of cmd.

August 23, 2020

SVN to Git Migration

Recently, we decided to migrate one of our "legacy" product in SVN repository to git that will be hosted on Bitbucket. It had to have the full history maintained. While researching this topic, I was surprised to find out that git has built-in support for SVN!

Steps

Note that I'm only migrating the trunk in this post, but you can also migrate the full SVN structure including branches and tags.

  1. Get the list of users that have committed to SVN (from PowerShell): PS C:\MySVNRepo> svn log --quiet | ? { $_ -notlike '-*' } | % { "{0} = {0} <{0}>" -f ($_ -split ' \| ')[1] } | Select-Object -Unique | Out-File 'authors-transform.txt' -Encoding utf8
  2. Open the authors-transform.txt generated from above and add the email address of the users. This step is actually optional, it's because git includes email addresses in commits and will help map out the users later when pushed to remote.
  3. "Clone" the SVN repo as a git repo using the git svn clone command, e.g. in bash: git svn clone http://<svn repo URL>/trunk --prefix=svn/ --no-metadata --authors-file "authors-transform.txt" /c/repos/migrated-repo --username <svn user name>. (It will prompt for the SVN password.)
  4. Create an empty repository on the git server, such as Azure Repos/Bitbucket/GitHub, and get the clone URL.
  5. Add the clone URL as origin in the git repo: git remote add origin <clone URL>
  6. Push to the git server: git push -u origin master

If you were to use LFS, you can run LFS migrate between steps 3 and 4.

Migration..?

Now, as you might know already, when you hear the word "migration", it never goes smoothly. Below are some of the issues that I had to deal with.

Git 2.27.0

So at the time, I was using git 2.27.0, and of course, git svn is broken in that release. Used 2.28 RC since it wasn't officially out yet. I suppose I could've downgraded as well.

Author Not Defined

In the middle of the process, it kicked out with a message, Author: [user] not defined in authors-transform.txt file. The [user] was the first user listed in the file. I've used the -Encoding utf8 option in the PowerShell command, and git didn't like the BOM. So opened the file in Notepad++ and converted to UTF-8 without BOM, and that did the trick.

Hanging, Timeouts and Other Errors

The SVN repo had a lot of revisions, riddled with large binaries. After getting some revisions, it hanged – no activity for a while. In the Task Manager, perl was taking up about 50% of the CPU, and the folder was not growing. So after canceling the command, cd'ed into the folder and ran git svn fetch --authors-file "../authors-transform.txt" as per this article, and that seemed to have made it continue from where it left off – it detected that the last retrieved revision was not complete, and started over again from that revision. All errors below were resolved the same way:

  • Connection timed out: Connection timed out at C:/Program Files/Git/mingw64/share/perl5/Git/SVN/Ra.pm line 312
  • 1 [main] perl 44975 cygwin_exception: Dumping stack trace to perl.exe.stackdump
  • Failed to commit, invalid old:
  • Name or service not known at C:/Program Files/Git/mingw64/share/per15/Git/SVN/Ra.pm line 312.
    (This can happen if your DNS goes down, yes, it happened.)
Checksum mismatch

This was a tricky one, and it occurred on random files. After many trials, I was able to avoid this error by running the git svn clone on the SVN server machine itself, e.g.:

git svn clone file:///c/SVNData/MyProject/trunk --prefix=svn/ --no-metadata --authors-file "authors-transform.txt" /c/gitrepo/my-project

I really don't know what the root cause is, maybe our internal network is unstable, or the SVN webserver (CollabNet) had some issues. It was a large SVN repo though, took about 24 hours to complete.

New Changes in SVN

For this migration, I didn't have to worry about two way support, since we were planning to retire the SVN after migrating to git. But I did have to get some new changes from SVN after the initial migration to git.

I used git svn fetch as above to get the latest SVN changes into git. This is not enough, though, since now I have the master branch, so I needed to bring in those changes into the master branch. This SO answer seems incorrect – running git svn rebase -l gave the following error message after running for a while:

$ git svn rebase -l
Unable to determine upstream SVN information from working tree history

Just simply doing git merge remotes/svn/git-svn worked.

LFS

If you run LFS migrate, you may no longer be able to run git svn fetch again and do a merge to master to bring it up to the latest:

$ git merge remotes/svn/git-svn
fatal: refusing to merge unrelated histories

LFS migrate rewrites commits, hence new hashes will be created for commits and git will think the master branch and the svn remote branch are completely separate since they won't share a common ancestor (remember, LFS rewrites the very first commit to add .gitattributes file). There are ways around it, such as specifying --allow-unreleated-histories option, but it may get ugly since all commits are technically different, and git will warn you about merging binaries as well:

$ git merge remotes/svn/git-svn --allow-unrelated-histories
warning: Cannot merge binary files: Libraries/MyLibrary.dll (HEAD vs. remotes/svn/git-svn)
warning: Cannot merge binary files: Libraries/MyLibrary.exe (HEAD vs. remotes/svn/git-svn)
...
...

One way to handle this would be by making backups before each step, and then you can go back to the point right before you ran LFS migrate, run git svn fetch, merge to master, then run LFS migrate again.

August 18, 2020

More on Git LFS

This is a follow up to the Git LFS Basics post – some additional notes on LFS.

Partial Clone

If the storage space is not a concern (e.g., Bitbucket's 2 GB hard limit), then partial clone and sparse checkout may replace the need for LFS. They are still in early stages, though.

Case Sensitivity

Windows file system is case insensitive, but git is case sensitive, so there may be problems when specifying the tracking pattern. According to this open issue (which was based on an older issue), you can use regex patterns, e.g., git lfs migrate import --include="*.[dD][lL][lL], *.[eE][xX][eE], [Bb]in/". Specifying it as "*.dll, *.DLL" doesn't work as expected.

LFS File Size Report

LFS has a built-in feature that will go through the history and report on the file types and sizes.

$ git lfs migrate info
migrate: Fetching remote refs: ..., done.
migrate: Sorting commits: ..., done.
migrate: Examining commits: 100% (1681/1681), done.
*.dll   5.6 GB  2013/2013 files(s)      100%
*.exe   3.6 GB    546/546 files(s)      100%
*.dat   1.9 GB        2/2 files(s)      100%
*.zip   1.7 GB      16/16 files(s)      100%
*.war   595 MB      17/17 files(s)      100%

The thing is, it defaults to showing only the top five. To show more, use the --top option, e.g., git lfs migrate info --top=100

Create .gitattributes Before or After Migrate?

Per migrate import documentation, it will create .gitattributes for you (except on certain cases based on options passed in). One thing it doesn't tell you is where it's added - it adds it to the very first commit in the history – it rewrites the very first commit (which is not surprising since rewriting history is one of the main tasks for LFS).

Find Lingering Large Files After the Migration

After migrating to LFS, if you find that the repo is still too large, you may want to run git lfs migrate info again, but it won't show any files. Instead, you can use git ls-tree to find large files in the repository. For example: git ls-tree -r -l --abbrev --full-name HEAD | sort -n -r -k 4 | head -n 10

Viewing Differences

There's no git lfs diff, and git log will show differences in "pointer" files, not the content. But in some cases, it might be useful to be able to view the differences. One way to do it is by using the external diff tool, e.g., git difftool HEAD^ HEAD Document.pdf (assuming you have difftool already configured). Note that in Bitbucket, if you browse to the source file tracked as LFS, it won't let you view the differences in the UI even if it's a "text" file, it just allows you to download the file.

July 19, 2020

Git LFS Basics

In most cases, it's not the best practice to store large files in git. It's designed to store source code, not large files. However, for some legacy projects and processes where you don't want to invest a lot time and effort, or perhaps a game project with large assets, or an ML project with a large set of training data, or you're using Bitbucket Cloud which has a hard 2 GB repo size limit, it may be necessary to do so, and Git LFS allows you to "store" large files with whatever git workflow you're using, without git's inherent "inefficiencies" with large files.

Beware that if you decide to go with LFS, you'll lose some distributed-ness of git, since the content of some files have been moved to the LFS server and not part of the repo anymore, and when you clone the repo, it will only pull down the files that's needed for checking out the master branch. Be sure to have a backup policy in place.

Here's a high level animation of how Git LFS works:

As for what LFS does, the man page explains it pretty well:

$ git lfs
Git LFS is a system for managing and versioning large files in
association with a Git repository.  Instead of storing the large files
within the Git repository as blobs, Git LFS stores special "pointer
files" in the repository, while storing the actual file contents on a
Git LFS server.  The contents of the large file are downloaded
automatically when needed, for example when a Git branch containing
the large file is checked out.
...
...

Let's go over some of the basics of Git LFS by doing the same operation with, and without, LFS.

Installation

If you have an older version of git and can't upgrade for some reason, you'll need to install Git LFS first. If you're on Windows, as of version 2.12.0, Git LFS is already bundled with Git for Windows.

Initialize and Configure Git LFS

Let's start from two brand new repos in the remote (such as Azure Repos, Bitbucket, GitHub), and clone them into local folders. In the first repo, we'll enable LFS. In the second repo, we'll leave it alone so that we can compare normal git with LFS.

In the first local repo, let's initialize and configure Git LFS:

$ git lfs install
Updated git hooks.
Git LFS initialized.

As you can see from the message, it uses git hooks to perform LFS operations. It modifies 4 hook files – post-checkout, post-commit, post-merge, and pre-push. Here's what pre-push looks like:

#!/bin/sh
command -v git-lfs >/dev/null 2>&1 || { echo >&2 "\nThis repository is configured for Git LFS but 'git-lfs' was not found on your path. If you no longer wish to use Git LFS, remove this hook by deleting .git/hooks/pre-push.\n"; exit 2; }
git lfs pre-push "$@"

Next, let's assume we have large DLL files that we need to keep in our repo, so we'll tell LFS to handle, or "track", DLL files.

$ git lfs track '*.dll'
Tracking "*.dll"

(Note that we're using quotes around *.dll to prevent the shell from expanding the actual files matching the pattern.)

Above will create .gitattributes with the following content:

$ cat .gitattributes
*.dll filter=lfs diff=lfs merge=lfs -text

According to Git LFS documentation, the pattern follows gitignore rules.

Commit and push the changes we made above.

Adding a Large File

Now, let's add a DLL to both repos, say a DLL called MyLibrary.dll which is about 1.6 MB. At this point, the two repos are about the same size, except for a few extra bytes to account for files such as .gitattributes.

  With LFS Without LFS
Folder Size: ~1.6 MB ~1.6 MB
count-objects
count: 6
size: 861 bytes
count: 3
size: 540 bytes

(The Folder Size is from Windows Explorer, and count-objects is from git count-objects -vH command.)

Stage the DLL file with git add.

  With LFS Without LFS
Folder Size: ~3.22 MB ~2.37 MB
count-objects
count: 7
size: 986 bytes
count: 4
size: 774.87 KiB

Now things are getting interesting...

  • So why is the LFS repo bigger? Git compresses objects, but LFS doesn't. An issue has been opened to address this in 2015, but still no implementation yet. It's still on their roadmap. (In a way it makes sense, since if you have large files, maybe they are compressed already, so you might not want to waste time and resources compressing it again as it may even yield a bigger file, so perhaps a more granular control is needed here.)
  • Git's count-object size is much smaller in LFS. This is the size of the push.
  • The 774 KiB for the non-LFS repo is about the size of the DLL compressed with zlib.
  • The file object is stored in .git/lfs/objects for LFS, not in .git/objects.

Let's commit and push to remote.

With LFS
$ git push
Uploading LFS objects: 100% (1/1), 1.7 MB | 89 KB/s, done.
Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 4 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 437 bytes | 218.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
To https://bitbucket.org/[organization]/lfs-demo.git
   09e450f..753cc3d  master -> master
Without LFS
$ git push
Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 4 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 717.60 KiB | 6.71 MiB/s, done.
Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
To https://bitbucket.org/[organization]/without-lfs-demo.git
   23d4243..b21091d  master -> master

Note that there's an extra step for uploading LFS objects in the LFS repo, and the push size to remote is much smaller in the LFS repo as we've seen from count-objects before.

Most git providers such as Azure Repos, Bitbucket, and GitHub have built-in support for LFS. You can also use a separate LFS server.

Cloning LFS Repo

One benefit of using LFS is that you can use git without the need to download the full git history data, as it will download LFS files as needed. In git, the LFS files it stores are actually reference, or "pointer" files to the LFS objects. Let's see a bit of how that works.

Let's overwrite the DLL with a much small one, say ~49 KB, then commit and push.

With LFS
$ git commit -m "Replaced with smaller DLL."
[master 7c9b2ab] Replaced with smaller DLL.
 1 file changed, 2 insertions(+), 2 deletions(-)
Without LFS
$ git commit -m "Replaced with smaller DLL"
[master 2286636] Replaced with smaller DLL
 1 file changed, 0 insertions(+), 0 deletions(-)
 rewrite MyLibrary.dll (99%)

Note that the messages are a bit different, and that's because as far as git is concerned for the LFS, the file that changed is the pointer file. If you run git show, you will see something like below, which shows you the content of the pointer file and how it changed. The hash is how it tracks the file between git and LFS object storage.

diff --git a/MyLibrary.dll b/MyLibrary.dll
index 91c5966..d8b338d 100644
--- a/MyLibrary.dll
+++ b/MyLibrary.dll
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3dce36d583ba1c741e95df1a265e47f0de581bef77ab48165dd67266be7a42ef
-size 1677824
+oid sha256:2b615798c36b1996093d44e77eb5306b4db9260546ce5aa2d3f7dde23476586b
+size 49664

Now, let's clone the repositories to new folders, and see what the sizes are.

  With LFS Without LFS
Folder Size: ~122 KB ~872 KB
count-objects
count: 12
size: 1.68 KiB
count: 9
size: 801.93 KiB

So there you have it, Git LFS is much smaller, as it only downloaded the latest commit of the DLL file.

At this point, if you look at .git/lfs/objects/, there should be one folder, in my case 2b, and inside that is 61 folder. If you open that folder, there is a file, in my case: 2b615798c36b1996093d44e77eb5306b4db9260546ce5aa2d3f7dde23476586b, sitting in at 49KB. This is the actual MyLibrary.DLL file stored by LFS with git. Note that how the folder names match the beginning of the file name, which is the hash from the pointer file.

To view which hash the file corresponds to, use git lfs ls-files:

$ git lfs ls-files
2b615798c3 * MyLibrary.dll

What would happen if we checkout the previous commit, the one that had the larger DLL?

In my case, the previous commit's hash is a4febdc, so if we checkout that commit with git checkout a4febdc, the folder size gets larger, about 3.27 MB. There's a new folder under the .git/lfs/objects folder, storing the DLL of this commit. In my case, .git/lfs/objects/3d/ce/3dce36d5...., at 1.6 MB.

If we go back to the latest commit that has the smaller DLL, will LFS delete the old one from .git/lfs/objects? No, but you can run git lfs prune, which will delete the file from .git/lfs/objects/3d/ce (though it doesn't seem to delete the folders, just the file).

$ git lfs prune
prune: 2 local object(s), 1 retained, done.
prune: Deleting objects: 100% (1/1), done.

Now you may wonder, when I cloned the repo that had LFS, do I need to run git lfs install again? The answer is no, because I'm running git version 2.27. As per this commit, LFS clone support is built into git as of git version 2.15 (released in October 2017), so it will update the hook files and such. Note that the documentation for git lfs clone has not been updated with this information.

Some Concerns

Azure Repos LFS Interface

In Azure Repos, there doesn't seem to be a UI to view and manage LFS objects as of now. There is a suggestion, but it's closed.

Bitbucket Cloud offers a dedicated UI to manage LFS objects, such as deleting them:

Maximum File Size

So LFS can support large files, but there might be a limit on the maximum size of a single file:

  • GitHub enforces 4 GiB size limit on Team plan, and 5 GiB on Enterprise.
  • Azure Repos LFS doesn't seem to have a documented limit.
  • Bitbucket doesn't have a limit, as long as you pay for storage capacity ($10 per 100 GB increments). As per their documentation: "Note that there's no limit on the LFS file size you can push to Bitbucket Cloud."
  • Note that in most cases the storage limit is across the organization/account, not per repo.
Push Size Limit
  • Some remotes may have push limit size...? If you encounter errors while pushing to remote, may need to update some configuration, such as running git config http.version HTTP/1.1.
Pipelines
  • Some remotes may require special instructions when using LFS from a Pipeline process/builds, such as in Azure Pipelines.
Deleting LFS Objects from Remote

Since LFS is built to support git's workflow where all history is stored, it probably makes sense that you need to use caution if you want to delete LFS objects. For GitHub, and also for Azure Repos and Bitbucket, to reclaim LFS storage, you'll need to delete the entire repo. In Bitbucket, as seen above, you can use the UI to delete individual objects, but this will break your git unless you also clean up the git accordingly.

Other Limitations

Additional notes can be found on the follow up post.

May 6, 2020

Enable Story Points Field in Jira

If you can't seem to enable/show the Story Points field in Jira, take a look at the Context setting.

Jira Issues Settings → Custom Fields → Story Points → click the three dots to bring up the menu → Contexts and default value.

Make sure that the project you want is selected or Global context is selected.

This is also where you can set different issue types to have Story Points field. You can also define multiple contexts.

Story Points and Kanban Board in Jira

I had to add the Story Points field to a Jira project that's been using Kanban board. Surprisingly, Jira doesn't support Story Points field for Kanban boards. One of those "the tool vendor knows better than you" situations, though they are considering it as a future feature request.

There's a value in having estimates for Kanban issues, and it's unfortunate that Jira decided not to support it. At least I can add to the issue detail view, though I wasted some time due to the custom Context setting that this particular Jira Cloud instance was using. It's very possible that it was actually me who initially set it, yes, it's been a while.

Just to make the planning easier for my team, I created a new board based on Scrum, so we can see Story Points in the backlog list view.

Epilogue

This is a bit of a rant really. Jira has a lot of customization features, with schemes and screens and such, though it's puzzling as to why they couldn't just enable Story Points in Kanban boards since they have it in Scrum boards. Like many software implementations that need to evolve, it's interesting to see how Jira is adding new features and keeping existing features on a web product that can't go down and need to service both long time existing users and attract brand new users. First was the New Issue View that you can enable from the personal settings. It lacked a lot of features when it first came out, such as copy-and-paste for images, but now it has implemented enough features that I usually leave it on. Then there's the Next-Gen projects, which currently lacks enough features that it's not usable for me, but probably geared for the new people who are not accustomed to somewhat steep learning curve of Jira configuration. It also doesn't help that built-in fields are configured through a feature named "Custom Fields".

Also, why are there two fields — Story Points and Story points estimation? Well, here's the answer — the first is used for classic projects, and the latter is used for Next-Gen projects. Now I'm beginning to wonder if Atlassian has it all together...is it time to leave Jira and look for something better?

February 18, 2020

MkDocs with Docker - Part 2: Azure Pipelines and ACR

In Part 1, we created a Docker image with MkDocs, and used the container to execute MkDocs. In this post, we will:

  • Improve the Docker image with additional MkDocs features.
  • Publish the image to Azure Container Registry.
  • Use the container in Azure Pipelines.
  • Use Azure Pipelines to publish the compiled documentation to Azure App Service.

Making Improvements

I'm not fond of MkDoc's default theme, and after trying out various themes, I've settled on Material for MkDocs. It seems to be the most feature-complete, with 3K GitHub stars and lots of extension support. (I also like Read-the-Docs theme, but it seems there are some bugs, such as code blocks rendering in a single line, and even though there's a pull-request for the fix, the developer has not merged it. That project has also been inactive for a couple of years.)

Let's change the Dockerfile to integrate the Material theme, and also to reduce some steps in preparation for Azure Pipelines. Refer to the sample GitHub repo for the project folder structure.

FROM python:3.8.1-alpine3.11

EXPOSE 8000

# Git is required for the git-revision-date plugin.
RUN apk add git

RUN pip install --no-cache-dir mkdocs && \
    pip install --no-cache-dir mkdocs-material && \
    pip install --no-cache-dir mkdocs-git-revision-date-localized-plugin && \
    pip install --no-cache-dir pymdown-extensions

# Note that /mnt/repo should be the git root folder, not the content folder,
# otherwise the git-revision plugin will complain during build.
CMD ["/bin/sh", "-c", "cd /mnt/repo/content && mkdocs build"]

I'm relatively new to Docker and python, so it's probably not the optimal Dockerfile, so will need to explore further, such as using multi-stage builds. Also should look into requirements.txt file for pip.

Note the CMD command in the Dockerfile – it assumes certain conventions for mounting the host directory path, so we'll need to follow it when we want to run the container in Azure Pipelines. If we don't want to follow the convention, it can be overwritten in docker run command.

Just out of curiosity, what shell does Docker use when it's executing the RUN command? Looks like it's /bin/sh.

Publish to Azure Container Registry

We can publish the docker image to Docker Hub, but we need to keep things private, so for this demo, let's set up an Azure Container Registry to host the docker images. Follow the quick start guide to setup an Azure Container Registry to push the image we've built above.

Enable the admin user functionality, as we'll use that later in Azure Pipelines below. Note that enabling the admin user is not recommended, but good enough for this demo. Service principals is used in production.

Azure Pipelines

To setup the pipeline, first, add a secret variable to store the Azure Container Registry's admin user password, as ACR-SECRET. Then add the following to azure-pipelines.yml:

trigger:
- master

pool:
  vmImage: 'ubuntu-latest'

steps:
- script: docker login -u <acr-name> -p $(ACR-SECRET) <acr-name>.azurecr.io
  displayName: 'Login to Azure Container Registry'

- script: docker pull <acr-name>.azurecr.io/tools/mkdocs:0.1
  displayName: 'Pull MkDocs docker image'

- script: docker run --rm --mount type=bind,source=$(pwd),target=/mnt/repo <acr-name>.azurecr.io/tools/mkdocs:0.1
  displayName: Run MkDocs build through the docker container.

The above pipeline will login to our container registry, pull down the image, and spin up a container of that image to execute MkDocs to build the documentation. One thing that may need to be investigated is if there's a way to cache the docker image so we don't have to pull it down every time. We'll need to weigh which will be more cost effective, since pulling from the ACR may incur cost. Also note that the Azure Pipelines Build Agents already have common docker images preinstalled, so we should try to use those as base when creating the Dockerfile. We'll also need to weigh the speed of the build, since loading the cache in the pipeline can also take some time.

Now, of course, you don't have to use the docker container here. You can just install MkDocs and its plugins in the pipeline... However, since all of my team members will be using the same docker container to build and test their documentation, we can be sure that it will work the same way on this build. And yes, there are other ways of enforcing those standards even if we didn't use Docker, but let's leave it at that for now.

Publish to Azure App Service

So the pipeline above does the build but doesn't actually do anything with the output of the build. Let's publish the compiled documentation site to Azure App Service. Since the site is static, I can also do Azure Storage Static Hosting, but it seems it's always public access. With the App Service, I'll put it behind our Active Directory authentication.

Follow the documentation to create the site. The documentation doesn't go into details on how to setup a static HTML site using the Portal, so I ended up just creating a Windows .Net Framework site for this demo.

Add the following to the azure-pipelines.yml from above:

- task: ArchiveFiles@2
  inputs:
    rootFolderOrFile: 'content/site'
    includeRootFolder: false
    archiveType: 'zip'
    archiveFile: '$(Build.ArtifactStagingDirectory)/$(Build.BuildId).zip'
    replaceExistingArchive: true

- task: AzureRmWebAppDeployment@4
  inputs:
    ConnectionType: 'AzureRM'
    azureSubscription: '<Your Subscription Here>'
    appType: 'webApp'
    WebAppName: '<app-site-name>'
    packageForLinux: '$(Build.ArtifactStagingDirectory)/$(Build.BuildId).zip'

There are two additional tasks added – one to zip up the content of MkDocs build output, and the other to publish that zip file to Azure App Service.

We could also use Release Pipelines, but for this demo, it's enough to just let the build pipeline handle the deployment.

Now whenever the master branch is updated, the pipeline will do the build and publish the site.

Misc. Notes

The git-revision-date-localized plugin requires git to be installed in the container, since it uses git when MkDocs builds the site. It takes the date of the commit and adds it to the compiled document, and is supported by the Material theme.

Note that there are two sections for configuring MkDocs – plugins and markdown extensions. The git-revision-date-localized is configured under the plugin section. Refer to the demo GitHub repo.

It's slightly concerning that MkDocs hasn't had a release since September 7th, 2018. There are some issues, such as favicon not working, where the fix has been merged but hasn't been released yet, though a workaround does exist — overwriting it after the site has been built, which I've implemented in the azure-pipelines.yml in the demo repo.

One of the big gripes when working with markdown is handling images — you can't simply paste in a screenshot, which we often need to do when creating technical documentation. Fortunately, Visual Studio Code has an extension, Markdown Paste, that seems to work well.

February 17, 2020

MkDocs with Docker - Part 1: Getting Started

I had a chance to explore some options on what tools my teams can use for documentation. I've considered Confluence, which I've used in the past and it's what my teams use currently for some technical documentation, I've also seen Read-the-Docs in the past. I like the idea of having the document under git version control, and using pull-requests with code reviews to control the edits – a flow that my team members are very familiar with.

One constraint I have is that the documentation must be internal and can't be public, so I looked for static files or local hosting options. After some searching, settled on MkDocs for now. Read-the-Docs seemed promising, but local hosting is not supported officially as they are focusing on their own cloud host offering. Some other tools lacked search functionality.

Creating a Docker Image for MkDocs

I wanted to keep my local dev machine clean, which means not installing MkDocs and its tooling (MkDocs is written in python), so let's use a docker image. This will also make it easier for other developers in my team to run it on their local machines. Here's the Dockerfile:

FROM python:3.8.1-alpine3.11

RUN pip install mkdocs
    

Yes, it's simple. It will be expanded in Part 2, but good enough for now. I'm using a specific tag – 3.8.1-alpine3.11 – for the base because I already have alpine 3.11 image locally, so it wouldn't take up extra space if it was using another base image.

Let's build the image:

C:\data\my-docs\docker-image>docker build -t dusklight/mkdocs:0.1 .
    

Running MkDocs through Docker Container

Let's assume that I have C:\data\my-docs folder which will store the documentation. I want this folder to be accessible from the Docker container that I'll be running from the image I created above. I also want to serve the documentation from the container by MkDocs and access it from the host.

Share the Host Drive for Docker Containers

To make a folder on the host available to containers, in Docker Desktop for Windows, use the Settings to share the drive:

Start the Docker Container
C:\>docker run -it --rm -p 8888:8000 --mount type=bind,source="C:\data\my-docs",target=/mnt/my-docs dusklight/mkdocs:0.1 /bin/sh
    

I won't go into details in this post about the command, but basically it will start the container in interactive mode and expose container's port 8000 to port 8888 on the host, and make the folder available to the container at /mnt/my-docs.

Note that if you haven't shared the C drive first, you may see the following error: docker: Error response from daemon: status code not OK but 500: {"Message":"Unhandled exception: Drive has not been shared"}.

Creating a Sample Project with MkDocs

After the container starts, let's create a sample MkDocs project:

/ # cd /mnt/my-docs
/ # mkdocs new .
INFO    -  Writing config file: ./mkdocs.yml
INFO    -  Writing initial docs: ./docs/index.md
    

The files are now created in C:\data\my-docs.

Serving Documents with MkDocs Dev Server

MkDocs comes with a dev server that has live reloading feature, so we'll use that for now. When we are at a point where the documentation is ready and can be published to others, we'll build and publish with a Azure DevOps, in Part 2 of this series.

By default, MkDocs will bind to 127.0.0.1, so when it's run in the container, it's not accessible to the outside even with the right ports published in the docker command line. Open the mkdocs.yml and add the following so that it binds to all:

dev_addr: '0.0.0.0:8000'

Now let's start the server:

mkdocs serve

From the host machine, browse to htp://localhost:8888/ and MkDocs should come up.

In Part 2, we'll look at how to use the Docker image in a CI/CD process utilizing Azure DevOps and Azure Container Registry.

January 31, 2020

Docker: HNS failed with error

While trying to run a container with a port exposed/published to the host, encountered the following error:

HNS failed with error : The process cannot access the file because it is being used by another process.

Makes it sound like Docker has trouble accessing a file because another process has an exclusive lock...but what is HNS? It stands for Host Networking Service.

It turned out I mistakenly reversed the order of the port numbers. Since my host's port 80 was taken up by IIS already, Docker wasn't able to start the container.

The port command should be:

docker run -p [host port]:[container port] <ImageName>