Using CMake with External Projects

The reason for this post is that I spent a considerable amount of time1 the last two days to convince CMake’s External Project feature to do what I wanted. And without the help of the usual suspects (Stackoverflow, old mails, blog articles, code snippets, etc.), I might still not be done. So I thought I could pay back the general public by posting some notes on this topic 😉

CMake

I’m using CMake for building my personal projects for several years now and am actually quite happy with it: It’s a very powerful and useful tool.

One thing to consider though is that CMake has its own quirky scripting language and the official documentation is still pretty terrible: It’s not very good in terms of general overview/introduction, or explaining how all plays together, or recommending current best practices. And even if you only ever use it as a reference, it’s not very good structured or written for that task…

You’ll easily get started with CMake by looking at the usual “Hello, world!” tutorials: Create a CMakeLists.txt file, insert an add_executable() here and an add_library() there, and very soon you’ll have a running program.exe.

But as soon as you’ll need some more advanced features (and you will need and should use them!), you’ll be spending a lot of time on Google, reading Stackoverflow threads or blog posts, and sifting mailing list archives.

Adding to this confused state is the ‘recent’ evolution to the best practices of “Modern CMake”, meaning that a lot of the stuff you’ll find on the Internet may still work, but are not really recommended anymore (to sum it up in one sentence: The new way is to construct your CMakeLists rather target based instead of variable based).

But despite all that, as mentioned before, it’s a cool utility and one gets used to the weird stuff after a while.

Approaches

I started a new project that will be making use of SQLite and TagLib; so when building it, these two libraries must also be available.

There are several ways how one could make these dependencies work:

  1. Use pre-compiled binary versions of the libraries and store them externally or within the project itself

    Either way, I’m not a big fan of it: If you switch the compiler (be it just from MSVC20xx to MSVC20yy), you’ll need to remember to change the binaries also. For multiple supported toolsets, multiple copies of the binaries must remain in reach.

    If possible, building these dependencies yourself from source seems to be the better alternative.

  2. Download the source of the libraries, build it separately, store it somewhere external/central and point the project to that path

    I do that actually for a big dependeny like the Qt framework: The Zip file of the source code alone is 700 MBytes and after configuration it has to be build, which takes hours on my current PC.

    But since it is so big and I use the Qt toolkit in more than one project, I build it once and put it then at a designated place on my machine (like C:\devel\ext\Qt\...; currently, the path is hardcoded in the CMakeLists of the projects that use it. Since Windows has a different directory structure than Linux or MacOS, I’m not sure if something like CMake’s find_package() may be of help there).

    The downside: You need to remember to do that and you may need to invest time to automate this for easier reproduction.

    Also, if you switch your working computer often, you may feel the pain at some point: “Damn, just wanted to work a bit on the project on my laptop while riding the train, but now I see that I’m missing my Qt library here.”

  3. Store the source code of the libraries in your repo (and then build it together with your project)

    Again, for a heavyweight like the Qt framework this is out of the question, but for smaller tools (like “SQLite” or “TagLib”) it might be a possibilty.

    On the pro side: If you clone the project’s repository, all the stuff you need is right here.

    On the contra side: If you don’t have the need to customize these libraries’s code to your special requirements, it’ll just bloat your repository unnecessarily with read-only material.

  4. Git Submodules

    Submodules allow you to keep a Git repository as a subdirectory of another Git repository.
    This lets you clone another repository into your project and keep your commits separate.

    Oh well, sounds interesting at first sight, but after reading about it some more, it sounds a bit too fragile and error-prone for my liking at the moment.

    It also has the same issue like the previous point: Bloating your repository with code that you only want to use, not modify.

    And it’s just covering the availabilty or the source code; the actual building of it still has to happen.

  5. CMake ExternalProject module

    The ExternalProject_Add() function creates a custom target to drive download, update/patch, configure, build, install and test steps of an external project […]
    The function supports a large number of options which can be used to tailor the external project behavior.

    Which is my currently favored approach: It incorporates procedures into the CMake workflow to download and build the sources of these libraries, subject to the state of the CMake target that will use it.

    All in all, it seems like a good solution for small to medium sized dependencies.

Practice

So this new project of mine will utilize two other libraries. At the moment, it’s in a very early stage and I just started off with a few (sub)directories and files below the project’s root:

ProjectX/
    src/
        cmake/
            ExternalProjects.cmake
        CMakeLists.txt
        main.cpp

We can mostly ignore the files CMakeLists.txt and main.cpp for now; I’ll explain briefly:

CMakeLists.txt has only one relevant part for us in the form of the include(cmake/ExternalProjects.cmake) statement, which is used to load and run CMake code from a file or module (in this case from my file ExternalProjects.cmake in the project’s cmake subdirectory; not the similar named CMake module!). The other lines are the usual boilterplate to create a project.

1
2
3
4
5
project("ProjectX")
add_executable(${PROJECT_NAME})
target_sources(${PROJECT_NAME} main.cpp)

include(cmake/ExternalProjects.cmake)             # <--

It is of course not neccessary to put this into a distinct *.cmake file and then to include it again; all the following lines could also be in the main CMakeLists.txt. It’s just my preference to often follow a separation of concerns way of doing. It may be overkill in some smaller projects, but some of those tend to grow and then it becomes hard to untangle it again.

main.cpp is just a dummy source code file that prints out “Hello, world!” to the console.
At this time, it’s only reason for existence is so that I can use it in the CMakeLists.txt for the usual boilerplate statements to build a program later.

1
2
3
4
5
6
7
#include <iostream>

int main (int argc, char** argv)
{
    std::cout << "Hello, world!" << std::endl;
    return 0;
}

cmake/ExternalProjects.cmake

The interesting stuff is in the cmake/ExternalProjects.cmake file, at which we will now take a closer look.

Check for the Git module

Since one of the following two ExternalProject segments will download its sources from a GitHub repository, we first check if Git is available on our machine (the FindGit module that we load for this comes with CMake).

But note that you have to take care of installing Git yourself: Both include(FindGit) and find_package(Git) are just tools of CMake to detect the presence of the application on a machine, not installing it; that’s your job!

1
2
3
4
5
6
include(FindGit)
find_package(Git)

if (NOT Git_FOUND)
    message(FATAL_ERROR "Git not found!")
endif ()

Include the required module

Next up, we will get the sources for SQLite and TagLib and build them.

Since this happens with CMake’s own ExternalProject module, we need to load it first with the appropriate include() statement:

1
 include (ExternalProject)

I will not explain every setting I use in ExternalProject_Add() in detail, some are pretty self-explanatory (like e.g. URL), and there are more that I don’t use. You can look it up on the ExternalProject help page2.

Download a Zip file and build with custom commands

We will get the Zip file of the amalgamation version of the SQLite sources and build it then with a custom build command:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
set (EP_SQLITE "SQLite")

ExternalProject_Add (
    ${EP_SQLITE}
    
    PREFIX            ${EP_SQLITE}
    URL               https://www.sqlite.org/2019/sqlite-amalgamation-3290000.zip
    URL_HASH          SHA1=a0eba79e5d1627946aead47e100a8a6f9f6fafff

    CONFIGURE_COMMAND ""
    UPDATE_COMMAND    ""
    INSTALL_COMMAND   ""
    
    BUILD_ALWAYS      OFF
    INSTALL_DIR       ${CMAKE_CURRENT_BINARY_DIR}/ext/${EP_SQLITE}

    BUILD_COMMAND     ${CMAKE_CXX_COMPILER} <SOURCE_DIR>/sqlite3.c /link /dll /out:sqlite3.dll
    COMMAND           ${CMAKE_COMMAND} -E copy <BINARY_DIR>/sqlite3.dll <INSTALL_DIR>/sqlite3.dll
)

The variable EP_SQLITE is set for convenience reasons and to satisfy my “One Definition Rule”.

PREFIX is the root name for the default directory structure.

ExternalProject_Add() will download the given Zip file from the SQLite URL, compare it with the given Hash value with the specified algorithm (SHA-1) for correctness and then extract it into a default directory structure (which can be changed, see documentation). If CMake detects that the (extracted) files are already there, it will skip this part.

Then we set several *_COMMAND variables to an empty string: Some have CMake-defined default values that may clash with our cause.

BUILD_ALWAYS OFF makes sure that the build step is not always run when we run our CMake project.

INSTALL_DIR modifies one value of the above mentioned default directory structure slightly, so that the built library files will end up in the right place (related: The COMMAND at the end).

Then we have our custom BUILD_COMMAND: The instruction is copied from the SQLite page as is and I just assume that the variable ${CMAKE_CXX_COMPILER} will point to cl.exe (not portable/multi-platform compatible, but good enough for this example).

Also noteworthy here is the strange syntax to reference the source directory path in the BUILD_COMMAND: <SOURCE_DIR>. This is not resolving a normal CMake variable (that would be ${SOURCE_DIR}), but using a placeholder token3 that ExternalProject_Add() will replace with the actual path when processed.

Finally, a plain COMMAND that uses CMake to copy the generated DLL file to wherever we want…
Any of the *_COMMANDs (like CONFIGURE_COMMAND, BUILD_COMMAND, INSTALL_COMMAND, etc.) can have as many of such additional COMMANDs following as needed.

Clone a Git repository and build with CMake

The second example is getting the TagLib sources by cloning the GitHub repository and then invoking CMake to build it (TagLib itself comes with its own CMakeLists.txt, etc.).

Remember that we checked above whether Git was installed our machine — that was meant for this section…

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
set (EP_TAGLIB "TagLib")

ExternalProject_Add (
    ${EP_TAGLIB}
    
    PREFIX         ${EP_TAGLIB}
    GIT_REPOSITORY https://github.com/taglib/taglib
    GIT_TAG        v1.11.1
    GIT_SHALLOW    ON
    
    BUILD_ALWAYS   OFF
    INSTALL_DIR    ${CMAKE_CURRENT_BINARY_DIR}/ext/${EP_TAGLIB}
    
    CMAKE_CACHE_ARGS
        -DBUILD_SHARED_LIBS:BOOL=ON
        -DENABLE_STATIC_RUNTIME:BOOL=OFF
        -DBUILD_EXAMPLES:BOOL=ON
        -DCMAKE_INSTALL_PREFIX:PATH=<INSTALL_DIR>

    BUILD_COMMAND     ${CMAKE_COMMAND} --build <BINARY_DIR> --config Release --target INSTALL
)

Again, the variables EP_SQLITE and PREFIX have the same purpose as mentioned in the previous example.

Then we specify the URL of the Git repository, which tag it should checkout (GIT_TAG doesn’t need to be a literal tag name, can also be a branch name or a commit hash) and whether we want the whole history or just the one attached to the specified tag (i.e. GIT_SHALLOW ON).

Again, BUILD_ALWAYS OFF and INSTALL_DIR serve the same purpose as mentioned in the previous example.

With CMAKE_CACHE_ARGS we set options for the following implict(!) configuration of the external project’s own(!) CMake run.
There are several things to keep in mind for this:

  1. Multiple variants:

    • CMAKE_ARGS is the normal way for supplying options to the CMake command line for this external project.
    • CMAKE_CACHE_ARGS is for when you may hit a command line length limit: It works directly with the CMakeCache.txt file of that project; also, it forces these values!
    • CMAKE_CACHE_DEFAULT_ARGS is similar, just that is sets the inital values in the cache file, which could later be overridden.
  2. Also, CMAKE_*_ARGS constructs a list of space separated values, so don’t use -D CMAKE_INSTALL_PREFIX:PATH = <INSTALL_DIR> but -DCMAKE_INSTALL_PREFIX:PATH=<INSTALL_DIR>!

  3. Also neccessary, at least for the two CMAKE_CACHE_*_ARGS options: One must specify the type: -D<var>:<type>=<value>
    Example: -DBUILD_SHARED_LIBS:BOOL=ON
    If you’re unsure about the type, just let CMake run one time and look into the generated CMakeCache.txt file.

Digression: A tale of woe

So far so good, but then an error threw me off the track for a while, until I figured out how this works/relates.

The first time I ran this portion of the script, CMake spit out an error about TagLib not being configured (or so I misinterpreted it at least).
In retrospective, that must have been something different, but I can’t now trace back what it really caused it.

Anyways, me being clever thought: “Oh, so CMake complains that TagLib is not configured. Since TagLibs uses itself CMake for building, that makes total sense and the documentation for ExternalProjects_Add() even has a CONFIGURE_COMMAND, so I will use that!” — big mistake!

Because I spent the next hours scratching my head why the options that I set in CMAKE_ARGS weren’t applied.

I found out that when I put the options directly into the CONFIGURE_COMMAND, all worked as it should.
But if that was the right thing to do, for what purposes do CMAKE_ARGS, CMAKE_CACHE_ARGS and CMAKE_CACHE_DEFAULT_ARGS exist?

After a lot of testing and reading, the lights went on:

  • If you use CONFIGURE_COMMAND, you override CMake’s default behavior and you’re on your own, because… you certainly know what you’re doing, right?!
  • That means also, CONFIGURE_COMMAND will not use any of the values set in CMAKE_*_ARGS, you must provide all yourself in the CONFIGURE_COMMAND section.

So, after removing the CONFIGURE_COMMAND lines again, everything suddenly worked fine: TagLib was automatically and implicitly configured by CMake with the (inherited from the parent CMake process) default values (generator, platform, etc.) and with the values I set in my CMAKE_ARGS section.

The error that originally led me to believe that an explicit configuration step was missing for TagLib must have been fixed by me, accidentally, while testing and cursing 😄

Now, if all is so easy when using CMake’s ExternalProject_Add() on another CMake project, why do I still explicitly prepare a BUILD_COMMAND for it?

Because the implicit default command (at least in this case) is to build the Debug configuration and the target All, while I wanted Release and Install to be build, that’s why.
(Note: Using CMAKE_BUILD_TYPE for specifying Release in CMAKE_*_ARGS would not work for multi-configuration generators like Microsoft Visual C++!)

The end (for now)…


  1. Considerable amount of time meaning more than the 10 minutes I expected it would take…
    Split over the days, it have been hours for a seemingly simple task.

    [return]
  2. I do wonder why the CMake gods named the main and most important part of the module ExternalProject_Add():
    Either a plain ExternalProject() to be in sync with Project(), or Add_ExternalProject(), to be in sync with Add_Executable() or Add_Library() would have been easier to remember. Oh well, probably because it’s a separate module and not a core feature…

    [return]
  3. Irritating is the fact that you can’t use these placeholder tokens everywhere within the ExternalProject_Add() block, it seems:
    I tried it in some ways (not shown in this tutorial), where CMake bailed with an error about not being able to find/resolve/create <XXX>.

    [return]