Using CMake with External Projects

Posted 2019-08-03 · Updated 2021-04-14

The reason for this post is that I spent a considerable amount of time¹ the last two days to convince CMake’s External Project feature to do what I wanted. And without the help of the usual suspects (Stackoverflow, old mails, blog articles, code snippets, etc.), I might still not be done. So I thought I could pay back the general public by posting some notes on this topic 😉

Contents

CMake

I’m using CMake for building my personal projects for several years now and am actually quite happy with it: It’s a very powerful and useful tool.

One thing to consider though is that CMake has its own quirky scripting language and the official documentation is still pretty terrible: It’s not very good in terms of general overview/introduction, or explaining how all plays together, or recommending current best practices. And even if you only ever use it as a reference, it’s not very good structured or written for that task…

You’ll easily get started with CMake by looking at the usual “Hello, world!” tutorials: Create a CMakeLists.txt file, insert an add_executable() here and an add_library() there, and very soon you’ll have a running program.exe.

But as soon as you’ll need some more advanced features (and you will need and should use them!), you’ll be spending a lot of time on Google, reading Stackoverflow threads or blog posts, and sifting mailing list archives.

Adding to this confused state is the ‘recent’ evolution to the best practices of “Modern CMake”, meaning that a lot of the stuff you’ll find on the Internet may still work, but are not really recommended anymore (to sum it up in one sentence: The new way is to construct your CMakeLists rather target based instead of variable based).

But despite all that, as mentioned before, it’s a cool utility and one gets used to the weird stuff after a while.

Approaches

I started a new project that will be making use of SQLite and TagLib; so when building it, these two libraries must also be available.

There are several ways how one could make these dependencies work:

Use pre-compiled binary versions of the libraries and store them externally or within the project itself

Either way, I’m not a big fan of it: If you switch the compiler (be it just from MSVC20xx to MSVC20yy), you’ll need to remember to change the binaries also. For multiple supported toolsets, multiple copies of the binaries must remain in reach.

If possible, building these dependencies yourself from source seems to be the better alternative.
Download the source of the libraries, build it separately, install it and point the project there

I do that actually for a big dependeny like the Qt framework: The Zip file of the source code alone is 700 MBytes and after configuration it has to be build, which takes hours on my current PC. But since it is so big and I use the Qt toolkit in more than one project, I build it once and install it then at a designated place on my machine (like C:\devel\ext\Qt\...).

My own CMake projects then uses find_package() to include the Qt package, which then can be used as a normal target.
The hint where CMake should look for Qt’s CMake package configuration files is by providing the path to the xxxConfig.cmake file in the Qt directory (via -DCMAKE_PREFIX_PATH=...).

The downside: You need to remember to do that and you may need to invest time to automate this for easier reproduction.
Also, if you switch your working computer often, you may feel the pain at some point: “Damn, just wanted to work a bit on the project on my laptop while riding the train, but now I see that I’m missing my Qt library here.”
Store the source code of the libraries in your repository (and then build it together with your project)

Again, for a heavyweight like the Qt framework this is out of the question, but for smaller tools (like “SQLite” or “TagLib”) it might be a possibilty.

On the pro side: If you clone the project’s repository, all the stuff you need is right here.

On the contra side: If you don’t have the need to customize these libraries’s code to your special requirements, it’ll just bloat your repository unnecessarily with read-only material.
Git Submodules

Submodules allow you to keep a Git repository as a subdirectory of another Git repository.
This lets you clone another repository into your project and keep your commits separate.

Oh well, sounds interesting at first sight, but after reading about it some more, it sounds a bit too fragile and error-prone for my liking at the moment.

It also has the same issue like the previous point: Bloating your repository with code that you only want to use, not modify.

And it’s just covering the availabilty or the source code; the actual building of it still has to happen.
CMake’s ExternalProject module

The ExternalProject_Add() function creates a custom target to drive download, update/patch, configure, build, install and test steps of an external project […]
The function supports a large number of options which can be used to tailor the external project behavior.

It incorporates procedures into the CMake workflow to download and build the sources of these libraries, subject to the state of the CMake target that will use it.

This is good for so-called “superbuilds”, where the project may consist mostly of external projects.

One thing to consider here though is that getting the sources happens at build time, so you will not be able to use the targets provided by the external project earlier in your own CMake files: They do not yet exist at configuration time of your own CMakeLists.txt!

This is solved by the FetchContent module…
CMake’s FetchContent module

This module is part of CMake since version 3.11, and uses the ExternalProject module’s functionality for some tasks, like the actual downloading.

The primary difference is the time when external projects are brought in to your own project:
At CMake’s configure time instead of the later build time.

This module enables populating content at configure time via any method supported by the ExternalProject module.
Whereas ExternalProject_Add() downloads at build time, the FetchContent module makes content available immediately, allowing the configure step to use the content in commands like add_subdirectory(), include() or file() operations.

So that seems like a good way when you have your own stuff going on in your project’s CMakeLists.txt, but also depend on some external projects/libraries, which you would like to add directly, ~~by using find_package() on it~~(TODO: Is that so? Must it not be installed…?), or as a dependency in your target_link_libraries(), or using its targets otherwise…

And now some more details and examples for ExternalProject and FetchContent:

ExternalProject: Practice

So this new project of mine will utilize two other libraries. At the moment, it’s in a very early stage and I just started off with a few (sub)directories and files below the project’s root:

ProjectX/
	src/
		cmake/
			ExternalProjects.cmake
		CMakeLists.txt
		main.cpp

We can mostly ignore the files CMakeLists.txt and main.cpp for now; I’ll explain briefly:

CMakeLists.txt has only one relevant part for us in the form of the include(cmake/ExternalProjects.cmake) statement, which is used to load and run CMake code from a file or module (in this case from my file ExternalProjects.cmake in the project’s cmake subdirectory; not the similar named CMake module!). The other lines are the usual boilterplate to create a project.

project("ProjectX")
add_executable(${PROJECT_NAME})
target_sources(${PROJECT_NAME} main.cpp)

include(cmake/ExternalProjects.cmake)             # <--

It is of course not neccessary to put this into a distinct *.cmake file and then to include it again; all the following lines could also be in the main CMakeLists.txt. It’s just my preference to often follow a separation of concerns way of doing. It may be overkill in some smaller projects, but some of those tend to grow and then it becomes hard to untangle it again.

main.cpp is just a dummy source code file that prints out “Hello, world!” to the console.
At this time, it’s only reason for existence is so that I can use it in the CMakeLists.txt for the usual boilerplate statements to build a program later.

#include <iostream>

int main (int argc, char** argv)
{
	std::cout << "Hello, world!" << std::endl;
	return 0;
}

cmake/ExternalProjects.cmake

The interesting stuff is in the cmake/ExternalProjects.cmake file, at which we will now take a closer look.

Check for the Git module

Since one of the following two ExternalProject segments will download its sources from a GitHub repository, we first check if Git is available on our machine (the FindGit module that we load for this comes with CMake).

But note that you have to take care of installing Git yourself: Both include(FindGit) and find_package(Git) are just tools of CMake to detect the presence of the application on a machine, not installing it; that’s your job!

include(FindGit)
find_package(Git)

if (NOT Git_FOUND)
	message(FATAL_ERROR "Git not found!")
endif ()

Include the required module

Next up, we will get the sources for SQLite and TagLib and build them.

Since this happens with CMake’s own ExternalProject module, we need to load it first with the appropriate include() statement:

 include (ExternalProject)

I will not explain every setting I use in ExternalProject_Add() in detail, some are pretty self-explanatory (like e.g. URL), and there are more that I don’t use. You can look it up on the ExternalProject help page².

Download a Zip file and build with custom commands

We will get the Zip file of the amalgamation version of the SQLite sources and build it then with a custom build command:


set (EP_SQLITE "SQLite")

ExternalProject_Add (
	${EP_SQLITE}
	
	PREFIX            ${EP_SQLITE}
	URL               https://www.sqlite.org/2019/sqlite-amalgamation-3290000.zip
	URL_HASH          SHA1=a0eba79e5d1627946aead47e100a8a6f9f6fafff

	CONFIGURE_COMMAND ""
	UPDATE_COMMAND    ""
	INSTALL_COMMAND   ""
	
	BUILD_ALWAYS      OFF
	INSTALL_DIR       ${CMAKE_CURRENT_BINARY_DIR}/ext/${EP_SQLITE}

	BUILD_COMMAND     ${CMAKE_CXX_COMPILER} <SOURCE_DIR>/sqlite3.c /link /dll /out:sqlite3.dll
	COMMAND           ${CMAKE_COMMAND} -E copy <BINARY_DIR>/sqlite3.dll <INSTALL_DIR>/sqlite3.dll
)

The variable EP_SQLITE is set for convenience reasons and to satisfy my “One Definition Rule”.

PREFIX is the root name for the default directory structure.

ExternalProject_Add() will download the given Zip file from the SQLite URL, compare it with the given Hash value with the specified algorithm (SHA-1) for correctness and then extract it into a default directory structure (which can be changed, see documentation). If CMake detects that the (extracted) files are already there, it will skip this part.

Then we set several \*_COMMAND variables to an empty string, because some of thos variable have CMake-defined default values that may clash with our cause.

BUILD_ALWAYS OFF makes sure that the build step is not run always whenever we run our CMake project.

INSTALL_DIR modifies one value of the above mentioned default directory structure slightly, so that the built library files will end up in the right place (related: The COMMAND at the end).

Then we have our custom BUILD_COMMAND: The instruction is copied from the SQLite page as is and I just assume that the variable ${CMAKE_CXX_COMPILER} will point to cl.exe (not portable/multi-platform compatible, but good enough for this example).

Also noteworthy here is the strange syntax to reference the source directory path in the BUILD_COMMAND: <SOURCE_DIR>. This is not resolving a normal CMake variable (that would be ${SOURCE_DIR}), but using a placeholder token³ that ExternalProject_Add() will replace with the actual path when processed.

Finally, a plain COMMAND that uses CMake to copy the generated DLL file to wherever we want…
Any of the *_COMMANDs (like CONFIGURE_COMMAND, BUILD_COMMAND, INSTALL_COMMAND, etc.) can have as many of such additional COMMANDs following as needed.

Clone a Git repository and build with CMake

The second example is getting the TagLib sources by cloning the GitHub repository and then invoking CMake to build it (TagLib itself comes with its own CMakeLists.txt, etc.).

Remember that we checked above whether Git was installed on our machine — that was meant for this section…

set (EP_TAGLIB "TagLib")

ExternalProject_Add (
	${EP_TAGLIB}
	
	PREFIX         ${EP_TAGLIB}
	GIT_REPOSITORY https://github.com/taglib/taglib
	GIT_TAG        v1.11.1
	GIT_SHALLOW    ON
	
	BUILD_ALWAYS   OFF
	INSTALL_DIR    ${CMAKE_CURRENT_BINARY_DIR}/ext/${EP_TAGLIB}
	
	CMAKE_CACHE_ARGS
		-DBUILD_SHARED_LIBS:BOOL=ON
		-DENABLE_STATIC_RUNTIME:BOOL=OFF
		-DBUILD_EXAMPLES:BOOL=ON
		-DCMAKE_INSTALL_PREFIX:PATH=<INSTALL_DIR>

	BUILD_COMMAND     ${CMAKE_COMMAND} --build <BINARY_DIR> --config Release --target INSTALL
)

Again, the variables EP_SQLITE and PREFIX have the same purpose as mentioned in the previous example.

Then we specify the URL of the Git repository, which tag it should checkout (GIT_TAG doesn’t need to be a literal tag name, can also be a branch name or a commit hash) and whether we want the whole history or just the one attached to the specified tag (i.e. GIT_SHALLOW ON).

Again, BUILD_ALWAYS OFF and INSTALL_DIR serve the same purpose as mentioned in the previous example.

With CMAKE_CACHE_ARGS we set options for the following implict(!) configuration of the external project’s own(!) CMake run.
There are several things to keep in mind for this:

Multiple variants:
- CMAKE_ARGS is the normal way for supplying options to the CMake command line for this external project.
- CMAKE_CACHE_ARGS is for when you may hit a command line length limit: It works directly with the CMakeCache.txt file of that project; also, it forces these values!
- CMAKE_CACHE_DEFAULT_ARGS is similar, just that is sets the inital values in the cache file, which could later be overridden.
Also, CMAKE_*_ARGS constructs a list of space separated values, so don’t use -D CMAKE_INSTALL_PREFIX:PATH = <INSTALL_DIR> but -DCMAKE_INSTALL_PREFIX:PATH=<INSTALL_DIR>!
Also neccessary, at least for the two CMAKE_CACHE_*_ARGS options: One must specify the type: -D<var>:<type>=<value>
Example: -DBUILD_SHARED_LIBS:BOOL=ON
If you’re unsure about the type, just let CMake run one time and look into the generated CMakeCache.txt file.

Digression: A tale of woe

So far so good, but then an error threw me off the track for a while, until I figured out how this works/relates.

The first time I ran this portion of the script, CMake spit out an error about TagLib not being configured (or so I misinterpreted it at least).
In retrospective, that must have been something different, but I can’t now trace back what it really caused it.

Anyways, me being clever thought: “Oh, so CMake complains that TagLib is not configured. Since TagLibs uses itself CMake for building, that makes total sense and the documentation for ExternalProjects_Add() even has a CONFIGURE_COMMAND, so I will use that!” — big mistake!

Because I spent the next hours scratching my head why the options that I set in CMAKE_ARGS weren’t applied.

I found out that when I put the options directly into the CONFIGURE_COMMAND, all worked as it should.
But if that was the right thing to do, for what purposes do CMAKE_ARGS, CMAKE_CACHE_ARGS and CMAKE_CACHE_DEFAULT_ARGS exist?

After a lot of testing and reading, the lights went on:

If you use CONFIGURE_COMMAND, you override CMake’s default behavior and you’re on your own, because… you certainly know what you’re doing, right?!
That means also, CONFIGURE_COMMAND will not use any of the values set in CMAKE_*_ARGS, you must provide all yourself in the CONFIGURE_COMMAND section.

So, after removing the CONFIGURE_COMMAND lines again, everything suddenly worked fine: TagLib was automatically and implicitly configured by CMake with the (inherited from the parent CMake process) default values (generator, platform, etc.) and with the values I set in my CMAKE_ARGS section.

The error that originally led me to believe that an explicit configuration step was missing for TagLib must have been fixed by me, accidentally, while testing and cursing 😄

Now, if all is so easy when using CMake’s ExternalProject_Add() on another CMake project, why do I still explicitly prepare a BUILD_COMMAND for it?

Because the implicit default command (at least in this case) is to build the Debug configuration and the target All, while I wanted Release and Install to be build, that’s why.
(Note: Using CMAKE_BUILD_TYPE for specifying Release in CMAKE_*_ARGS would not work for multi-configuration generators like Microsoft Visual C++!)

FetchContent: Practice

This is a bit easier: We simply fetch the content of a another CMake-based project at and add its sources to our own project’s source tree at CMake’s configuration time(!), so that we can depend on and build its targets like one of ours.

The getting (dowloading) is internally handled by the same options that ExternalProject_Add() offers, so you can use a Git or Subversion repository, an URL to a Zip file and the like.

Here’s a very simple example; note that for features like the use of namespaces (XXXLib::Library), additional steps need to be done in the project you’re fetching – but that is related to the Art of Packaging and shall not interest us now.

(The differences and options of FetchContent_Declare(), FetchContent_Populate() and FetchContent_MakeAvailable() are documented in the official documententation of the FetchContent module, so I will not go into that here.)

cmake_minimum_required(VERSION 3.11)   # The FetchModule is only available since CMake 3.11.

include(FetchContent)

FetchContent_Populate(
    XXXLib                             # Recommendation: Stick close to the original name.
    GIT_REPOSITORY https://.../xxx.git
    SOURCE_DIR     xxxlib              # (Relative) path within in the build directory.
)

# And now you can already add and use it, like it's a part/target of your own project!
add_subdirectory(${xxxlib_SOURCE_DIR}/src xxxlib/build)
    # IMPORTANT: Lowercase name! See below for more on this...

project(test)

add_executable(foo bar.cpp)
target_link_libraries(foo XXXLib::Library)

Two gotchas I had when I tested it with a library project of mine:

I used there a certain directory layout and used in its CMakeLists.txt the variable CMAKE_SOURCE_DIR as a starting point for some relative paths, to find files and scripts – that did not work anymore when I embedded this library via add_subdirectory() in another project:
Because my library’s dropped down some levels in the over-all file hierarchy, CMAKE_SOURCE_DIR now pointed to the other project’s top-level directory.

Since the relative paths didn’t work anymore, I had to re-arrange some file locations/variables and paths in my library…

Just keep it in mind: Your project may become later in its life just another cog in someone else’s machine 😉
The project name in the FetchContent variables must be written in lowercase, regardless how it’s written elsewhere, otherwise it won’t work!

So, do not write ${XXXLib_SOURCE_DIR} or ${XXXLib_BINARY_DIR} or if (XXXLib_POPULATED), but ${xxxlib_SOURCE_DIR} or ${xxxlib_BINARY_DIR} or if (xxxlib_POPULATED) instead; see also https://cmake.org/pipermail/cmake/2018-March/067184.html!

Yeah, that’s some fucked up shit, for sure…

Considerable amount of time meaning more than the 10 minutes I expected it would take…
Split over the days, it have been hours for a seemingly simple task. ↩︎
I do wonder why the CMake gods named the main and most important part of the module ExternalProject_Add():
Either a plain ExternalProject() to be in sync with Project(), or Add_ExternalProject(), to be in sync with Add_Executable() or Add_Library() would have been easier to remember. Oh well, probably because it’s a separate module and not a core feature… ↩︎
Irritating is the fact that you can’t use these placeholder tokens everywhere within the ExternalProject_Add() block, it seems:
I tried it in some ways (not shown in this tutorial), where CMake bailed with an error about not being able to find/resolve/create <XXX>. ↩︎

Feedback: comment@saoe.net · Categories: Development, How To, On Software · Tags: EN, CMake

◀ From Mercurial to Git

Mini-Reviews #23 ▶