Deploy using git archive » andreas.heigl.org

When deploying code I by now almost always use git archive for that no matter whether that is library code or actually product code that gets deployed onto a (web)server.

Recently I had a chat with someone that so far hadn’t heard of that so I realized perhaps it’s time to write about it.

What is it

git archive is a tool that exports the content of a git repository into an archive file. But while doing so it also uses information in a .gitattributes file to decide whether to include a file into the archive or not and also whether to modify a file.

A lot of people already know about the .gitattributes file as it makes sure that certain files are removed from the archive that is created by github when creating a release.

The documentation for .gitattributes has (amongst a huge number of other information and awesome things that can be done via that file – but that’s for some other time) a special part about creating archives that talks about two attributes:

export-ignore

export-ignore removes files that are tracked in the git-repository but are not to be part of the distribution archive.

Why is that of interest? As a PHP-Developer when I deploy code I also run composer install --no-dev --prefer-dist as I want in my production setup no development tools but also I prefer the distribution ready code of my dependencies. And that does fetch the archive-files from the respective release and adds that to my vendor folder. By removing files that are not relevant for production from my archive, I can reduce the overall size of a deployment.

So what i usually add to my .gitattributes file is something like this:

tests export-ignore
phpunit.xml.dist export-ignore
phpstan.neon export-ignore
.editorconfig export-ignore
.gitattributes export-ignore
.gitignore export-ignore

All these files and folders with the export-ignore attribute will not be part of the archive that is created. And will therefore not deploy to production.

export-subst

The other attribute that we can set is export-subst. This took me quite some while to realize. But having come from SVN where substitutions where *a thing* that was easy to understand.

Whenever we run an export, git will replace certain placeholders in files with respective values. For that to work, the file with the placeholder will need to have the export-subst attribute set though.

Wait.. What?

Imagine you want to provide the user with the information which version they actually are using of your code. That is easy when they cloned your git repo as the information is readily available. But not so when they downloaded an archive from your repo. We loose that information.

Therefore it can make sense to have a file where you want to have the hash and the last commit-date along with the name of the last committer available. That way people know which version in git this archive maps.

Then you can add the following into a file (for example your README.md):

$Format:%h%$ - commited on $Format:%cD%$ by $Format:%cN%$

This will then on export be replaces with something like this:

fb235d2 - commited on Thu, 18 Jul 2024 22:25:22 +0200 by Andreas Heigl

For more infos on which formats can be used, check out the placeholders section of the pretty formats of the git log docs.

Caveat

One thing that bugs me extremely is that one can actually use $Format:%(describe)%$ which will then output either the tag associated with the current hash or the last hash, the number of commits since that last hash and the current short hash.

So it’s either 0.1.0 when the current commit is associated with a tag or it’s 0.1.0-2-g7265c97 meaning the current hash 7265c97 is 2 commits further than the last tag 0.1.0.

This is awesome! I can use that to actually add the release version to all my files on exporting! 🎉

Well. No! As its calculation might be resource-hungry (due to it having to check for previous commits) someone made the decission to only allow the (describe) placeholder to be replaced once.

Which means, if you want that in multiple files, you will have to come up with some really ugly bash-scripting code in your ci-pipeline when deploying. More on that in a moment. And it won’t work out of the box when using the archive that is automatically created by github or gitlab.

But all in all a pretty cool feature!

Deployments

But what has all that to do with deployments, you might ask yourself.

A lot!

When I deploy code I do not want to deploy unnecessary code. On the one hand to reduce the amount of transferred. On the other hand to reduce the amount of exploitable files.

git archive can help me with that as it allows me to specify which files I actually want to have in a distribution. So I can remove all the config files for CI tools, my complete test-folder and whatever else might be unnecessary directly from my production distribution by adding all those files to the .gitattributes file and then use git archive to get the files I want to deploy.

My deployment scripts usually looks something like this:

# Remove a possibly existing extraction folder
rm -rf extract
# No that we are sure it's not there, create an empty extraction folder
mkdir extract
# Create an archive from the repository based on the given tag
# and extract that into the just created extraction folder.
git archive --prefix="./" --format=tar ${CI_COMMIT_TAG} .| tar xv -C extract/
# Do some shell magic to replace occurrences of the string '%release-tag%' 
# with the current release tag in all files within the extraction folder
find extract/ -type f -exec sed -i "s/$release-tag%/:${CI_COMMIT_TAG}/" {} \;
# Move into the extraction folder
cd extract
# Call composer install to add all your dependencies, prefer the 
# distribution ones and create an authoritative and optimized autoloader
composer install --no-dev --prefer-dist -a
# Go back one level
cd ..
# Create the actual archive that you want to deploy
tar cvzf archive.tgz -C extract/ .
# clean up the extraction folder
rm -rf extract

Now I have a (hopefully) fully working production environment with replaced placeholders and all dependencies required for production in one archive! All that is left for me to do is move that archive to the production server and extract it there.

In most of my projects I either have a script deploy.sh that does all that along with actually moving the files onto the production server and finalize the deployment or I have a CI-pipeline that does that whenever I push a tag.

In either case I do not have to worry about “How do I do a deployment again”? I just tag and push. And then check whether there is a deploy.sh. If so I run that…

Pros

No unnecessary files on production (No phpunit for example)
small file to transfer to production
No git required on production (some people use it to check out there)
No composer required on production
Replace Placeholders with metadata from the git-archive
Replace tags within multiple files

Cons

requires a bit more work initially
… ? 🤷
what else?