Commit new content in a GitLab CI/CD pipeline the easy way

A brass medallion with an image of a traditional Cornish Piskie

A brass medallion with an image of a traditional Cornish Piskie.Photo by chrisinplymouth.

I really like using GitLab CI/CD, every time I push a commit to the cloud repo a magical Cornish piskie does the hard work for me (it must be magic, right?).

For example I use GitLab CI/CD to publish the blog you are reading.

But how to do you push new content in the CI pipeline back to the project repo? It’s easier than you might think.

In this post I’m going to assume you understand the basics of Git and GitLab CI/CD (I’m just going to say CI from know on). You should also note that this is written and tested on Docker containers executing on the GitLab cloud. If you use local runners you may need to use a different approach.

Usually when you build some content inside a CI runner it’s listed as one or more artifacts for use in later test or deployment jobs and there is no further need to preserve them in the git repo.

However, sometimes you do want to add or update content in the actual project repo (i.e. you want to commit changes generated in the pipeline back to the repo). For example release notes, Project README files, or images used in the documentation.

However the CI pipeline is not configured with write access to project repo, and a further complication is that the repo is cloned with a detached HEAD (HEAD refers to a commit, not a branch)

Your mileage may vary of course depending on your specific requirements, but at the very least this should give a starting point.

Setup

Before we can start pushing changes from our CI jobs some setup is required:

  1. Create a project access token (PAT). Follow this process to create a PAT, making sure it has the write_repository scope and the developer role.

    Note that from GitLab 16.0 all PATs will have an expiry date, so they will need to be refreshed regularly.

    Copy the PAT immediately as you cannot recover it later.

  2. Now use the value of the PAT to create a CI Project Variable. When creating the variable make sure “mask” is selected. Masking the PAT makes sure the value is not accidentally exposed and reduces any security risk.

    In this example I have assumed the variable is called ACCESS_TOKEN.

  3. Make sure that the developer role can commit to any protected branches that will be the target of your git-push during the CI job ($CI_DEFAULT_BRANCH, which is main, in this example).

Making file changes

The first thing you need in your CI script is a content creation job. We’ll do something trivial, i.e. append the current date and time to the repo README.

makechanges:
  image: busybox

  rules:
  - if: $CI_COMMIT_BRANCH                    == $CI_DEFAULT_BRANCH ||
        $CI_MERGE_REQUEST_TARGET_BRANCH_NAME == $CI_DEFAULT_BRANCH

  artifacts:
    paths:   # You must list all files you want committed to the project repo
    - README.md
    expire_in: 5 minutes # They don't need to hang around long

  script:
  - printf "$(date)  \n" >> README.md

The important thing to note is that you have to explicitly pass the modified files as artifacts to the next job. Depending on your project that may not be ideal and we’ll come back to alternatives later.

Doing the actual commit

We need a Docker image with Git installed. Alpine provides a handy prebuilt image, but the ENTRYPOINT needs to be overwritten.

updaterepo:
  image:
    name: alpine/git:latest
    entrypoint: [""] # Alpine/git image needs entrypoint to be overwritten

Before doing any real work it’s worth checking of there are any changes that need be committed

  script:
  - '[ -z "$(git status --porcelain)" ] && exit 0'

(If there are no untracked files and no modified files then git status --porcelain prints an empty string, and the script exists)

Once we know we need to commit something the first “trick” is to create a new git remote that provides write access to our project repo. On GitLab remotes can use HTTP Basic Authentication, where a username and password are included in the URL. Such a URL looks like this:

https://username:password@host/path/

As GitLab provides all the values needed to form the URL we don’t have to hard code any part of the URL into the script.

Further, in GitLab the username can be any string (must be at least length one, I use the value of CI_PROJECT_NAME just to be tidy. The password should be the PAT, contained in the CI variable ACCESS_TOKEN (see above). Setting the remote becomes:

  - git remote set-url project_repo \
    https://${CI_PROJECT_NAME}:${ACCESS_TOKEN}@${CI_SERVER_HOST}/${CI_PROJECT_NAMESPACE}/${CI_PROJECT_NAME}.git

We should also make sure the user email and name are correct:

  - git config user.email "${GITLAB_USER_EMAIL}"
  - git config user.name "${GITLAB_USER_NAME}"

This sets up the identity of the user running the CI pipeline. If you want to be extra fancy you can set it to the identity of the commit author instead (they are probably the same person most of the time):

  - git config user.email $(echo ${CI_COMMIT_AUTHOR} | sed -Ee 's/^[^<]+<([^>]*)>$/\1/')
  - git config user.name "${CI_COMMIT_AUTHOR%<*}"

Now we can add and commit any changes.

  - git add . # Any new changes, i.e. Any artifacts from previous jobs
  - git commit --no-verify --message "Commit via a GitLab CI job"

Note that I use the --no-verify to skip hook scripts, as running hooks inside a CI job usually doesn’t make sense. Of course by default hook scripts are not replicated when a repo is cloned, but I like to be sure as technically it would be possible to set up hooks in a previous part of the CI job (I might explain how in a future post).

The result of that is:

$ git commit --no-verify --message "Commit via a GitLab CI job"
[detached HEAD f54fe80] Update changed content via a GitLab CI job
 1 file changed, 1 insertion(+), 1 deletion(-)

If you look at the job log above you’ll notice the commit results in a detached head. If you don’t know what that is, don’t worry about it for now but it’s the result of the way Gitlab clones the repo, viz:

Initialized empty Git repository in /builds/alecthegeek/git-in-gitlab-ci/.git/
Created fresh repository.
Checking out 65ff8728 as detached HEAD (ref is main)...

When we push we need to make sure that the new local detached commit (represented by HEAD) is applied to the tip of the default branch in the repo with HEAD:$CI_DEFAULT_BRANCH (but change the destination branch if needed by your project).

  - git push --push-option=ci.skip --no-verify
      origin HEAD:$CI_DEFAULT_BRANCH

The other important thing to notice is --push-option=ci.skip which stops the CI pipeline running again when we update the project repo via the CI pipeline job.

And finally we are skipping any hook scripts again (--no-verify).

Other Ways to Organise the Pipeline

The solution above is neat and tidy, and if it’s all your pipeline does then the approach works well. You can just include the job script (for the second job) from another repo (for example https://gitlab.com/alecthegeek/git-in-gitlab-ci/-/blob/main/.gitlab-ci-updaterepo.yml) – making sure the CI variable is named correctly.

However your CI pipeline may be generating many other artifacts that you don’t want committed. Instead they are used during subsequent test and deploy jobs, and so still need be listed as artifacts.

The simplest approach is to just add the Git logic into single job that’s creating your content, but you do need to make sure that Git is available in the image used to run the job (because you are running Git in the script section). This is certainly an option, but you will need to manage this additional image.

As GitLab supports runnings Docker in Docker, there is a third option.

In this approach the job is run in a the official docker:git image. Furthermore the Docker in Docker service is enabled.

This means that any other image, including the image needed to make file updates, can be run in the same job. The Git working directory is shared between the two containers.

No artifacts are created, because it all happens in one job; and no additional images need to be managed.

Here is an example:

include:
- project: alecthegeek/git-in-gitlab-ci
  file: .gitlab-ci-git-commit.yml

makechanges:

  image: docker:git

  services:
  - docker:dind

  variables:
    RUN_IMAGE: busybox  
    FF_NETWORK_PER_BUILD: "true"
    RUN_CMD: 'docker run --rm --mount "type=bind,src=$(pwd),dst=$(pwd)" --workdir "$(pwd)" --network=host ${RUN_IMAGE}'

  before_script:
  - !reference [.git_commit_and_push, script]

  script:
  # Make a change by running a docker image
  - ${RUNCMD} printf "$(date)  \n" >> README.md
  # Commit and push the change using function defined in include file
  - git_commit_and_push

Finally

  • Avoid unnecessary commits to the repo by judicious use of a rules section so that the job only runs as needed. As this will be very specific to your project I have only a simple example:

      rules:
      - if: $CI_COMMIT_BRANCH                    == $CI_DEFAULT_BRANCH ||
            $CI_MERGE_REQUEST_TARGET_BRANCH_NAME == $CI_DEFAULT_BRANCH
    

To make your life easier I have set up a repo with all this content and an example job. Feel free to clone

https://gitlab.com/alecthegeek/git-in-gitlab-ci/

comments powered by Disqus