For our day to day product deployment, we use docker containers. Whenever a new piece of code is being shipped to production, our CI/CD process creates several docker images and pushes them to our private registry – standard deployment process.
In the spirit of “shifting left”, we wanted to scan our Docker images and the 3rd party dockers we use in our architecture as a part of our pipeline. We want to prevent merging to master if they contain any critical vulnerabilities. We decided to try the open source clair-scanner which is based on Clair 'analyze-local-images' by CoreOS for the mission.
In our environment we use Gitlab as our code repository and CI.
Gitlab Ultimate provides an integration with clair-scanner results. When merging branches to master it analyzes the vulnerabilities report and displays the results in the merge request.
Gitlab also recommends running the scanner inside a docker executer that attaches to the hosts docker socket.
Please refer to the following articles for more info about integrating container scanning in Gitlab CI:
Both of the above presented an obstacle for us, since we use Gitlab shell executers and don’t own Ultimate license.
In addition, we wanted to be more proactive about the vulnerabilities found.
Here is how we implemented a solution that would not require upgrading our license or changing executer from shell to docker which would create an issue for each vulnerable image with the list of vulnerabilities for our team to analyze.
What needs to be done:
- Run arminc/clair-db and arminc/clair-local-scan:v2.0.1 containers linked one to another
- Download the latest script for Clair scanner
- Run the script to scan the desired images
- Report an issue in Gitlab in case of vulnerabilities found
To avoid the overhead of starting clair bundle (scanner and DB) and waiting for it to “warm up” during each CI execution we decided to run the bundle continuously on the executers.
The issues we needed to tackle:
- When using docker run command to start Clair scanner we’ve noticed that it created numerous ‘git-remote-http’ processes without killing them. This ended us up with a bunch of zombie processes on the CI executer, until the container was restarted.
- Clair DB image is frequently updated and needs to be pulled and restarted. But there is no reason to recreate it if it wasn’t updated.
Docker-compose to the rescue!
To overcome both of the above we’ve decided to run the scanner and DB using docker-compose, instead of utilizing maintenance scripts or adding logic to the docker run commands.
To fix the first issue we’ve added ‘init: true’, which forwards signals and reaps processes.
Please refer to this issue to find more info about zombie processes in CoreOS Clair.
For the second issue we run docker-compose pull and then docker-compose up, which will only recreate the containers in case they are newer than the current ones running or will just pull latest versions and run them, in case they are not yet running.
The docker-compose file is quite simple, with the addition of init to clair scanner:
TAKE ME TO YOUR SCANNER!
Gitlab CI uses a .yml file to define the stages. Our first step was to add the “container-scanning” as a new stage in the process.
Then we created the stage itself:
Let’s break it down a bit:
What this does is limit this stage to run only on master –
- Copies the folder that holds essential files for clair docker-compose (docker-compose.yml and any env files you may want to use with it) to /opt/
- Pulls the newest versions of images specified in docker-compose.yml
- Executes docker-compose up to boot/update the docker containers for Clair scanner and DB
- Downloads the scanner script
- Gives the script executable permissions
- Searches for images in docker-compose.yml, which defines our applications’ architecture (this is not the same as the clair docker-compose.yml) and passes each image as an argument to our custom script (scan-docker.sh). This script calls clair-scanner executable from the previous stage, parses the output and automatically opens/updates Gitlab tickets based on the results.
The whole process is quite simple and quick overall.
The real magic is in our scan_docker.sh script, (I’ll replace most of the code with pseudo code for easier reading but keep the Clair parts intact).
Feel free to contact us for the full code.
The first thing we do is run the scan and save the output. We also keep the JSON file to be later attached to the Gitlab ticket.
vuln_file_name=$(basename $IMAGE | sed 's/:/_/g')
echo "Scanning" $IMAGE
We then take the output, parse and strip from formatting (like shell colors) and any unneeded text.
According to the result, we decide what we want to do:
if [ "any vulnerabilities found" ]; then
To submit or update an issue we use Gitlab API
Voila! A totally free vulnerability scanner integrated in our pipeline, with reporting capabilities!