CI/CD pipeline notes: It starts with requirements

Martin Fowler explains the concepts of Continuous Integration (CI) and Continuous Delivery (CD) in two articles, Continuous Integration and Continuous Delivery (*). These articles already contains some hints towards requirements and with this blog post I want to provide an overview of the requirements I have identified for our CI/CD pipeline.
If you visualize the CI/CD process, you will see that it involves several main topics. You also notice that a CI/CD pipeline is not a straight line with a begin and an end. It is a continuous and cyclic process in which all aspects have a 'continuous' character.

(*)
Do not confuse Continuous Delivery with Continuous Deployment. Continuous Deployment goes one step further. In Continuous Deployment all changes that passes the (automated) tests are also automatically deployed to production.

The requirements of a CD/CD pipeline covers all of these aspects. It is up to you, which requirements you find important. Articles on the Internet usually highlight the most obvious ones.

In my pursue to gather a ' complete' set of requirements, I came up with the list below. Some of the topics are still underexposed. For example, we looked into automating the 'Design' and came up with tools such as 'CA Agile Requirements Designer', but ultimately decided to focus on other automation topics. Also 'Operate' deserves a bit more attention. We already implemented the ITIL processes and supporting tools, but it was difficult to define more day-to-day requirements for 'Continuous Operation' (e.g. think of automated server restarts after an incident).

The requirements list so far:

General requirements

Make the CI/CD Pipeline the Only Way to deploy to Production
CI/CD Pipeline as code: With “CI/CD pipeline as Code”, your code, your automation and your orchestration are commited to the source code management (e.g. Git). The pipeline has the exact same versioned lifecycle, helping you to ensure long term maintainability
The CI/CD Pipeline must be high available, because it must be possible to create and deploy fixes 24x7
Use the same mechanism to deploy to every environment, incl. production
Use short feedback loops - break the process as soon as a certain quality threshold is not met
Make small releases
Release often

Design

Designs are versioned
Create workitems that refer to the design (e.g. using Jira, VSTS, ...)

Code

Keep source code in a Git repository (e.g. Bitbucket)

Build

Webhooks are used to automatically start a build process after a commit/push
Keep binaries (dependencies) and artefacts in an artefact repository (e.g. Nexus, Artifactory)
Build artefacts are automatically uploaded into an artefact repository
Release artefacts are build only once
Builds are made by means of a CI tool (e.g. Jenkins)

Build Automation QA

Perform static code analysis using Sonar, Fortify and Sonatype CLM
Unit tests (Junit, Cucumber) are performed as part of the build automation
Integration tests are (partly) performed as part of the build automation (e.g. using Docker)
Code coverage must be defined; setup a Quality Gate to fail builds if test coverage drops below a certain threshold
Sonar and Fortify checks may not contain high/critical defects

Provision

The CI/CD pipeline is stored as code; the CI/CD pipeline can be recreated in an acceptable timeframe
Tests environment are created on-demand (e.g. by means of Ansible); infrastructure is configured (infrastructure-as-code)
Request for a testenvironment including middleware and connections is done by means of REST API’s
(Test) Environments can be used for both (very) short and longer terms

Deploy

Source of a deployment is an artefact from the artefact repository
Deploy an application by means of a deployment tool (e.g. XL Deploy, Cloud Foundry CLI, UrbanCode Deploy)
Deploy database scripts by means of a deployment tool
Stubs and drivers are deployed by means of a deployment tool
Automatically rollback a deployment gone bad
Implement blue/green deployment to support zero downtime deployment (ZDD)

Test

System – and regression tests are automated
User acceptance tests are automated (e.g. UFT, Cucumber, Postman/Newman, SoapUI)
Load and stress tests are automated
Penetration tests are automated (e.g. OWASP ZAP)
Automated tests are triggered/initiated from the CI/CD pipeline
Drivers and stubs, needed for different test environments, are developed in parallel with the development of a new feature
Tests are reusable (for next testcycles/regression)
Test can be executed remotely
Tests, testdata, drivers and stubs are versioned (e.g. in Git)
Deployment to production is only possible when all OTA tests are succesfully passed; automatically check whether the version in Production is not higher than the highest version in OTA

Release

Release management is automated (e.g. XL Release, VSTS)
Release builds can be promoted; this depends on the workflow. If a workflow only builds release artefacts, the artefact repository should have a staging function
Releases are started and registered in an orchestration tool (e.g. XL Release, VSTS)

Operate

Maintenance scripts are versioned in Git

Monitor

The CI/CD pipeline is monitored (e.g. Insufficient diskspace is automatically detected and the team is informed)
The CI/CD pipeline automatically restarts in case the server restarts
Standard monitoring tools are used (e.g. Splunk)
Monitor the application in production (e.g. use Dynatrace, Splunk, Oracle Enterprise Manager). Monitor:

performance
number of request per second
throughput
CPU usage (application server, database)
Memory usage

Monitoring results are spread amongst the devops team, so they can judge what to improve (e.g. by means of a dashboard)

Orchestration

All steps in the CI/CD pipeline are orchestrated by the appropriate tool; (e.g. overall orchestration by VSTS or XL Release in combination with Jenkins)

Security

Make sure two or more reviewers approved a Pull Request, and there are no failing builds or test runs associated with that commit
Do not allow manual upload of artefacts in the deployment tool
Uploading artefacts in the artefact repository is done by means of a non-personal account
Deployments are done by means of a non-personal account
Refine access to specific features by setting permissions for a user or group:

Release OTA
Release Prod
Approve critical steps
SCM; arrange permissions in such a way that only certain people can make changes in specific repos and/or branches – and such that nobody can make changes directly in production repo's
Dahsboards
Define team admin role

Store sensitive information (such as database credentials) in a vault or an encrypted dictionary (eg. CyberArk, XL Deploy dictionary, CredHub, Hashicorp Vault)
LDAP access to tools (e.g. Jenkins) is mandatory
Tools must have fine-grained/matrix authorization
User communication with tools is by means of HTTPS (TLS with server side authentication)
Non-personal accounts, SSH key pairs and TLS with mutual authentication are used to communicate between tools
Only grant accounts temporary (high) access to start deployment to a Production environment
Use a 4-eyes principle to grant accounts higher access
Higher authorizations are delegated to selected members of a devops team, who issue underlying rights to team members
Audit trail: Automatically log build, test, and deploy results. Then complete the circle by linking in commits, reviews, and, issue
Traceability: Make sure code changes, code reviews, build results, deploys, and issues are all linked together (as in, you can get from one to the other with a single mouse click). Track:

Designs - a design must be linked to a workitem
Workitems; epic, story or task (e.g. Jira workitems) - a workitem is associated with a Git feature branche / code commit hash
Git commit - associated with the committer (developer)
Pull request and approvals by reviewers - are associated with a Git feature branche / commit hash
Buildjob and buildnumber - are associated with a Git feature branche / commit hash
Artefact - associated with a Git feature branche / commit hash
Release version/identification (release version preferably is the Git tag), including approval - associated with an artefact in the artefact repository
Artefact in production - associated with an artefact in the artefact repository

Metrics, reporting, aggregation and notification

Metrics must be made available; provide a dashboard to see the throughput of each feature and queues or bottlenecks in the process
Service descriptions are automatically generated from code and uploaded to a service repository; this can also be something like Confluence
Testreports are automatically generated
Testreports are aggregated in one dashboard (e.g. Hygieia)
Testresults are automatically fed into an issuetracker (e.g. Jira)
Developers are informed (by means of email):

of a breaking feature branch build (only the committer receives the mail to prevent email overloading)
of a successful feature branch build (only the committer receives the mail to prevent email overloading)
of any breaking release or snapshot build (all developers)
of a successful release or snapshot build (all developers)
of a deployment to production (stakeholders to be determined by the team)

Governance

The devops team keeps track of CI/CD development by means of a scorecard
The devops team actively shares successes and knowledge with other project teams

CI/CD pipeline notes

It starts with requirements