Continuous integration
NOTE: Supported
SOC2/GN-105
andSOC2/GN-106
Sourcegraph uses a continuous integration (CI) and delivery tool, Buildkite, to help ensure a consistent build, test and deploy process. Software changes are systematically required to complete all steps within the continuous integration tool workflow prior to production deployment, in addition to being peer reviewed.
Sourcegraph also maintains a variety of tooling on GitHub Actions for continuous integration and repository maintainence purposes.
NOTE: To learn more about testing in particular, see our testing principles.
Buildkite pipelines
An introduction to Sourcegraph's Buildkite pipelines.
Development
Contribute changes to Sourcegraph's Buildkite pipelines.
Run sg ci docs
to see documentation about the CI pipeline steps.
Buildkite pipelines
Tests are automatically run in our various Buildkite pipelines when you push your changes (i.e. when you run git push
) to the sourcegraph/sourcegraph
GitHub repository.
Pipeline steps are generated on the fly using the pipeline generator - a complete reference of all available pipeline types and steps is available from sg ci docs
. To keep the repository tidy, consider deleting the branch after the pipeline has completed. The build results will be available even after the branch is deleted.
To see what checks will get run against your current branch, use sg
:
sg ci preview
You can also request builds manually for your builds using sg ci build
. You'll find below a summary video about some useful sg ci *
commands, to learn how move fast when interacting with the CI:
To learn about making changes to our Buildkite pipelines, see Pipeline development.
Pipeline steps
Soft failuresSOC2/GN-106
Many steps in Sourcegraph's Buildkite pipelines allow for soft failures, which means that even if they fail they do not cause the entire build to be failed.
In the Buildkite UI, soft failures currently look like the following, with a triangular warning sign (not to be mistaken for a hard failure!):
We use soft failures for the following reasons only:
- Steps that determine whether a subsequent step should run, where soft failures are the only technical way to communicate that a later step should be skipped in this manner using Buildkite.
- Regular analysis tasks, where soft failures serve as an monitoring indicator to warn the team responsible for fixing issues.
- Examples: image vulnerability scanning, linting tasks for catching deprecation warnings
- Temporary exceptions to accommodate experimental or in-progress work.
You can find all usages of soft failures with the following queries:
All other failures are hard failures.
Image vulnerability scanning
Our CI pipeline scans uses Trivy to scan our Docker images for security vulnerabilities.
Refer to sg ci docs
to see what pipelines Trivy checks run in.
If there are any HIGH
or CRITICAL
severities in a Docker image that have a known fix:
- The CI pipeline will create an annotation that contains links to reports that describe the vulnerabilities
- The Trivy scanning step will soft fail. Note that soft failures do not fail builds or block deployments. They simply highlight the failing step for further analysis.
NOTE: Our vulnerability management process (including this workflow) is under active development and in its early stages. All of the above is subject to change. See https://github.com/sourcegraph/sourcegraph/pull/25756 for more context.
We also run separate vulnerability scans for our infrastructure.
Pipeline health
Maintaining Buildkite pipeline health is a critical part of ensuring we ship a stable product—changes that make it to the main
branch may be deployed to various Sourcegraph instances, and having a reliable and predictable pipeline is crucial to ensuring bugs do not make it to production environments.
To enable this, we address flakes as they arise and mitigate the impacts of pipeline instability with branch locks.
NOTE: Sourcegraph teammates should refer to the CI incidents playbook for help managing issues with pipeline health.
Branch locks
WARNING: A red
main
build is not okay and must be fixed. Learn more about ourmain
branch policy in Testing principles: Failures on themain
branch.
buildchecker
is a tool responding to periods of consecutive build failures on the main
branch Sourcegraph Buildkite pipeline. If it detects a series of failures on the main
branch, merges to main
will be restricted to members of the Sourcegraph team who authored the failing commits until the issue is resolved—this is referred to as a "branch lock". When a build passes on main
again, buildchecker
will automatically unlock the branch.
Authors of the most recent failed builds are responsible for investigating failures. Please refer to the Continuous integration playbook for step-by-step guides on what to do in various scenarios.
Flakes
A flake is defined as a test or script that is unreliable or non-deterministic, i.e. it exhibits both a passing and a failing result with the same code. In other words: something that sometimes fails, but if you retry it enough times, it passes, eventually.
Tests are not the only thing that are flaky—flakes can also encompass sporadic infrastructure issues and unreliable steps.
Flaky tests
WARNING: We do not tolerate flaky tests of any kind. Learn more about our flaky test policy in Testing principles: Flaky tests.
Typical reasons why a test may be flaky:
- Race conditions or timing issues
- Caching or inconsistent state between tests
- Unreliable test infrastructure (such as CI)
- Reliance on third-party services that are inconsistent
If a flaky test is discovered:
-
Immediately use language-specific functionality to skip a test and open a PR to disable the test:
- Go:
testing.T.Skip
- Typescript:
.skip()
If the language or framework allows for a skip reason, include a link to the issue track re-enabling the test, or leave a docstring with a link.
- Go:
-
Open an issue to investigate the flaky test (use the flaky test issue template), and assign it to the most likely owner.
Flaky steps
If a step is flaky we need to get the build back to reliable as soon as possible. If there is not already a discussion in #buildkite-main
create one and link what step you take. Here are the recommended approaches in order:
- Revert the PR if a recent change introduced the instability. Ping author.
- Use
Skip
StepOpt when creating the step. Include reason and a link to context. This will still show the step on builds so we don't forget about it.
An example use of Skip
:
--- a/dev/ci/internal/ci/operations.go
+++ b/dev/ci/internal/ci/operations.go
@@ -260,7 +260,9 @@ func addGoBuild(pipeline *bk.Pipeline) {
func addDockerfileLint(pipeline *bk.Pipeline) {
pipeline.AddStep(":docker: Lint",
bk.Cmd("go run ./dev/sg lint -annotations docker"),
+ bk.Skip("2021-09-29 example message https://github.com/sourcegraph/sourcegraph/issues/123"),
)
}
NOTE: If it's hard to make sure that the flake is fixed, another approach is to monitor the step wihout breaking the build, see How to allow a CI step to fail without breaking the build and still receive a notification.
Assessing flaky client steps
See more information on how to assess flaky client steps here.
Flaky infrastructure
If the build or test infrastructure itself is flaky, then open an issue with the team/devx
label and notify the Developer Experience team.
Also see Buildkite infrastructure.
Flaky linters
Linters are all run through [sg lint
], with linters defined in dev/sg/linters
.
If a linter is flaky, you can modify the dev/sg/linters
package to disable the specific linter (or entire category of linters) with the Enabled: disabled(...)
helper:
{
Name: "svg",
Description: "Check svg assets",
+ Enabled: disabled("reported as unreliable"),
Checks: []*linter{
checkSVGCompression(),
},
},
Pipeline development
See Pipeline development to get started with contributing to our Buildkite pipelines!
Deployment notifications
When a pull request is deployed, an automated notification will be posted in #deployments-cloud. Notifications include a list of the pull-request that were shipped as well as a list of which services specifically were rolled out.
If you want to be explictly notified (through a Slack ping) when your pull request reaches dotcom production, add the label notify-on-deploy
.
GitHub Actions
buildchecker
buildchecker
, our branch lock management tool, runs in GitHub actions—see the workflow specification.
To learn more about buildchecker
, refer to the buildchecker
source code and documentation.
pr-auditor
pr-auditor
, our PR audit tool, runs in GitHub actions—see the workflow specification.
To learn more about pr-auditor
, refer to the pr-auditor
source code and documentation.
Third-party licenses
We use the license_finder
tool to check third-party dependencies for their licenses. It runs as a GitHub Action on pull requests, which will fail if one of the following occur:
- If the license for a dependency cannot be inferred. To resolve:
- Use
license_finder licenses add <dep> <license>
to set the license manually
- Use
- If the license for a new or updated dependency is not on the list of approved licenses. To resolve, either:
- Remove the dependency
- Use
license_finder ignored_dependencies add <dep> --why="Some reason"
to ignore it - Use
license_finder permitted_licenses add <license> --why="Some reason"
to allow the offending license
The license_finder
tool can be installed using gem install license_finder
. You can run the script locally using:
# updates ThirdPartyLicenses.csv
./dev/licenses.sh
# runs the same check as the one used in CI, returning status 1
# if there are any unapproved dependencies ('action items')
LICENSE_CHECK=true ./dev/licenses.sh
The ./dev/licenses.sh
script will also output some license_finder
configuration for debugging purposes—this configuration is based on the doc/dependency_decisions.yml
file, which tracks decisions made about licenses and dependencies.
For more details, refer to the license_finder
documentation.