Developing batch changes
Getting started
Welcome, new batch change developer! This section will give you a rough overview of what Batch Changes is and how it works.
NOTE: Never hesitate to ask in
#batch-changes-internal
for help!
What are batch changes?
Before diving into the technical part of batch changes, make sure to read up on what batch changes are, what they're not and what we want them to be:
- Look at the batch changes product page.
- Watch the 2min demo video
Next: create your first batch change!
Creating a batch change locally
NOTE: Make sure your local development environment is set up by going through the "Getting started" guide.
- Since Batch Changes is an enterprise-only feature, make sure to start your local environment with
sg start
(which defaults tosg start enterprise
). - Go through the Quickstart for Batch Changes to create a batch change in your local environment. See "Testing repositories" for a list of repositories in which you can safely publish changesets.
- Now combine what you just did with some background information by reading the following:
Code walkthrough
To give you a rough overview where each part of the code lives, let's take a look at which code gets executed when you
- run
src batch preview -f your-batch-spec.yaml
- click on the preview link
- click Apply to publish changesets on the code hosts
It starts in src-cli
:
src batch preview
starts the "preview" command insrc-cli
- That executes your batch spec, which means it parses it, validates it, resolves the namespace, prepares the docker images, and checks which workspaces are required
- Then, for each repository (or workspace in each repository), it runs the
steps
in the batch spec by downloading a repository archive, creating a workspace in which to execute thesteps
, and then starting the Docker containers. - If changes were produced in a repository, these changes are turned into a
ChangesetSpec
(a specification of what a changeset should look like on the code host—title, body, commit, etc.) and uploaded to the Sourcegraph instance src batch preview
's last step is then to create aBatchSpec
on the Sourcegraph instance, which is a collection of theChangesetSpec
s that you can then preview or apply
When you then click the "Preview the batch change" link that src-cli
printed, you'll land on the preview page in the web frontend:
- The
BatchChangePreviewPage
component then sends a GraphQL request to the backend to query theBatchSpecByID
. - Once you hit the Apply button, the component uses the
applyBatchChange
to apply the batch spec and create a batch change. - You're then redirected to the
BatchChangeDetailsPage
component that shows you you're newly-created batch change.
In the backend, all Batch Changes related GraphQL queries and mutations start in the Resolver
package:
- The
CreateChangesetSpec
andCreateBatchSpec
mutations thatsrc-cli
called to create the changeset and batch specs are defined here. - When you clicked Apply the
ApplyBatchChange
resolver was executed to create the batch change. - Most of that doesn't happen in the resolver layer, but in the service layer: here is the
(*Service).ApplyBatchChange
method that talks to the database to create an entry in thebatch_changes
table. - The most important thing that happens in
(*Service).ApplyBatchChange
is that it calls therewirer
to wire the entries in thechangesets
table to the correctchangeset_specs
. - Once that is done, the
changesets
are created or updated to point to the newchangeset_specs
that you created withsrc-cli
.
After that you can look at your new batch change in the UI while the rest happens asynchronously in the background:
- In a background process (which is started in (
enterprise/cmd/repo-updater
](https://github.com/sourcegraph/sourcegraph/blob/e7f26c0d7bc965892669a5fc9835ec65211943aa/enterprise/cmd/repo-updater/main.go#L58)) aworker
is running that monitors thechangesets
the table. - Once a
changeset
has been rewired to a newchangeset_spec
and reset, this worker, called theReconciler
, fetches the changeset from the database and "reconciles" its current state (not published yet) with its desired state ("published on code host X, with this diff, that title and this body") - To do that, the
Reconciler
looks at the changeset's current and previousChangesetSpec
to determine a plan for what it should do ("publish", "push a commit", "update title", etc.) - Once it has the plan, it hands over to the
Executor
which executes the plan. - To push a commit to the code host, the
Executor
sends a request to thegitserver
service - To create or update a pull request or merge request on the code host it builds a
ChangesetSource
which is a wrapper around the GitHub, Bitbucket Server, Bitbucket Data Center and GitLab HTTP clients.
While that is going on in the background the BatchChangeDetailsPage
component is polling the GraphQL to get the current state of the Batch Change and its changesets.
Once all instances of the Reconciler
worker are done determining plans and executing them, you'll see that your changesets have been published on the code hosts.
Glossary
Batch changes introduce a lot of new names, GraphQL queries & mutations, and database tables. This section tries to explain the most common names and provide a mapping between the GraphQL types and their internal counterpart in the Go backend.
GraphQL type | Go type | Database table | Description |
---|---|---|---|
Changeset | batches.Changeset | changesets | A changeset is a generic abstraction for pull requests and merge requests. |
BatchChange | batches.BatchChange | batch_changes | A batch change is a collection of changesets. The central entity. |
BatchSpec | batches.BatchSpec | batch_specs | A batch spec describes the desired state of a single batch change. |
ChangesetSpec | batches.ChangesetSpec | changeset_specs | A changeset spec describes the desired state of a single changeset. |
ExternalChangeset | batches.Changeset | changesets | Changeset is the unified name for pull requests/merge requests/etc. on code hosts. |
ChangesetEvent | batches.ChangesetEvent | changeset_events | A changeset event is an event on a code host, e.g. a comment or a review on a pull request on GitHub. They are created by syncing the changesets from the code host on a regular basis and by accepting webhook events and turning them into changeset events. |
Structure of the Go backend code
The following is a list of Go packages in the sourcegraph/sourcegraph
repository and short explanations of what each package does:
-
enterprise/internal/batches/types
:Type definitions of common
batches
types, such asBatchChange
,BatchSpec
,Changeset
, etc. A few helper functions and methods, but no real business logic. -
enterprise/internal/batches
:The hook
InitBackgroundJobs
injects Batch Changes code intoenterprise/repo-updater
. This is the "glue" in "glue code". -
enterprise/internal/batches/background
Another bit of glue code that starts background goroutines: the changeset reconciler, the stuck-reconciler resetter, the old-changeset-spec expirer.
-
enterprise/internal/batches/rewirer
:The
ChangesetRewirer
maps existing/new changesets to the matchingChangesetSpecs
when a user applies a batch spec. -
enterprise/internal/batches/state
:All the logic concerned with calculating a changesets state at a given point in time, taking into account its current state, past events synced from regular code host APIs, and events received via webhooks.
-
enterprise/internal/batches/search
:Parsing text-field input for changeset searches and turning them into database-queryable structures.
-
enterprise/internal/batches/search/syntax
:The old Sourcegraph-search-query parser we inherited from the search team a week or two back (the plan is not to keep it, but switch to the new one when we have time)
-
cmd/frontend/internal/batches/resolvers
:The GraphQL resolvers that are injected into the
enterprise/frontend
incmd/frontend/internal/batches/init.go
. They mostly concern themselves with input/argument parsing/validation, (bulk-)reading (and paginating) from the database via thebatches/store
, but delegate most business logic tobatches/service
. -
cmd/frontend/internal/batches/resolvers/apitest
:A package that helps with testing the resolvers by defining types that match the GraphQL schema.
-
enterprise/internal/batches/testing
:Common testing helpers we use across
enterprise/internal/batches/*
to create test data in the database, verify test output, etc. -
enterprise/internal/batches/reconciler
:The
reconciler
is what gets kicked off by theworkerutil.Worker
initialised inbatches/background
when achangeset
is enqueued. It's the heart of the declarative model of batches: compares changeset specs, creates execution plans, executes those. -
enterprise/internal/batches/syncer
:This contains everything related to "sync changeset data from the code host to sourcegraph". The
Syncer
is started in the background, keeps state in memory (rate limit per external service), and syncs changesets either periodically (according to heuristics) or when directly enqueued from theresolvers
. -
enterprise/internal/batches/service
:This is what's often called the "service layer" in web architectures and contains a lot of the business logic: creating a batch change and validating whether the user can create one, applying new batch specs, calling the
rewirer
, deleting batch changes, closing batch changes, etc. -
cmd/frontend/internal/batches/webhooks
:These
webhooks
endpoints are injected byInitFrontend
into thefrontend
and implement thecmd/frontend/webhooks
interfaces. -
enterprise/internal/batches/store
:This is the batch changes
Store
that takesenterprise/internal/batches/types
types and writes/reads them to/from the database. This contains everything related to SQL and database persistence, even some complex business logic queries. -
enterprise/internal/batches/sources
:This package contains the abstraction layer of code host APIs that live in
internal/extsvc/*
. It provides a generalized interfaceChangesetSource
and implementations for each of our supported code hosts.
Diving into the code as a backend developer
- Read through
./cmd/frontend/graphqlbackend/batches.go
to get an overview of the batch changes GraphQL API. - Read through
./enterprise/internal/batches/types/*.go
to see all batch changes related type definitions. - Compare that with the GraphQL definitions in
./cmd/frontend/graphqlbackend/batches.graphql
. - Start reading through
./enterprise/internal/batches/resolvers/resolver.go
to see how the main mutations are implemented (look atCreateBatchChange
andApplyBatchChange
to see how the two main operations are implemented). - Then start from the other end,
enterprise/cmd/repo-updater/main.go
.enterpriseInit()
creates two sets of batch change goroutines: batches.NewSyncRegistry
creates a pool of syncers to pull changes from code hosts.batches.RunWorkers
creates a set of reconciler workers to push changes to code hosts as batch changes are applied.
Testing repositories
Batch changes create changesets (PRs) on code hosts. For testing Batch Changes locally we recommend to use the following repositories:
- The sourcegraph-testing GitHub organization contains testing repositories in which you can open pull requests.
- We have an
automation-testing
repository that exists on Github, Bitbucket Server, and GitLab - The GitHub user
sd9
was specifically created to be used for testing Batch Changes. See "GitHub testing account" for details.
If you're lacking permissions to publish changesets in one of these repositories, feel free to reach out to a team member.
GitHub testing account
To use the sd9
GitHub testing account:
- Find the GitHub
sd9
user in 1Password - Copy the
Campaigns Testing Token
- Change your
dev-private/enterprise/dev/external-services-config.json
to only contain a GitHub config with the token, like this:
{
"GITHUB": [
{
"authorization": {},
"url": "https://github.com",
"token": "<TOKEN>",
"repositoryQuery": ["affiliated"]
}
]
}
Batch Spec examples
Take a look at the following links to see some examples of batch changes and the batch specs that produced them:
Server-side execution
Database tables
There are currently (Sept '21) four tables at the heart of the server-side execution of batch specs:
batch_specs
. These are the batch_specs
we already have, but in server-side mode they are created through a special mutation that also creates a batch_spec_resolution_job
, see below.
batch_spec_resolution_jobs
. These are worker jobs that are created through the GraphQL when a user wants to kick of a server-side execution. Once a batch_spec_resolution_job
is created a worker will pick them up, load the corresponding batch_spec
and resolve its on
part into RepoWorkspaces
: a combination of repository, commit, path, steps, branch, etc. For each RepoWorkspace
they create a batch_spec_workspace
in the database.
batch_spec_workspace
. Each batch_spec_workspace
represents a unit of work for a src batch exec
invocation inside the executor. Once src batch exec
has successfully executed, these batch_spec_workspaces
will contain references to changeset_specs
and those in turn will be updated to point to the batch_spec
that kicked all of this off.
batch_spec_workspace_execution_jobs
. These are the worker jobs that get picked up the executor and lead to src batch exec
being called. Each batch_spec_workspace_execution_job
points to one batch_spec_workspace
. This extra table lets us separate the workspace data from the execution of src batch exec
. Separation of these two tables is the result of us running into tricky concurrency problems where workers were modifying table rows that the GraphQL layer was reading (or even modifying).
Here's a diagram of their relationship: