Skip to content

Commit 0251359

Browse files
authored
Merge pull request #792 from jieyouxu/bors-firefighting
bors: document how to properly resync bors queue
2 parents 10d9cd3 + d78ded7 commit 0251359

File tree

3 files changed

+150
-4
lines changed

3 files changed

+150
-4
lines changed

src/SUMMARY.md

+1
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@
8888
- [AWS regions](./infra/docs/aws-regions.md)
8989
- [Bastion server](./infra/docs/bastion.md)
9090
- [Bors](./infra/docs/bors.md)
91+
- [Fixing bors queue](./infra/docs/bors/queue-resync.md)
9192
- [CDN](./infra/docs/cdn.md)
9293
- [Crater agents](./infra/docs/crater-agents.md)
9394
- [Dev Desktops](./infra/docs/dev-desktop.md)

src/infra/docs/bors.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,10 @@ from the [rust-lang/homu] repository onto our [ECS cluster][ecs].
1212
### Fixing inconsistencies in the queue
1313

1414
Homu is quite buggy, and it might happen that the queue doesn't reflect the
15-
actual state in the repositories. This can be fixed by pressing the
16-
"Synchronize" button in the queue page. Note that the synchronization process
17-
itself is a bit buggy, and it might happen that PRs which were approved but
18-
failed are re-approved again on their own.
15+
actual state in the repositories.
16+
17+
See [Fixing inconsistencies in the bors queue](./bors/queue-resync.md) for
18+
instructions on how to do this properly.
1919

2020
### Adding a new repository to bors
2121

src/infra/docs/bors/queue-resync.md

+145
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
# Fixing inconsistencies in the bors queue
2+
3+
bors queue page: <https://bors.rust-lang.org/queue/rust>.
4+
5+
<div class="warning">
6+
**WARNING**: You should only do this if you have bors `r+` permissions on the
7+
rust-lang/rust repo. Please do not synchronize if you do not have `r+` permissions
8+
even if you have write access to the repo, as you will be unable to perform the
9+
required cleanup steps.
10+
11+
This is a **destructive** operation. If someone syncs, they need to
12+
baby-sit the queue for around 45 minutes: around 30 minutes to wait for PRs to
13+
be recollected, and 15 minutes after that to kick out PRs that should not be in
14+
the tree.
15+
16+
**DO NOT CLICK THIS BUTTON IF YOU ARE NOT ABLE TO HANDLE THE CLEANUP.**
17+
</div>
18+
19+
Sometimes you have to do a bors queue sync for various reasons. This is not
20+
trivial and requires you to be very careful, as otherwise we may accidentally
21+
merge PRs to `master` (or even `beta`) that should not have been merged
22+
otherwise.
23+
24+
## Steps
25+
26+
### Step 0: Announce your intention
27+
28+
Let T-infra (and other reviewers) know that you plan to close the tree. Open a
29+
new [T-infra zulip
30+
thread](https://rust-lang.zulipchat.com/#narrow/channel/242791-t-infra) to let
31+
other contributors know about the bors queue resync.
32+
33+
### Step 1: Close the tree
34+
35+
Find a PR that's currently being tested (or any open PR in the queue really if
36+
the queue is *really* messed up).
37+
38+
Issue `@bors treeclosed=1000` along with some brief explanation for why you are
39+
closing the tree so other reviewers (especially people doing rollups) have some
40+
context.
41+
42+
Example:
43+
44+
```text
45+
Closing the tree due to a resync.
46+
cc <https://rust-lang.zulipchat.com/#narrow/channel/242791-t-infra/topic/try.20jobs.20not.20kicking.20off>.
47+
48+
@bors treeclosed=1000
49+
```
50+
51+
### Step 2: Click "synchronize" button
52+
53+
As a courtesy, you can record which PRs had try jobs starting on them. After a
54+
sync, the distinction between regular jobs and try jobs will be lost, so you'll
55+
have to kick out all "pending" PRs.
56+
57+
Click the "synchronize" button in the [bors queue page][bors-queue]. Then,
58+
immediately start performing the next step.
59+
60+
### Step 3: Kick out all actively tested PRs
61+
62+
Find *all* actively tested PRs. This includes both "auto" builds (or full CI)
63+
which show up as "pending", or "try" builds which show up as "pending (try)"
64+
(but sometimes bors forget the distinction and try jobs can show up as "pending"
65+
too).
66+
67+
On each "pending" or "pending (try)" PR, write:
68+
69+
```text
70+
@bors retry r- (sync)
71+
```
72+
73+
to suspend the current job and take it out of the queue (bors can confuse try
74+
jobs with full CI jobs).
75+
76+
### Step 4: Wait
77+
78+
Wait for around **30-45 minutes** to allow bors to recollect all the PRs. **Do
79+
not** reopen the tree beforehand, as it will cause bors to have an inconsistent
80+
view of the PRs, which will lead to unspecified behavior.
81+
82+
### Step 5: Kick out ineligible PRs
83+
84+
Check "approved" PRs in the queue. Some of them will actually not be eligible
85+
for merge, due to reasons such as:
86+
87+
- Merge conflicts
88+
- Significant changes since last review
89+
- Never has been approved
90+
91+
But sometimes bors forget these distinction. You'll need to manually visit each
92+
of the "approved" PRs and check their eligibility.
93+
94+
For "approved" PRs that are not actually eligible, you should kick them out of
95+
the queue via `@bors r-`. Prefer to be cautious if you are not sure, and
96+
unapprove the PR in case of ambiguity.
97+
98+
```text
99+
@bors r- (sync)
100+
```
101+
102+
### Step 6: Double-check approved PRs
103+
104+
Do another review pass of "approved" PRs in the queue, to make sure all approved
105+
PRs are actually eligible for merge.
106+
107+
### Step 7: Re-open the tree on the same PR where you closed the tree
108+
109+
Reopen the tree on the same PR that you issued the `treeclosed` command with
110+
111+
```text
112+
@bors treeclosed-
113+
```
114+
115+
Closely monitor bors' behavior for around 5 minutes, to ensure that bors is
116+
correctly testing a PR that's eligible for merge. Update the relevant T-infra
117+
zulip thread as suitable.
118+
119+
### Step 8: Re-queue try jobs
120+
121+
In Step 2, if you had to kick out try jobs, you can requeue the try jobs on the
122+
PRs that previously had try jobs started on them.
123+
124+
Use the normal try-job command:
125+
126+
```text
127+
@bors try
128+
```
129+
130+
and not `@bors retry`.
131+
132+
### Step 9: Edit your `treeclose` commands to prevent bors from picking them up
133+
134+
Edit the `@bors treeclosed=xxxx` command and `@bors treeclosed-` command like
135+
136+
```text
137+
~~@/bors treeclosed=xxxx~~
138+
139+
EDIT(ferris): edited to prevent bors from picking up command in a future sync
140+
```
141+
142+
AFTER the tree has been reopened to prevent bors from picking them up in a
143+
future sync.
144+
145+
[bors-queue]: https://bors.rust-lang.org/queue/rust

0 commit comments

Comments
 (0)