Skip to content

Commit ef93c5a

Browse files
authored
docs: Document latest crawl (#2613)
Follows #2603 ## Changes - Updates documentation on "Latest Crawl" tab - Fixes extra fetch in workflow detail page - Reverts workflow detail labels from "Duration" back to "Run Duration" and "Pages" back to "Pages Crawled"
1 parent c134b57 commit ef93c5a

File tree

4 files changed

+16
-11
lines changed

4 files changed

+16
-11
lines changed

frontend/docs/docs/user-guide/crawl-workflows.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ After deciding what type of crawl you'd like to run, you can begin to set up you
3232

3333
Run a crawl workflow by clicking _Run Crawl_ in the actions menu of the workflow in the crawl workflow list, or by clicking the _Run Crawl_ button on the workflow's details page.
3434

35-
While crawling, the **Watch Crawl** section displays a list of queued URLs that will be visited, and streams the current state of the browser windows as they visit pages from the queue. You can [modify the crawl live](./running-crawl.md) by adding URL exclusions or changing the number of crawling instances.
35+
While crawling, the **Latest Crawl** section streams the current state of the browser windows as they visit pages. You can [modify the crawl live](./running-crawl.md) by adding URL exclusions or changing the number of crawling instances.
3636

3737
Re-running a crawl workflow can be useful to capture a website as it changes over time, or to run with an updated [crawl scope](workflow-setup.md#crawl-scope-options).
3838

frontend/docs/docs/user-guide/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ The crawling panel lists the number of currently running and waiting crawls, as
2121
For organizations with a set execution minute limit, the crawling panel displays a graph of how much execution time has been used and how much is currently remaining. Monthly execution time limits reset on the first of each month at 12:00 AM GMT.
2222

2323
??? Question "How is execution time calculated?"
24-
Execution time is the total runtime of scaled by the [_Browser Windows_](workflow-setup.md/#browser-windows) setting increment value during a crawl. Like elapsed time, this is tracked as the crawl runs so changing the amount of _Browser Windows_ while a crawl is running may change the amount of execution time used in a given time period.
24+
Execution time is the total runtime of a crawl scaled by the [_Browser Windows_](workflow-setup.md/#browser-windows) value during a crawl. Like elapsed time, this is tracked while the crawl runs. Changing the amount of _Browser Windows_ while a crawl is running may change the amount of execution time used in a given time period.
2525

2626
## Collections
2727

frontend/docs/docs/user-guide/running-crawl.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Modifying Running Crawls
22

3-
Running crawls can be modified from the crawl workflow **Watch Crawl** tab. You may want to modify a runnning crawl if you find that the workflow is crawling pages that you didn't intend to archive, or if you want a boost of speed.
3+
Running crawls can be modified from the crawl workflow **Latest Crawl** tab. You may want to modify a runnning crawl if you find that the workflow is crawling pages that you didn't intend to archive, or if you want a boost of speed.
44

55
## Crawl Workflow Status
66

@@ -15,17 +15,21 @@ A crawl workflow that is in progress can be in one of the following states:
1515
| <span class="status-waiting">:btrix-status-dot: Finishing Crawl</span> | The workflow has finished crawling and data is being packaged into WACZ files.|
1616
| <span class="status-waiting">:btrix-status-dot: Uploading WACZ</span> | WACZ files have been created and are being transferred to storage.|
1717

18+
## Watch Crawl
19+
20+
You can watch the current state of the browser windows as the crawler visit pages in the **Watch** tab of **Latest Crawl**. A list of queued URLs are displayed below in the **Upcoming Pages** section.
21+
1822
## Live Exclusion Editing
1923

2024
While [exclusions](workflow-setup.md#exclude-pages) can be set before running a crawl workflow, sometimes while crawling the crawler may find new parts of the site that weren't previously known about and shouldn't be crawled, or get stuck browsing parts of a website that automatically generate URLs known as ["crawler traps"](https://en.wikipedia.org/wiki/Spider_trap).
2125

22-
If the crawl queue is filled with URLs that should not be crawled, use the _Edit Exclusions_ button on the Watch Crawl page to instruct the crawler what pages should be excluded from the queue.
26+
If the crawl queue is filled with URLs that should not be crawled, use the _Edit Exclusions_ button in the **Watch** tab to instruct the crawler what pages should be excluded from the queue.
2327

2428
Exclusions added while crawling are applied to the same exclusion table saved in the workflow's settings and will be used the next time the crawl workflow is run unless they are manually removed.
2529

2630
## Changing the Number of Browser Windows
2731

28-
Like exclusions, the number of [browser windows](workflow-setup.md#browser-windows) can also be adjusted while crawling. On the **Watch Crawl** tab, press the _Edit Browser Windows_ button, and set the desired value.
32+
Like exclusions, the number of [browser windows](workflow-setup.md#browser-windows) can also be adjusted while crawling. On the **Watch** tab, press the **+/-** button next to the _Running in_ N _browser windows_ text and set the desired value.
2933

3034
Unlike exclusions, this change will not be applied to future workflow runs.
3135

frontend/src/pages/org/workflow-detail.ts

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -162,13 +162,14 @@ export class WorkflowDetail extends BtrixElement {
162162
) {
163163
void this.fetchWorkflow();
164164
void this.fetchSeeds();
165+
void this.fetchCrawls();
166+
} else if (changedProperties.has("workflowTab")) {
167+
void this.fetchDataForTab();
165168
}
169+
166170
if (changedProperties.has("isEditing") && this.isEditing) {
167171
this.stopPoll();
168172
}
169-
if (changedProperties.has("workflowTab")) {
170-
void this.fetchDataForTab();
171-
}
172173
}
173174

174175
private async fetchDataForTab() {
@@ -829,7 +830,7 @@ export class WorkflowDetail extends BtrixElement {
829830
class="underline hover:no-underline"
830831
@click=${this.navigate.link}
831832
>
832-
${msg("Watch Running Crawl")}
833+
${msg("Watch Crawl")}
833834
</a>
834835
</btrix-alert>
835836
</div>`,
@@ -1071,7 +1072,7 @@ export class WorkflowDetail extends BtrixElement {
10711072

10721073
return html`
10731074
<btrix-desc-list horizontal>
1074-
${this.renderDetailItem(msg("Duration"), (workflow) =>
1075+
${this.renderDetailItem(msg("Run Duration"), (workflow) =>
10751076
this.lastCrawlStartTime
10761077
? this.localize.humanizeDuration(
10771078
(workflow.lastCrawlTime && !workflow.isCrawlRunning
@@ -1081,7 +1082,7 @@ export class WorkflowDetail extends BtrixElement {
10811082
)
10821083
: skeleton,
10831084
)}
1084-
${this.renderDetailItem(msg("Pages"), pages)}
1085+
${this.renderDetailItem(msg("Pages Crawled"), pages)}
10851086
${this.renderDetailItem(msg("Size"), (workflow) =>
10861087
this.localize.bytes(workflow.lastCrawlSize || 0, {
10871088
unitDisplay: "narrow",

0 commit comments

Comments
 (0)