Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion api-reference/workflow/jobs.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@ To use the [Unstructured Workflow Endpoint](/api-reference/workflow/overview) to
the `GET` method to call the `/jobs/<job-id>/details` endpoint (for `curl` or Postman). [Learn more](/api-reference/workflow/overview#get-processing-details-for-a-job).
- To get the list of any failed files for a job and why those files failed, use the `UnstructuredClient` object's `jobs.get_failed_files` function (for the Python SDK) or
the `GET` method to call the `/jobs/<job-id>/failed-files` endpoint (for `curl` or Postman). [Learn more](/api-reference/workflow/overview#get-failed-file-details-for-a-job).
- To run a workflow that takes one or more local files only as input, and the workflow exists only for the duration of
that job's run (known as an _on-demand job_), use the `POST` method to call the `/jobs/` endpoint for `curl` or Postman. [Learn more](/api-reference/workflow/overview#run-an-on-demand-job).
- A job is created automatically whenever a workflow runs on a schedule; see [Create a workflow](/api-reference/workflow/workflows#create-a-workflow).
A job is also created whenever you run a workflow manually; see [Run a workflow](/api-reference/workflow/overview#run-a-workflow).
A job is also created automatically whenever you run a workflow manually; see [Run a workflow](/api-reference/workflow/overview#run-a-workflow).
- To cancel a running job, use the `UnstructuredClient` object's `jobs.cancel_job` function (for the Python SDK) or
the `POST` method to call the `/jobs/<job-id>/cancel` endpoint (for `curl` or Postman). [Learn more](/api-reference/workflow/overview#cancel-a-job).
111 changes: 97 additions & 14 deletions api-reference/workflow/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -201,8 +201,17 @@ Unstructured offers a [Postman collection](https://learning.postman.com/docs/col

4. In the **Paste cURL, Raw text or URL** box, enter the following URL, and then press `Enter`:

```
For all workflow operations:

```text
https://raw.githubusercontent.com/Unstructured-IO/docs/main/examplecode/codesamples/api/Unstructured-REST-API-Workflow-Endpoint.postman_collection.json
```

For [on-demand job](#run-an-on-demand-job) related operations only:

```text
https://raw.githubusercontent.com/Unstructured-IO/docs/main/examplecode/codesamples/api/Unstructured-REST-API-On-Demand-Jobs.postman_collection.json
```

5. On the sidebar, click **Collections**.
6. Expand **Unstructured REST API - Workflow Endpoint**.
Expand Down Expand Up @@ -1689,6 +1698,12 @@ the `GET` method to call the `/workflows/<workflow-id>` endpoint (for `curl` or
To create a workflow, use the `UnstructuredClient` object's `workflows.create_workflow` function (for the Python SDK) or
the `POST` method to call the `/workflows` endpoint (for `curl` or Postman).

<Note>
The following instructions create a workflow that exists until it is explicitly deleted.
To create a workflow that exists only for the duration of
that workflow's job run, and that job run's temporary workflow takes one or more local files only as input, see [Run an on-demand job](#run-an-on-demand-job).
</Note>

In the `CreateWorkflow` object (for the Python SDK) or
the request body (for `curl` or Postman),
specify the settings for the workflow. For the specific settings to include, see
Expand Down Expand Up @@ -1835,6 +1850,12 @@ To run a workflow manually, use the `UnstructuredClient` object's `workflows.run
the `POST` method to call the `/workflows/<workflow-id>/run` endpoint (for `curl` or Postman), replacing
`<workflow-id>` with the workflow's unique ID. To get this ID, see [List workflows](#list-workflows).

<Note>
The following instructions run a workflow that was already created at some point in the past and still exists.
To run a workflow that exists only for the duration of a single job run, and the job's temporary workflow
takes one or more local files only as input (known as an _on-demand job_), see [Run an on-demand job](#run-an-on-demand-job).
</Note>

<AccordionGroup>
<Accordion title="Python SDK (remote source and remote destination)">
```python
Expand Down Expand Up @@ -2024,17 +2045,7 @@ the `POST` method to call the `/workflows/<workflow-id>/run` endpoint (for `curl

To upload multiple files, add additional `input_files` entries after this one, one entry per additional file to upload.

- **Key**: `filename`, **Text**, **Value**: Type the name of the file that you just uploaded.

To upload multiple files, add additional `filename` entries after this one, one entry per additional file to upload. Make sure the order of these
`filename` entries matches the order of the `input_files` entries, respectively.

- **Key**: `type`, **Text**, **Value**: `<local-file-media-type>`

To upload multiple files, add additional `type` entries after this one, one entry per additional file to upload. Make sure the order of these
`type` entries matches the order of the `input_files` entries, respectively.

For a list of available media types, such as `application/pdf`, see [Media Types](https://www.iana.org/assignments/media-types/media-types.xhtml).
For a list of available media types, such as `application/pdf`, see [Media Types](https://www.iana.org/assignments/media-types/media-types.xhtml).

5. Click **Send**.

Expand Down Expand Up @@ -2270,15 +2281,87 @@ the `DELETE` method to call the `/workflows/<workflow-id>` endpoint (for `curl`

## Jobs

You can [list](#list-jobs),
You can [run on demand](#run-an-on-demand-job), [list](#list-jobs),
[get](#get-a-job),
and [cancel](#cancel-a-job) jobs.

A job is created automatically whenever a workflow runs on a schedule; see [Create a workflow](#create-a-workflow).
A job is also created whenever you run a workflow; see [Run a workflow](#run-a-workflow).
A job is also created automatically whenever you run a workflow; see [Run a workflow](#run-a-workflow).

For general information, see [Jobs](/ui/jobs).

### Run an on-demand job

To run a job whose workflow takes one or more local files only as input, and the job's temporary workflow exists only for the duration of
that job's run (known as an _on-demand job_), use the `POST` method to call the `/jobs/` endpoint for `curl` or Postman.

<Note>
To run a workflow that was already created at some point in the past and still exists, see [Run a workflow](#run-a-workflow).
</Note>

<AccordionGroup>
<Accordion title="curl">
In the following command, replace:

- `</full/path/to/local/filename.extension>` with the full path to the local file to upload.
- `<filename.extension>` with the filename of the local file to upload.
- `<local-file-media-type>` with the local file's media type. For a list of available media types, such as `application/pdf`, see [Media Types](https://www.iana.org/assignments/media-types/media-types.xhtml).
- To upload multiple files, add additional `--form` entries, one per file.
- For additional replacements, see the end of this section.

```bash
curl --request 'POST' --location \
"$UNSTRUCTURED_API_URL/jobs/" \
--header 'accept: application/json' \
--header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
--form "request_data={\"job_type\":\"<job-type>\",\"template_id\":\"<workflow-template-id>\",\"job_nodes\":[<job-node-settings>]}" \
--form "input_files=@</full/path/to/local/filename.extension>;filename=<filename.extension>;type=<local-file-media-type>" \
--form "input_files=@</full/path/to/local/filename.extension>;filename=<filename.extension>;type=<local-file-media-type>" # For each additional file to be uploaded.

```

To access the processed files' data, [download a processed local file](#download-a-processed-local-file-from-a-job) from the job's run.
</Accordion>
<Accordion title="Postman">
1. In the method drop-down list, select **POST**.
2. In the address box, enter the following URL:

```text
{{UNSTRUCTURED_API_URL}}/jobs/
```

3. On the **Headers** tab, enter the following headers:

- **Key**: `unstructured-api-key`, **Value**: `{{UNSTRUCTURED_API_KEY}}`
- **Key**: `accept`, **Value**: `application/json`

4. On the **Body** tab, select **form-data**, and specify the settings for the on-demand job, as follows:

- **Key**: `input_files`, **File**, **Value**: Click the **Value** box, then click **New file from local machine**, and select the file to upload.

To upload multiple files, add additional `input_files` entries after this one, one entry per additional file to upload.

- **Key**: `request_data`, **Text**, **Value**: Specify the settings for the on-demand job, as follows:

```json
{"job_type":"<job-type>","template_id":"<workflow-template-id>","job_nodes":[<job-node-settings>]}
```
5. Click **Send**.

To access the processed files' data, [download a processed local file](#download-a-processed-local-file-from-a-job) from the job's run.
</Accordion>
</AccordionGroup>

Replace the preceding placeholders as follows:

- `<job-type>` - The type of job to run. Available values include:

- `template` - This job's workflow nodes are specified by a workflow template.
- `ephemeral` - This job's workflow nodes are manually specified.

- `<workflow-template-id>` - If `<job-type>` is set to `template`, the unique ID of the workflow template to use for this job's workflow nodes. For instructions, see [List workflow templates](#list-workflow-templates) and [Get a workflow template](#get-a-workflow-template).
- `<job-node-settings>` - If `<job-type>` is set to `ephemeral`, the settings for the job's workflow nodes. For instructions, see [Custom workflow DAG nodes](/api-reference/workflow/workflows#custom-workflow-dag-nodes).

### List jobs

To list jobs, use the `UnstructuredClient` object's `jobs.list_jobs` function (for the Python SDK) or
Expand Down
91 changes: 83 additions & 8 deletions api-reference/workflow/workflows.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ To use the [Unstructured Workflow Endpoint](/api-reference/workflow/overview) to
the `PUT` method to call the `/workflows/<workflow-id>` endpoint (for `curl` or Postman). [Learn more](#update-a-workflow).
- To delete a workflow, use the `UnstructuredClient` object's `workflows.delete_workflow` function (for the Python SDK) or
the `DELETE` method to call the `/workflows/<workflow-id>` endpoint (for `curl` or Postman). [Learn more](/api-reference/workflow/overview#delete-a-workflow).
- To get a list of available workflow templates, use the `GET` method to call the `/templates` endpoint (for `curl` or Postman).
[Learn more](/api-reference/workflow/overview#list-workflow-templates).
- To get information about a workflow template, use the `GET` method to call the `/templates/<template-id>` endpoint (for `curl` or Postman).
[Learn more](/api-reference/workflow/overview#get-a-workflow-template).

The following examples assume that you have already met the [requirements](/api-reference/workflow/overview#requirements) and
understand the [basics](/api-reference/workflow/overview#basics) of working with the Unstructured Workflow Endpoint.
Expand Down Expand Up @@ -72,7 +76,8 @@ specify the settings for the workflow, as follows:
name="<name>",
source_id="<source-connector-id>",
destination_id="<destination-connector-id>",
workflow_type=WorkflowType.<TYPE>,
workflow_type=WorkflowType.<workflow-type>,
template_id="<workflow-template-id>",
workflow_nodes=[
workflow_node,
another_workflow_node
Expand Down Expand Up @@ -317,7 +322,8 @@ specify the settings for the workflow, as follows:
name="<name>",
source_id="<source-connector-id>",
destination_id="<destination-connector-id>",
workflow_type=WorkflowType.<TYPE>,
workflow_type=WorkflowType.<workflow-type>,
template_id="<workflow-template-id>",
workflow_nodes=[
workflow_node,
another_workflow_node
Expand Down Expand Up @@ -540,7 +546,8 @@ specify the settings for the workflow, as follows:
"name": "<name>",
"source_id": "<source-connector-id>",
"destination_id": "<destination-connector-id>",
"workflow_type": "<type>",
"workflow_type": "<workflow-type>",
"template_id": "<workflow-template-id>",
"workflow_nodes": [
{
"name": "<node-name>",
Expand Down Expand Up @@ -579,6 +586,7 @@ specify the settings for the workflow, as follows:
'{
"name": "<name>",
"workflow_type": "custom",
"template_id": "<workflow-template-id>",
"workflow_nodes": [
{
"name": "<node-name>",
Expand Down Expand Up @@ -617,6 +625,7 @@ specify the settings for the workflow, as follows:
"name": "<name>",
"destination_id": "<destination-connector-id>",
"workflow_type": "custom",
"template_id": "<workflow-template-id>",
"workflow_nodes": [
{
"name": "<node-name>",
Expand Down Expand Up @@ -653,7 +662,8 @@ specify the settings for the workflow, as follows:
"name": "<name>",
"source_id": "<source-connector-id>",
"destination_id": "<destination-connector-id>",
"workflow_type": "<type>",
"workflow_type": "<workflow-type>",
"template_id": "<workflow-template-id>",
"workflow_nodes": [
{
"name": "<node-name>",
Expand Down Expand Up @@ -703,6 +713,7 @@ specify the settings for the workflow, as follows:
{
"name": "<name>",
"workflow_type": "custom",
"template_id": "<workflow-template-id>",
"workflow_nodes": [
{
"name": "<node-name>",
Expand Down Expand Up @@ -751,6 +762,7 @@ specify the settings for the workflow, as follows:
{
"name": "<name>",
"workflow_type": "custom",
"template_id": "<workflow-template-id>",
"workflow_nodes": [
{
"name": "<node-name>",
Expand Down Expand Up @@ -781,10 +793,11 @@ Replace the preceding placeholders as follows:
- `<destination-connector-id>` (_required_) - The ID of the target destination connector. To get the ID,
use the `UnstructuredClient` object's `destinations.list_destinations` function (for the Python SDK) or
the `GET` method to call the `/destinations` endpoint (for `curl` or Postman). [Learn more](/api-reference/workflow/overview#list-destination-connectors).
- `<TYPE>` (for the Python SDK) or `<type>` (for `curl` or Postman) (_required_) - The workflow type. Available values include `CUSTOM` (for the Python SDK) and `custom` (for `curl` or Postman).

If `<TYPE>` is set to `CUSTOM` (for the Python SDK), or if `<type>` is set to `custom` (for `curl` or Postman), you must add a `workflow_nodes` array. For instructions, see [Custom workflow DAG nodes](#custom-workflow-dag-nodes).
- `<workflow-type>` (_required_) - The workflow type. Available values include:

- `CUSTOM` (for the Python SDK) and `custom` (for `curl` or Postman) - This workflow's nodes are manually specified in the `workflow_nodes` array. For instructions, see [Custom workflow DAG nodes](#custom-workflow-dag-nodes).
- `TEMPLATE` (for the Python SDK) and `template` (for `curl` or Postman) - This workflow's nodes are specified by a workflow template. In this case, you would not add a `workflow_nodes` array. Instead, you must specify the workflowtemplate's unique ID in the `template_id` field. For instructions, see [List workflow templates](#list-workflow-templates) and [Get a workflow template](#get-a-workflow-template).

<Note>
The previously-available workflow optimization types `ADVANCED`, `BASIC`, and `PLATINUM` (for the Python SDK) and
`advanced`, `basic`, and `platinum` (for `curl` or Postman) are non-operational and planned to be fully removed in a future release.
Expand Down Expand Up @@ -1942,4 +1955,66 @@ Allowed values for `subtype` and `model_name` include:
- `"model_name": "voyage-finance-2"`
- `"model_name": "voyage-law-2"`
- `"model_name": "voyage-code-2"`
- `"model_name": "voyage-multimodal-3"`
- `"model_name": "voyage-multimodal-3"`

## List workflow templates

To list workflow templates, use the `GET` method to call the `/templates` endpoint for `curl` or Postman.

<AccordionGroup>
<Accordion title="curl">
```bash
curl --request 'POST' --location \
"$UNSTRUCTURED_API_URL/templates" \
--header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
--header 'accept: application/json'
```
</Accordion>
<Accordion title="Postman">
1. In the method drop-down list, select **GET**.
2. In the address box, enter the following URL:

```text
{{UNSTRUCTURED_API_URL}}/templates
```

3. On the **Headers** tab, enter the following headers:

- **Key**: `unstructured-api-key`, **Value**: `{{UNSTRUCTURED_API_KEY}}`
- **Key**: `accept`, **Value**: `application/json`

4. Click **Send**.
</Accordion>
</AccordionGroup>

## Get a workflow template

To get information about a workflow template, use the `GET` method to call the
`/templates/<template-id>` endpoint for `curl` or Postman, replacing `<template-id>` with the
workflow template's unique ID. To get this ID, see [List workflow templates](#list-workflow-templates).

<AccordionGroup>
<Accordion title="curl">
```bash
curl --request 'GET' --location \
"$UNSTRUCTURED_API_URL/templates/<template-id>" \
--header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
--header 'accept: application/json'
```
</Accordion>
<Accordion title="Postman">
1. In the method drop-down list, select **GET**.
2. In the address box, enter the following URL:

```text
{{UNSTRUCTURED_API_URL}}/templates/<template-id>
```

3. On the **Headers** tab, enter the following headers:

- **Key**: `unstructured-api-key`, **Value**: `{{UNSTRUCTURED_API_KEY}}`
- **Key**: `accept`, **Value**: `application/json`

4. Click **Send**.
</Accordion>
</AccordionGroup>
Loading