Skip to content

NUTCH-2931 Create OpenAPI specification for Nutch 1.x REST API#896

Open
lewismc wants to merge 6 commits intoapache:masterfrom
lewismc:NUTCH-2932
Open

NUTCH-2931 Create OpenAPI specification for Nutch 1.x REST API#896
lewismc wants to merge 6 commits intoapache:masterfrom
lewismc:NUTCH-2932

Conversation

@lewismc
Copy link
Member

@lewismc lewismc commented Feb 14, 2026

This is a WIP I had on an old laptop and never pushed to my public Nutch mirror. My notes are that the OpenAPI spec didn't comply with the IBM/openapi-validator which is what I use(d) to lint and validate the OpenAPI specifications I work(ed) on over the years. I didn't have enough time/was side tracked by something or lots of other somethings and ultimately forgot about this branch altogether. I was surprised to find it.
I remember using the existing Java annotations and the Wiki documentation as resources for the OpenAPI. A few ex-colleagues also assisted in developing this branch.
It needs peer reviewed for completeness and updated to OpenAPI 3.1.X however this is a good foundation.
You can view the OpenAPI documentation if you copy and paste the content into https://editor.swagger.io/.
The goal would be to use the OpenAPI Generator project to generate a server implementation we could package either with Nutch or separately altogether.
I'm open to suggestions.

@lewismc lewismc self-assigned this Feb 14, 2026
@lewismc lewismc marked this pull request as draft February 14, 2026 01:29
@lewismc
Copy link
Member Author

lewismc commented Feb 14, 2026

I pushed a commit to add a new step to lint and validate openapi.yaml. I knew this would fail but it demonstrates where I got to last time around! Addressing the issues is non-trivial work.

  Total number of errors   : 89
  Total number of warnings : 315

In an API-first development model, these items would be addressed before we do anything further.

@lewismc
Copy link
Member Author

lewismc commented Mar 9, 2026

I put some more work into this PR and feel it is now ready for review. Here's a summary of the changes:

  • The OpenAPI 3.1.2 specification (openapi.yaml) describing all 24 REST endpoints of the Nutch service, with HTTP/HTTPS server URL variables, HTTP Basic Authentication, and fully typed request/response schemas
  • Integrate the IBM OpenAPI Validator into the master CI build (.github/workflows/master-build.yml) with a dorny/paths-filter gate so it runs only when openapi.yaml or .spectral.yaml change
  • Add a .spectral.yaml ruleset override to disable IBM-specific convention rules that conflict with the existing Nutch wire format (camelCase properties, UPPER_CASE enums, bare-array responses). This means minimal code changes to the Deprecated Nutch service code. I saw this as unproductive effort if we are just going to remove it in a future version. Should we wish to create a new service based on the openapi specification we could likely reuse some of this code but that is an entirely separate effort
  • Deprecate all Java classes in org.apache.nutch.service and its sub-packages with @Deprecated annotations and @deprecated Javadoc tags indicating the service will be removed in a future version of Nutch
  • Add a deprecation warning to the startserver command in src/bin/nutch and mark it as deprecated in the usage text
  • Mark Docker BUILD_MODE=1 (server) and BUILD_MODE=2 (server + webapp) as deprecated in docker/Dockerfile with build-time warnings
  • Annotate the 6 service-only dependencies in ivy/ivy.xml (5 Apache CXF artifacts and jackson-jaxrs-json-provider) with XML comments noting they will be removed with the service

@lewismc lewismc marked this pull request as ready for review March 9, 2026 19:54
@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 9, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
56.3% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant