Skip to content

[FEATURE] Add AGENTS.md to guide AI coding agents contributing to Kyuubi #7445

@wangzhigang1999

Description

@wangzhigang1999

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the feature

Add an AGENTS.md at the repository root that documents project conventions, build/test commands, module boundaries, and AI-assisted contribution policy in a format optimized for AI coding agents (Claude Code, Codex, Cursor, Copilot, Gemini, etc.).

AGENTS.md is an emerging cross-tool convention (see https://agents.md) that complements CONTRIBUTING.md — it's terser, command-oriented, and explicitly addresses agent failure modes (silent compliance, scope creep, fabricated APIs, undisclosed AI authorship).

Motivation

AI-assisted contributions to Kyuubi are already happening — recent merged PRs show the practice is here. Without an in-repo guide, agents miss project-specific conventions: the [KYUUBI #xxxx][COMPONENT] PR title, our pre-commit/style checks, the engine isolation between externals/kyuubi-*-sql-engine modules, the JVM-version matrices for Spark/Flink/Hive/Trino profiles.

Reviewers end up flagging the same things repeatedly.

The convention has reached critical mass in the Apache ecosystem. As of May 2026, 60+ Apache repos have committed an AGENTS.md to root, including direct neighbors of Kyuubi:

  • Apache Spark, Apache Flink — both engines Kyuubi gateways
  • Apache Iceberg, Apache Paimon, Apache Polaris, Apache Gravitino — table format / catalog peers
  • Apache Doris, Apache Pinot, Apache Druid — OLAP peers
  • Apache Airflow, Apache Superset, Apache Zeppelin, Apache ShardingSphere

Active proposals exist in Cassandra (CASSANDRA-21301), Arrow R, and Fluss.

Describe the solution

Propose an AGENTS.md (~150–200 lines, ~8KB) at the repo root, modeled on the Spark/Iceberg/Flink style. Tentative outline:

  1. Pre-flight Checks — remote setup, fetch upstream, branch selection (Spark-style)
  2. Build and Test — Maven single-module / single-class / single-method commands, the dev format/lint scripts, the -Pspark-3.5 / -Pflink-1.18 profile pattern
  3. Repository Structure & Module Boundaries — kyuubi-server, kyuubi-common, externals/, extensions/; engine modules must not depend on each other; server must not depend on engine modules
  4. High-Sensitivity Areas — Thrift IDL under kyuubi-common/src/main/thrift/, OperationManager/SessionManager, Kubernetes operations
  5. Coding Standards — Checkstyle/scalastyle expectations; AssertJ for tests
  6. PR & Commit Conventions — [KYUUBI #xxxx][COMPONENT] title, link the issue, fill the PR template, the existing AI-disclosure checkbox
  7. AI-assisted contributions — use Generated-by: rather than Co-Authored-By: ; no AI self-references in code/comments/messages (Flink/Polaris pattern)
  8. Investigating CI failures — use gh api check-run annotations rather than downloading full logs (Spark pattern)
  9. Boundaries — explicit Never / Ask first lists (no direct push to apache/kyuubi, no mixed PRs, no secrets, no breaking thrift, ask before new dependencies / public API changes / cross-engine refactors)

Additional context

Reference implementations from sibling Apache projects:

Happy to draft the file in a follow-up PR once the community aligns on scope.

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the Kyuubi community to improve.
  • No. I cannot submit a PR at this time.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions