-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-56074][INFRA] Improve AGENTS.md with inline build/test commands, PR workflow, and dev notes #54899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
cloud-fan
wants to merge
11
commits into
apache:master
Choose a base branch
from
cloud-fan:ai
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+75
−7
Open
[SPARK-56074][INFRA] Improve AGENTS.md with inline build/test commands, PR workflow, and dev notes #54899
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
a1d5b08
[MINOR] Improve AGENTS.md with inline build/test commands, PR workflo…
cloud-fan dd06027
[MINOR] Address review comments: remove [MINOR] from PR title format,…
cloud-fan 30fa4ba
[MINOR] Add instruction to ask user for JIRA ticket ID
cloud-fan 2e4f1b0
[MINOR] Ask user to create/provide JIRA ticket, require approval befo…
cloud-fan 1efa612
[MINOR] Simplify project overview in AGENTS.md
cloud-fan f4652af
[MINOR] Refine AGENTS.md: improve pre-flight checks, use <module> pla…
cloud-fan 0047038
[MINOR] Polish wording in Development Notes and PR Workflow sections
cloud-fan 80eeac2
[MINOR] Refine pre-flight checks and PR workflow rules in AGENTS.md
cloud-fan a6ed93e
[MINOR] Refine pre-flight checks and PR workflow rules in AGENTS.md
cloud-fan 3599583
[MINOR] Use project-local .venv for PySpark virtual environment
cloud-fan a14c16a
[MINOR] Add note to confirm edits are done before running tests
cloud-fan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,14 +1,81 @@ | ||
| # Apache Spark | ||
|
|
||
| This file provides context and guidelines for AI coding assistants working with the Apache Spark codebase. | ||
| ## Before Making Changes | ||
|
|
||
| Before the first edit in a session, ensure a clean working environment. DO NOT skip these checks: | ||
|
|
||
| 1. Run `git remote -v` to identify the personal fork and upstream (`apache/spark`). If unclear, ask the user to configure their remotes following the standard convention (`origin` for the fork, `upstream` for `apache/spark`). | ||
| 2. If the latest commit on `<upstream>/master` is more than a day old (check with `git log -1 --format="%ci" <upstream>/master`), run `git fetch <upstream> master`. | ||
| 3. If the current branch has uncommitted changes (check with `git status`) or commits not in upstream `master` (check with `git log <upstream>/master..HEAD`), ask the user to pick one: | ||
| - Create a new git worktree from `<upstream>/master` (recommended) and work from there. | ||
| - For uncommitted changes: stash them. For unmerged commits: create and switch to a new branch from `<upstream>/master`. | ||
| 4. Otherwise, proceed on the current branch. | ||
|
|
||
| ## Development Notes | ||
|
|
||
| SQL golden file tests are managed by `SQLQueryTestSuite` and its variants. Read the class documentation before running or updating these tests. DO NOT edit the generated golden files (`.sql.out`) directly. Always regenerate them when needed, and carefully review the diff to make sure it's expected. | ||
|
|
||
| Spark Connect protocol is defined in proto files under `sql/connect/common/src/main/protobuf/`. Read the README there before modifying proto definitions. | ||
|
|
||
| ## Build and Test | ||
|
|
||
| Prefer building in sbt over maven: | ||
| Build and tests can take a long time. Before running tests, ask the user if they have more changes to make. | ||
|
|
||
| Prefer SBT over Maven for faster incremental compilation. Module names are defined in `project/SparkBuild.scala`. | ||
|
|
||
| Compile a single module: | ||
|
|
||
| build/sbt <module>/compile | ||
|
|
||
| Compile test code for a single module: | ||
|
|
||
| build/sbt <module>/Test/compile | ||
|
|
||
| Run test suites by wildcard or full class name: | ||
|
|
||
| build/sbt '<module>/testOnly *MySuite' | ||
| build/sbt '<module>/testOnly org.apache.spark.sql.MySuite' | ||
|
|
||
| Run test cases matching a substring: | ||
|
|
||
| build/sbt '<module>/testOnly *MySuite -- -z "test name"' | ||
|
|
||
| For faster iteration, keep SBT open in interactive mode: | ||
|
|
||
| build/sbt | ||
| > project <module> | ||
| > testOnly *MySuite | ||
|
|
||
| ### PySpark Tests | ||
|
|
||
| PySpark tests require building Spark with Hive support first: | ||
|
|
||
| build/sbt -Phive package | ||
|
|
||
| Activate the virtual environment specified by the user, or default to `.venv`: | ||
|
|
||
| source <venv>/bin/activate | ||
|
|
||
| If the default venv does not exist, create it: | ||
|
|
||
| python3 -m venv .venv | ||
| source .venv/bin/activate | ||
| pip install -r dev/requirements.txt | ||
|
|
||
| Run a single test suite: | ||
|
|
||
| python/run-tests --testnames pyspark.sql.tests.arrow.test_arrow | ||
|
|
||
| Run a single test case: | ||
|
|
||
| python/run-tests --testnames "pyspark.sql.tests.test_catalog CatalogTests.test_current_database" | ||
|
|
||
| ## Pull Request Workflow | ||
|
|
||
| PR title requires a JIRA ticket ID (e.g., `[SPARK-xxxx][SQL] Title`). Ask the user to create a new ticket or provide an existing one if not given. Follow the template in `.github/PULL_REQUEST_TEMPLATE` for the PR description. | ||
|
|
||
| DO NOT push to the upstream repo. Always push to the personal fork. Open PRs against `master` on the upstream repo. | ||
|
|
||
| - **Building Spark**: [docs/building-spark.md](docs/building-spark.md) | ||
| - SBT build instructions: See the ["Building with SBT"](docs/building-spark.md#building-with-sbt) section | ||
| - SBT testing: See the ["Testing with SBT"](docs/building-spark.md#testing-with-sbt) section | ||
| - Running individual tests: See the ["Running Individual Tests"](docs/building-spark.md#running-individual-tests) section | ||
| DO NOT force push or use `--amend` on pushed commits unless the user explicitly asks. If the remote branch has new commits, fetch and rebase before pushing. | ||
|
|
||
| - **PySpark Testing**: [python/docs/source/development/testing.rst](python/docs/source/development/testing.rst) | ||
| Always get user approval before external operations such as pushing commits, creating PRs, or posting comments. Use `gh pr create` to open PRs. If `gh` is not installed, generate the GitHub PR URL for the user and recommend installing the GitHub CLI. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| AGENTS.md | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need this? AGENTS.md is now like an open standard for project harness
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. seems not well supported: anthropics/claude-code#34235 |
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also cc @gaogaotiantian @Yicong-Huang for the pyspark test part.