From a1d5b08971645781e82840f1832477f0ba623cc2 Mon Sep 17 00:00:00 2001 From: Wenchen Fan Date: Thu, 19 Mar 2026 13:26:01 +0800 Subject: [PATCH 01/16] [MINOR] Improve AGENTS.md with inline build/test commands, PR workflow, and dev notes Rewrite AGENTS.md to follow industry best practices: inline actionable commands instead of links, add project overview, git pre-flight checks, PySpark venv setup, development notes for SQLQueryTestSuite and Spark Connect protos, and PR workflow guidelines. Add CLAUDE.md as a symlink so Claude Code also picks up the instructions. Generated-by: Claude Opus 4.6 Co-authored-by: Isaac --- AGENTS.md | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++----- CLAUDE.md | 1 + 2 files changed, 79 insertions(+), 7 deletions(-) create mode 120000 CLAUDE.md diff --git a/AGENTS.md b/AGENTS.md index d463bca60b070..0c346263daed2 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,14 +1,85 @@ # Apache Spark -This file provides context and guidelines for AI coding assistants working with the Apache Spark codebase. +Apache Spark is a multi-language engine for large-scale data processing and analytics, primarily written in Scala and Java. It provides SQL and DataFrame APIs for both batch and streaming workloads, with Spark Connect as an optional server-client protocol. + +## Before Making Changes + +Before the first edit in a session, check the git state: +1. If there are uncommitted changes, ask the user whether to continue editing or stash/commit first. +2. If the branch is `master`, or has new commits compared to `master` (check with `git log master..HEAD`), create a new branch from `master` before editing. Inform the user of the new branch name. +3. Otherwise, proceed on the current branch. + +## Development Notes + +SQL golden file tests are managed by `SQLQueryTestSuite` and its variants. Read the class documentation before running or updating these tests. + +Spark Connect protocol is defined in proto files under `sql/connect/common/src/main/protobuf/`. Read the README there before modifying proto definitions. ## Build and Test -Prefer building in sbt over maven: +Prefer SBT over Maven for day-to-day development (faster incremental compilation). Replace `sql` below with the target module (e.g., `core`, `catalyst`, `connect`). + +Compile a single module: + + build/sbt sql/compile + +Compile test code for a single module: + + build/sbt sql/Test/compile + +Run test suites by wildcard or full class name: + + build/sbt "sql/testOnly *MySuite" + build/sbt "sql/testOnly org.apache.spark.sql.MySuite" + +Run test cases matching a substring: + + build/sbt "sql/testOnly *MySuite -- -z \"test name\"" + +For faster iteration, keep SBT open in interactive mode: + + build/sbt + > project sql + > testOnly *MySuite + +### PySpark Tests + +PySpark tests require building Spark with Hive support first: + + build/sbt -Phive package + +Set up and activate a virtual environment (Python 3.10+): + + if [ ! -d .venv ]; then + python3 -c "import sys; assert sys.version_info >= (3, 10)" || { echo "Python 3.10+ required. Ask the user to install a compatible version."; exit 1; } + python3 -m venv .venv + source .venv/bin/activate + pip install -r dev/requirements.txt + else + source .venv/bin/activate + fi + +Run a single test suite: + + python/run-tests --testnames pyspark.sql.tests.arrow.test_arrow + +Run a single test case: + + python/run-tests --testnames "pyspark.sql.tests.test_catalog CatalogTests.test_current_database" + +## Pull Request Workflow + +### PR Title + +Format: `[SPARK-xxxx][COMPONENT] Title` or `[MINOR] Title` for trivial changes. +For follow-up fixes: `[SPARK-xxxx][COMPONENT][FOLLOWUP] Title`. + +### PR Description + +Follow the template in `.github/PULL_REQUEST_TEMPLATE`. + +### GitHub Workflow -- **Building Spark**: [docs/building-spark.md](docs/building-spark.md) - - SBT build instructions: See the ["Building with SBT"](docs/building-spark.md#building-with-sbt) section - - SBT testing: See the ["Testing with SBT"](docs/building-spark.md#testing-with-sbt) section - - Running individual tests: See the ["Running Individual Tests"](docs/building-spark.md#running-individual-tests) section +Contributors push their feature branch to their personal fork and open PRs against `master` on the upstream Apache Spark repo. Run `git remote -v` to identify which remote is the fork and which is upstream (`apache/spark`). If the remotes are unclear, ask the user to set them up following the standard convention (`origin` for the fork, `upstream` for `apache/spark`). -- **PySpark Testing**: [python/docs/source/development/testing.rst](python/docs/source/development/testing.rst) +Use `gh pr create` to open PRs. If `gh` is not installed, generate the GitHub PR URL for the user and recommend installing the GitHub CLI. diff --git a/CLAUDE.md b/CLAUDE.md new file mode 120000 index 0000000000000..47dc3e3d863cf --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file From dd0602780d050027b8468ed9270237585f1013ba Mon Sep 17 00:00:00 2001 From: Wenchen Fan Date: Thu, 19 Mar 2026 13:49:55 +0800 Subject: [PATCH 02/16] [MINOR] Address review comments: remove [MINOR] from PR title format, drop Python version check Co-authored-by: Isaac --- AGENTS.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 0c346263daed2..7b938d2e9cef5 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -48,10 +48,9 @@ PySpark tests require building Spark with Hive support first: build/sbt -Phive package -Set up and activate a virtual environment (Python 3.10+): +Set up and activate a virtual environment: if [ ! -d .venv ]; then - python3 -c "import sys; assert sys.version_info >= (3, 10)" || { echo "Python 3.10+ required. Ask the user to install a compatible version."; exit 1; } python3 -m venv .venv source .venv/bin/activate pip install -r dev/requirements.txt @@ -71,7 +70,7 @@ Run a single test case: ### PR Title -Format: `[SPARK-xxxx][COMPONENT] Title` or `[MINOR] Title` for trivial changes. +Format: `[SPARK-xxxx][COMPONENT] Title`. For follow-up fixes: `[SPARK-xxxx][COMPONENT][FOLLOWUP] Title`. ### PR Description From 30fa4babf4894de45923d5051f798c2a5767ffad Mon Sep 17 00:00:00 2001 From: Wenchen Fan Date: Thu, 19 Mar 2026 13:52:54 +0800 Subject: [PATCH 03/16] [MINOR] Add instruction to ask user for JIRA ticket ID Co-authored-by: Isaac --- AGENTS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/AGENTS.md b/AGENTS.md index 7b938d2e9cef5..44c5a00623127 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -70,7 +70,7 @@ Run a single test case: ### PR Title -Format: `[SPARK-xxxx][COMPONENT] Title`. +Format: `[SPARK-xxxx][COMPONENT] Title` where `SPARK-xxxx` is the JIRA ticket ID. Ask the user for the ticket ID if not provided. For follow-up fixes: `[SPARK-xxxx][COMPONENT][FOLLOWUP] Title`. ### PR Description From 2e4f1b0f4cc5383f3df48bb0df441e31e1e8e259 Mon Sep 17 00:00:00 2001 From: Wenchen Fan Date: Thu, 19 Mar 2026 13:55:20 +0800 Subject: [PATCH 04/16] [MINOR] Ask user to create/provide JIRA ticket, require approval before push Co-authored-by: Isaac --- AGENTS.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 44c5a00623127..d37a969ebe7e1 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -70,7 +70,7 @@ Run a single test case: ### PR Title -Format: `[SPARK-xxxx][COMPONENT] Title` where `SPARK-xxxx` is the JIRA ticket ID. Ask the user for the ticket ID if not provided. +Format: `[SPARK-xxxx][COMPONENT] Title` where `SPARK-xxxx` is the JIRA ticket ID. Ask the user to create a new ticket or provide an existing one if not given. For follow-up fixes: `[SPARK-xxxx][COMPONENT][FOLLOWUP] Title`. ### PR Description @@ -81,4 +81,4 @@ Follow the template in `.github/PULL_REQUEST_TEMPLATE`. Contributors push their feature branch to their personal fork and open PRs against `master` on the upstream Apache Spark repo. Run `git remote -v` to identify which remote is the fork and which is upstream (`apache/spark`). If the remotes are unclear, ask the user to set them up following the standard convention (`origin` for the fork, `upstream` for `apache/spark`). -Use `gh pr create` to open PRs. If `gh` is not installed, generate the GitHub PR URL for the user and recommend installing the GitHub CLI. +Always get user approval before pushing commits or creating PRs. Use `gh pr create` to open PRs. If `gh` is not installed, generate the GitHub PR URL for the user and recommend installing the GitHub CLI. From 1efa6122f767a534aaf72734a91ed904e7c71ff4 Mon Sep 17 00:00:00 2001 From: Wenchen Fan Date: Thu, 19 Mar 2026 13:59:10 +0800 Subject: [PATCH 05/16] [MINOR] Simplify project overview in AGENTS.md Co-authored-by: Isaac --- AGENTS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/AGENTS.md b/AGENTS.md index d37a969ebe7e1..8610202f59326 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,6 +1,6 @@ # Apache Spark -Apache Spark is a multi-language engine for large-scale data processing and analytics, primarily written in Scala and Java. It provides SQL and DataFrame APIs for both batch and streaming workloads, with Spark Connect as an optional server-client protocol. +Apache Spark is a multi-language engine for large-scale data processing and analytics, primarily written in Scala and Java. It supports both batch and streaming workloads, with Spark Connect as an optional server-client protocol. ## Before Making Changes From f4652af7c21862874ea8d1dda09ec752081ee7f0 Mon Sep 17 00:00:00 2001 From: Wenchen Fan Date: Thu, 19 Mar 2026 18:22:27 +0800 Subject: [PATCH 06/16] [MINOR] Refine AGENTS.md: improve pre-flight checks, use placeholders, simplify PR section Co-authored-by: Isaac --- AGENTS.md | 57 +++++++++++++++++++++++-------------------------------- 1 file changed, 24 insertions(+), 33 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 8610202f59326..4c7ffaff80c69 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,13 +1,13 @@ # Apache Spark -Apache Spark is a multi-language engine for large-scale data processing and analytics, primarily written in Scala and Java. It supports both batch and streaming workloads, with Spark Connect as an optional server-client protocol. - ## Before Making Changes -Before the first edit in a session, check the git state: -1. If there are uncommitted changes, ask the user whether to continue editing or stash/commit first. -2. If the branch is `master`, or has new commits compared to `master` (check with `git log master..HEAD`), create a new branch from `master` before editing. Inform the user of the new branch name. -3. Otherwise, proceed on the current branch. +Before the first edit in a session, ensure a clean working environment: + +1. Run `git remote -v` to identify the personal fork and upstream (`apache/spark`). If unclear, ask the user to configure their remotes following the standard convention (`origin` for the fork, `upstream` for `apache/spark`). +2. If the latest commit on `/master` is more than a day old (check with `git log -1 --format="%ci" /master`), run `git fetch master`. +3. If the current branch has uncommitted changes or commits not in upstream `master` (check with `git log /master..HEAD`), suggest creating a new git worktree from `/master` (recommended), or ask the user to clean up by creating a new branch from `/master` or stashing changes. +4. Otherwise, proceed on the current branch. ## Development Notes @@ -17,29 +17,29 @@ Spark Connect protocol is defined in proto files under `sql/connect/common/src/m ## Build and Test -Prefer SBT over Maven for day-to-day development (faster incremental compilation). Replace `sql` below with the target module (e.g., `core`, `catalyst`, `connect`). +Prefer SBT over Maven for faster incremental compilation. Module names are defined in `project/SparkBuild.scala`. Compile a single module: - build/sbt sql/compile + build/sbt /compile Compile test code for a single module: - build/sbt sql/Test/compile + build/sbt /Test/compile Run test suites by wildcard or full class name: - build/sbt "sql/testOnly *MySuite" - build/sbt "sql/testOnly org.apache.spark.sql.MySuite" + build/sbt '/testOnly *MySuite' + build/sbt '/testOnly org.apache.spark.sql.MySuite' Run test cases matching a substring: - build/sbt "sql/testOnly *MySuite -- -z \"test name\"" + build/sbt '/testOnly *MySuite -- -z "test name"' For faster iteration, keep SBT open in interactive mode: build/sbt - > project sql + > project > testOnly *MySuite ### PySpark Tests @@ -48,15 +48,15 @@ PySpark tests require building Spark with Hive support first: build/sbt -Phive package -Set up and activate a virtual environment: +Activate the virtual environment specified by the user, or default to `~/.virtualenvs/pyspark`: + + source /bin/activate + +If the default venv does not exist, create it: - if [ ! -d .venv ]; then - python3 -m venv .venv - source .venv/bin/activate - pip install -r dev/requirements.txt - else - source .venv/bin/activate - fi + python3 -m venv ~/.virtualenvs/pyspark + source ~/.virtualenvs/pyspark/bin/activate + pip install -r dev/requirements.txt Run a single test suite: @@ -68,17 +68,8 @@ Run a single test case: ## Pull Request Workflow -### PR Title - -Format: `[SPARK-xxxx][COMPONENT] Title` where `SPARK-xxxx` is the JIRA ticket ID. Ask the user to create a new ticket or provide an existing one if not given. -For follow-up fixes: `[SPARK-xxxx][COMPONENT][FOLLOWUP] Title`. - -### PR Description - -Follow the template in `.github/PULL_REQUEST_TEMPLATE`. - -### GitHub Workflow +PR title requires a JIRA ticket ID (e.g., `[SPARK-xxxx][SQL] Title`). Ask the user to create a new ticket or provide an existing one if not given. Follow the template in `.github/PULL_REQUEST_TEMPLATE` for the PR description. -Contributors push their feature branch to their personal fork and open PRs against `master` on the upstream Apache Spark repo. Run `git remote -v` to identify which remote is the fork and which is upstream (`apache/spark`). If the remotes are unclear, ask the user to set them up following the standard convention (`origin` for the fork, `upstream` for `apache/spark`). +Contributors push their feature branch to their personal fork and open PRs against `master` on the upstream Apache Spark repo. -Always get user approval before pushing commits or creating PRs. Use `gh pr create` to open PRs. If `gh` is not installed, generate the GitHub PR URL for the user and recommend installing the GitHub CLI. +Always get user approval before external operations such as pushing commits, creating PRs, or posting comments. Use `gh pr create` to open PRs. If `gh` is not installed, generate the GitHub PR URL for the user and recommend installing the GitHub CLI. From 0047038d219c0e52cf8b9350899312f96081e93f Mon Sep 17 00:00:00 2001 From: Wenchen Fan Date: Fri, 20 Mar 2026 18:48:47 +0800 Subject: [PATCH 07/16] [MINOR] Polish wording in Development Notes and PR Workflow sections Co-authored-by: Isaac --- AGENTS.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 4c7ffaff80c69..bcd0d712cadae 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -2,7 +2,7 @@ ## Before Making Changes -Before the first edit in a session, ensure a clean working environment: +Before the first edit in a session, ensure a clean working environment. DO NOT skip these checks: 1. Run `git remote -v` to identify the personal fork and upstream (`apache/spark`). If unclear, ask the user to configure their remotes following the standard convention (`origin` for the fork, `upstream` for `apache/spark`). 2. If the latest commit on `/master` is more than a day old (check with `git log -1 --format="%ci" /master`), run `git fetch master`. @@ -11,7 +11,7 @@ Before the first edit in a session, ensure a clean working environment: ## Development Notes -SQL golden file tests are managed by `SQLQueryTestSuite` and its variants. Read the class documentation before running or updating these tests. +SQL golden file tests are managed by `SQLQueryTestSuite` and its variants. Read the class documentation before running or updating these tests. DO NOT edit the generated golden files (`.sql.out`) directly. Always regenerate them when needed, and carefully review the diff to make sure it's expected. Spark Connect protocol is defined in proto files under `sql/connect/common/src/main/protobuf/`. Read the README there before modifying proto definitions. @@ -70,6 +70,6 @@ Run a single test case: PR title requires a JIRA ticket ID (e.g., `[SPARK-xxxx][SQL] Title`). Ask the user to create a new ticket or provide an existing one if not given. Follow the template in `.github/PULL_REQUEST_TEMPLATE` for the PR description. -Contributors push their feature branch to their personal fork and open PRs against `master` on the upstream Apache Spark repo. +DO NOT push to the upstream repo. Always push to the personal fork. Open PRs against `master` on the upstream repo. Always get user approval before external operations such as pushing commits, creating PRs, or posting comments. Use `gh pr create` to open PRs. If `gh` is not installed, generate the GitHub PR URL for the user and recommend installing the GitHub CLI. From 80eeac2fa0b71a17fc6623fcd317b8917f89037e Mon Sep 17 00:00:00 2001 From: Wenchen Fan Date: Sat, 21 Mar 2026 02:07:19 +0800 Subject: [PATCH 08/16] [MINOR] Refine pre-flight checks and PR workflow rules in AGENTS.md Co-authored-by: Isaac --- AGENTS.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/AGENTS.md b/AGENTS.md index bcd0d712cadae..644a2d52f8e2c 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -6,7 +6,9 @@ Before the first edit in a session, ensure a clean working environment. DO NOT s 1. Run `git remote -v` to identify the personal fork and upstream (`apache/spark`). If unclear, ask the user to configure their remotes following the standard convention (`origin` for the fork, `upstream` for `apache/spark`). 2. If the latest commit on `/master` is more than a day old (check with `git log -1 --format="%ci" /master`), run `git fetch master`. -3. If the current branch has uncommitted changes or commits not in upstream `master` (check with `git log /master..HEAD`), suggest creating a new git worktree from `/master` (recommended), or ask the user to clean up by creating a new branch from `/master` or stashing changes. +3. If the current branch has uncommitted changes (check with `git status`) or commits not in upstream `master` (check with `git log /master..HEAD`), ask the user to pick one: + - Create a new git worktree from `/master` (recommended) and work from there. + - For uncommitted changes: stash them. For unmerged commits: create and switch to a new branch from `/master`. 4. Otherwise, proceed on the current branch. ## Development Notes @@ -72,4 +74,6 @@ PR title requires a JIRA ticket ID (e.g., `[SPARK-xxxx][SQL] Title`). Ask the us DO NOT push to the upstream repo. Always push to the personal fork. Open PRs against `master` on the upstream repo. +DO NOT force push unless the user explicitly asks. Avoid `--amend` on commits that have been pushed. If the remote branch has new commits, fetch and rebase before pushing. + Always get user approval before external operations such as pushing commits, creating PRs, or posting comments. Use `gh pr create` to open PRs. If `gh` is not installed, generate the GitHub PR URL for the user and recommend installing the GitHub CLI. From a6ed93efcaac59df638efcd9d0cd2616e43dcd03 Mon Sep 17 00:00:00 2001 From: Wenchen Fan Date: Sat, 21 Mar 2026 02:11:45 +0800 Subject: [PATCH 09/16] [MINOR] Refine pre-flight checks and PR workflow rules in AGENTS.md Co-authored-by: Isaac --- AGENTS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/AGENTS.md b/AGENTS.md index 644a2d52f8e2c..1e604ce398809 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -74,6 +74,6 @@ PR title requires a JIRA ticket ID (e.g., `[SPARK-xxxx][SQL] Title`). Ask the us DO NOT push to the upstream repo. Always push to the personal fork. Open PRs against `master` on the upstream repo. -DO NOT force push unless the user explicitly asks. Avoid `--amend` on commits that have been pushed. If the remote branch has new commits, fetch and rebase before pushing. +DO NOT force push or use `--amend` on pushed commits unless the user explicitly asks. If the remote branch has new commits, fetch and rebase before pushing. Always get user approval before external operations such as pushing commits, creating PRs, or posting comments. Use `gh pr create` to open PRs. If `gh` is not installed, generate the GitHub PR URL for the user and recommend installing the GitHub CLI. From 35995832daf6565e2644664c446f17df3eedc924 Mon Sep 17 00:00:00 2001 From: Wenchen Fan Date: Sat, 21 Mar 2026 02:15:27 +0800 Subject: [PATCH 10/16] [MINOR] Use project-local .venv for PySpark virtual environment Co-authored-by: Isaac --- AGENTS.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 1e604ce398809..6860434f12f81 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -50,14 +50,14 @@ PySpark tests require building Spark with Hive support first: build/sbt -Phive package -Activate the virtual environment specified by the user, or default to `~/.virtualenvs/pyspark`: +Activate the virtual environment specified by the user, or default to `.venv`: source /bin/activate If the default venv does not exist, create it: - python3 -m venv ~/.virtualenvs/pyspark - source ~/.virtualenvs/pyspark/bin/activate + python3 -m venv .venv + source .venv/bin/activate pip install -r dev/requirements.txt Run a single test suite: From a14c16a19a269d873c51a5cdfea3c587a6b55b93 Mon Sep 17 00:00:00 2001 From: Wenchen Fan Date: Sat, 21 Mar 2026 04:12:49 +0800 Subject: [PATCH 11/16] [MINOR] Add note to confirm edits are done before running tests Co-authored-by: Isaac --- AGENTS.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/AGENTS.md b/AGENTS.md index 6860434f12f81..6344fa0053279 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -19,6 +19,8 @@ Spark Connect protocol is defined in proto files under `sql/connect/common/src/m ## Build and Test +Build and tests can take a long time. Before running tests, ask the user if they have more changes to make. + Prefer SBT over Maven for faster incremental compilation. Module names are defined in `project/SparkBuild.scala`. Compile a single module: From b5e63165c35d3408533d87328041288d645632cd Mon Sep 17 00:00:00 2001 From: Wenchen Fan Date: Sat, 21 Mar 2026 22:12:03 +0800 Subject: [PATCH 12/16] [MINOR] Add PR branch workflow and fix step numbering in Before Making Changes Co-authored-by: Isaac --- AGENTS.md | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 6344fa0053279..20e1af80ddf99 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -6,10 +6,19 @@ Before the first edit in a session, ensure a clean working environment. DO NOT s 1. Run `git remote -v` to identify the personal fork and upstream (`apache/spark`). If unclear, ask the user to configure their remotes following the standard convention (`origin` for the fork, `upstream` for `apache/spark`). 2. If the latest commit on `/master` is more than a day old (check with `git log -1 --format="%ci" /master`), run `git fetch master`. -3. If the current branch has uncommitted changes (check with `git status`) or commits not in upstream `master` (check with `git log /master..HEAD`), ask the user to pick one: +3. If there are uncommitted changes (check with `git status`), ask the user to stash or commit them before proceeding. + +Then, depending on the task: + +If working on an existing PR: +- 4a. Find the local branch for the PR by matching branch name and shared commits (one can fast-forward to the other). If not found, ask the user whether to fetch the PR to a new local branch or identify the correct local branch. +- 4b. Switch to that branch. + +If starting new work: +- 4a. If the current branch has commits not in upstream `master` (check with `git log /master..HEAD`), ask the user to pick one: - Create a new git worktree from `/master` (recommended) and work from there. - - For uncommitted changes: stash them. For unmerged commits: create and switch to a new branch from `/master`. -4. Otherwise, proceed on the current branch. + - Create and switch to a new branch from `/master`. +- 4b. Otherwise, proceed on the current branch. ## Development Notes From 82c786d0ca10a41b5a8935fd5e481e0264a7f6d5 Mon Sep 17 00:00:00 2001 From: Wenchen Fan Date: Sat, 21 Mar 2026 22:30:38 +0800 Subject: [PATCH 13/16] [MINOR] Restructure pre-flight checks: add PR branch workflow, clarify branch selection Co-authored-by: Isaac --- AGENTS.md | 20 ++++++-------------- 1 file changed, 6 insertions(+), 14 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 20e1af80ddf99..d25455d686944 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,24 +1,16 @@ # Apache Spark -## Before Making Changes +## Pre-flight Checks -Before the first edit in a session, ensure a clean working environment. DO NOT skip these checks: +Before the first code read, edit, or test in a session, ensure a clean working environment. DO NOT skip these checks: 1. Run `git remote -v` to identify the personal fork and upstream (`apache/spark`). If unclear, ask the user to configure their remotes following the standard convention (`origin` for the fork, `upstream` for `apache/spark`). 2. If the latest commit on `/master` is more than a day old (check with `git log -1 --format="%ci" /master`), run `git fetch master`. 3. If there are uncommitted changes (check with `git status`), ask the user to stash or commit them before proceeding. - -Then, depending on the task: - -If working on an existing PR: -- 4a. Find the local branch for the PR by matching branch name and shared commits (one can fast-forward to the other). If not found, ask the user whether to fetch the PR to a new local branch or identify the correct local branch. -- 4b. Switch to that branch. - -If starting new work: -- 4a. If the current branch has commits not in upstream `master` (check with `git log /master..HEAD`), ask the user to pick one: - - Create a new git worktree from `/master` (recommended) and work from there. - - Create and switch to a new branch from `/master`. -- 4b. Otherwise, proceed on the current branch. +4. Switch to the appropriate branch: + - **Existing PR**: find the local branch by matching branch name and shared commits (one can fast-forward to the other). If the branch is checked out in another worktree, work from that worktree. If not found, ask the user whether to fetch the PR to a new local branch or identify the correct local branch. + - **New work**: if the current branch has commits not in upstream `master` (check with `git log /master..HEAD`), ask the user to choose: create a new git worktree from `/master` (recommended), or create and switch to a new branch from `/master`. + - **Otherwise**: use `/master`. ## Development Notes From 41d0dce0d46cd7bd345767c80e6160f0aa96dbd8 Mon Sep 17 00:00:00 2001 From: Wenchen Fan Date: Sat, 21 Mar 2026 22:59:53 +0800 Subject: [PATCH 14/16] [MINOR] Always create new branch for edits, separate read/test path in pre-flight checks Co-authored-by: Isaac --- AGENTS.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index d25455d686944..a5d7cd6582bb3 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -6,11 +6,11 @@ Before the first code read, edit, or test in a session, ensure a clean working e 1. Run `git remote -v` to identify the personal fork and upstream (`apache/spark`). If unclear, ask the user to configure their remotes following the standard convention (`origin` for the fork, `upstream` for `apache/spark`). 2. If the latest commit on `/master` is more than a day old (check with `git log -1 --format="%ci" /master`), run `git fetch master`. -3. If there are uncommitted changes (check with `git status`), ask the user to stash or commit them before proceeding. +3. If there are uncommitted changes (check with `git status`), ask the user to stash them before proceeding. 4. Switch to the appropriate branch: - **Existing PR**: find the local branch by matching branch name and shared commits (one can fast-forward to the other). If the branch is checked out in another worktree, work from that worktree. If not found, ask the user whether to fetch the PR to a new local branch or identify the correct local branch. - - **New work**: if the current branch has commits not in upstream `master` (check with `git log /master..HEAD`), ask the user to choose: create a new git worktree from `/master` (recommended), or create and switch to a new branch from `/master`. - - **Otherwise**: use `/master`. + - **New edits**: ask the user to choose: create a new git worktree from `/master` and work from there (recommended), or create and switch to a new branch from `/master`. + - **Reading code or running tests**: use `/master`. ## Development Notes From 138fba12b71a3680212c41e90135152829ee34d6 Mon Sep 17 00:00:00 2001 From: Wenchen Fan Date: Mon, 23 Mar 2026 10:57:19 +0800 Subject: [PATCH 15/16] [MINOR] Simplify existing PR branch detection in pre-flight checks Co-authored-by: Isaac --- AGENTS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/AGENTS.md b/AGENTS.md index a5d7cd6582bb3..bbbccf1f2cce1 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -8,7 +8,7 @@ Before the first code read, edit, or test in a session, ensure a clean working e 2. If the latest commit on `/master` is more than a day old (check with `git log -1 --format="%ci" /master`), run `git fetch master`. 3. If there are uncommitted changes (check with `git status`), ask the user to stash them before proceeding. 4. Switch to the appropriate branch: - - **Existing PR**: find the local branch by matching branch name and shared commits (one can fast-forward to the other). If the branch is checked out in another worktree, work from that worktree. If not found, ask the user whether to fetch the PR to a new local branch or identify the correct local branch. + - **Existing PR**: look for a local branch matching the PR branch name and confirm with the user. If found, switch to it (if it is checked out in another worktree, work from that worktree). If not found, ask the user whether to fetch the PR to a new local branch or whether there is a local branch under a different name. - **New edits**: ask the user to choose: create a new git worktree from `/master` and work from there (recommended), or create and switch to a new branch from `/master`. - **Reading code or running tests**: use `/master`. From 854997d80129969e8049444c6f94ad8f80766e64 Mon Sep 17 00:00:00 2001 From: Wenchen Fan Date: Mon, 23 Mar 2026 11:05:19 +0800 Subject: [PATCH 16/16] [MINOR] Simplify existing PR branch detection wording Co-authored-by: Isaac --- AGENTS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/AGENTS.md b/AGENTS.md index bbbccf1f2cce1..3c0cbb4177750 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -8,7 +8,7 @@ Before the first code read, edit, or test in a session, ensure a clean working e 2. If the latest commit on `/master` is more than a day old (check with `git log -1 --format="%ci" /master`), run `git fetch master`. 3. If there are uncommitted changes (check with `git status`), ask the user to stash them before proceeding. 4. Switch to the appropriate branch: - - **Existing PR**: look for a local branch matching the PR branch name and confirm with the user. If found, switch to it (if it is checked out in another worktree, work from that worktree). If not found, ask the user whether to fetch the PR to a new local branch or whether there is a local branch under a different name. + - **Existing PR**: look for a local branch matching the PR branch name. If found, switch to it and inform the user. If not found, ask whether to fetch it or if there is a local branch under a different name. - **New edits**: ask the user to choose: create a new git worktree from `/master` and work from there (recommended), or create and switch to a new branch from `/master`. - **Reading code or running tests**: use `/master`.