diff --git a/AGENTS.md b/AGENTS.md index d463bca60b070..3c0cbb4177750 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,14 +1,82 @@ # Apache Spark -This file provides context and guidelines for AI coding assistants working with the Apache Spark codebase. +## Pre-flight Checks + +Before the first code read, edit, or test in a session, ensure a clean working environment. DO NOT skip these checks: + +1. Run `git remote -v` to identify the personal fork and upstream (`apache/spark`). If unclear, ask the user to configure their remotes following the standard convention (`origin` for the fork, `upstream` for `apache/spark`). +2. If the latest commit on `/master` is more than a day old (check with `git log -1 --format="%ci" /master`), run `git fetch master`. +3. If there are uncommitted changes (check with `git status`), ask the user to stash them before proceeding. +4. Switch to the appropriate branch: + - **Existing PR**: look for a local branch matching the PR branch name. If found, switch to it and inform the user. If not found, ask whether to fetch it or if there is a local branch under a different name. + - **New edits**: ask the user to choose: create a new git worktree from `/master` and work from there (recommended), or create and switch to a new branch from `/master`. + - **Reading code or running tests**: use `/master`. + +## Development Notes + +SQL golden file tests are managed by `SQLQueryTestSuite` and its variants. Read the class documentation before running or updating these tests. DO NOT edit the generated golden files (`.sql.out`) directly. Always regenerate them when needed, and carefully review the diff to make sure it's expected. + +Spark Connect protocol is defined in proto files under `sql/connect/common/src/main/protobuf/`. Read the README there before modifying proto definitions. ## Build and Test -Prefer building in sbt over maven: +Build and tests can take a long time. Before running tests, ask the user if they have more changes to make. + +Prefer SBT over Maven for faster incremental compilation. Module names are defined in `project/SparkBuild.scala`. + +Compile a single module: + + build/sbt /compile + +Compile test code for a single module: + + build/sbt /Test/compile + +Run test suites by wildcard or full class name: + + build/sbt '/testOnly *MySuite' + build/sbt '/testOnly org.apache.spark.sql.MySuite' + +Run test cases matching a substring: + + build/sbt '/testOnly *MySuite -- -z "test name"' + +For faster iteration, keep SBT open in interactive mode: + + build/sbt + > project + > testOnly *MySuite + +### PySpark Tests + +PySpark tests require building Spark with Hive support first: + + build/sbt -Phive package + +Activate the virtual environment specified by the user, or default to `.venv`: + + source /bin/activate + +If the default venv does not exist, create it: + + python3 -m venv .venv + source .venv/bin/activate + pip install -r dev/requirements.txt + +Run a single test suite: + + python/run-tests --testnames pyspark.sql.tests.arrow.test_arrow + +Run a single test case: + + python/run-tests --testnames "pyspark.sql.tests.test_catalog CatalogTests.test_current_database" + +## Pull Request Workflow + +PR title requires a JIRA ticket ID (e.g., `[SPARK-xxxx][SQL] Title`). Ask the user to create a new ticket or provide an existing one if not given. Follow the template in `.github/PULL_REQUEST_TEMPLATE` for the PR description. + +DO NOT push to the upstream repo. Always push to the personal fork. Open PRs against `master` on the upstream repo. -- **Building Spark**: [docs/building-spark.md](docs/building-spark.md) - - SBT build instructions: See the ["Building with SBT"](docs/building-spark.md#building-with-sbt) section - - SBT testing: See the ["Testing with SBT"](docs/building-spark.md#testing-with-sbt) section - - Running individual tests: See the ["Running Individual Tests"](docs/building-spark.md#running-individual-tests) section +DO NOT force push or use `--amend` on pushed commits unless the user explicitly asks. If the remote branch has new commits, fetch and rebase before pushing. -- **PySpark Testing**: [python/docs/source/development/testing.rst](python/docs/source/development/testing.rst) +Always get user approval before external operations such as pushing commits, creating PRs, or posting comments. Use `gh pr create` to open PRs. If `gh` is not installed, generate the GitHub PR URL for the user and recommend installing the GitHub CLI. diff --git a/CLAUDE.md b/CLAUDE.md new file mode 120000 index 0000000000000..47dc3e3d863cf --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file