Skip to content

Use a short SPDX license header for LLM-centered files#1489

Open
Dev-iL wants to merge 3 commits intoapache:mainfrom
SummitSG-LLC:2602/spdx
Open

Use a short SPDX license header for LLM-centered files#1489
Dev-iL wants to merge 3 commits intoapache:mainfrom
SummitSG-LLC:2602/spdx

Conversation

@Dev-iL
Copy link
Collaborator

@Dev-iL Dev-iL commented Feb 22, 2026

Following the approach from apache/airflow#62073 and apache/airflow#62145, files intended for LLM/agent consumption (not distributed in releases) now use a minimal SPDX license identifier instead of the full Apache 2.0 header - for LLM token efficiency.

See also:
https://lists.apache.org/thread/j1tn63r2lf13v3d1tnnqff8fkcl4nx53

Changes

  • Mark the .github folder as export-ignore.
  • Add a short and long license templates.
  • Add pre-commit hooks to ensure the right license header exists in every file.
  • Add missing license headers to two PR templates.

How I tested this

  • Hooks pass locally.

Notes

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

Comment on lines -26 to -28
- name: Check for missing Apache 2 license headers
run: python3 scripts/check_license_headers.py
working-directory: ${{ github.workspace }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove?

Copy link
Collaborator Author

@Dev-iL Dev-iL Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This answers all similar comments:

After the proposed change, license headers are being enforced by pre-commit hooks. This approach has several benefits:

  1. Contributors can tell there's an issue before getting to ci
  2. Coverage isn't lost since hooks should run on ci anyway as part of static checks
  3. No need to maintain license enforcement scripts
  4. Hooks were more thorough and detected missing licenses that the ci missed

Whereas the main downside is it's somewhat harder to customize if a specific file requires special treatment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue is people submit without running pre-commit hooks :(

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption was that the hooks run on CI too. It's just that instead of a custom script we use a hook - which is more standard.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add that to CI then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It already runs as part of the "Unit Tests" job's "check linting with prek" stage. I'll move this to a separate "Static checks" job for better visibility.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@Dev-iL Dev-iL force-pushed the 2602/spdx branch 3 times, most recently from 8f6cdd6 to c1f2647 Compare March 6, 2026 18:39
NOTICE Outdated
Comment on lines +7 to +14
This product includes test code vendored from pygls
(https://github.com/openlawlibrary/pygls):
Copyright (c) Open Law Library. All rights reserved.
Licensed under the Apache License, Version 2.0.

Original work in conftest.py:
Copyright 2017 Palantir Technologies, Inc.
Licensed under the MIT License.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be here? or under the LSP package notice? I think the LSP package, since this is related to the source distribution of that, no?

Copy link
Collaborator Author

@Dev-iL Dev-iL Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the ASF guidelines,

LICENSE and NOTICE files belong at the top level of the source tree.

I'm not sure how this applies to monorepos though. Will investigate...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved.

Dev-iL added 2 commits March 8, 2026 11:55
- Mark .github as export-ignore in .gitattributes
- Add short and long license templates for pre-commit hooks
- Add pre-commit hooks to enforce license headers (replaces CI scripts)
- Delete scripts/add_license_headers.py and scripts/check_license_headers.py
- Remove CI license check step from hamilton-lsp workflow
- Fix inconsistent license header indentation in several files
- Add missing license headers to PR templates
- Add vendored code attributions (Open Law Library, Palantir) to NOTICE file
- Exclude contrib/docs/ from markdown license hook (Docusaurus frontmatter)
# distributed under the License is distributed on an "AS IS" BASIS, #
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. #
# See the License for the specific language governing permissions and #
# limitations under the License. #
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't remove license headers from 3rd party origin files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be reverted and we should still update https://github.com/SummitSG-LLC/hamilton/blob/8fa698904c772a6738f7531c7fa56f62753fa325/dev_tools/language_server/LICENSE to mention this conftest.py file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me how this file should end up - do we keep only the original copyright notice or have both one under the other?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't remove any 3rd party code header - it's basically a rule of the ASF - you need to come up with a strong justification for it - there is an open issue for this

#1363

And that issue can't continue to be ignored. We need to at least update our LICENSE file to mention any 3rd party code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I kept the old header and excluded this file from the hook.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok - if we agree the code does not contain anything that requires the old Palantir copyright then it is ok to remove it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 7 to +8
---
<!-- SPDX-License-Identifier: Apache-2.0 -->
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@potiuk I configured the hooks to accept the license being below the frontmatter using:

# .pre-commit-config.yaml
      - id: insert-license
          ...
          - --detect-license-in-X-top-lines
          - '8'

In case this comes in handy in Airflow too...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants