Skip to content

Source Links Fix#440

Merged
MaximilianSoerenPollak merged 34 commits intoeclipse-score:mainfrom
MaximilianSoerenPollak:MSP_fix_sourcelinks_2
Mar 26, 2026
Merged

Source Links Fix#440
MaximilianSoerenPollak merged 34 commits intoeclipse-score:mainfrom
MaximilianSoerenPollak:MSP_fix_sourcelinks_2

Conversation

@MaximilianSoerenPollak
Copy link
Contributor

@MaximilianSoerenPollak MaximilianSoerenPollak commented Mar 11, 2026

Changes the working of source_code_linker.
It now correctly identifies the repository the source link & test links came from, either in local or combo build.
Should redirect to the correct place.

=> KNOWN_GOOD_JSON is required in combo builds now
=> In combo builds it will take the URL from the KNOWN_GOOD_JSON.


Done:

  • Achieve same or better test coverage as currently (see last comment)
    • Adapt all current tests to work with new structure
    • Add new tests to enable testing of new functionality
  • Update Documentation if needed
  • Clean up implementation (naming / debug prints etc. )
  • Proof it works in ref-integration & locally (docs-as-code & other local repos)

📌 Description

🚨 Impact Analysis

  • This change does not violate any tool requirements and is covered by existing tool requirements
  • This change does not violate any design decisions
  • Otherwise I have created a ticket for new tool qualification

✅ Checklist

  • Added/updated documentation for new or changed features
  • Added/updated tests to cover the changes
  • Followed project coding standards and guidelines

@github-actions
Copy link

github-actions bot commented Mar 11, 2026

License Check Results

🚀 The license check job ran with the Bazel command:

bazel run --lockfile_mode=error //src:license-check

Status: ⚠️ Needs Review

Click to expand output
[License Check Output]
Extracting Bazel installation...
Starting local Bazel server (8.3.0) and connecting to it...
INFO: Invocation ID: 6644a1c9-e7e3-4de2-984e-84014dbb9ebb
Computing main repo mapping: 
Computing main repo mapping: 
Computing main repo mapping: 
Loading: 
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
    currently loading: src
Loading: 0 packages loaded
    currently loading: src
Analyzing: target //src:license-check (1 packages loaded, 0 targets configured)
Analyzing: target //src:license-check (1 packages loaded, 0 targets configured)

Analyzing: target //src:license-check (66 packages loaded, 9 targets configured)

Analyzing: target //src:license-check (69 packages loaded, 9 targets configured)

Analyzing: target //src:license-check (118 packages loaded, 246 targets configured)

Analyzing: target //src:license-check (128 packages loaded, 2477 targets configured)

Analyzing: target //src:license-check (130 packages loaded, 2518 targets configured)

Analyzing: target //src:license-check (140 packages loaded, 2576 targets configured)

Analyzing: target //src:license-check (140 packages loaded, 2576 targets configured)

Analyzing: target //src:license-check (140 packages loaded, 2576 targets configured)

Analyzing: target //src:license-check (144 packages loaded, 4589 targets configured)

INFO: Analyzed target //src:license-check (145 packages loaded, 4715 targets configured).
[8 / 16] Creating runfiles tree bazel-out/k8-opt-exec-ST-d57f47055a04/bin/external/score_tooling+/dash/tool/formatters/dash_format_converter.runfiles [for tool]; 0s local
[13 / 16] JavaToolchainCompileClasses external/rules_java+/toolchains/platformclasspath_classes; 0s disk-cache, processwrapper-sandbox
[15 / 16] [Prepa] Building src/license.check.license_check.jar ()
INFO: Found 1 target...
Target //src:license.check.license_check up-to-date:
  bazel-bin/src/license.check.license_check
  bazel-bin/src/license.check.license_check.jar
INFO: Elapsed time: 23.295s, Critical Path: 2.67s
INFO: 16 processes: 12 internal, 3 processwrapper-sandbox, 1 worker.
INFO: Build completed successfully, 16 total actions
INFO: Running command line: bazel-bin/src/license.check.license_check src/formatted.txt <args omitted>
usage: org.eclipse.dash.licenses.cli.Main [-batch <int>] [-cd <url>]
       [-confidence <int>] [-ef <url>] [-excludeSources <sources>] [-help] [-lic
       <url>] [-project <shortname>] [-repo <url>] [-review] [-summary <file>]
       [-timeout <seconds>] [-token <token>]

@github-actions
Copy link

The created documentation from the pull request is available at: docu-html

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reworks how “source code links” are generated and injected into Sphinx needs, aiming to support both local builds (derive Git hash from the current repo) and combo/ref-integration builds (use per-module metadata such as repo URL + commit hash).

Changes:

  • Add metadata support (module name / repo URL / commit hash) to NeedLink JSON handling and Bazel CLI generation/merge scripts.
  • Introduce grouping of needs by module and a new module-grouped cache JSON.
  • Update Sphinx extension flow to read the new caches and generate GitHub links using either git-derived or metadata-derived refs.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
src/helper_lib/additional_functions.py Changes GitHub link generation API to require module metadata.
src/extensions/score_source_code_linker/needlinks.py Extends NeedLink with metadata fields; adds new metadata-aware JSON format/load path.
src/extensions/score_source_code_linker/need_source_links.py Moves group_by_need into this module.
src/extensions/score_source_code_linker/module_source_links.py New module-grouped cache format and grouping logic.
src/extensions/score_source_code_linker/metadata.py New TypedDict + TypeGuard for metadata records.
src/extensions/score_source_code_linker/generate_source_code_links_json.py Refactors extraction helper signature and logging (currently inconsistent call sites).
src/extensions/score_source_code_linker/init.py Adds module-linker stage and changes injection to use module-grouped cache + metadata-based link generation.
scripts_bazel/merge_sourcelinks.py Merges per-module sourcelinks and enriches with known-good repo/hash metadata.
scripts_bazel/generate_sourcelinks_cli.py Emits sourcelinks JSON with a leading metadata dict and uses updated extraction helper signature.
docs.bzl Adds optional known_good wiring into the sourcelinks merge genrule and the public docs() macro.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +106 to +108
store_source_code_links_with_metadata_json(
file=args.output, metadata=metadata, needlist=all_need_references
)
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This switches the generated JSON format from a plain list of NeedLinks to a list whose first element is a metadata dict. Any existing consumers/tests that expect the old schema will now fail. Consider either keeping the old format as default (with an opt-in flag for metadata), or updating all in-repo consumers and tests in the same PR to avoid a partially-migrated state.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a-zw Valid issue here .

Do you think it would be better to rename this specific cache that comes from here a bit so it is clear it is with metadata?
Like 'scl_metadata_cache.json' or whatever? SO that the name makes it clear to use the metadata reader?

Comment on lines +30 to +39
metadata: moduleInfo,
link: NeedLink | DataForTestLink | DataOfTestCase | None = None,
) -> str:
if link is None:
link = DefaultNeedLink()
if not metadata.hash:
# Local path (//:docs)
return get_github_link_from_git(link)
# Ref-Integration path (//:docs_combo..)
return get_github_link_from_json(metadata, link)
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_github_link now requires a metadata argument, but there are existing call sites in the repo that still call get_github_link(link) (e.g. in tests and xml_parser.py). This is a breaking API change that will raise TypeError. Consider keeping backwards compatibility by making metadata optional with a sensible default (e.g. infer local/git behavior when metadata is omitted) or providing a new function name for the metadata-based behavior.

Suggested change
metadata: moduleInfo,
link: NeedLink | DataForTestLink | DataOfTestCase | None = None,
) -> str:
if link is None:
link = DefaultNeedLink()
if not metadata.hash:
# Local path (//:docs)
return get_github_link_from_git(link)
# Ref-Integration path (//:docs_combo..)
return get_github_link_from_json(metadata, link)
metadata: moduleInfo
| NeedLink
| DataForTestLink
| DataOfTestCase
| None = None,
link: NeedLink | DataForTestLink | DataOfTestCase | None = None,
) -> str:
"""
Return a GitHub link for the given location.
Backwards compatible calling conventions:
- New style: get_github_link(metadata, link)
- Old style: get_github_link(link) # metadata omitted
"""
# Distinguish between new-style and old-style calls.
if isinstance(metadata, moduleInfo):
actual_metadata: moduleInfo | None = metadata
actual_link = link
else:
# Old-style: first argument is actually the link (or None)
actual_metadata = None
# If both metadata and link are provided but metadata is not a moduleInfo,
# prefer the explicitly provided link argument.
actual_link = link if link is not None else metadata
if actual_link is None:
actual_link = DefaultNeedLink()
# If no metadata is available or metadata.hash is falsy, fall back to git-based link.
if actual_metadata is None or not getattr(actual_metadata, "hash", None):
# Local path (//:docs)
return get_github_link_from_git(actual_link)
# Ref-Integration path (//:docs_combo..)
return get_github_link_from_json(actual_metadata, actual_link)

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.


You can also share your feedback on Copilot code review. Take the survey.

@MaximilianSoerenPollak
Copy link
Contributor Author

Just realized that testlinks also need the module name earlier than source_code_linker extension end state.
As well as the edgecase that it could be that source_code_links are empty but testlinks exists. Then we have 0 idea about the module state.

I need to re-think this approach a bit and see what I can adapt to make this more convenient. I think I have an idea, but unsure.

@MaximilianSoerenPollak
Copy link
Contributor Author

Still a bit to do here, but the architecture is 'working', though not pretty.
Will work on fixing the tests soon, and fixing the finaly bug in the testlinks that the actuall github link is put together wrong

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 14 comments.


You can also share your feedback on Copilot code review. Take the survey.

@MaximilianSoerenPollak
Copy link
Contributor Author

MaximilianSoerenPollak commented Mar 16, 2026

Current TestCoverage report of main:

Goal => Do not fall under current coverage after changes.

image

After just converting the current tests to new structure this is the new test coverage:
image


Progress after adding 'module_source_links', 'needlinks', 'testlinks' tests, and cleaning up some tests.
image

logger = logging.getLogger(__name__)


def clean_external_prefix(path: Path) -> Path:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this still here? With our new design, this should not be necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neccessary. Otherwise the filename will be wrong that is passed to the extraction function.

return f"{base_url}/blob/{current_hash}/{link.file}#L{link.line}"


def parse_module_name_from_path(path: Path) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not need this anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need this.
The only place we get the actual model this sourcelink file belongs to is via the path.
And we need that to make the correct json in the end.

Is there a different idea that I missed where we would have the module somehow somewhere else?

"url": "",

Else it will parse the module_name e.g. `score_docs_as_code`
match this in the known_good_json and grab the accompanying
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This describes the old way of doing it. Why is it still here?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 6 comments.


You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +74 to +85
known_good_json = os.environ.get("KNOWN_GOOD_JSON")
module_name = parse_module_name_from_path(filepath)
md: MetaData = {
"module_name": module_name,
"hash": "",
"url": "",
}
if module_name != "local_module" and known_good_json:
md["hash"], md["url"] = parse_info_from_known_good(
Path(known_good_json), module_name
)
return md
pyproject.toml Outdated
".venv*/**",
]
[tool.pytest.ini_options]
addopts = ["-v", "-s"]
assert result == expected


# Done to appease the LSP Gods
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change comment:

Suggested change
# Done to appease the LSP Gods
# Done to ensure type hinting throughout.

test_type="requirements-based",
derivation_technique="requirements-analysis",
)
def test_get_github_link_with_real_repo(git_repo: Path) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seems correct & useful.
Must have been accidentally delted:

=> reinstate this.

@@ -1,5 +1,5 @@
# *******************************************************************************
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted all / most property decorators as I wanted to understand how they should be used (content wise).
Once this is understood, will reinstate & add decorators to all test functions where applicable.


# ─────────────────[ NeedLink Dataclass Tests ]─────────────────


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we 'custom made' these two functions and rely on them working correctly during encoding / decoding I figured we should probably test them.



def test_store_and_load_source_code_links(tmp_path: Path):
"""Happy path: Store and load without metadata"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should add an actual comparision of the two jsons in every roundtrip tests that was made.
To fully test if saved => loaded

Comment on lines +671 to +673
module_name="mod",
url="url",
hash="hash",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double check if it would be nicer in implementation to have the default be 'local_module' instead.

AlexanderLanin
AlexanderLanin previously approved these changes Mar 26, 2026
@MaximilianSoerenPollak
Copy link
Contributor Author

Just for informations sake .

PR was reviewed multiple times with @a-zw , the latest changes were gone over in a call to ensure understanding and clearing up questions etc.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 32 changed files in this pull request and generated 4 comments.

Fixing assertion
Fixing wrong split in path
Fixing pytest global arguments
@MaximilianSoerenPollak
Copy link
Contributor Author

Consumer tests are failing with the expected issue only:

image

This issue is hinged upon this PR being merged: eclipse-score/tooling#130

Once that is the case and a git override is added to the newest main commit of tooling, I would expect all consumer tests to pass without issue

@MaximilianSoerenPollak
Copy link
Contributor Author

@AlexanderLanin @a-zw

I addressed & fixed the copilot issues as well (those that were actual issues).

It should now be good to go from my side.

@MaximilianSoerenPollak MaximilianSoerenPollak merged commit 6763560 into eclipse-score:main Mar 26, 2026
25 of 27 checks passed
@MaximilianSoerenPollak MaximilianSoerenPollak deleted the MSP_fix_sourcelinks_2 branch March 26, 2026 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

6 participants