Skip to content

Improve license extraction, detection spdx and consolidation #955

@juanjemdIos

Description

@juanjemdIos

We need to review and improve how licenses are extracted and handled across the system.
• Licenses should be consolidated, as there may be duplicates or inconsistencies from different sources.
• The extraction process should be reviewed, since the license identifier is sometimes incorrect (ej: "identifier": "https://spdx.org/licenses/https://spdx.org/licenses/MIT",)
• The SPDX detector does not always work properly with file dump inputs, which can lead to missing or wrong license information. spdx_id is missing. ej: " {
"result": {
"value": "Copyright (c) 2018 The Python Packaging Authority\n\n,..... "
"type": "File_dump"
},
"confidence": 1,
"technique": "file_exploration",
"source": "https://raw.githubusercontent.com/KnowledgeCaptureAndDiscovery/somef/master/LICENSE"
},

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions