Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 11 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,20 @@ Given a readme file (or a GitHub/Gitlab repository) SOMEF will extract the follo
- Family name: Last name of an author
- Email: email of author
- URL: website or ORCID associated with the author
- **Application type**: type of software (command line application, notebook, ontology, scientific workflow, etc.)
- **Build file**: Build file(s) of the project. For example, files used to create a Docker image for the target software, package files, etc.
- **Citation**: Preferred citation as the authors have stated in their readme file. SOMEF recognizes Bibtex, Citation File Format files and other means by which authors cite their papers (e.g., by in-text citation). We aim to recognize the following properties:
- **Citation**: Preferred citation as the authors have stated in their readme file. SOMEF recognizes Bibtex, Citation File Format files and other means by which authors cite their papers (e.g., by in-text citation).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my previous addition and comment above for this file

For CITATION.cff files, SOMEF now generates two separate entries: one for the software and another for the preferred citation (is_preferred_citation: True). This ensures metadata like DOI or version is correctly assigned to each entity.
We aim to recognize the following properties:
- Title: Title of the publication
- Author: list of author names in the publication
- URL: URL of the publication
- DOI: Digital object identifier of the publication
- Date published
- Version: Software version (if applicable)
- Journal: Journal name where the paper was published
- Year: Year of publication
- Pages: Page range in the journal
- **Code of conduct**: Link to the code of conduct of the project
- **Code repository**: Link to the GitHub/GitLab repository used for the extraction
- **Contact**: Contact person responsible for maintaining a software component
Expand All @@ -48,13 +55,14 @@ Given a readme file (or a GitHub/Gitlab repository) SOMEF will extract the follo
- **Forks url**: Links to forks made of the project
- **Full name**: Name + owner (owner/name)
- **Full title**: If the repository is a short name, we will attempt to extract the longer version of the repository name
- **Funding**: Funding information associated with the project. **Note**: Currently, this information is only extracted from existing `codemeta.json` files within the repository.
- **Identifier**: Identifier associated with the software (if any), such as Digital Object Identifiers and Software Heritage identifiers (SWH). DOIs associated with publications will also be detected.
- **Images**: Images used to illustrate the software component
- **Installation instructions**: A set of instructions that indicate how to install a target repository
- **Invocation**: Execution command(s) needed to run a scientific software component
- **Issue tracker**: Link where to open issues for the target repository
- **Keywords**: set of terms used to commonly identify a software component
- **License**: License and usage terms of a software component
- **License**: License and usage terms of a software component. Now we also extract license from citation.cff.
- **Logo**: Main logo used to represent the target software component
- **Maintainer**: Individuals or teams responsible for maintaining the software component, extracted from the CODEOWNERS file
- **Name**: Name identifying a software component
Expand All @@ -77,12 +85,11 @@ Given a readme file (or a GitHub/Gitlab repository) SOMEF will extract the follo
- **Repository status**: Repository status as it is described in [repostatus.org](https://www.repostatus.org/).
- **Requirements**: Pre-requisites and dependencies needed to execute a software component
- **Run**: Running instructions of a software component. It may be wider than the `invocation` category, as it may include several steps and explanations.
- **Runtime platform**: specifies runtime platform or script interpreter dependencies required to run the project..
- **Runtime platform**: specifies the runtime environment or script interpreter dependencies (e.g., Python, Java).
- **Script files**: Bash script files contained in the repository
- **Stargazers count**: Total number of stargazers of the project
- **Support**: Guidelines and links of where to obtain support for a software component
- **Support channels**: Help channels one can use to get support about the target software component
- **Type**: type of software (command line application, notebook, ontology, scientific workflow, etc.)
- **Usage examples**: Assumptions and considerations recorded by the authors when executing a software component, or examples on how to use it
- **Workflows**: URL and path to the computational workflow files present in the repository

Expand Down
44 changes: 41 additions & 3 deletions docs/codemetajson.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,9 @@ These fields are defined in the [Codemeta specification](https://github.com/code
| development_status | development_status[i].result.value | developmentStatus |
| download_url | download_url[i].result.value | downloadUrl |
| has_package_file | has_package_file[i].result.value | URL of the codemeta.json file |
| funding - funder | funding[i].result.funder | funding.funder or funding.funder.name |
| funding - funding | funding[i].result.funding | String.fundingIdentifier |
| funding - funder | funding[i].result.funder | funder.@id or funder.name *(1)*|
| funding - funding | funding[i].result.funding | funding *(1)*|
| funding - value | funding[i].result.value | funding string or funder.name *(1)*|
| identifier | identifier[i].result.value | identifier |
| issue_tracker | issue_tracker[i].result.value | issueTracker |
| keywords | keywords[i].result.value | keywords |
Expand All @@ -49,4 +50,41 @@ These fields are defined in the [Codemeta specification](https://github.com/code
| version | version[i].result.value | softwareVersion or version |



---

*(1)*

- SOMEF json result:

```
"funding": [
{
"result": {
"value": "1549758; Codemeta: A Rosetta Stone for Metadata in Scientific Software",
"type": "String",
"funder": {
"@id": "https://doi.org/10.13039/100000001",
"@type": "Organization",
"name": "National Science Foundation"
},
"funding": "1549758; Codemeta: A Rosetta Stone for Metadata in Scientific Software"
},
"confidence": 1,
"technique": "code_parser",
"source": "https://raw.githubusercontent.com/.../codemeta.json"
}
]
```

- CODEMETA output:
```
"funder": {
"@id": "https://doi.org/10.13039/100000001",
"@type": "Organization",
"name": "National Science Foundation"
},
"funding": "1549758; Codemeta: A Rosetta Stone for Metadata in Scientific Software",
```



4 changes: 2 additions & 2 deletions docs/condaenvironment.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,15 +35,15 @@ dependencies:
"value": "python=3.8.5",
"name": "python",
"version": "3.8.5",
"type": "Software_application",
"type": "SoftwareDependency",
"dependency_type": "runtime",
"dependency_resolver": "conda"
},
"result": {
"value": "albumentations==0.4.3",
"name": "albumentations",
"version": "0.4.3",
"type": "Software_application",
"type": "SoftwareDependency",
"dependency_type": "runtime",
"dependency_resolver": "pip"
},
Expand Down
2 changes: 1 addition & 1 deletion docs/gemspec.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ spec.requirements = [

Result: add_depency -> type runtime; add_development_dependencyd -> type dev
```
[{'result': {'value': 'railties: >= 3.0', 'name': 'railties', 'version': '>= 3.0', 'type': 'Software_application', 'dependency_type': 'runtime', 'dependency_resolver': 'bundler'}, 'confidence': 1, 'technique': 'code_parser', 'source': 'https://example.org/bootstrap-datepicker-rails.gemspec'}, {'result': {'value': 'bundler: >= 1.0', 'name': 'bundler', 'version': '>= 1.0', 'type': 'Software_application', 'dependency_type': 'dev','dependency_resolver': 'bundler'}, 'confidence': 1, 'technique': 'code_parser', 'source': 'https://example.org/bootstrap-datepicker-rails.gemspec'}]
[{'result': {'value': 'railties: >= 3.0', 'name': 'railties', 'version': '>= 3.0', 'type': 'SoftwareDependency', 'dependency_type': 'runtime', 'dependency_resolver': 'bundler'}, 'confidence': 1, 'technique': 'code_parser', 'source': 'https://example.org/bootstrap-datepicker-rails.gemspec'}, {'result': {'value': 'bundler: >= 1.0', 'name': 'bundler', 'version': '>= 1.0', 'type': 'SoftwareDependency', 'dependency_type': 'dev','dependency_resolver': 'bundler'}, 'confidence': 1, 'technique': 'code_parser', 'source': 'https://example.org/bootstrap-datepicker-rails.gemspec'}]
```


Expand Down
17 changes: 12 additions & 5 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Given a readme file (or a GitHub repository) SOMEF will extract the following ca

- **Acknowledgement**: Text acknowledging funding sources or contributors
- **Application domain**: The application domain of the repository. This may be related to the research area of a software component (e.g., Astrophysics) or the general domain/functionality of the tool (i.e., machine learning projects)<sup>[1](#myfootnote1)</sup>
- **Application type**: type of software (command line application, notebook, ontology, scientific workflow, etc.)
- **Assets**: files attached to the release
- url: URL of the publication of the file
- name: name of the file
Expand All @@ -31,12 +32,18 @@ Given a readme file (or a GitHub repository) SOMEF will extract the following ca
- URL: website or ORCID associated with the author
- Affiliation: name of organization or affiliation
- **Build file**: Build file(s) of the project. For example, files used to create a Docker image for the target software, package files, etc.
- **Citation**: Preferred citation as the authors have stated in their readme file. SOMEF recognizes Bibtex, Citation File Format files and other means by which authors cite their papers (e.g., by in-text citation). We aim to recognize the following properties:
- **Citation**: Preferred citation(s) as the authors have stated in their readme file. SOMEF recognizes Bibtex, Citation File Format files and other means by which authors cite their papers (e.g., by in-text citation).
For CITATION.cff files, SOMEF now generates two separate entries: one for the software and another for the preferred citation (is_preferred_citation: True). This ensures metadata like DOI or version is correctly assigned to each entity.
We aim to recognize the following properties:
- Title: Title of the publication
- Author: list of author names in the publication
- URL: URL of the publication
- DOI: Digital object identifier of the publication
- Date published:
- Date published
- Version: Software version (if applicable, i.e., the main citation is a software deposit)
- Journal: Journal name where the paper was published
- Year: Year of publication
- Pages: Page range in the journal
- **Code of conduct**: Link to the code of conduct of the project
- **Code repository**: Link to the GitHub/GitLab repository used for the extraction
- **Contact**: Contact person responsible for maintaining a software component
Expand All @@ -55,14 +62,15 @@ Given a readme file (or a GitHub repository) SOMEF will extract the following ca
- **Forks url**: Links to forks made of the project
- **Full name**: Name + owner (owner/name)
- **Full title**: If the repository is a short name, we will attempt to extract the longer version of the repository name
- **Funding**: Funding information associated with the project. **Note**: Currently, this information is only extracted from existing `codemeta.json` files within the repository.
- **Homepage**: URL of the item.
- **Identifier**: Identifier associated with the software (if any), such as Digital Object Identifiers and Software Heritage identifiers (SWH). DOIs associated with publications will also be detected.
- **Images**: Images used to illustrate the software component
- **Installation instructions**: A set of instructions that indicate how to install a target repository
- **Invocation**: Execution command(s) needed to run a scientific software component
- **Issue tracker**: Link where to open issues for the target repository
- **Keywords**: set of terms used to commonly identify a software component
- **License**: License and usage terms of a software component
- **License**: License and usage terms of a software component.
- **Logo**: Main logo used to represent the target software component
- **Name**: Name identifying a software component
- **Ontologies**: URL and path to the ontology files present in the repository
Expand All @@ -85,12 +93,11 @@ Given a readme file (or a GitHub repository) SOMEF will extract the following ca
- **Repository status**: Repository status as it is described in [repostatus.org](https://www.repostatus.org/).
- **Requirements**: Pre-requisites and dependencies needed to execute a software component
- **Run**: Running instructions of a software component. It may be wider than the `invocation` category, as it may include several steps and explanations.
- **Runtime platform**: specifies runtime platform or script interpreter dependencies required to run the project.
- **Runtime platform**: specifies the runtime environment or script interpreter dependencies (e.g., Python, Java).
- **Script files**: Bash script files contained in the repository
- **Stargazers count**: Total number of stargazers of the project
- **Support**: Guidelines and links of where to obtain support for a software component
- **Support channels**: Help channels one can use to get support about the target software component
- **Type**: type of software (command line application, notebook, ontology, scientific workflow, etc.)
- **Usage examples**: Assumptions and considerations recorded by the authors when executing a software component, or examples on how to use it
- **Workflows**: URL and path to the computational workflow files present in the repository

Expand Down
Loading
Loading