Skip to content

Add script for HAL pdf download.#5

Open
ClaireHzl wants to merge 1 commit intomainfrom
datacollection-hal
Open

Add script for HAL pdf download.#5
ClaireHzl wants to merge 1 commit intomainfrom
datacollection-hal

Conversation

@ClaireHzl
Copy link
Collaborator

This script allows to download a pdf from the article's DOI via the HAL API.
Needs to pip install requests.

Copy link

@Pamplemousse Pamplemousse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

Here are a couple of things that could also be helpful:

  • structure a subfolder to organise pieces of code related to article collection (aking to ingestion/parsing), for example ingestion/article_collection
    • with a README.md that has a little bit of documentation (with how to install, use)
  • specify the required dependencies in the root's pyproject.toml

return f"{uri}/document"


def set_output_file(doi: str, output_dir: str = "pdf") -> str:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type could be Path, as it is probably more adequate than str to represent a filesystem location.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants