Skip to content

Commit 56d7239

Browse files
author
Ang
committed
fix: Semantic Scholar 429 (#25), auto-detect .bib (#9), fix docs (#6 #8 #11 #12 #15 #18)
- #25: return [] silently on 429 instead of logging WARNING - #9: auto-detect input_type='bib' when file has .bib extension - #8: remove false claim that keywords route to specific data sources - #18: clarify entries must be blank-line separated in basic_usage docs - #11: update api/core.rst with use_google_scholar param - #12: remove nonexistent requirements-dev.txt from CONTRIBUTING.md - #15: remove false claim about black formatting from CONTRIBUTING.md - #6: update data sources description to reflect actual behaviour - docs: document --google-scholar, direct string input, and stdin Closes #25 Closes #9 Closes #8 Closes #11 Closes #12 Closes #15 Closes #18
1 parent 98695be commit 56d7239

8 files changed

Lines changed: 65 additions & 39 deletions

File tree

CONTRIBUTING.md

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -42,10 +42,6 @@ This project adheres to the Contributor Covenant [Code of Conduct](CODE_OF_CONDU
4242
```bash
4343
# Install the package in editable mode with development dependencies
4444
pip install -e ".[dev]"
45-
46-
# Or install from requirements files
47-
pip install -r requirements.txt
48-
pip install -r requirements-dev.txt
4945
```
5046

5147
### Verify Installation
@@ -151,14 +147,6 @@ We follow [PEP 8](https://pep8.org/) style guide. Key points:
151147
- Use descriptive variable and function names
152148
- Add docstrings to all public functions and classes
153149

154-
### Code Formatting
155-
156-
We use `black` for code formatting:
157-
158-
```bash
159-
black onecite tests
160-
```
161-
162150
### Type Hints
163151

164152
Use type hints where possible:

docs/advanced_usage.rst

Lines changed: 2 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -51,29 +51,9 @@ Process multiple files sequentially::
5151
Working with Different Data Sources
5252
------------------------------------
5353

54-
OneCite routes queries to data sources based on content type:
54+
OneCite queries multiple data sources (CrossRef, PubMed, arXiv, Semantic Scholar, Google Books, and others) and selects the best match. All sources are tried for every reference — you do not need to configure routing manually::
5555

56-
**For Biomedical Literature**
57-
58-
Add search terms related to medicine, biology, or health::
59-
60-
onecite process medical_refs.txt
61-
62-
This will prioritize PubMed when available.
63-
64-
**For Computer Science**
65-
66-
Add search terms related to CS topics::
67-
68-
onecite process cs_refs.txt
69-
70-
This will prioritize DBLP and arXiv.
71-
72-
**For General Academic Work**
73-
74-
Mixed references will use CrossRef and Semantic Scholar::
75-
76-
onecite process general_refs.txt
56+
onecite process references.txt
7757

7858
Custom Templates
7959
----------------

docs/api/core.rst

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,16 +18,18 @@ The primary function for processing citations.
1818
input_type: str,
1919
template_name: str,
2020
output_format: str,
21-
interactive_callback: Callable[[List[Dict]], int]
21+
interactive_callback: Callable[[List[Dict]], int],
22+
use_google_scholar: bool = False,
2223
) -> Dict[str, Any]
2324
2425
**Parameters:**
2526
2627
- ``input_content`` (str): The reference content to process
27-
- ``input_type`` (str): Type of input - "txt" or "bib" (required)
28-
- ``template_name`` (str): Template name to use (e.g., "journal_article_full") (required)
29-
- ``output_format`` (str): Output format - "bibtex", "apa", or "mla" (required)
28+
- ``input_type`` (str): Type of input - ``"txt"`` or ``"bib"`` (required)
29+
- ``template_name`` (str): Template name to use (e.g., ``"journal_article_full"``) (required)
30+
- ``output_format`` (str): Output format - ``"bibtex"``, ``"apa"``, or ``"mla"`` (required)
3031
- ``interactive_callback`` (Callable): Function to handle ambiguous matches. Takes a list of candidate dicts and returns the selected index (0-based), or -1 to skip (required)
32+
- ``use_google_scholar`` (bool): Enable Google Scholar as an additional data source. Requires the optional ``scholarly`` package. Default is ``False``.
3133

3234
**Returns:**
3335

docs/basic_usage.rst

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,19 @@ Supported Input Formats
1313

1414
**Plain Text (.txt)**
1515

16-
A simple text file with one reference per line or separated by blank lines::
16+
A text file where each reference is separated by a **blank line**::
1717

1818
10.1038/nature14539
19+
1920
Vaswani et al., 2017, Attention is all you need
21+
2022
Smith (2020) Neural Architecture Search
2123

24+
.. note::
25+
26+
Each entry must be separated by at least one blank line. Adjacent lines
27+
within the same block are treated as a single entry.
28+
2229
**BibTeX (.bib)**
2330

2431
Standard BibTeX format files::
@@ -108,6 +115,25 @@ Suppress verbose output::
108115

109116
onecite process input.txt --quiet
110117

118+
**Google Scholar (--google-scholar)**
119+
120+
Enable Google Scholar as an additional data source (requires the optional ``scholarly`` package)::
121+
122+
onecite process input.txt --google-scholar
123+
124+
**Direct String Input**
125+
126+
Pass a reference string directly instead of a file::
127+
128+
onecite process "10.1038/nature14539"
129+
onecite process "Attention is all you need, Vaswani et al., NIPS 2017"
130+
131+
**Stdin Input**
132+
133+
Read from standard input using ``-``::
134+
135+
echo "10.1038/nature14539" | onecite process -
136+
111137
**Help (--help)**
112138

113139
Display help information::
@@ -126,6 +152,7 @@ Example 1: Process a BibTeX File
126152
onecite process my_references.bib -o clean_references.bib --quiet
127153

128154
This will read ``my_references.bib``, enhance the entries, and save to ``clean_references.bib``.
155+
The ``--input-type`` flag is optional for ``.bib`` files — OneCite detects the format automatically.
129156

130157
Example 2: Convert to APA Format
131158
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

onecite/cli.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,8 @@ def process_command(args: "argparse.Namespace") -> int:
131131
elif os.path.exists(args.input_file):
132132
with open(args.input_file, 'r', encoding='utf-8') as f:
133133
input_content = f.read()
134+
if args.input_type == 'txt' and args.input_file.lower().endswith('.bib'):
135+
args.input_type = 'bib'
134136
else:
135137
input_content = args.input_file
136138

onecite/pipeline.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -931,6 +931,10 @@ def _search_semantic_scholar(self, query: str, limit: int = 5) -> List[Dict]:
931931

932932
response = requests.get(url, params=params, timeout=10)
933933

934+
if response.status_code == 429:
935+
self.logger.debug("Semantic Scholar rate-limited (429); skipping for this query.")
936+
return []
937+
934938
if response.status_code == 200:
935939
data = response.json()
936940
papers = data.get('data', [])

tests/test_cli.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,21 @@ def _ns(**overrides):
9494

9595
# -- Missing / bad input --------------------------------------------------
9696

97+
def test_bib_file_auto_detected(self, tmp_path, capsys):
98+
"""fix #9: .bib extension should auto-set input_type to 'bib'."""
99+
inf = tmp_path / "refs.bib"
100+
inf.write_text("@article{A, title={T}}", encoding="utf-8")
101+
captured = {}
102+
103+
def _fake(*, input_type, **kw):
104+
captured['input_type'] = input_type
105+
return {"results": ["OK"], "report": {"total": 1, "succeeded": 1, "failed_entries": []}}
106+
107+
with patch("onecite.cli.process_references", side_effect=_fake):
108+
cli.process_command(self._ns(input_file=str(inf), quiet=True))
109+
110+
assert captured['input_type'] == 'bib'
111+
97112
def test_google_scholar_flag_passed_through(self, capsys):
98113
"""fix #10: --google-scholar flag must be forwarded to process_references."""
99114
captured = {}

tests/test_pipeline_unit.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -691,6 +691,14 @@ def test_strip_html(self):
691691
assert e._strip_html_tags("Human-level <i>control</i> &amp; learning") == \
692692
"Human-level control & learning"
693693

694+
def test_semantic_scholar_429_returns_empty(self):
695+
"""fix #25: 429 from Semantic Scholar must return [] without raising."""
696+
ident = IdentifierModule()
697+
resp = DummyResponse(status_code=429, json_data={})
698+
with patch("onecite.pipeline.requests.get", return_value=resp):
699+
result = ident._search_semantic_scholar("attention is all you need")
700+
assert result == []
701+
694702
def test_crossref_request_has_user_agent_and_mailto(self):
695703
"""fix #21: _get_crossref_metadata must send User-Agent and mailto."""
696704
e = EnricherModule(use_google_scholar=False)

0 commit comments

Comments
 (0)