Adds ping support by JPrevost · Pull Request #11 · MITLibraries/timdex-semantic-builder

JPrevost · 2026-03-19T20:27:13Z

Note

There are 2 not strictly necessary commits included in this PR. One removes the example lambda that was not used. The second adds some non-entirely-necessary-but-made-me-feel-better integration tests. Mocks are great, but I get nervous when everything is a mock.

Why are these changes being introduced:

Adds a health check/ping endpoint to the tokenizer lambda, which can
be used by AWS to determine if the lambda is healthy and ready to
receive traffic.
This is important for ensuring high availability and reliability of
the service.
This also allows us to keep the lambda warm, which can improve
performance by reducing cold start latency.

Relevant ticket(s):

https://mitlibraries.atlassian.net/browse/USE-431

How does this address that need:

Adds check for "ping" key in the event, and if present, returns a
simple response indicating the service is healthy.

Document any side effects to this change:

We cannot use the "ping" key in the event for any other purpose,
as it is now reserved for health checks.
Removes example lambda function and test files
Adds integration tests to ensure not all tests are mocks

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

Code review

Code review best practices are documented here and you are encouraged to have a constructive dialogue with your reviewers about their preferences and expectations.

Why are these changes being introduced: * the example lambda function and test files are no longer needed and are being removed to clean up the codebase

Why are these changes being introduced: * Adds a health check/ping endpoint to the tokenizer lambda, which can be used by AWS to determine if the lambda is healthy and ready to receive traffic. * This is important for ensuring high availability and reliability of the service. * This also allows us to keep the lambda warm, which can improve performance by reducing cold start latency. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-431 How does this address that need: * Adds check for "ping" key in the event, and if present, returns a simple response indicating the service is healthy. Document any side effects to this change: * We cannot use the "ping" key in the event for any other purpose, as it is now reserved for health checks.

qltysh · 2026-03-19T20:28:40Z

All good ✅

tests/test_tokenizer_handler.py

These tests are not strictly necessary, but I was nervous we never actually tested the tokenizer handler with a real tokenizer. These tests load a real tokenizer and test that the handler can generate tokens for a simple query. They are marked as "integration" tests since they load real models from disk and are slower than typical unit tests. They are enabled by default, but can be skipped with `pytest -m "not integration"`. The total test time does not increase enough to make the default be to skip these tests at this time, but if we need to in the future we can change the default.

Copilot

Pull request overview

Adds a lightweight “ping” health-check path to the tokenizer Lambda so AWS (and/or internal warmers) can verify readiness and keep the function warm, plus expands the test suite accordingly.

Changes:

Add ping event handling to tokenizer_handler.lambda_handler.
Add unit + integration tests for tokenizer behavior and handler behavior.
Remove the unused example my_function Lambda and its test; register an integration pytest marker.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`lambdas/tokenizer_handler.py`	Adds ping/health-check short-circuit behavior in the Lambda handler.
`tests/test_tokenizer_handler.py`	Adds unit tests for ping behavior and an integration handler test.
`tests/test_query_tokenizer.py`	Adds integration tests that load the real tokenizer/IDF from disk.
`pyproject.toml`	Registers the `integration` pytest marker.
`lambdas/my_function.py`	Removes unused sample Lambda implementation.
`tests/test_my_function.py`	Removes tests for the deleted sample Lambda.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lambdas/tokenizer_handler.py

Copilot · 2026-03-19T20:34:13Z

pyproject.toml

@@ -28,6 +28,9 @@ exclude = ["tests/"]

 [tool.pytest.ini_options]
 log_level = "INFO"


The new integration tests are marked, but nothing here configures pytest to exclude them by default. Given the repo's default test command (make test) runs plain pytest, these slow disk/model-loading tests will run on every CI/local unit-test run unless callers remember -m "not integration". Consider adding a default addopts that excludes integration tests (and documenting how to opt in), or otherwise ensuring CI/test tooling explicitly selects/filters markers.

Suggested change

log_level = "INFO"

log_level = "INFO"

addopts = "-m 'not integration'"

My local testing shows the tests didn't add significant time so I did not change the defaults.

tests/test_tokenizer_handler.py

Copilot · 2026-03-19T20:34:14Z

tests/test_query_tokenizer.py

+@pytest.mark.integration
+def test_integration_tokenize_query_returns_nonempty_dict():
+    """Real tokenizer and IDF: tokenize_query returns a non-empty dict."""
+    qt = QueryTokenizer()
+    result = qt.tokenize_query("machine learning")
+    assert isinstance(result, dict)
+    assert len(result) > 0
+
+
+@pytest.mark.integration
+def test_integration_tokenize_query_values_are_positive_floats():
+    """Real tokenizer and IDF: all weights are positive floats."""
+    qt = QueryTokenizer()
+    result = qt.tokenize_query("machine learning")
+    for token, weight in result.items():


These integration tests each instantiate QueryTokenizer() separately, which can be noticeably slow because it loads tokenizer assets from disk each time. Since they’re all validating behavior of the same real tokenizer, consider using a module-scoped fixture (or caching) so the tokenizer is loaded once per test session/module.

This currently does not seem to be an issue. Test skipping integration tests and tests with the integration tests are largely as slow as the test startup time.

lambdas/tokenizer_handler.py

ghukill

Approved!

We've talked out-of-band of this PR about a possibly more robust (and complicated!) ping handler that might handle ping requests the same across all lambdas. I think it's a discussion still worth having, particularly for lambdas that field HTTP request invocations, but this is clearly a very effective and lightweight way to support ping behavior right now.

Full support of going this route for now, for even if pivoting to a decorated lambda that standardizes our ping behavior it would only require pulling out a couple of lines here.

Also, big fan of the integration checks as well.

Looking good to me.

ghukill · 2026-03-20T13:19:25Z

lambdas/tokenizer_handler.py

+    # We add this after query_tokenizer initialization to ensure the tokenizer is
+    # initialized during cold start, even for pings


Really appreciated this comment. I got concerned that the tokenizer was re-initialized for each invocation of the lambda, but then (re)remembered cached _get_tokenizer() pattern. So I'm digging the comment as it provides just enough friction to slow down and think about why that's placed there.

Yeah I had the ping condition earlier in the handler initially as I wanted it to shortcut all the heavy lifting and then realized it might defeat the "keep warm" aspect of our health checks if it didn't actually happen after the tokenizer initialization.

ghukill · 2026-03-20T13:22:30Z

tests/test_query_tokenizer.py

+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.integration


+1 to the integration mark, just in case it could be helpful.

JPrevost added 2 commits March 19, 2026 15:38

Removes example lambda function and test files

c2b2082

Why are these changes being introduced: * the example lambda function and test files are no longer needed and are being removed to clean up the codebase

qltysh bot reviewed Mar 19, 2026

View reviewed changes

tests/test_tokenizer_handler.py Outdated Show resolved Hide resolved

JPrevost force-pushed the use-431-ping branch from e42c3f9 to cbf0002 Compare March 19, 2026 20:30

JPrevost requested a review from Copilot March 19, 2026 20:30

Copilot started reviewing on behalf of JPrevost March 19, 2026 20:31 View session

Copilot AI reviewed Mar 19, 2026

View reviewed changes

Address automated feedback

1d5091c

JPrevost force-pushed the use-431-ping branch from 25b5400 to 1d5091c Compare March 20, 2026 12:33

JPrevost requested review from ghukill, jazairi and matt-bernhardt March 20, 2026 12:36

ghukill approved these changes Mar 20, 2026

View reviewed changes

JPrevost merged commit 6453362 into main Mar 20, 2026
5 checks passed

JPrevost deleted the use-431-ping branch March 20, 2026 13:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds ping support#11

Adds ping support#11
JPrevost merged 4 commits intomainfrom
use-431-ping

JPrevost commented Mar 19, 2026 •

edited

Loading

Uh oh!

qltysh bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Mar 19, 2026

Uh oh!

JPrevost Mar 20, 2026

Uh oh!

Uh oh!

Copilot AI Mar 19, 2026

Uh oh!

JPrevost Mar 20, 2026

Uh oh!

Uh oh!

ghukill left a comment

Uh oh!

ghukill Mar 20, 2026

Uh oh!

JPrevost Mar 20, 2026

Uh oh!

ghukill Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -28,6 +28,9 @@ exclude = ["tests/"]

		[tool.pytest.ini_options]
		log_level = "INFO"

	log_level = "INFO"
	log_level = "INFO"
	addopts = "-m 'not integration'"

		# We add this after query_tokenizer initialization to ensure the tokenizer is
		# initialized during cold start, even for pings

		# ---------------------------------------------------------------------------


		@pytest.mark.integration

Conversation

JPrevost commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Includes new or updated dependencies?

Changes expectations for external applications?

Code review

Uh oh!

qltysh bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

JPrevost Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

JPrevost Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ghukill left a comment

Choose a reason for hiding this comment

Uh oh!

ghukill Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

JPrevost Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

ghukill Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JPrevost commented Mar 19, 2026 •

edited

Loading

qltysh bot commented Mar 19, 2026 •

edited

Loading