Skip to content

SPOC-471: fix Codacy-reported security concerns#17

Open
danolivo wants to merge 1 commit intomainfrom
spoc-471
Open

SPOC-471: fix Codacy-reported security concerns#17
danolivo wants to merge 1 commit intomainfrom
spoc-471

Conversation

@danolivo
Copy link
Copy Markdown

Summary

Fix three Codacy-reported security issues in the pgedge-vectorizer extension.

CWE-20 / CWE-120 — unbounded API key file reads (provider_openai.c, provider_voyage.c)

load_api_key() used fgetc() in an unbounded loop backed by a StringInfoData buffer that grows without limit. A large or malicious key file could exhaust server memory. The fix checks st.st_size against MAX_API_KEY_FILE_SIZE (4096 bytes) before fopen(), using the stat() result that was already available for the permissions check. The constant is defined once in pgedge_vectorizer.h and shared by both providers.

CWE-807 — untrusted HOME environment variable (provider_openai.c, provider_voyage.c)

expand_tilde() resolved ~ using getenv("HOME"), which is fully attacker-controllable via the process environment. Replaced with getpwuid(geteuid()), which reads the home directory from the system user database and cannot be overridden by unprivileged callers.

CWE-120 — strncpy without guaranteed null termination (worker.c)

strncpy(dbname, db_name, NAMEDATALEN - 1) followed by a manual dbname[NAMEDATALEN - 1] = '\0' is error-prone and was accompanied by now-incorrect flawfinder suppression comments. Replaced with strlcpy(dbname, db_name, NAMEDATALEN), which always null-terminates and needs no fixup line.

@danolivo danolivo requested a review from mason-sharp April 13, 2026 09:32
@danolivo danolivo self-assigned this Apr 13, 2026
@danolivo danolivo added the enhancement New feature or request label Apr 13, 2026
Reject API key files larger than MAX_API_KEY_FILE_SIZE (4096 bytes)
before opening, preventing unbounded memory allocation (CWE-20).
Move the constant into pgedge_vectorizer.h so both providers share
a single definition.

Replace getenv("HOME") with getpwuid(geteuid()) in expand_tilde()
of both providers: the environment variable is attacker-controllable
and must not be trusted for path resolution (CWE-807).

Replace strncpy + manual null-termination in worker.c with strlcpy,
which always null-terminates and does not require a separate fixup
line (CWE-120).
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

📝 Walkthrough

Walkthrough

The changes introduce several modifications across multiple source files. A new preprocessor constant MAX_API_KEY_FILE_SIZE set to 4096 is added to src/pgedge_vectorizer.h. Two provider files (src/provider_openai.c and src/provider_voyage.c) are updated to include <pwd.h>, check API key file sizes before loading, and resolve home directories using getpwuid() instead of environment variables. Additionally, src/worker.c replaces strncpy() with strlcpy() for copying database names.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: fixing Codacy-reported security concerns, which aligns with all four files modified in the changeset.
Description check ✅ Passed The description comprehensively details the three security issues fixed (CWE-20/CWE-120, CWE-807, CWE-120) with technical explanations, directly corresponding to all changes in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch spoc-471

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production
Copy link
Copy Markdown

codacy-production bot commented Apr 13, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 0 duplication

Metric Results
Duplication 0

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/provider_openai.c`:
- Around line 290-297: The pre-open stat-based size check (st.st_size >
MAX_API_KEY_FILE_SIZE) has a TOCTOU and lacks regular-file validation and
runtime bounds during read; fix by verifying S_ISREG(st.st_mode) before
proceeding, then replace fopen() with open(path, O_RDONLY|O_CLOEXEC) followed by
fstat(fd) to re-check size and S_ISREG on the opened descriptor, convert the fd
to a FILE* with fdopen(), and in the fgetc() loop introduce a bytes_read counter
that stops and returns an error if it exceeds MAX_API_KEY_FILE_SIZE; apply the
identical changes to the Voyage provider implementation as well.

In `@src/provider_voyage.c`:
- Around line 292-299: The filesize/safety check using stat() then fopen() is
vulnerable to TOCTOU and non-regular files; change the logic in the API-key
loading routine to open the file with open() (include <fcntl.h>), call fstat()
on the opened fd and verify S_ISREG(st_mode) and st_size <=
MAX_API_KEY_FILE_SIZE before reading, and then read using read()/fread() with a
running bytes_read counter to enforce a hard cap of MAX_API_KEY_FILE_SIZE
(reject if exceeded) to prevent unbounded reads; apply the same
open+fstat+S_ISREG+read-cap pattern to the analogous code in provider_openai.c.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 057f79fc-036a-4381-acf8-96d893414f0f

📥 Commits

Reviewing files that changed from the base of the PR and between 481eaa7 and 2d7996b.

📒 Files selected for processing (4)
  • src/pgedge_vectorizer.h
  • src/provider_openai.c
  • src/provider_voyage.c
  • src/worker.c

Comment on lines +290 to +297
/* Reject unreasonably large files before reading (CWE-20) */
if (st.st_size > MAX_API_KEY_FILE_SIZE)
{
*error_msg = psprintf("API key file exceeds maximum allowed size of %d bytes",
MAX_API_KEY_FILE_SIZE);
pfree(expanded_path);
return NULL;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find the file and check its contents around the specified lines
find . -name "provider_openai.c" -type f

Repository: pgEdge/pgedge-vectorizer

Length of output: 91


🏁 Script executed:

# Once file is located, examine the code around lines 290-297 and the fgetc loop
cat -n src/provider_openai.c | sed -n '280,320p'

Repository: pgEdge/pgedge-vectorizer

Length of output: 1279


🏁 Script executed:

# Also check if there's a Voyage provider for reference pattern
find . -name "*voyage*" -type f | grep -i provider

Repository: pgEdge/pgedge-vectorizer

Length of output: 91


🏁 Script executed:

# Check the Voyage provider implementation
cat -n src/provider_voyage.c | sed -n '280,320p'

Repository: pgEdge/pgedge-vectorizer

Length of output: 1349


🏁 Script executed:

# Look for the load_api_key function in Voyage provider to see full pattern
rg -A 50 "static.*load_api_key" src/provider_voyage.c | head -80

Repository: pgEdge/pgedge-vectorizer

Length of output: 1221


🏁 Script executed:

# Check if load_api_key is shared or defined separately in each provider
grep -n "MAX_API_KEY_FILE_SIZE" src/provider_openai.c src/provider_voyage.c

Repository: pgEdge/pgedge-vectorizer

Length of output: 319


🏁 Script executed:

# Look for function definition to see if it's shared or duplicated
rg "^static char \*load_api_key" src/

Repository: pgEdge/pgedge-vectorizer

Length of output: 245


🏁 Script executed:

# Check if load_api_key is in a header or shared file
find . -name "*.h" -type f | xargs grep -l "load_api_key" 2>/dev/null

Repository: pgEdge/pgedge-vectorizer

Length of output: 52


🏁 Script executed:

# Find where MAX_API_KEY_FILE_SIZE is defined
grep -r "define MAX_API_KEY_FILE_SIZE" src/

Repository: pgEdge/pgedge-vectorizer

Length of output: 126


🏁 Script executed:

# Get the full load_api_key implementation from OpenAI provider
rg -A 60 "^static char \*\nlload_api_key\(const char" src/provider_openai.c | head -70

Repository: pgEdge/pgedge-vectorizer

Length of output: 259


🏁 Script executed:

# Compare implementations - extract full function from both files
sed -n '/^static char \*$/,/^}$/p' src/provider_openai.c | grep -A 60 "load_api_key" | head -65

Repository: pgEdge/pgedge-vectorizer

Length of output: 1485


Add post-open re-validation and bounds enforcement during file read.

The pre-open size check at line 291 has a TOCTOU window: the file can be replaced between stat() and fopen() (line 300). Additionally, there's no validation that the opened file is a regular file (could be a device/FIFO), and the fgetc loop lacks runtime bounds enforcement.

To fix:

  1. Add S_ISREG(st.st_mode) check before opening
  2. Use open() with O_RDONLY | O_CLOEXEC followed by fstat() and fdopen() to reduce TOCTOU window and re-validate post-open
  3. Add a bytes_read counter in the fgetc loop (lines 311-315) that halts if it exceeds MAX_API_KEY_FILE_SIZE

Note: The Voyage provider has the same vulnerability and should receive identical hardening.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/provider_openai.c` around lines 290 - 297, The pre-open stat-based size
check (st.st_size > MAX_API_KEY_FILE_SIZE) has a TOCTOU and lacks regular-file
validation and runtime bounds during read; fix by verifying S_ISREG(st.st_mode)
before proceeding, then replace fopen() with open(path, O_RDONLY|O_CLOEXEC)
followed by fstat(fd) to re-check size and S_ISREG on the opened descriptor,
convert the fd to a FILE* with fdopen(), and in the fgetc() loop introduce a
bytes_read counter that stops and returns an error if it exceeds
MAX_API_KEY_FILE_SIZE; apply the identical changes to the Voyage provider
implementation as well.

Comment on lines +292 to +299
/* Reject unreasonably large files before reading (CWE-20) */
if (st.st_size > MAX_API_KEY_FILE_SIZE)
{
*error_msg = psprintf("API key file exceeds maximum allowed size of %d bytes",
MAX_API_KEY_FILE_SIZE);
pfree(expanded_path);
return NULL;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate and examine src/provider_voyage.c
git ls-files src/provider_voyage.c

Repository: pgEdge/pgedge-vectorizer

Length of output: 89


🏁 Script executed:

# Get the file size to determine if we can read it entirely
wc -l src/provider_voyage.c

Repository: pgEdge/pgedge-vectorizer

Length of output: 93


🏁 Script executed:

# Read the load_api_key function to understand the current implementation
# First, find the function definition
rg -n "load_api_key" src/provider_voyage.c | head -20

Repository: pgEdge/pgedge-vectorizer

Length of output: 265


🏁 Script executed:

# Read the load_api_key function starting at line 260
sed -n '260,330p' src/provider_voyage.c

Repository: pgEdge/pgedge-vectorizer

Length of output: 1673


🏁 Script executed:

# Search for MAX_API_KEY_FILE_SIZE definition
rg -n "MAX_API_KEY_FILE_SIZE" src/provider_voyage.c | head -10

Repository: pgEdge/pgedge-vectorizer

Length of output: 149


🏁 Script executed:

# Check the top of the file for includes and defines
head -50 src/provider_voyage.c

Repository: pgEdge/pgedge-vectorizer

Length of output: 1344


🏁 Script executed:

# Find and search header files
fd -name "*.h" | head -20

Repository: pgEdge/pgedge-vectorizer

Length of output: 298


🏁 Script executed:

# Search for MAX_API_KEY_FILE_SIZE in all header files
rg -n "MAX_API_KEY_FILE_SIZE" --type c --type h

Repository: pgEdge/pgedge-vectorizer

Length of output: 381


TOCTOU and non-regular file vulnerabilities allow unbounded reads.

The current implementation is vulnerable to three security issues:

  1. TOCTOU (Time-of-Check-Time-of-Use): The window between stat() (line 283) and fopen() (line 304) allows an attacker to replace the file with a symlink, pipe, or oversized file.

  2. Non-regular file bypass: No S_ISREG() check means symlinks, named pipes, character/block devices, and sockets bypass the st_size check. Reading from pipes or devices can cause indefinite blocking; symlinks can point to arbitrarily large files.

  3. Unbounded read loop: The fgetc() loop (line 312+) lacks a runtime size cap. Even if the initial stat() check passes, the loop can read more than MAX_API_KEY_FILE_SIZE if the file grows or if a non-regular file is read.

The proposed fix (adding #include <fcntl.h>, using open() + fstat() after opening, enforcing S_ISREG(), and adding a bytes_read counter in the loop) correctly addresses all three vulnerabilities. The same issues exist in provider_openai.c (line 291+).

Proposed hardening (open+fstat+regular-file check+read cap)
+#include <fcntl.h>
...
 static char *
 load_api_key(const char *filepath, char **error_msg)
 {
 	FILE *fp;
+	int fd;
 	char *expanded_path;
 	StringInfoData key_buf;
 	int c;
+	size_t bytes_read = 0;
 	struct stat st;
...
-	/* Reject unreasonably large files before reading (CWE-20) */
-	if (st.st_size > MAX_API_KEY_FILE_SIZE)
+	/* Require regular files and enforce max size */
+	if (!S_ISREG(st.st_mode) || st.st_size < 0 || st.st_size > MAX_API_KEY_FILE_SIZE)
 	{
 		*error_msg = psprintf("API key file exceeds maximum allowed size of %d bytes",
 							  MAX_API_KEY_FILE_SIZE);
 		pfree(expanded_path);
 		return NULL;
 	}
 
-	/* Open and read file */
-	fp = fopen(expanded_path, "r");
-	if (fp == NULL)
+	fd = open(expanded_path, O_RDONLY | O_CLOEXEC);
+	if (fd < 0 || fstat(fd, &st) != 0 || !S_ISREG(st.st_mode) ||
+		st.st_size < 0 || st.st_size > MAX_API_KEY_FILE_SIZE)
 	{
-		*error_msg = psprintf("Failed to open API key file: %s", expanded_path);
+		if (fd >= 0)
+			close(fd);
+		*error_msg = psprintf("Invalid API key file: %s", expanded_path);
 		pfree(expanded_path);
 		return NULL;
 	}
+	fp = fdopen(fd, "r");
+	if (fp == NULL)
+	{
+		close(fd);
+		*error_msg = psprintf("Failed to open API key file: %s", expanded_path);
+		pfree(expanded_path);
+		return NULL;
+	}
...
 	while ((c = fgetc(fp)) != EOF)
 	{
+		if (++bytes_read > MAX_API_KEY_FILE_SIZE)
+		{
+			*error_msg = psprintf("API key file exceeds maximum allowed size of %d bytes",
+								  MAX_API_KEY_FILE_SIZE);
+			fclose(fp);
+			pfree(expanded_path);
+			pfree(key_buf.data);
+			return NULL;
+		}
 		if (c != '\n' && c != '\r' && c != ' ' && c != '\t')
 			appendStringInfoChar(&key_buf, c);
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/provider_voyage.c` around lines 292 - 299, The filesize/safety check
using stat() then fopen() is vulnerable to TOCTOU and non-regular files; change
the logic in the API-key loading routine to open the file with open() (include
<fcntl.h>), call fstat() on the opened fd and verify S_ISREG(st_mode) and
st_size <= MAX_API_KEY_FILE_SIZE before reading, and then read using
read()/fread() with a running bytes_read counter to enforce a hard cap of
MAX_API_KEY_FILE_SIZE (reject if exceeded) to prevent unbounded reads; apply the
same open+fstat+S_ISREG+read-cap pattern to the analogous code in
provider_openai.c.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant