Conversation
Reject API key files larger than MAX_API_KEY_FILE_SIZE (4096 bytes)
before opening, preventing unbounded memory allocation (CWE-20).
Move the constant into pgedge_vectorizer.h so both providers share
a single definition.
Replace getenv("HOME") with getpwuid(geteuid()) in expand_tilde()
of both providers: the environment variable is attacker-controllable
and must not be trusted for path resolution (CWE-807).
Replace strncpy + manual null-termination in worker.c with strlcpy,
which always null-terminates and does not require a separate fixup
line (CWE-120).
📝 WalkthroughWalkthroughThe changes introduce several modifications across multiple source files. A new preprocessor constant 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Duplication | 0 |
TIP This summary will be updated as you push new changes. Give us feedback
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/provider_openai.c`:
- Around line 290-297: The pre-open stat-based size check (st.st_size >
MAX_API_KEY_FILE_SIZE) has a TOCTOU and lacks regular-file validation and
runtime bounds during read; fix by verifying S_ISREG(st.st_mode) before
proceeding, then replace fopen() with open(path, O_RDONLY|O_CLOEXEC) followed by
fstat(fd) to re-check size and S_ISREG on the opened descriptor, convert the fd
to a FILE* with fdopen(), and in the fgetc() loop introduce a bytes_read counter
that stops and returns an error if it exceeds MAX_API_KEY_FILE_SIZE; apply the
identical changes to the Voyage provider implementation as well.
In `@src/provider_voyage.c`:
- Around line 292-299: The filesize/safety check using stat() then fopen() is
vulnerable to TOCTOU and non-regular files; change the logic in the API-key
loading routine to open the file with open() (include <fcntl.h>), call fstat()
on the opened fd and verify S_ISREG(st_mode) and st_size <=
MAX_API_KEY_FILE_SIZE before reading, and then read using read()/fread() with a
running bytes_read counter to enforce a hard cap of MAX_API_KEY_FILE_SIZE
(reject if exceeded) to prevent unbounded reads; apply the same
open+fstat+S_ISREG+read-cap pattern to the analogous code in provider_openai.c.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 057f79fc-036a-4381-acf8-96d893414f0f
📒 Files selected for processing (4)
src/pgedge_vectorizer.hsrc/provider_openai.csrc/provider_voyage.csrc/worker.c
| /* Reject unreasonably large files before reading (CWE-20) */ | ||
| if (st.st_size > MAX_API_KEY_FILE_SIZE) | ||
| { | ||
| *error_msg = psprintf("API key file exceeds maximum allowed size of %d bytes", | ||
| MAX_API_KEY_FILE_SIZE); | ||
| pfree(expanded_path); | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find the file and check its contents around the specified lines
find . -name "provider_openai.c" -type fRepository: pgEdge/pgedge-vectorizer
Length of output: 91
🏁 Script executed:
# Once file is located, examine the code around lines 290-297 and the fgetc loop
cat -n src/provider_openai.c | sed -n '280,320p'Repository: pgEdge/pgedge-vectorizer
Length of output: 1279
🏁 Script executed:
# Also check if there's a Voyage provider for reference pattern
find . -name "*voyage*" -type f | grep -i providerRepository: pgEdge/pgedge-vectorizer
Length of output: 91
🏁 Script executed:
# Check the Voyage provider implementation
cat -n src/provider_voyage.c | sed -n '280,320p'Repository: pgEdge/pgedge-vectorizer
Length of output: 1349
🏁 Script executed:
# Look for the load_api_key function in Voyage provider to see full pattern
rg -A 50 "static.*load_api_key" src/provider_voyage.c | head -80Repository: pgEdge/pgedge-vectorizer
Length of output: 1221
🏁 Script executed:
# Check if load_api_key is shared or defined separately in each provider
grep -n "MAX_API_KEY_FILE_SIZE" src/provider_openai.c src/provider_voyage.cRepository: pgEdge/pgedge-vectorizer
Length of output: 319
🏁 Script executed:
# Look for function definition to see if it's shared or duplicated
rg "^static char \*load_api_key" src/Repository: pgEdge/pgedge-vectorizer
Length of output: 245
🏁 Script executed:
# Check if load_api_key is in a header or shared file
find . -name "*.h" -type f | xargs grep -l "load_api_key" 2>/dev/nullRepository: pgEdge/pgedge-vectorizer
Length of output: 52
🏁 Script executed:
# Find where MAX_API_KEY_FILE_SIZE is defined
grep -r "define MAX_API_KEY_FILE_SIZE" src/Repository: pgEdge/pgedge-vectorizer
Length of output: 126
🏁 Script executed:
# Get the full load_api_key implementation from OpenAI provider
rg -A 60 "^static char \*\nlload_api_key\(const char" src/provider_openai.c | head -70Repository: pgEdge/pgedge-vectorizer
Length of output: 259
🏁 Script executed:
# Compare implementations - extract full function from both files
sed -n '/^static char \*$/,/^}$/p' src/provider_openai.c | grep -A 60 "load_api_key" | head -65Repository: pgEdge/pgedge-vectorizer
Length of output: 1485
Add post-open re-validation and bounds enforcement during file read.
The pre-open size check at line 291 has a TOCTOU window: the file can be replaced between stat() and fopen() (line 300). Additionally, there's no validation that the opened file is a regular file (could be a device/FIFO), and the fgetc loop lacks runtime bounds enforcement.
To fix:
- Add
S_ISREG(st.st_mode)check before opening - Use
open()withO_RDONLY | O_CLOEXECfollowed byfstat()andfdopen()to reduce TOCTOU window and re-validate post-open - Add a
bytes_readcounter in thefgetcloop (lines 311-315) that halts if it exceedsMAX_API_KEY_FILE_SIZE
Note: The Voyage provider has the same vulnerability and should receive identical hardening.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/provider_openai.c` around lines 290 - 297, The pre-open stat-based size
check (st.st_size > MAX_API_KEY_FILE_SIZE) has a TOCTOU and lacks regular-file
validation and runtime bounds during read; fix by verifying S_ISREG(st.st_mode)
before proceeding, then replace fopen() with open(path, O_RDONLY|O_CLOEXEC)
followed by fstat(fd) to re-check size and S_ISREG on the opened descriptor,
convert the fd to a FILE* with fdopen(), and in the fgetc() loop introduce a
bytes_read counter that stops and returns an error if it exceeds
MAX_API_KEY_FILE_SIZE; apply the identical changes to the Voyage provider
implementation as well.
| /* Reject unreasonably large files before reading (CWE-20) */ | ||
| if (st.st_size > MAX_API_KEY_FILE_SIZE) | ||
| { | ||
| *error_msg = psprintf("API key file exceeds maximum allowed size of %d bytes", | ||
| MAX_API_KEY_FILE_SIZE); | ||
| pfree(expanded_path); | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate and examine src/provider_voyage.c
git ls-files src/provider_voyage.cRepository: pgEdge/pgedge-vectorizer
Length of output: 89
🏁 Script executed:
# Get the file size to determine if we can read it entirely
wc -l src/provider_voyage.cRepository: pgEdge/pgedge-vectorizer
Length of output: 93
🏁 Script executed:
# Read the load_api_key function to understand the current implementation
# First, find the function definition
rg -n "load_api_key" src/provider_voyage.c | head -20Repository: pgEdge/pgedge-vectorizer
Length of output: 265
🏁 Script executed:
# Read the load_api_key function starting at line 260
sed -n '260,330p' src/provider_voyage.cRepository: pgEdge/pgedge-vectorizer
Length of output: 1673
🏁 Script executed:
# Search for MAX_API_KEY_FILE_SIZE definition
rg -n "MAX_API_KEY_FILE_SIZE" src/provider_voyage.c | head -10Repository: pgEdge/pgedge-vectorizer
Length of output: 149
🏁 Script executed:
# Check the top of the file for includes and defines
head -50 src/provider_voyage.cRepository: pgEdge/pgedge-vectorizer
Length of output: 1344
🏁 Script executed:
# Find and search header files
fd -name "*.h" | head -20Repository: pgEdge/pgedge-vectorizer
Length of output: 298
🏁 Script executed:
# Search for MAX_API_KEY_FILE_SIZE in all header files
rg -n "MAX_API_KEY_FILE_SIZE" --type c --type hRepository: pgEdge/pgedge-vectorizer
Length of output: 381
TOCTOU and non-regular file vulnerabilities allow unbounded reads.
The current implementation is vulnerable to three security issues:
-
TOCTOU (Time-of-Check-Time-of-Use): The window between
stat()(line 283) andfopen()(line 304) allows an attacker to replace the file with a symlink, pipe, or oversized file. -
Non-regular file bypass: No
S_ISREG()check means symlinks, named pipes, character/block devices, and sockets bypass thest_sizecheck. Reading from pipes or devices can cause indefinite blocking; symlinks can point to arbitrarily large files. -
Unbounded read loop: The
fgetc()loop (line 312+) lacks a runtime size cap. Even if the initialstat()check passes, the loop can read more thanMAX_API_KEY_FILE_SIZEif the file grows or if a non-regular file is read.
The proposed fix (adding #include <fcntl.h>, using open() + fstat() after opening, enforcing S_ISREG(), and adding a bytes_read counter in the loop) correctly addresses all three vulnerabilities. The same issues exist in provider_openai.c (line 291+).
Proposed hardening (open+fstat+regular-file check+read cap)
+#include <fcntl.h>
...
static char *
load_api_key(const char *filepath, char **error_msg)
{
FILE *fp;
+ int fd;
char *expanded_path;
StringInfoData key_buf;
int c;
+ size_t bytes_read = 0;
struct stat st;
...
- /* Reject unreasonably large files before reading (CWE-20) */
- if (st.st_size > MAX_API_KEY_FILE_SIZE)
+ /* Require regular files and enforce max size */
+ if (!S_ISREG(st.st_mode) || st.st_size < 0 || st.st_size > MAX_API_KEY_FILE_SIZE)
{
*error_msg = psprintf("API key file exceeds maximum allowed size of %d bytes",
MAX_API_KEY_FILE_SIZE);
pfree(expanded_path);
return NULL;
}
- /* Open and read file */
- fp = fopen(expanded_path, "r");
- if (fp == NULL)
+ fd = open(expanded_path, O_RDONLY | O_CLOEXEC);
+ if (fd < 0 || fstat(fd, &st) != 0 || !S_ISREG(st.st_mode) ||
+ st.st_size < 0 || st.st_size > MAX_API_KEY_FILE_SIZE)
{
- *error_msg = psprintf("Failed to open API key file: %s", expanded_path);
+ if (fd >= 0)
+ close(fd);
+ *error_msg = psprintf("Invalid API key file: %s", expanded_path);
pfree(expanded_path);
return NULL;
}
+ fp = fdopen(fd, "r");
+ if (fp == NULL)
+ {
+ close(fd);
+ *error_msg = psprintf("Failed to open API key file: %s", expanded_path);
+ pfree(expanded_path);
+ return NULL;
+ }
...
while ((c = fgetc(fp)) != EOF)
{
+ if (++bytes_read > MAX_API_KEY_FILE_SIZE)
+ {
+ *error_msg = psprintf("API key file exceeds maximum allowed size of %d bytes",
+ MAX_API_KEY_FILE_SIZE);
+ fclose(fp);
+ pfree(expanded_path);
+ pfree(key_buf.data);
+ return NULL;
+ }
if (c != '\n' && c != '\r' && c != ' ' && c != '\t')
appendStringInfoChar(&key_buf, c);
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/provider_voyage.c` around lines 292 - 299, The filesize/safety check
using stat() then fopen() is vulnerable to TOCTOU and non-regular files; change
the logic in the API-key loading routine to open the file with open() (include
<fcntl.h>), call fstat() on the opened fd and verify S_ISREG(st_mode) and
st_size <= MAX_API_KEY_FILE_SIZE before reading, and then read using
read()/fread() with a running bytes_read counter to enforce a hard cap of
MAX_API_KEY_FILE_SIZE (reject if exceeded) to prevent unbounded reads; apply the
same open+fstat+S_ISREG+read-cap pattern to the analogous code in
provider_openai.c.
Summary
Fix three Codacy-reported security issues in the pgedge-vectorizer extension.
CWE-20 / CWE-120 — unbounded API key file reads (
provider_openai.c,provider_voyage.c)load_api_key()usedfgetc()in an unbounded loop backed by aStringInfoDatabuffer that grows without limit. A large or malicious key file could exhaust server memory. The fix checksst.st_sizeagainstMAX_API_KEY_FILE_SIZE(4096 bytes) beforefopen(), using thestat()result that was already available for the permissions check. The constant is defined once inpgedge_vectorizer.hand shared by both providers.CWE-807 — untrusted
HOMEenvironment variable (provider_openai.c,provider_voyage.c)expand_tilde()resolved~usinggetenv("HOME"), which is fully attacker-controllable via the process environment. Replaced withgetpwuid(geteuid()), which reads the home directory from the system user database and cannot be overridden by unprivileged callers.CWE-120 —
strncpywithout guaranteed null termination (worker.c)strncpy(dbname, db_name, NAMEDATALEN - 1)followed by a manualdbname[NAMEDATALEN - 1] = '\0'is error-prone and was accompanied by now-incorrect flawfinder suppression comments. Replaced withstrlcpy(dbname, db_name, NAMEDATALEN), which always null-terminates and needs no fixup line.