-
Notifications
You must be signed in to change notification settings - Fork 42
feat: output security reports as JSON when requested #136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @jajanet, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the security analysis tool by introducing the ability to output findings in a structured JSON format. This change is crucial for enabling automated processing, integration with external systems like CI/CD pipelines, and improving the overall utility of the security reports. The implementation involves adding a new parsing utility, integrating it as a server tool, and updating the workflow configuration to support conditional JSON generation. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
🤖 Hi @jajanet, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a valuable feature to output security reports in JSON format, which will greatly improve programmatic consumption of the results. The implementation includes a new markdown parser, tests, and a server tool to handle the conversion.
My review focuses on improving the robustness and correctness of the new parser and its integration. I've identified a critical bug in the parsing logic that could lead to incorrect JSON output, along with a few medium-severity issues related to error handling, a typo in a prompt, and code consistency. The provided code suggestions aim to resolve these issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📋 Review Summary
This pull request introduces a valuable feature to output security reports in JSON format. The implementation is well-executed, with a robust markdown parser and comprehensive test coverage. The code is clean, and the new functionality is a great addition for enabling programmatic use of the security reports.
🔍 General Feedback
- I've left a few minor suggestions for improving type safety and code readability, but overall this is a solid contribution.
| } | ||
| }) as any | ||
| ); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| findings.push({ | ||
| vulnerability: extract("Vulnerability"), | ||
| vulnerabilityType: extract("Vulnerability Type"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const rawSink = extract("Sink Location"); | ||
|
|
||
| let lineContent = extract("Line Content"); | ||
| if (lineContent) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request introduces a new feature to output security reports in JSON format. The implementation adds a new tool to the MCP server and a parser to convert the markdown report to JSON. The code is well-structured and includes tests for the new parser.
🔍 General Feedback
- The new feature is a valuable addition that will improve the usability of the security extension.
- The code is clean and easy to understand.
- One potential security vulnerability was identified in the new parser.
| const extract = (label: string): string | null => { | ||
| const fieldNames = 'Vulnerability Type|Severity|Source Location|Sink Location|Data Type|Line Content|Description|Recommendation'; | ||
| const patternStr = `(?:-?\\s*\\**)?${label}\\**:\\s*([\\s\\S]*?)(?=\\n(?:-?\\s*\\**)?(?:${fieldNames})|$)`; | ||
| const pattern = new RegExp(patternStr, 'i'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regular expression used to parse the markdown report is vulnerable to Regular Expression Denial of Service (ReDoS). An attacker can craft a malicious markdown file that will cause the regex engine to backtrack excessively, leading to a denial of service. The vulnerable part is ([\\s\\S]*?) which can cause catastrophic backtracking.
| const pattern = new RegExp(patternStr, 'i'); | |
| const fieldNames = 'Vulnerability Type|Severity|Source Location|Sink Location|Data Type|Line Content|Description|Recommendation'; | |
| const patternStr = `(?:-?\\\\s*\\\\**)?${label}\\\\**:\\\\s*([\\\\s\\\\S]*?)(?=\\\\n(?:-?\\\\s*\\\\**)?(?:${fieldNames})|$)`; | |
| const pattern = new RegExp(patternStr, 'i'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we fix this?
QuanZhang-William
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR!
Left some questions and I think we should also update the README.md
shrishabh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR!
I question that I was wondering - here we are deterministically parsing the markdown into json using regexes. Another option could have been using LLMs to adjust their outputs so that they conform to this particular format. I think deterministically parsing is a great first step, but we might want to keep an eye out for failure cases and potentially move to the alternative if we see lots of errors.
| const extract = (label: string): string | null => { | ||
| const fieldNames = 'Vulnerability Type|Severity|Source Location|Sink Location|Data Type|Line Content|Description|Recommendation'; | ||
| const patternStr = `(?:-?\\s*\\**)?${label}\\**:\\s*([\\s\\S]*?)(?=\\n(?:-?\\s*\\**)?(?:${fieldNames})|$)`; | ||
| const pattern = new RegExp(patternStr, 'i'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we fix this?
| vulnerabilityType: extract("Vulnerability Type"), | ||
| severity: extract("Severity"), | ||
| dataType: extract("Data Type"), | ||
| sourceLocation: parseLocation(rawSource), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if there is an error in any of these extraction methods? It might be worth considering failing gracefully here as the rest of the extraction might still produce a reasonable report?
QuanZhang-William
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm to me.
Please address other feedbacks before merge :)
This PR adds the ability to output security report results in JSON. This enables programmatic parsing for accuracy checks, standardization, and integration with SCM tools and CI/CD pipelines (e.g., GitHub Actions, Jenkins)
Example of original markdown report and corresponding JSON:
turns into
[ { "vulnerability": "Path Traversal and Command Injection", "vulnerabilityType": "Security", "severity": "Critical", "extension": { "sourceLocation": { "File": "lib/router.js", "startLine": null, "endLine": null }, "sinkLocation": { "File": null, "startLine": null, "endLine": null }, "dataType": null }, "lineContent": "`full_path = \"\" + dispatch.static_route + (unescape(pathname));`", "description": "The `pathname` variable, derived from the URL, is not sanitized before being used to construct a file path. An attacker can use URLencoded characters like `../` to traverse the file system and access arbitrary files. This vulnerability is further escalated to command injection because the `full_path` is used in a `spawn` call, allowing an attacker to execute arbitrary commands on the system.", "recommendation": "Sanitize the `pathname` variable by removing any directory traversal characters before using it to construct a file path. Use `path.normalize()` or a similar function to resolve the path and ensure it stays within the intended directory." }, ... ]Fields are optional and written as null if not present, as the tool assumes that the
DRAFT_SECURITY_REPORT.mdfile is well-formed and has the expected fieldsThis is an initial implementation to help improve processes, and we may iterate using Vertex calls in the future. There is an upcoming PR on adding a subfield for code changes sometimes present under
recommendationas well