Fix accumulating memory issue#1297
Conversation
InvestigationRunning the above code snippet, we can see that the memory usage does in fact increase with each call: But there does seem to be some freeing of memory (approximately 20 or so MB) every ~20 calls. I also checked both Pylint and Astroid's changelogs but I couldn't find any fixes related to any memory leak issues. |
|
Thanks @Zain-Mahmoud. I did a bit of digging and found this pylint option: https://pylint.pycqa.org/en/latest/user_guide/configuration/all-options.html#clear-cache-post-run. Could you try to figure out how we can run PythonTA with this option enabled, and whether that fixes the issue? |
|
Hi @david-yz-liu |
|
Alright thanks @Zain-Mahmoud, I think we'll need to do some more research then. There are two tools that you can use to help with memory profiling in Python: tracemalloc, which is a Python built-in module, and memray, which is a third-party package. Please try using those to investigate which parts of the codebase may be resulting in issues. BTW, you could also do some experimentation like commenting out particular components to see if that makes a difference. |
|
Hi @david-yz-liu, apologies for the delay. Here's the investigation report. I've also pushed the version of the code where I was clearing the InvestigationRunner functionMemory tracingUsing Commenting out these functions caused the memory usage reported by SnapshotsWhen using an alternative approach to recording the memory usage. I used The line causing the memory difference across the snapshots: Commenting out this function caused the memory usage reported by MyPySince the line was calling an external MyPy function, The output: This implies that it wasn't the library function that was causing the increased memory usage, rather how PyTA was using it. A potential solution to this was to replace the |
for more information, see https://pre-commit.ci
|
Hi @Zain-Mahmoud, thanks for the updated report. The two code changes you made are good. In the testing scripts you used:
Please make these changes then update your report. Your report should then include the output of the |
|
Thanks @david-yz-liu. I’ve updated the report below: UpdatesMypyAfter starting The output: This explains why the Comparing snapshotsFollowing the above advice, I tried running The output: Again, most of these were external library functions, but now there were no library imports appearing in the comparison. However, there were still many functions related to the MessageDefinitionStore in this list even after clearing the global linter MessageDefinitionStore cache after each call to python_ta.check_all(). MarkupsafeAnother library function call that was high on this list is I had to remove the conversion to MB in trace_memory_runs since it was reporting 0 otherwise (was being rounded down to 0). Also when running the snapshots, we get this output: This shows that the constructor call is in fact using some memory, although in this script, it’s almost negligible and isn't accumulating across calls. |
|
@Zain-Mahmoud okay this is looking better, but please also call |
|
@david-yz-liu I've updated the runner script with |
|
Thanks @Zain-Mahmoud! So I think we are ready to wrap this up. To turn this into a mergeable PR, please remove the |
Pull Request Test Coverage Report for Build 23830640442Details
💛 - Coveralls |
|
|
||
| from __future__ import annotations | ||
|
|
||
| import gc |
There was a problem hiding this comment.
Oh, we shouldn't actually call gc.collect in here, as it isn't necessary for the standard single-use of PythonTA. It could go in the runner script for testing, but not in the code itself.
| autoformat=autoformat, | ||
| on_verify_fail=on_verify_fail, | ||
| ) | ||
| reporter.linter.msgs_store.get_message_definitions.cache_clear() |
There was a problem hiding this comment.
This part is good, but move it into the _check helper function. This way it'll affect both check_all and check_errors.
david-yz-liu
left a comment
There was a problem hiding this comment.
Nice work, @Zain-Mahmoud!
Proposed Changes
This PR aims to fix the issue where repeated calls to the PyTA
check_all()function cause the memory usage of the process to increase with each call (issue #646)....
Final runner script used
runnerscript.pyScreenshots of your changes (if applicable)
Snippet of the runner script's output:Type of Change
(Write an
Xor a brief description next to the type or types that best describe your changes.)Checklist
(Complete each of the following items for your pull request. Indicate that you have completed an item by changing the
[ ]into a[x]in the raw text, or by clicking on the checkbox in the rendered description on GitHub.)Before opening your pull request:
After opening your pull request:
Questions and Comments
(Include any questions or comments you have regarding your changes.)