Fix accumulating memory issue by Zain-Mahmoud · Pull Request #1297 · pyta-uoft/pyta

Zain-Mahmoud · 2026-02-10T03:29:59Z

Proposed Changes

This PR aims to fix the issue where repeated calls to the PyTA check_all() function cause the memory usage of the process to increase with each call (issue #646).

...

Final runner script used

runnerscript.py

import io
import gc
import tracemalloc
import python_ta

def run_runner():
    buf = io.StringIO()
    python_ta.check_all(output=buf)
    gc.collect()

tracemalloc.start()

def snapshot_runs():
    run_runner()
    s1 = tracemalloc.take_snapshot()
    run_runner()
    s2 = tracemalloc.take_snapshot()
    stats = s2.compare_to(s1, "traceback")
    for stat in stats[:10]:
        print(f"{stat.traceback.format()}")
        print(f"size_diff={stat.size_diff} | count_diff={stat.count_diff}")
    print("-" * 50)

def trace_memory_runs():
    for i in range(100):
        run_runner()
        current = tracemalloc.get_traced_memory()
        print(f" {i} | {int(current[0] / 1024 ** 2)} MB")

trace_memory_runs()

Screenshots of your changes (if applicable)

Snippet of the runner script's output:

Type of Change

(Write an X or a brief description next to the type or types that best describe your changes.)

Type	Applies?
🚨 Breaking change (fix or feature that would cause existing functionality to change)
✨ New feature (non-breaking change that adds functionality)
🐛 Bug fix (non-breaking change that fixes an issue)	x
♻️ Refactoring (internal change to codebase, without changing functionality)
🚦 Test update (change that only adds or modifies tests)
📚 Documentation update (change that only updates documentation)
📦 Dependency update (change that updates a dependency)
🔧 Internal (change that only affects developers or continuous integration)

Checklist

(Complete each of the following items for your pull request. Indicate that you have completed an item by changing the [ ] into a [x] in the raw text, or by clicking on the checkbox in the rendered description on GitHub.)

Before opening your pull request:

I have performed a self-review of my changes.
- Check that all changed files included in this pull request are intentional changes.
- Check that all changes are relevant to the purpose of this pull request, as described above.
I have added tests for my changes, if applicable.
- This is required for all bug fixes and new features.
I have updated the project documentation, if applicable.
- This is required for new features.
I have updated the project Changelog (this is required for all changes).
If this is my first contribution, I have added myself to the list of contributors.

After opening your pull request:

I have verified that the pre-commit.ci checks have passed.
I have verified that the CI tests have passed.
I have reviewed the test coverage changes reported by Coveralls.
I have requested a review from a project maintainer.

Questions and Comments

(Include any questions or comments you have regarding your changes.)

Zain-Mahmoud · 2026-02-10T03:35:20Z

Investigation

import io
import tracemalloc
import python_ta

def run_runner():
    buf = io.StringIO()
    python_ta.check_all(output=buf)

tracemalloc.start()

for i in range(5000):
    run_runner()
    current = tracemalloc.get_traced_memory()
    print(f"{i} | {int(current[0] / 1024**2)} MB")

Running the above code snippet, we can see that the memory usage does in fact increase with each call:

0 | 42 MB
1 | 44 MB
2 | 46 MB
3 | 48 MB
4 | 49 MB
5 | 51 MB
6 | 53 MB
7 | 55 MB
8 | 56 MB
9 | 58 MB
...
91 | 121 MB
92 | 122 MB
93 | 124 MB
94 | 125 MB
95 | 127 MB
96 | 128 MB
97 | 130 MB
98 | 132 MB
99 | 133 MB
...

But there does seem to be some freeing of memory (approximately 20 or so MB) every ~20 calls.

I also checked both Pylint and Astroid's changelogs but I couldn't find any fixes related to any memory leak issues.

david-yz-liu · 2026-02-11T21:46:27Z

Thanks @Zain-Mahmoud. I did a bit of digging and found this pylint option: https://pylint.pycqa.org/en/latest/user_guide/configuration/all-options.html#clear-cache-post-run. Could you try to figure out how we can run PythonTA with this option enabled, and whether that fixes the issue?

Zain-Mahmoud · 2026-02-12T03:50:42Z

Hi @david-yz-liu
I tried using the clear-cache-post-run option after instantiating the linter but it doesn't seem like the issue has been fixed. I've also pushed the updated code I'm using.

david-yz-liu · 2026-02-12T12:35:07Z

Alright thanks @Zain-Mahmoud, I think we'll need to do some more research then. There are two tools that you can use to help with memory profiling in Python: tracemalloc, which is a Python built-in module, and memray, which is a third-party package. Please try using those to investigate which parts of the codebase may be resulting in issues.

BTW, you could also do some experimentation like commenting out particular components to see if that makes a difference.

Zain-Mahmoud · 2026-03-22T04:55:12Z

Hi @david-yz-liu, apologies for the delay. Here's the investigation report. I've also pushed the version of the code where I was clearing the linter.msgs_store.get_message_definitions cache and the scripts that I was running.

Investigation

Runner function

def run_runner():
    buf = io.StringIO()
    python_ta.check_all(output=buf)

Memory tracing

Using tracemalloc.get_traced_memory(), I was able to narrow down the possible functions causing the memory leak:

check(): line 108 in helpers.py
generate_reports(): line 190 in __init__.py
load_reporter_by_class(): line 288 in helpers.py
load_default_plugins(): line 277 in helpers.py
load_plugin_modules(): line 280 in helpers.py

Commenting out these functions caused the memory usage reported by tracemalloc.get_traced_memory() to decrease, implying that these functions were part of the issue.

Snapshots

When using an alternative approach to recording the memory usage. I used tracemalloc.take_snapshot() to take a snapshot of the memory usage before and after the python_ta.check_all() function call and then used snapshot.compare_to() to compare the two snapshots. Surprisingly, many of the functions were astroid functions and other built in library functions and none of them were the pylint functions from above or were even called by these functions. The returned list of Statistic objects reported most of the differences in memory usage between the two snapshots came from line 91 of python_ta/checkers/static_type_checker.py.

python-ta/src/python_ta/checkers/static_type_checker.py:91 | size_diff=20747126  | count_diff=214488

The line causing the memory difference across the snapshots:

result, _, _ = api.run([filename] + mypy_options)

Commenting out this function caused the memory usage reported by tracemalloc.get_traced_memory() to slightly decrease compared to the other pylint functions.

MyPy

Since the line was calling an external MyPy function, api.run, I tried running the function with the same arguments separately in an isolated file to see if the library function was causing the problem.

import tracemalloc
from mypy import api

for i in range(100):
    api.run(["testmypy.py", "--ignore-missing-imports", "--follow-imports=skip"])
    current = tracemalloc.get_traced_memory()
    print(f" {i} | {int(current[0] / 1024 ** 2)} MB")

The output:

0 | 0 MB
 1 | 0 MB
 2 | 0 MB
 3 | 0 MB
 4 | 0 MB
 5 | 0 MB
 6 | 0 MB
 7 | 0 MB
 8 | 0 MB
 9 | 0 MB
 10 | 0 MB
 11 | 0 MB
 12 | 0 MB
 13 | 0 MB
 14 | 0 MB
 15 | 0 MB

This implies that it wasn't the library function that was causing the increased memory usage, rather how PyTA was using it.

A potential solution to this was to replace the api.run call with a call to Python's subprocess.run. This seems to achieve the same effect as commenting out the api.run call in terms of reducing the memory reported, and it no longer shows up in the snapshot differences.

for more information, see https://pre-commit.ci

david-yz-liu · 2026-03-24T11:56:48Z

Hi @Zain-Mahmoud, thanks for the updated report. The two code changes you made are good.

In the testing scripts you used:

Your mypy runner didn't call tracemalloc.start(), which is required to actually trace the memory usage (to see non-zero results). Please make that change and re-run the test you made to update your report.
In runner_script.py snapshot_runs, since the largest memory usage is from the first call to python_ta.check_all, you should call run_runner once first before using tracemalloc, and then start comparing snapshots. This will let you zoom in on the increases in memory usage from run to run, which is more realistic to address. (This will reveal that the changes you made actually are working to decrease this memory usage.)

Please make these changes then update your report. Your report should then include the output of the runner_script.py, which will allow us to better analyse the actual parts that are causing increased memory usage.

Zain-Mahmoud · 2026-03-26T05:23:59Z

Thanks @david-yz-liu. I’ve updated the report below:

Updates

Mypy

After starting tracemalloc.start(), we now see that the api.run() call was in fact consuming some memory after each call.

import tracemalloc
from mypy import api

tracemalloc.start()

for i in range(100):
   api.run(["testmypy.py", "--ignore-missing-imports", "--follow-imports=skip"])
   current = tracemalloc.get_traced_memory()
   print(f" {i} | {int(current[0] / 1024 ** 2)} MB")

The output:

0 | 8 MB
 1 | 8 MB
 2 | 8 MB
 3 | 8 MB
 4 | 9 MB
 5 | 9 MB
 6 | 9 MB
 7 | 9 MB
 8 | 9 MB
 9 | 9 MB
 10 | 10 MB
 11 | 10 MB
 12 | 10 MB
 13 | 10 MB
 14 | 10 MB
 15 | 10 MB
 16 | 11 MB
 17 | 11 MB
 18 | 11 MB
 19 | 11 MB
 20 | 11 MB
 21 | 11 MB
 22 | 12 MB

This explains why the api.run() call was showing up in the snapshot differences and why commenting out this line (and replacing it with Python’s subprocess functions) reported a decrease in memory usage.

Comparing snapshots

Following the above advice, I tried running run_runner() once, taking a snapshot, running run_runner() again and then taking the second snapshot and comparing these two snapshots.

The output:

'  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/pylint/reporters/base_reporter.py", line 46', '    print(string, file=self.out)']
size_diff=197173 | count_diff=1
['  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/pylint/checkers/base_checker.py", line 207', '    return MessageDefinition(self, msgid, msg, descr, symbol, **options)']
size_diff=145960 | count_diff=877
['  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/markupsafe/__init__.py", line 129', '    return super().__new__(cls, object)']
size_diff=141099 | count_diff=1730
['  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/tracemalloc.py", line 558', '    traces = _get_traces()']
size_diff=125776 | count_diff=2196
['  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/astroid/rebuilder.py", line 62', '    self._data = data.split("\\n") if data else None']
size_diff=-67668 | count_diff=-763
['  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/argparse.py", line 1465', '    action = action_class(**kwargs)']
size_diff=61288 | count_diff=336
['  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/pylint/message/message_id_store.py", line 157', '    ids = self.__old_names.get(msgid, [msgid])']
size_diff=54464 | count_diff=1702
['  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/pylint/message/message_definition.py", line 47', '    self.old_names: list[tuple[str, str]] = []']
size_diff=47600 | count_diff=850
['  File "<string>", line 1']
size_diff=39800 | count_diff=453
['  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/pylint/message/message_id_store.py", line 137', '    msgid = msgid_or_symbol.upper()']
size_diff=31878 | count_diff=693

Again, most of these were external library functions, but now there were no library imports appearing in the comparison.

However, there were still many functions related to the MessageDefinitionStore in this list even after clearing the global linter MessageDefinitionStore cache after each call to python_ta.check_all().

Markupsafe

Another library function call that was high on this list is Markupsafe. It seemed that the constructor to the Markupsafe object was reporting the issue when we created each instance of the ExtendedMarkupsafe class for the messages. I wrote a similar script to the mypy script (remembering to enable tracemalloc.start() this time).

import tracemalloc
from markupsafe import Markup

tracemalloc.start()

def snapshot_runs():
  m = Markup("hello")
  s1 = tracemalloc.take_snapshot()
  m = Markup("hi")
  s2 = tracemalloc.take_snapshot()
  stats = s2.compare_to(s1, "traceback")
  for stat in stats[:10]:

      print(f"{stat.traceback.format()}")
      print(f"size_diff={stat.size_diff} | count_diff={stat.count_diff}")
  print("-"*50)

def trace_memory_runs():
  for i in range(100):
      m = Markup("hi")
      current = tracemalloc.get_traced_memory()
      print(f" {i} | {int(current[0] )} ")

trace_memory_runs()

# for i in range(10):
#     snapshot_runs()

I had to remove the conversion to MB in trace_memory_runs since it was reporting 0 otherwise (was being rounded down to 0).
The output:

Also when running the snapshots, we get this output:

['  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/tracemalloc.py", line 560', '    return Snapshot(traces, traceback_limit)']
size_diff=328 | count_diff=1
['  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/tracemalloc.py", line 423', '    self.traces = _Traces(traces)']
size_diff=328 | count_diff=1
['  File "/Users/zain/SDS/pyta-fork/packages/python-ta/src/python_ta/check/mypy_runner.py", line 10', '    m = Markup("hi")']
size_diff=48 | count_diff=1
['  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/markupsafe/__init__.py", line 129', '    return super().__new__(cls, object)']
size_diff=45 | count_diff=1
['  File "/Users/zain/SDS/pyta-fork/packages/python-ta/src/python_ta/check/mypy_runner.py", line 21', '    def trace_memory_runs():']
size_diff=0 | count_diff=0
['  File "/Users/zain/SDS/pyta-fork/packages/python-ta/src/python_ta/check/mypy_runner.py", line 7', '    def snapshot_runs():']
size_diff=0 | count_diff=0
['  File "/Users/zain/SDS/pyta-fork/packages/python-ta/src/python_ta/check/mypy_runner.py", line 30', '    for i in range(10):']
size_diff=0 | count_diff=0
--------------------------------------------------
['  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/tracemalloc.py", line 558', '    traces = _get_traces()']
size_diff=2616 | count_diff=49
['  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/tracemalloc.py", line 560', '    return Snapshot(traces, traceback_limit)']
size_diff=312 | count_diff=1
['  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/tracemalloc.py", line 423', '    self.traces = _Traces(traces)']
size_diff=312 | count_diff=1
['  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/re/_parser.py", line 585', '    code1 = LITERAL, _ord(this)']
size_diff=-280 | count_diff=-5

This shows that the constructor call is in fact using some memory, although in this script, it’s almost negligible and isn't accumulating across calls.

david-yz-liu · 2026-03-27T15:32:06Z

@Zain-Mahmoud okay this is looking better, but please also call gc.collect() at the end of run_runner in order to force garbage collection, which will help ensure the snapshots are picking up truly accumulated memory.

Zain-Mahmoud · 2026-03-30T02:57:48Z

@david-yz-liu I've updated the runner script with gc.collect() call. It seems like the memory consumption is plateauing

 0 | 51 MB
 1 | 51 MB
 2 | 52 MB
 3 | 52 MB
 4 | 52 MB
 5 | 52 MB
 6 | 52 MB
 7 | 52 MB
 8 | 52 MB
 9 | 52 MB
 10 | 52 MB
 11 | 52 MB
 12 | 53 MB
 13 | 53 MB
 14 | 53 MB
 15 | 53 MB
 16 | 53 MB
 17 | 53 MB
 18 | 53 MB
 19 | 53 MB

david-yz-liu · 2026-03-30T14:00:21Z

Thanks @Zain-Mahmoud! So I think we are ready to wrap this up. To turn this into a mergeable PR, please remove the runner_script.py file from your branch, but instead leave a comment with the full comments directly in the PR description (perhaps under a <details> so it doesn't take up space by default). Then just tidy up your changes and update the Changelog with the final description.

coveralls · 2026-03-31T19:31:01Z

Pull Request Test Coverage Report for Build 23830640442

Details

5 of 5 (100.0%) changed or added relevant lines in 2 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.008%) to 89.995%

Totals
Change from base Build 23057700087:	0.008%
Covered Lines:	3481
Relevant Lines:	3868

💛 - Coveralls

david-yz-liu · 2026-04-01T01:52:15Z


 from __future__ import annotations

+import gc


Oh, we shouldn't actually call gc.collect in here, as it isn't necessary for the standard single-use of PythonTA. It could go in the runner script for testing, but not in the code itself.

david-yz-liu · 2026-04-01T01:52:42Z

        autoformat=autoformat,
        on_verify_fail=on_verify_fail,
    )
+    reporter.linter.msgs_store.get_message_definitions.cache_clear()


This part is good, but move it into the _check helper function. This way it'll affect both check_all and check_errors.

david-yz-liu

Nice work, @Zain-Mahmoud!

initial commit

c17f795

Zain-Mahmoud requested a review from david-yz-liu February 10, 2026 03:34

used clear cache after run option

980caab

Zain-Mahmoud added 6 commits March 21, 2026 23:57

Merge branch 'master' into issue-646

d928c3f

remove linter cache clear option

22461fc

Merge branch 'master' into issue-646

d99f85b

testing script

6b4ef1b

clear message definition cache

97fc666

replace api.run with subprocess.run

cd2cea1

[pre-commit.ci] auto fixes from pre-commit.com hooks

b8cf0c1

for more information, see https://pre-commit.ci

Zain-Mahmoud requested review from david-yz-liu and removed request for david-yz-liu March 23, 2026 19:28

updated runner script

89cc75a

Zain-Mahmoud requested review from david-yz-liu and removed request for david-yz-liu March 26, 2026 05:26

forced garbage collection in runner script

bf3396d

Zain-Mahmoud requested review from david-yz-liu and removed request for david-yz-liu March 30, 2026 02:58

Zain-Mahmoud added 2 commits March 30, 2026 15:24

added garbage collection and removed unnecessary imports

9adc5dd

update CHANEGLOG.md

9d2d5da

Zain-Mahmoud requested review from david-yz-liu and removed request for david-yz-liu March 30, 2026 22:53

Zain-Mahmoud marked this pull request as ready for review March 31, 2026 19:19

david-yz-liu reviewed Apr 1, 2026

View reviewed changes

removed gc and moved cache clear

23e06dd

Zain-Mahmoud requested a review from david-yz-liu April 1, 2026 07:41

david-yz-liu approved these changes Apr 1, 2026

View reviewed changes

david-yz-liu merged commit 1b8900a into pyta-uoft:master Apr 1, 2026
30 checks passed

Conversation

Zain-Mahmoud commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed Changes

Type of Change

Checklist

Questions and Comments

Uh oh!

Zain-Mahmoud commented Feb 10, 2026

Investigation

Uh oh!

david-yz-liu commented Feb 11, 2026

Uh oh!

Zain-Mahmoud commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-yz-liu commented Feb 12, 2026

Uh oh!

Zain-Mahmoud commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Investigation

Runner function

Memory tracing

Snapshots

MyPy

Uh oh!

david-yz-liu commented Mar 24, 2026

Uh oh!

Zain-Mahmoud commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Updates

Mypy

Comparing snapshots

Markupsafe

Uh oh!

david-yz-liu commented Mar 27, 2026

Uh oh!

Zain-Mahmoud commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-yz-liu commented Mar 30, 2026

Uh oh!

coveralls commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 23830640442

Details

💛 - Coveralls

Uh oh!

david-yz-liu Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

david-yz-liu Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

david-yz-liu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Zain-Mahmoud commented Feb 10, 2026 •

edited

Loading

Zain-Mahmoud commented Feb 12, 2026 •

edited

Loading

Zain-Mahmoud commented Mar 22, 2026 •

edited

Loading

Zain-Mahmoud commented Mar 26, 2026 •

edited

Loading

Zain-Mahmoud commented Mar 30, 2026 •

edited

Loading

coveralls commented Mar 31, 2026 •

edited

Loading