Collision Performance by AmityWilder · Pull Request #4 · anadon/JLS

AmityWilder · 2026-02-08T21:40:10Z

Copy element Sets to temporary arrays and ArrayLists, since they will be iterated over far more than they will be searched within the overlap method.
Implement a bounding box for the selection and remove any elements not overlapping the bounding box from the temporary unselected element ArrayList before any further checks.
- Use swap-removal since order doesn't matter, making the removal worst case $O(1)$ instead of $O(n)$
Crunch the $O(n^3)$ section in the "else" case for simple intersections down to $O(n^2)$, since that part seems to just be checking against all elements per element per selected element--despite only involving the selected element and all other elements, not the selected element and all permutations of other elements.
i.e.

for (Element sel : selected) {          // O(n)
  for (Element el : elementsArr) {      // O(n^2)
    if (intersecting) { ... }
    else {
      for (Element elm : elementsArr) { // O(n^3)
        if (sel ? elm) // no mention of `el`

$$\large\Downarrow$$

for (Element sel : selected) {       // O(n)
  for (Element el : elementsArr) {   // O(n^2)
    if (intersecting) { ... }
  }
  if (none intersected) {
    for (Element el : elementsArr) { // O(n^2)
      if (sel ? el) // ...

Related to #3

… "removeAll(collection)"

Collections was imported for swap remove, before realizing swapping wasnt needed and copying would suffice.

…loop I'm not sure that "for (sel : selected) { for (el : elements) { for (elm : elements) { /* only acknowledging sel and elm*/ } } }" makes sense. Perhaps at some point in development, the "sel.intersects(el)" may have been mistaken for a bulk test of all the elements? The way it was written prior to this commit, it would check all elements for all elements, turning an O(n^2) operation into an O(n^3) operation with n:1 duplicate operations (if I am understanding the intent of the code correctly).

AmityWilder · 2026-02-08T21:42:05Z

Apologies for my IDE removing all the trailing whitespace in the files I edited, I didn't notice it had done that until creating this PR.

Also, feel free to remove (or ask me to remove, if that's easier) the edits to the README.md; I understand that may be out of scope for this.

anadon · 2026-02-08T22:52:43Z

This is an OLD project initially created by Dr. Poplawski at Michigan Technological University. I'll need to ask you for proof that this submission isn't AI and your interest in the project. More contemporary and industry accepted work builds on Verilog or SystemC.

…

On Sun, Feb 8, 2026, 4:42 PM Amy Wilder ***@***.***> wrote: *AmityWilder* left a comment (anadon/JLS#4) <#4 (comment)> Apologies for my IDE removing all the trailing whitespace in the files I edited, I didn't notice it had done that until creating this PR. Also, feel free to remove (or ask me to remove, if that's easier) the edits to the README.md; I understand that may be out of scope for this. — Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAXYEBICEGK7LQNAHUCGWDL4K6UUFAVCNFSM6AAAAACUNDL64GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTQNRYGM2TCNJZHE> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

AmityWilder · 2026-02-09T14:45:38Z

My Computer Organization and Algorithm Engineering professor, Dr. Lalejini, put me in contact with Dr. Kurmas over email after I offered to look into the collision performance. JLS was running rather slowly when I tried building a 16-bit adder without using subcircuits.

I promise I did not use AI (I personally hate the stuff and 100% understand your apprehension), but I don't exactly have a 5hr recording of myself writing the code. How would you like me to prove this wasn't AI generated? I can explain every change and every step of my process in depth if that would work.

anadon · 2026-02-09T15:36:17Z

Saying so is good enough for me. Glad to know my archival work of this program wasn't for naught. I'll get to reviewing the PR tonight.

…

On Mon, Feb 9, 2026 at 10:09 AM Amy Wilder ***@***.***> wrote: *AmityWilder* left a comment (anadon/JLS#4) <#4 (comment)> My Computer Organization and Algorithm Engineering professor, Dr. Lalejini, put me in contact with Dr. Kurmas over email after I offered to look into the collision performance. JLS was running rather slowly when I tried building a 16-bit adder without using subcircuits. I promise I did not use AI (I personally hate the stuff and 100% understand your apprehension), but I don't exactly have a 5hr recording of myself writing the code. How would you like me to prove this wasn't AI generated? I can explain every change and every step of my process in depth if that would work. — Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAXYEBMYGIRKHJQGNWDCCW34LCPMJAVCNFSM6AAAAACUNDL64GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTQNZSGE3DGMJUHE> . You are receiving this because you commented.Message ID: ***@***.***>

bsiever · 2026-02-10T16:06:15Z

@anadon FWIW, I created a fork of your work, made some updates, and continue to use JLS in classes (https://github.com/bsiever/JLS/) --- your work definitely wasn't for naught!

anadon · 2026-02-17T02:18:46Z

src/jls/edit/SimpleEditor.java

+							int i = 0;
+							while (i < elementsArr.size()) {
+								if (selected.contains(elementsArr.get(i))) {
+									// "swap" remove - slightly cheaper because we don't care about order


If you're very worried about performance for this operation, a linked list has better cache locality.

I'm a bit confused how that can be true...

An array has all of its elements next to each other in memory, so the entire surrounding chunk of memory can be pulled into memory at once and contain those adjacent elements.

A linked list has pointers to adjacent elements, meaning it couldn't be cached without dereferencing those pointers--which in addition to being extra operations (and slow ones at that: loading data from RAM) may be unsafe to do speculatively since the CPU can't be certain those pointers belong to the program, or are even valid memory addresses.

Am I incorrect?

You'd be almost right if this were C. Less so for Java. In a traditional linked list and doubly linked list, each node has one to three pointers. One is for the next node, optionally for the previous node, and the value is either embedded or has a pointer to it. However, in optimized implementations these nodes are kept densely packed in an array or something more exotic. So, you're comparing something called a long jump to overcome 2+GB changes in virtual address or physical address (we're skipping the kernel considerations for now) or if correctly optimized in some circumstances a standard jump. Those jumps are, practically, cache-busters. Intel has some fancy stuff with their pre-fetcher, but relying on that from a programmer's perspective is faulty at best, meaning that you count on missing L1, L2, L3 (and sometimes L4) cache, hitting RAM, and killing the performance. Now, when working with swapping to the end of a dense array, you're busting this cache on every swap just by virtue of touching the end of the array while you were just working on the beginning. However, with Java's presumably optimized implementation, these removals are always touching bytes which are within one or two words away. This means that the cache characteristics are great. So even if you're performing 5X the operations, the fact this the CPU assumes that you're working with memory right next to each other allows for simpler algorithms to pre-fetch that memory and avoids the cache misses in the first place. And this is without the extremely optimized data structures which are actually probably being used which are going to have better characteristics. For these reasons, trust the standard library. Neither of us are going to out-smart it.

Once you hit system organization, operating systems, and compilers you'll formally be introduced to these topics if your profs are good.

One tip for compilers -- learn how to use stack machines. They aren't always covered but they allow you to entirely skip register management and it makes your life immensely easier.

I see! I had completely not realized that accessing the end would be what causes the slowdown (and I had NO idea about any of those linked list optimizations), but it makes perfect sense after you explained it!

I'll try benchmarking this and an alternative linked list implementation and keep whichever is faster. However, I'm not entirely sure how to benchmark this code without its speed being influenced by the speed of my own movements. As far as I can tell, there isn't a good way of isolating the overlap code and running it with a variety specific circuits--I would have to trigger the overlap method manually. Do you have any suggestions?

anadon · 2026-02-17T02:20:06Z

src/jls/edit/SimpleEditor.java

+						// so it makes sense to use a temporary array for cache locality and O(n) traversal.
+						Element[] selectedArr = selected.toArray(Element[]::new);
+						ArrayList<Element> elementsArr = new ArrayList(circuit.getElements());
+						// elementsArr.removeAll(selected);


These top level functions often have significant hidden optimizations to them. I'd benchmark before doing this and claiming this is the part that improves performance.

anadon · 2026-02-17T02:21:36Z

src/jls/edit/SimpleEditor.java

+							int ymax = Integer.MIN_VALUE;
+
+							// build up a bounding box
+							for (Element sel : selectedArr) {


This is incorrect for concave geometries.

Further on this point, say a student wants to work on a component involving several parts of a CPU pipeline in a ring-like layout on the diagram, and then they want to re-join in to the full CPU area. It is most likely that the lines do validly connect. But with this, the rectangularized area of each will indicate a false positive for overlapping and reject the drop. Rather than this approach, you should be looking at partitioning algorithms with non-zero area elements. You're likely to find something which generates a layout in O(n*log(n)) time. If you then couple this with a de-bouncer and a early termination signal for a separate thread, you'll freely be able to move the selected elements and only allow placement after the thread for the current position terminates and indicates success.

I have encountered such a false-negative on this branch myself, so I can confirm with 100% certainty that you are correct.

For now, I have removed the bounding box part of the code in 9275399. I can try working on your suggested alternative to a bounding box once I'm sure everything else is correct.

I apologize, you were talking about a false positive. I encountered a false negative and mistook it for what you were saying.

The false negative I encountered was that, when selecting wires, the user could place them in such a way that they overlap a wire or pin that does not overlap the selection bounding box.

Either way, I believe the bounding box check may be a bust and/or need work.

anadon · 2026-02-17T03:01:12Z

Thinking further, I think the move-and-drop implementation should take advantage that elements and voxels in 2D space and have their collisions detected by generating a sorted list for each of non-moving elements' X-values, moving elements' X-values, non-moving elements' Y-values, and moving elements' Y-values. Use counting sort. Have two threads with kill logic handle each of the X and Y dimensions. Use a transform for the granularized cursor coordinates to detect movements and launch threads with correct parameters. If in the line-segments for the elements in each merged array have a line-segment overlap for the same element, there is a collision. Segment collision detection is trivial. These sorted lists will need to be sorted on keys with their coordinate integer value, and a value of a list containing references to element objects. This should yield O(n) performance.

anadon · 2026-02-19T23:24:53Z

@AmityWilder Are you about?

AmityWilder · 2026-02-20T11:42:18Z

Sorry, I've been getting pulled in a lot of directions lately and I wanted to research your response before giving a yes or no.

AmityWilder · 2026-02-20T13:30:32Z

My immediate thought is whether two $O(n \log(n))$ sortings (one for $x$ and one for $y$) replacing the $O(1)$ insertion and $O(1)$ translation in exchange for an $O(n)$ overlap detection would be more performant on average than an $O(n^2)$ overlap detection.

I imagine, given the way students are encouraged to prefer smaller circuits containing subcircuits, most might not exceed a value of $n$ that makes an $O(n^2)$ overlap detection that significantly more expensive than an $O(n \log(n))+O(n)$ translation+overlap detection.

Plus, the bigger part from my perspective (as the person who will possibly be implementing this):
Getting the existing algorithm down from $O(n^3)$ to $O(n^2)$ was a surprisingly small refactor with no tradeoffs. It only involved noticing that a particular chunk of code was in the wrong place. It didn't involve changing any data structures or trading complexity between two things--just a direct reduction of complexity with minimal changes to existing code.

Your suggestion for changing it from $O(n^2)$ to $O(n)$ will certainly require modifying at least one data structure (to store the sorted lists; as creating them on the spot in overlap() would make the complexity at minimum that of the sorting algorithm, if it is more complex than the overlap detection) as well as possibly a full audit to make sure there are no cases where Elements can be added, removed, or transformed without being updated in the sorted lists (which I can tell you from experience will also lead to more difficult maintenance, since you can't just audit once and trust that future changes will somehow never ever cause the cached position map to desync). And that's on top of essentially having to implement a wholely different algorithm, without unit tests (due to the lack of a framework), rather than just tweaking the algorithm that's already there.

I'm not saying no, I just want to make sure it's worth it before spending what could be an entire week on what could very well amount to a minimal, possibly even negative performance change.
I may be 100% wrong. Maybe it would triple performance in all cases. Maybe I'm misinterpreting you, or maybe it would be a rather trivial change with great value. I just don't know.

That's why I want to give it some extra thought before jumping headfirst into it. It just happens that I'm having to wait for the right time to research it, as at this moment it's approching the 7th week of this 14-week semester, I have 5 classes, I've been doing bug fixes on Bill Siever's repo, and Dr. Kurmas has suggested some additional projects for me.

I will absolutely look into this when I have the chance. I apologize for not answering sooner, your previous reply was automatically marked as read when I read it but I had to leave for class before I could get a response back to you, and then I forgot to reply when I got home.

AmityWilder · 2026-02-20T13:57:39Z

My bad, I misremembered parts of your response when I made mine. You suggested using specifically counting sort ($O(n+k)$) and I was only thinking of sort in general (often $O(n\log(n))$).

However, I think then there is the problem of how to sort and store segments. I've run into this problem before: they aren't singular values, they're a combination of a start and an end. Is a segment that starts earlier but ends later "less than" or "greater than" a segment that starts later but ends earlier? Does it "eat" the contained segment? Does a segment that starts earlier and end earlier get merged with a segment whose start intersects that one, but ends later than it?

And how do we handle the fact that there are not just an $x$ lane and a $y$ lane, but one for every gridspace? Do we store them in a 2D array, or reuse two 1D arrays for each moving element? Do we sort the entire non-moving set for each moving element? Filter the ones overlapping their axes? That could be incredibly expensive--transforming an $O(n^2)$ algorithm into an $O(2n(2n+k))$

for each element a:
  select elements b
    where b.xrange overlaps a.xrange
    order by b.xrange
  select elements c
    where c.yrange overlaps a.yrange
    order by c.yrange
  if any b or c overlaps a:
    do remaining tests
  else:
    do other tests

It might be a better idea to use a quad tree. I have yet to implement one successfully, especially when wires are involved, but I think it would be more likely to improve performance than sorting segments every tick since it has inherent locality and positional sorting, and there is empirical evidence that they do boost performance in video games. It would be equally as much work to implement, however.

AmityWilder · 2026-02-20T14:57:44Z

I've been avoiding your suggestion regarding multithreading, because

I'm not familiar enough with Java to reliably use multithreading.
With a small enough quantity of Elements, it's possible that asking the OS for extra threads and waiting for one to get freed up could actually slow down the algorithm. I believe that, not being an "industry accepted" like you mentioned in your first reply to this PR, most users will be working in quantities of hundreds to thousands of elements--not hundreds of thousands. (I am not saying this definitively, I'm only saying this is another point I will need to research)

anadon · 2026-02-20T15:22:16Z

Segment overlap is actually shockingly easy! The gist is that each start pushes onto a stack, and each end marks a collision from the top of the stack to the related start. Then remove that start and continue pushing, marking, and removing.

Quad trees are plausible, but I'm not that familiar with their details. I understand the complications with multi-threading. While they have some great benefits with responsiveness, I'd drop them until you have a more optimal single-threaded solution.

Lastly, I understand that these are more work, but the false rejection on concave geometries is a blocking bug.

If you need some help, guidance, or to be taught some of these algorithms let me know! I understand your class-load. Been there.

AmityWilder · 2026-02-20T20:35:14Z

I could simply remove the bounding box part for now, if that's what you mean by "false rejection on concave geometries". The bigger speed boost came from the $O(n^3) \to O(n^2)$.

If that's not what you mean, could you please clarify?

AmityWilder · 2026-02-20T20:40:48Z

The details of a quad tree are that it's much like a binary tree, but in 2D. You split the useable area into quadrants, then separate elements out by "above/below y-midpoint" and "left/right of x-midpoint". You keep splitting up the space until only a manageable number of elements are in each leaf, leaving you with $O(\log(n))$ complexity when searching for nearby elements ($O(n\log(n))$ to visit each element and find the elements in touching-range, as opposed to $O(n^2)$ to filter out any non-overlapping elements)

The trouble I have with them is that wires have a tendency of occupying multiple leaves at once...

anadon · 2026-02-24T00:45:07Z

I could simply remove the bounding box part for now, if that's what you mean by "false rejection on concave geometries". The bigger speed boost came from the O ( n 3 ) → O ( n 2 ) .

If that's not what you mean, could you please clarify?

I'll try to properly respond to this later in the week. I'm cognitively loaded myself at the moment and can't give you a satisfying and meaningful response.

anadon · 2026-02-24T18:59:30Z

I wouldn't consider benchmarking at this immediate point in time. Rather, I'd focus on the core algorithms. But I don't have this fully committed to memory and could be wrong. I can get back to this late in the week.

…

On Tue, Feb 24, 2026, 7:38 AM Amy Wilder ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/jls/edit/SimpleEditor.java <#4 (comment)>: > @@ -2821,18 +2822,73 @@ private boolean overlap() { }*/ + // Set is O(n log(n)) to traverse over and doesn't benefit from cache locality. + // we're doing a lot of iterating over the same collection here, + // so it makes sense to use a temporary array for cache locality and O(n) traversal. + Element[] selectedArr = selected.toArray(Element[]::new); + ArrayList<Element> elementsArr = new ArrayList(circuit.getElements()); + // elementsArr.removeAll(selected); + { + int i = 0; + while (i < elementsArr.size()) { + if (selected.contains(elementsArr.get(i))) { + // "swap" remove - slightly cheaper because we don't care about order I see! I had completely not realized that accessing the end would be what causes the slowdown, but it makes perfect sense after you explained it! I'll try benchmarking this and an alternative linked list implementation and keep whichever is faster. However, I'm not entirely sure how to benchmark this code without its speed being influenced by the speed of my own movements. As far as I can tell, there isn't a good way of isolating the overlap code and running it with a variety specific circuits--I would have to trigger the overlap method manually. Do you have any suggestions? — Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAXYEBPGXIM2SSSFCEJH6XD4NRA4RAVCNFSM6AAAAACUNDL64GVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTQNBXGU4TCNRSG4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

AmityWilder added 8 commits February 8, 2026 12:59

Add compilation note to readme

6c55139

Add getWidth and getHeight to Element class

eb29aff

Use bounding box and convert Sets to Arrays in overlap method

27cadfd

Replace "removeIf(collection contains element)" with more appropriate…

e2ef905

… "removeAll(collection)"

Replace "remove" methods with swap removal

062dc68

Remove unused import

8103909

Collections was imported for swap remove, before realizing swapping wasnt needed and copying would suffice.

Rename 'elm' to 'el' to be consistent with intersect-check loop

b694d93

Revert whitespace removed by IDE (for cleaner squash)

f634c2c

AmityWilder mentioned this pull request Feb 11, 2026

Overlap performance bsiever/JLS#7

Open

anadon reviewed Feb 17, 2026

View reviewed changes

Remove false-negative-producing bounding box check

9275399

Conversation

AmityWilder commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AmityWilder commented Feb 8, 2026

Uh oh!

anadon commented Feb 8, 2026 via email

Uh oh!

AmityWilder commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anadon commented Feb 9, 2026 via email

Uh oh!

bsiever commented Feb 10, 2026

Uh oh!

anadon Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

AmityWilder Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anadon Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

AmityWilder Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anadon Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

anadon Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

anadon Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

AmityWilder Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

AmityWilder Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

anadon commented Feb 17, 2026

Uh oh!

anadon commented Feb 19, 2026

Uh oh!

AmityWilder commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AmityWilder commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AmityWilder commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AmityWilder commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anadon commented Feb 20, 2026

Uh oh!

AmityWilder commented Feb 20, 2026

Uh oh!

AmityWilder commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anadon commented Feb 24, 2026

Uh oh!

anadon commented Feb 24, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AmityWilder commented Feb 8, 2026 •

edited

Loading

AmityWilder commented Feb 9, 2026 •

edited

Loading

AmityWilder Feb 23, 2026 •

edited

Loading

AmityWilder Feb 24, 2026 •

edited

Loading

AmityWilder commented Feb 20, 2026 •

edited

Loading

AmityWilder commented Feb 20, 2026 •

edited

Loading

AmityWilder commented Feb 20, 2026 •

edited

Loading

AmityWilder commented Feb 20, 2026 •

edited

Loading

AmityWilder commented Feb 20, 2026 •

edited

Loading