Very clearly infeasible SMILES string generated during batch inference

When I batch inference a list of image files using the method `predict_images`, some clearly invalid occasionally occur in the result. The following is an example of such case:

`CC1C(c2ccccc2)=C([Si](C)(C)C)CC1=C123(C)CC456789%10%11%12%13%14%15%16%17%18%19%20%21C%22C%23C4%24CC%2354C5C6C%22%2446%23%25%26%27%28%29%30%31%32%33C57CC684C5C4678C4%22%24(C%34%35%36(C%37%38%39%40C%41%42%43%44(C4%34%37%239C49%23%34C%35%38%41%25%10%37C%10%25%35%38%41C%394%42%26%11%45C%374%11%26%39%42C9%43%10%27%12%37%46C%459%10%12%27%43%47C%23%44%254%28%13%45%48%49C%374%13%23%25%28%44C%34%35%119%29%14%37%50%51%52C%46%459%11%14%29%34%35%53C4%45%46%54%55%56%57%58%59%60%61C%48%374%62%63%64%65%66%67%68C9%37%48%69%70%71%72%73%74%75%76C%49%50%459%77%78%79%80%81%82%83C%38%26%10%13%114%30%15%45%49%50C%51%46%374%10%11%13%15%26%30%38%84C%12%23%14%629%37%46%51%85%86%87%88C%54%489%12%14%23%62%89%90%91%92%93C%29%63%774%48%54%94%95%96%97%98%99C%52%55%69%45%374%29%63%77%(100)%(101)%(102)%(103)C%78%109%37%45%52%55%69%(104)%(105)%(106)%(107)C9%10%78%(108)%(109)%(110)%(111)(C%(112)%(113)%(114)%(115)C9%(116)%(117)%(118)C%(112)%109%(119)C%(113)%(116)%78%10C%(114)%(117)9%(108)C%(115)%(118)%(119)%10%(109))C%1249%10%78%(108)%(109)%(112)%(113)%(114)%(115)%(116)C%14%374%12%(117)%(118)%(119)%(120)%(121)%(122)%(123)%(124)C%64%79%11%23%29%14%37%(125)%(126)%(127)%(128)%(129)C%56%70%49%46%48%459%11%23%29%64%79%(130)C%65%80%13%624%639%45%46%48%49%56C%57%71%51%54%52%14%12%104%13%62%63%65C%89%94%55%379%(117)%78%10%12%14%51%52%54C%58%72%85%95%69%(125)%45%(118)%(108)9%37%55%57C%(110)%15%90%(119)%(109)%77%114%10%45%58%69%70C%(111)%26%91%(120)%(112)%(100)%23%139%124%10%11C%59%73%86%96%(104)%(126)%46%(121)%(113)%14%459%12%13C%41%39%27%25%34%66%81%30%(101)%29%16%14C%60%74%87%97%(105)%(127)%48%(122)%(114)%51%584%15%16C%42%43%28%35%67%82%38%(102)%64%62%379%17C%61%75%50%88%98%(106)%(128)%49%(123)%(115)%14%52%69%10(C%47%44%53%68%83%84%92%(103)%79%63%55%12%15%18)C%76%93%99%(107)%(129)%56%(124)%(116)%(130)%65%57%54%70%11%13%16)C6%22%36%401%31%19)C7%242%32%20)C583%33%21`

The invalid predicted strings always occur in such format, i.e., a valid smiles string beginning, followed by a lot of C%(n). When I rerun with the same input list of images, the previously invalid smiles string will become valid, but the other previously valid smiles string might become invalid like this. When I take out those invalid cases and test each single image with `predict_image_file`, this error never occurs. I wonder why this kind of invalid SMILS string are not filtered out in postprocessing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very clearly infeasible SMILES string generated during batch inference #42

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Very clearly infeasible SMILES string generated during batch inference #42

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions