When I batch inference a list of image files using the method predict_images, some clearly invalid occasionally occur in the result. The following is an example of such case:
CC1C(c2ccccc2)=C([Si](C)(C)C)CC1=C123(C)CC456789%10%11%12%13%14%15%16%17%18%19%20%21C%22C%23C4%24CC%2354C5C6C%22%2446%23%25%26%27%28%29%30%31%32%33C57CC684C5C4678C4%22%24(C%34%35%36(C%37%38%39%40C%41%42%43%44(C4%34%37%239C49%23%34C%35%38%41%25%10%37C%10%25%35%38%41C%394%42%26%11%45C%374%11%26%39%42C9%43%10%27%12%37%46C%459%10%12%27%43%47C%23%44%254%28%13%45%48%49C%374%13%23%25%28%44C%34%35%119%29%14%37%50%51%52C%46%459%11%14%29%34%35%53C4%45%46%54%55%56%57%58%59%60%61C%48%374%62%63%64%65%66%67%68C9%37%48%69%70%71%72%73%74%75%76C%49%50%459%77%78%79%80%81%82%83C%38%26%10%13%114%30%15%45%49%50C%51%46%374%10%11%13%15%26%30%38%84C%12%23%14%629%37%46%51%85%86%87%88C%54%489%12%14%23%62%89%90%91%92%93C%29%63%774%48%54%94%95%96%97%98%99C%52%55%69%45%374%29%63%77%(100)%(101)%(102)%(103)C%78%109%37%45%52%55%69%(104)%(105)%(106)%(107)C9%10%78%(108)%(109)%(110)%(111)(C%(112)%(113)%(114)%(115)C9%(116)%(117)%(118)C%(112)%109%(119)C%(113)%(116)%78%10C%(114)%(117)9%(108)C%(115)%(118)%(119)%10%(109))C%1249%10%78%(108)%(109)%(112)%(113)%(114)%(115)%(116)C%14%374%12%(117)%(118)%(119)%(120)%(121)%(122)%(123)%(124)C%64%79%11%23%29%14%37%(125)%(126)%(127)%(128)%(129)C%56%70%49%46%48%459%11%23%29%64%79%(130)C%65%80%13%624%639%45%46%48%49%56C%57%71%51%54%52%14%12%104%13%62%63%65C%89%94%55%379%(117)%78%10%12%14%51%52%54C%58%72%85%95%69%(125)%45%(118)%(108)9%37%55%57C%(110)%15%90%(119)%(109)%77%114%10%45%58%69%70C%(111)%26%91%(120)%(112)%(100)%23%139%124%10%11C%59%73%86%96%(104)%(126)%46%(121)%(113)%14%459%12%13C%41%39%27%25%34%66%81%30%(101)%29%16%14C%60%74%87%97%(105)%(127)%48%(122)%(114)%51%584%15%16C%42%43%28%35%67%82%38%(102)%64%62%379%17C%61%75%50%88%98%(106)%(128)%49%(123)%(115)%14%52%69%10(C%47%44%53%68%83%84%92%(103)%79%63%55%12%15%18)C%76%93%99%(107)%(129)%56%(124)%(116)%(130)%65%57%54%70%11%13%16)C6%22%36%401%31%19)C7%242%32%20)C583%33%21
The invalid predicted strings always occur in such format, i.e., a valid smiles string beginning, followed by a lot of C%(n). When I rerun with the same input list of images, the previously invalid smiles string will become valid, but the other previously valid smiles string might become invalid like this. When I take out those invalid cases and test each single image with predict_image_file, this error never occurs. I wonder why this kind of invalid SMILS string are not filtered out in postprocessing.
When I batch inference a list of image files using the method
predict_images, some clearly invalid occasionally occur in the result. The following is an example of such case:CC1C(c2ccccc2)=C([Si](C)(C)C)CC1=C123(C)CC456789%10%11%12%13%14%15%16%17%18%19%20%21C%22C%23C4%24CC%2354C5C6C%22%2446%23%25%26%27%28%29%30%31%32%33C57CC684C5C4678C4%22%24(C%34%35%36(C%37%38%39%40C%41%42%43%44(C4%34%37%239C49%23%34C%35%38%41%25%10%37C%10%25%35%38%41C%394%42%26%11%45C%374%11%26%39%42C9%43%10%27%12%37%46C%459%10%12%27%43%47C%23%44%254%28%13%45%48%49C%374%13%23%25%28%44C%34%35%119%29%14%37%50%51%52C%46%459%11%14%29%34%35%53C4%45%46%54%55%56%57%58%59%60%61C%48%374%62%63%64%65%66%67%68C9%37%48%69%70%71%72%73%74%75%76C%49%50%459%77%78%79%80%81%82%83C%38%26%10%13%114%30%15%45%49%50C%51%46%374%10%11%13%15%26%30%38%84C%12%23%14%629%37%46%51%85%86%87%88C%54%489%12%14%23%62%89%90%91%92%93C%29%63%774%48%54%94%95%96%97%98%99C%52%55%69%45%374%29%63%77%(100)%(101)%(102)%(103)C%78%109%37%45%52%55%69%(104)%(105)%(106)%(107)C9%10%78%(108)%(109)%(110)%(111)(C%(112)%(113)%(114)%(115)C9%(116)%(117)%(118)C%(112)%109%(119)C%(113)%(116)%78%10C%(114)%(117)9%(108)C%(115)%(118)%(119)%10%(109))C%1249%10%78%(108)%(109)%(112)%(113)%(114)%(115)%(116)C%14%374%12%(117)%(118)%(119)%(120)%(121)%(122)%(123)%(124)C%64%79%11%23%29%14%37%(125)%(126)%(127)%(128)%(129)C%56%70%49%46%48%459%11%23%29%64%79%(130)C%65%80%13%624%639%45%46%48%49%56C%57%71%51%54%52%14%12%104%13%62%63%65C%89%94%55%379%(117)%78%10%12%14%51%52%54C%58%72%85%95%69%(125)%45%(118)%(108)9%37%55%57C%(110)%15%90%(119)%(109)%77%114%10%45%58%69%70C%(111)%26%91%(120)%(112)%(100)%23%139%124%10%11C%59%73%86%96%(104)%(126)%46%(121)%(113)%14%459%12%13C%41%39%27%25%34%66%81%30%(101)%29%16%14C%60%74%87%97%(105)%(127)%48%(122)%(114)%51%584%15%16C%42%43%28%35%67%82%38%(102)%64%62%379%17C%61%75%50%88%98%(106)%(128)%49%(123)%(115)%14%52%69%10(C%47%44%53%68%83%84%92%(103)%79%63%55%12%15%18)C%76%93%99%(107)%(129)%56%(124)%(116)%(130)%65%57%54%70%11%13%16)C6%22%36%401%31%19)C7%242%32%20)C583%33%21The invalid predicted strings always occur in such format, i.e., a valid smiles string beginning, followed by a lot of C%(n). When I rerun with the same input list of images, the previously invalid smiles string will become valid, but the other previously valid smiles string might become invalid like this. When I take out those invalid cases and test each single image with
predict_image_file, this error never occurs. I wonder why this kind of invalid SMILS string are not filtered out in postprocessing.