Conversation
|
@yuki24 could we please merge this? |
|
We should warn it before raising an exception. This change breaks the user application. |
|
I agree with @hsbt. The current functionality has been like this forever. Besides that, I don't think this is the correct place to fix it. These stream handlers are used for building ASTs. I think it would be more appropriate to fix this where the AST is converted to a Ruby object. Would you mind fixing it in that place? Here's a starting point: diff --git a/lib/psych/visitors/to_ruby.rb b/lib/psych/visitors/to_ruby.rb
index a922f90..9ca2d9a 100644
--- a/lib/psych/visitors/to_ruby.rb
+++ b/lib/psych/visitors/to_ruby.rb
@@ -348,6 +348,8 @@ module Psych
end
val = accept(v)
+ raise Psych::Exception if hash.key? key
+
if key == SHOVEL && k.tag != "tag:yaml.org,2002:str"
case v
when Nodes::Alias, Nodes::MappingThe above patch makes the test in this PR pass, but there is another test in the suite that fails and I'm not sure why. |
|
Btw, adding this check is going to have a performance impact on loading hashes. It might be nice if there is a way we can let people who care more about performance and less about being true to the spec. Maybe introduce a |
|
Shouldn't this be enforced at the libyaml level? For the JRuby extension, we could enable this strict behavior by setting options to the SnakeYAML library we use: https://www.javadoc.io/doc/org.yaml/snakeyaml/1.20/org/yaml/snakeyaml/LoaderOptions.html It seems odd to enforce this at the Ruby level when it's part of the YAML spec. |
|
@headius does that actually impact the AST, or just the loader? I don't know SnakeYaml very well, but it doesn't look like we're using We could also do this while the AST is being created, but this PR seems to do post-processing on the AST and that is definitely not the right place to do it regardless. |
|
@tenderlove That's a good point... it appears the LoaderOptions are applied when loading YAML into an in-memory graph of Java objects (maps, lists, etc). From what I can read in the code, it's applied at the point the MappingNode gets processed into a Map. They don't do a separate scan of the node's contents. Instead they check each key inserted into the resulting map to see if something else was there already. |
Ok. So they're doing basically what I suggested to do. IOW the events and AST don't care that there are duplicate keys, but the thing that creates objects does. |
|
@tenderlove so are the next steps…
? What about @hsbt’s concern? Strict mode seems reasonable, but might expand the scope of this PR a lot. I’m happy to do some more work on this but I could use some more design direction since it’s my first time working in this repo. |
|
@getaaron yep, I think those are the next steps. The initialize method for the |
|
I forgot to this for Psych 5. It's good time to show deprecate warnings. |
The YAML spec is extremely clear that duplicate keys are forbidden. A few examples:
and:
and:
and:
For these reasons, psych should raise an exception when trying to parse YAML with duplicate keys. It should be considered a bug that this ever worked, so I don't think we need to worry about backwards compatibility to support out-of-spec YAML parsing.
This PR adds functionality to raise an exception on duplicates, and a test for it. Closes #79.