- 
                Notifications
    You must be signed in to change notification settings 
- Fork 2.2k
Description
Describe the Bug
We observe the following error across ~5% of all Puppet agent runs in our organization against our Puppet 7.x masters when we set multithreaded: true, with any value for max-active-instances (even 2):
ERROR [qtp735649759-289] [puppetserver] Puppet Server Error: Evaluation Error: Error while evaluating a Type-Name, Internal Error: Attempt to redefine loader named '<foo> private' (file: /path/to/our/modules/foo/manifests/init.pp, line: 12, column: 18) on node clientnode.example.com
The error disappears if we disable threading, or if we hand-insert thread synchronization into the offending method or its caller. It is not consistent to any particular client node, and nodes on which agent runs throw the error will work fine the other ~95% of the time.
The source of the error, add_loader_by_name in puppet/lib/ruby/vendor_ruby/puppet/pops/loaders.rb, is fairly straightforward, adding a new key to an instance variable and throwing the error we observe if the key is already present:
  def add_loader_by_name(loader)
    name = loader.loader_name
    if @loaders_by_name.include?(name)
      raise Puppet::ParseError, _("Internal Error: Attempt to redefine loader named '%{name}'") % { name: name }
    end
    @loaders_by_name[name] = loader
  end
Interestingly, adding Puppet.info or Puppet.warn output before the if statement, to debug the call stack and thread state, causes the issue to disappear. I'd theorize that whatever output synchronization the logging system uses adds sort of back-door thread safety. Adding logging inside the if statement doesn't do this, which makes sense.
The calling method in all instances of this error we've logged to date has been []. If we add naïve Ruby-native synchronization to that method as below, our problem disappears:
  def [](loader_name)
    loader = @loaders_by_name[loader_name]
    if loader.nil?
      # Unable to find the module private loader. Try resolving the module
      @my_semaphore.synchronize {
        loader = private_loader_for_module(loader_name[0..-9]) if loader_name.end_with?(' private')
        raise Puppet::ParseError, _("Unable to find loader named '%{loader_name}'") % { loader_name: loader_name } if loader.nil?
      }
    end
    loader
  end
That's presumably not a correct-for-this-codebase fix by itself, but does point in the general direction of "it's threads stepping on each other, add a sempahore somewhere."
Environment
- Puppet Server 7.x (various, up to and including 7.15.0)
- CentOS 7.9.2009, AlmaLinux 8.9
- Sanitized puppet.conf and puppetserver.conf attached
Additional Context
It's not clear why our ecosystem in particular triggers this issue. While we're a medium-large organization, our Puppet masters are spread around to various regional collections of nodes. The one on which we did the above debugging manages only about a dozen nodes. The issue manifests on all of them.
Our module ecosystem is old growth forest, with a lot of interwoven dependency. It's possible that the trigger here is multiple references to the same dependent object being resolved in parallel.
It's of note that we have a number of modules with names of the form *_private (in a search path distinct from our "main" module repo). But this seems to occur with any module and does not occur with threading disabled, so is unlikely to be some kind of namespace collision.