Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rare ConnectionError causing playbook fails #367

Open
ghost opened this issue Jul 9, 2024 · 3 comments
Open

Rare ConnectionError causing playbook fails #367

ghost opened this issue Jul 9, 2024 · 3 comments

Comments

@ghost
Copy link

ghost commented Jul 9, 2024

ISSUE TYPE
  • Bug Report
SOFTWARE VERSIONS
pynautobot

pip freeze | grep pynautobot
pynautobot==2.2.0

Ansible:

ansible --version
ansible [core 2.16.0]
config file = /root/flexgrid-netbuild/ansible.cfg
configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/lib/python3.10/dist-packages/ansible
ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/local/bin/ansible
python version = 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (/usr/bin/python3)
jinja version = 3.1.2
libyaml = True

Nautobot:

2.2.6

Collection:

ansible-galaxy collection list | grep nautobot
networktocode.nautobot 5.2.1

SUMMARY

We have runs that make hundreds of API calls to Nautobot. Extremely rarely, these fail with ConnectionError, which causes the entire run to appear to have failed, causing reporting issues.

STEPS TO REPRODUCE
- name: Creating prefix records in Nautobot
  when: not temporaryskiprun and upstream_agg_switch is defined
  networktocode.nautobot.prefix:
    api_version: "2.1"
    prefix: "{{ wansubnetusuallyslash29 }}/29"
    location:
      name: "{{ nautobot_site }}"
    namespace: "INTERNET.inet.0"
    state: present
    status: "{{ 'Active' if not awaitingfirstprovision else 'Reserved' }}"
    tenant: "{{ nautobot_tenant }}"
    token: "{{ nautobot_read_write_token }}"
    type: Network
    url: https://{{ nautobot_api_ip }}
    validate_certs: false  # TODO - sort the certs on Nautobot so this isn't required
    vlan:
      name: "{{ inventory_hostname | replace(\"-\", \".\") | lower }}.inner"
      site: "{{ hostvars[upstream_agg_switch].nautobot_site }}"
      tenant: "{{ nautobot_tenant }}"
  register: result_for_tagging
EXPECTED RESULTS
ok: [cpe.xxx.xxx.xxx]

(this is what I see almost all times)

ACTUAL RESULTS

error scenario (very approximately 1 in 1000 calls):

      fatal: [cpe.xxx.xxx.xxx]: FAILED! =>
        msg: |-
          Traceback (most recent call last):
            File "/usr/local/lib/python3.10/dist-packages/ansible/module_utils/connection.py", line 210, in send
              response = recv_data(sf)
            File "/usr/local/lib/python3.10/dist-packages/ansible/module_utils/connection.py", line 79, in recv_data
              d = s.recv(header_len - len(data))
          ConnectionResetError: [Errno 104] Connection reset by peer

          During handling of the above exception, another exception occurred:

          Traceback (most recent call last):
            File "/usr/local/lib/python3.10/dist-packages/ansible/cli/scripts/ansible_connection_cli_stub.py", line 315, in main                  conn.set_options(direct=options)
            File "/usr/local/lib/python3.10/dist-packages/ansible/module_utils/connection.py", line 194, in __rpc__
              response = self._exec_jsonrpc(name, *args, **kwargs)
            File "/usr/local/lib/python3.10/dist-packages/ansible/module_utils/connection.py", line 155, in _exec_jsonrpc
              out = self.send(data)
            File "/usr/local/lib/python3.10/dist-packages/ansible/module_utils/connection.py", line 214, in send
              raise ConnectionError(
          ansible.module_utils.connection.ConnectionError: unable to connect to socket /root/.ansible/pc/c317ea50ac. See the socket path issue category in Network Debug and Troubleshooting Guide

          During handling of the above exception, another exception occurred:

          Traceback (most recent call last):
            File "/usr/local/bin/ansible-connection", line 8, in <module>
              sys.exit(main())
            File "/usr/local/lib/python3.10/dist-packages/ansible/cli/scripts/ansible_connection_cli_stub.py", line 318, in main
              raise ConnectionError('Unable to decode JSON from response set_options. See the debug log for more information.')               ansible.module_utils.connection.ConnectionError: Unable to decode JSON from response set_options. See the debug log for more information.    

Running the exact same command again will result in a successful run.

This happens regardless of whether the API call was going to make any actual change.

@joewesch
Copy link
Contributor

joewesch commented Jul 9, 2024

I believe using ansible retries should work. Can you try that?

We also have a retries arg on pynautobot, but we don't expose it (outside of the lookup plugin) in lieu of the built-in ansible retry option.

@ghost
Copy link
Author

ghost commented Jul 10, 2024

Thanks. It's undesirable to add this to every single play (as any of them could be vulnerable to the issue), but I'll investigate if we can somehow do it globally.

@joewesch
Copy link
Contributor

One possible solution I would suggest would be to add the ability to ingest the number of retries on pynautobot via an environment variable (e.g. PYNAUTOBOT_RETRIES).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant