Skip to content

PYTHON-5306 - Fix use of public MongoClient attributes before connection #2285

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Apr 21, 2025

Conversation

NoahStapp
Copy link
Contributor

@NoahStapp NoahStapp commented Apr 15, 2025

The public attributes of MongoClient are topology_description, nodes, address, primary, secondaries, and arbiters. Of these, only topology_description and nodes do not already perform network I/O. The rest can safely ensure connection as part of their execution without modifying their previous behavior.

@@ -1205,6 +1206,18 @@ def topology_description(self) -> TopologyDescription:

.. versionadded:: 4.0
"""
if self._topology is None:
Copy link
Contributor Author

@NoahStapp NoahStapp Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the client has not connected yet, return a TopologyDescription with placeholder servers derived from the seeds passed to the client's constructor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make sure the returned TopologyDescription has the same _topology_id. Otherwise it could be confusing because _topology_id will change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can you give an example of what topology_description looks like in the connect=False srv case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<TopologyDescription id: 67ffb68a6e12ca176991f521, topology_type: Unknown, servers: [<ServerDescription (('test1.test.build.10gen.cc', None), 27017) server_type: Unknown, rtt: None>]>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include an example of this new/old behavior in the Jira ticket when closing.

@@ -1218,6 +1231,8 @@ def nodes(self) -> FrozenSet[_Address]:
to any servers, or a network partition causes it to lose connection
to all servers.
"""
if self._topology is None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the docstring defines, we expect nodes to return an empty set if the client has not yet connected.

@@ -1205,6 +1206,18 @@ def topology_description(self) -> TopologyDescription:

.. versionadded:: 4.0
"""
if self._topology is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make sure the returned TopologyDescription has the same _topology_id. Otherwise it could be confusing because _topology_id will change.

{("unknown", None): ServerDescription(("unknown", None))},
)
await client.aconnect()
self.assertEqual(await client.address, None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a tests for address and all the helper helpers without calling aconnect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was mistakenly included: our test SRV URIs aren't real servers, so we can't actually connect to them.

Copy link
Contributor Author

@NoahStapp NoahStapp Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a larger restriction: the new test needs to be run against a real SRV server, or have accurate responses mocked out, to ensure correct behavior.

For all non-SRV URIs, we immediately create a Topology, making this test on such connections useless.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have real SRV addresses which resolve to localhost:

>>> client = MongoClient("mongodb+srv://test1.test.build.10gen.cc", tlsInsecure=True)
>>> client.topology_description
<TopologyDescription id: 68000f362bab7f34cc9902c2, topology_type: ReplicaSetWithPrimary, servers: [<ServerDescription ('localhost', 27017) server_type: RSPrimary, rtt: 0.001482416999351699>]>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I forgot that SRV implies replica + SSL, thanks!

# arbiters causes client to block until connected
await c.arbiters
self.assertIsNotNone(c._topology)
c = await self.async_rs_or_single_client(connect=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this kind of test it's a good idea to close each client once it's no longer needed to avoid building up unclosed clients.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

async_rs_or_single_client already automatically closes all clients at the end of the method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know but I'm saying it's better to close them explicitly to avoid building up unclosed clients.

Copy link
Contributor Author

@NoahStapp NoahStapp Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't currently use that pattern in our testcases, relying on the helper client creation methods to clean up clients for us. If it's better to avoid relying on those, should we rethink our approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying for tests that specifically open multiple/many clients, it's good practice to add an explicit call to close().

Copy link
Contributor Author

@NoahStapp NoahStapp Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not trying to pushback stubbornly here, just want to keep our codebase consistent where possible since we don't currently explicitly close clients created with the helpers. Do we have a measurable improvement or benefit from explicitly closing clients that will be cleaned up by the test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not strictly a performance issue. It's also about easier debugging. Leaving open clients around unnecessarily like this makes debugging harder because each client has many threads/tasks causing noise.

Copy link
Contributor Author

@NoahStapp NoahStapp Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense! I'll open a ticket to make sure we follow the same pattern elsewhere too.

@@ -1205,6 +1206,18 @@ def topology_description(self) -> TopologyDescription:

.. versionadded:: 4.0
"""
if self._topology is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can you give an example of what topology_description looks like in the connect=False srv case?

@NoahStapp NoahStapp requested a review from ShaneHarvey April 16, 2025 14:20
None,
None,
None,
TopologySettings(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing certain settings that are publicly accessible, like TopologyDescription.heartbeat_frequency.

In the ticket I suggested making the MongoClient constructor always create the Topology instance, and then mutate it later once the SRV info is resolved. Did you consider that approach? It would avoid the problem of attempting to mock out these attributes and avoid the type-ignores since everything would always be non-None.

(It could be that this approach is a better fit, I just want to know if an alternative was considered.)

Copy link
Contributor Author

@NoahStapp NoahStapp Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started with the approach you suggested, but switched to the current implementation for a few reasons:

  1. topology_description is the only public attribute that requires changes of meaningful size. The rest either already do IO or are nodes, which has a trivial workaround.
  2. We don't currently mutate Topology instances or TopologySettings instances directly. To do so, we'd need to make more code changes and be confident that we haven't overlooked any problematic edge cases.
  3. I foresee more potential bugs/confusion caused by introducing a new state specifically for SRV Topologies where we know that the Topology we currently have is not actually correct, and will be mutated upon SRV resolution.

This approach has fewer changes that could cause errors, especially since it isolates changes to resolving the specific issue at hand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, could you update TopologySettings() here to be more accurate then?

Copy link
Contributor Author

@NoahStapp NoahStapp Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TopologySettings sets defaults for each attribute, is passing no arguments to it invalid?

Copy link
Member

@ShaneHarvey ShaneHarvey Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said above, client.topology_description.heartbeat_frequency is one example of an attribute we need to set here to avoid heartbeat_frequency appearing to change from one value to another. There may be others.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example:

>>> client = MongoClient(..., connect=False, heartbeatFrequencyMS=99999)
>>> client.topology_description.heartbeat_frequency
10
>>> client.admin.command("ping")
...
>>> client.topology_description.heartbeat_frequency
99.999

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I misunderstood what you meant. The example makes it clear, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we always initialize a TopologySettings in the constructor. If SRV is enabled, we create a new one after resolution but keep the existing topology id.

@@ -849,6 +849,54 @@ async def test_init_disconnected_with_auth(self):
with self.assertRaises(ConnectionFailure):
await c.pymongo_test.test.find_one()

@async_client_context.require_replica_set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can remove require_replica_set.

Copy link
Contributor Author

@NoahStapp NoahStapp Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. "mongodb+srv://test1.test.build.10gen.cc" returns 2 hosts so it can't connect to standalone. It can connect to Mongos though so we should run this test on both replica set and sharded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require_no_standalone, then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Load balancers probably dont work here so it would need to be @require_no_standalone+@require_no_load_balancer+@require_no_load_balancer

# arbiters causes client to block until connected
await c.arbiters
self.assertIsNotNone(c._topology)
c = await self.async_rs_or_single_client(connect=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying for tests that specifically open multiple/many clients, it's good practice to add an explicit call to close().

@@ -767,6 +768,7 @@ def __init__(
self._timeout: float | None = None
self._topology_settings: TopologySettings = None # type: ignore[assignment]
self._event_listeners: _EventListeners | None = None
self._initial_topology_id: Optional[ObjectId] = None # type: ignore[assignment]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove the type ignore here?

@NoahStapp NoahStapp requested a review from ShaneHarvey April 17, 2025 14:19
@@ -1205,6 +1206,18 @@ def topology_description(self) -> TopologyDescription:

.. versionadded:: 4.0
"""
if self._topology is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include an example of this new/old behavior in the Jira ticket when closing.

None,
None,
None,
TopologySettings(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, could you update TopologySettings() here to be more accurate then?

@NoahStapp NoahStapp requested a review from ShaneHarvey April 18, 2025 14:33
ShaneHarvey
ShaneHarvey previously approved these changes Apr 18, 2025
Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a changelog entry and call it a day!

sleepyStick
sleepyStick previously approved these changes Apr 21, 2025
@@ -10,6 +10,7 @@ Version 4.12.1 is a bug fix release.
- Fixed a bug that could raise ``UnboundLocalError`` when creating asynchronous connections over SSL.
- Fixed a bug causing SRV hostname validation to fail when resolver and resolved hostnames are identical with three domain levels.
- Fixed a bug that caused direct use of ``pymongo.uri_parser`` to raise an ``AttributeError``.
- Fixed a bug that could cause public ``pymongo.MongoClient`` and ``pymongo.AsyncMongoClient`` attributes to raise errors when accessed before a connection was established.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's describe it a bit more specifically, something like:

Fixed a bug where clients created with connect=False and a "mongodb+srv://" connection string
could cause public ``pymongo.MongoClient`` and ``pymongo.AsyncMongoClient`` attributes (topology_description,
nodes, address, primary, secondaries, arbiters) to incorrectly return a Database, leading to type
errors such as: "NotImplementedError: Database objects do not implement truth value testing or bool()".

- Fixed a bug where clients created with connect=False and a "mongodb+srv://" connection string
could cause public ``pymongo.MongoClient`` and ``pymongo.AsyncMongoClient`` attributes (topology_description,
nodes, address, primary, secondaries, arbiters) to incorrectly return a Database, leading to type
errors such as: "NotImplementedError: Database objects do not implement truth value testing or bool().
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing trailing " for "NotImplementedError.

@NoahStapp NoahStapp requested a review from ShaneHarvey April 21, 2025 19:47
@NoahStapp NoahStapp merged commit 412d000 into mongodb:master Apr 21, 2025
29 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants