Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort out mapgen map dependency violation mess #80157

Open
PatrikLundell opened this issue Mar 14, 2025 · 6 comments
Open

Sort out mapgen map dependency violation mess #80157

PatrikLundell opened this issue Mar 14, 2025 · 6 comments
Labels
<Suggestion / Discussion> Talk it out before implementing

Comments

@PatrikLundell
Copy link
Contributor

Is your feature request related to a problem? Please describe.

Code used by mapgen* is using code intended for exclusive reality bubble map usage, resulting in bugs. This needs to be investigated, corrected, and controlled.

Rumors and hints indicate the lead designers are firmly against supporting any map activity outside "mapgen" that doesn't operate fully on THE reality bubble map, and that all code designed to operate on the reality bubble map can and should assume this is the map used, and thus can freely call such operations without any care as to the map used, as any map usage will fetch the reality bubble wherever needed.

However, the current code base is a spaghetti tangle of call chains where "mapgen" calls end up calling reality bubble dependent code, resulting in more or less subtle bugs.

mapgen*: Code that generates new OMT maps, as well as code that modify existing OMTs. The former is basically all generation of OMTs, typically triggered by the reality bubble touching and partially overlapping with the OMT to be generated as the reality bubble moves (full overlap only happens on teleportation, game creation, and loading of saves). The latter is things like camp upgrades (player and NPC ones alike), which may or may not overlap the reality bubble (and may overlap only partially if there is overlap).

It can be noted that the comparatively new explosion handling that only partially (or not at all) overlaps the reality bubble violates the rumored lead designer objectives.

Solution you would like.

This is what I believe will have to be done to realize the rumored lead designer directives, not what I would like:

  • Perform a full trace of all call chains originating in any mapgen scenario, be they JSON or code derived, and note every operation involved.
  • Go through the list generated by the previous step and determine whether operations use the mapgen map or the reality bubble map (or none).
  • For the operations incorrectly using the reality bubble map, determine whether its usage should be banned from usage in the mapgen context, or whether it should be updated to use the correct map (possibly through a split into a map aware and a reality bubble version).
  • Perform all the required changes.
  • Devise some kind of documentation of which operations are permitted to be used in the mapgen context, how to deal with new mapgen functionality desires for usage of reality bubble only functionality (including transitive dependence).
  • Devise and implement a strategy for periodic checks that the call chains are actually using the correct map throughout, including handing of violations.

What I would actually like is for the whole code base to be map aware so all that mess could be avoided.
I will not hide that it would also allow for future expansion of the game to allow for more or less temporary additional reality bubbles, the latter of which I have until very recent rumors thought was desirable for the lead designers.
However, I still think a code base without easily broken implicit assumptions is less error prone and easier to maintain than one building on implicit contracts you're supposed to somehow know.

Describe alternatives you have considered.

Doing nothing except trying to track down and patch visible bugs (while missing most of the non obvious ones, such as things not happening because the functionality is cut short because the target location is outside the reality bubble bounds).

Additional context

I'm fairly sure parts of the creature generation code is broken, because it's reality bubble based despite almost all creature generation is performed by mapgen. That's an example of the things that should be addressed.

@PatrikLundell PatrikLundell added the <Suggestion / Discussion> Talk it out before implementing label Mar 14, 2025
@kevingranade
Copy link
Member

I'm sorry I let this linger, I have not had sufficient time/attention to weigh in on this recently, but it's gone on long enough I should have MADE time before now.

My primary objection to passing map pointers around to indicate which map to operate on is that it seems wildly error-prone to determine and maintain that chain of calls, as we saw previously, I believe when dealing with logic to place and merge vehicles. In a more hollistic sense, making all code that acts on the map responsible for being aware of which map to act on is creating a much larger opportunity for random parts of the code to get it wrong. We want as few pieces of code as possible to be aware of this distinction and just do their little thing.

At the limit you're talking about essentially removing the concept of a map singleton entirely and plumbing map pointers into every piece of code that needs to invoke a map instance and churning massive amounts of code, in order to support a relatively peripheral use case, though at least at that ultimate point it becomes somewhat less error prone, only as error prone as is implied by manually passing around a critical pointer in several thousand places in the code.

A potentially less invasive solution is to allow applying and removing overrides to the value returned by get_map() so that when a piece of code wants to act on not-the-current-map it does:

push_current_map( some_map_ptr );
do_stuff_to_map();
pop_current_map();

Then in get_map() we just reference the top of a stack of map pointers instead of grabbing the singleton map pointer. This would be nasty if we were worrying about threading, but we aren't.
This can be made more robust through constructs like:

act_on_new_map( targetX, targetY, [](){ do_stuff_to_map() } );

to avoid manual push/pop semantics, so please don't get bogged down in implementation details.

A larger scope of issue here is that the map class is wildly overscoped and in dire need of splitting into multiple classes in order to make any kind of multiple-maps-in-existence features feasible, I referenced this in #22721 and I could swear I had a more detailed writeup elsewhere, but I'm not finding it at the moment. As long as this issue is unaddressed, it is extraordinarily expensive to instantiate and maintain maps other than the main one, and the changes to fragment the map class are massive.

I will not hide that it would also allow for future expansion of the game to allow for more or less temporary additional reality bubbles,

I reflexively push back on this because it's frequently suggested in contexts where it doesn't scale, like "just leave a reality bubble running to process a fire" or "have a reality bubble track NPCs as they move around on the map". It's not out of the question but it needs a good and feasible use case, which hasn't come up.

@PatrikLundell
Copy link
Contributor Author

Thanks for weighing in.

Essentially redirecting the current map pointer for the duration of a call chain would work, as long as it can essentially be guaranteed this is restored no matter what (throwing exceptions, early returns, etc.). It can be noted that this can be done where the differing map context is set up, so it probably shouldn't be that many places.

I completely agree the map class operations have far too large a reach and that the logic ought to be restructured by someone capable (which I don't think is me), and they definitely should just use the current get_map() down the call chain if we want the functionality to be reasonably safe from failing when modified.

It can be noted that we already have the expense of maintaining multiple maps concurrently when mapgen is present, although these maps are (fortunately) small/tinymaps, so I don't really see a map redirection logic changing much in that department: as far as I can see it should cost about as much as it does now, or marginally less, if we need to juggle fewer map references (although we'd have to fetch them instead). When it comes to full size maps, yes, they're quite expensive to set up, as the explosion handling shows.

I completely agree that adding a lot of active reality bubbles will have to be out of the question for at least the medium term, due to the amount of processing required, and how we already have cases where the processing gets bogged down by things such as fire processing.
The use case I do have that might be worth aiming for in the medium term are various forms of telepresence, in which case we'd need two full sized maps operating for the duration of the session. What I mean by telepresence is situations where the PC is in one location, but acting in another through magical or technological means, such as scrying (passive observation from a potentially invisible, but typically movable focus), or remote operation acting in an area (e.g. magic possession (which adds additional issues if it's not a supported shape), or remotely operation of a "vehicle" which, again, may cause shape issues). The reason you can't just move the reality bubble is that this would make the actual PC invulnerable for the session, as the things that would normally cause the PC to snap out of the immersion aren't processed (and it's a fairly common trope that mages have to be guarded during such sessions, as well as similar to real life sniper situations, where they need a team to protect them and keep track of the situation while their focus is at the other end of a scope).

The fire argument for an extra reality bubble doesn't work, because fire ends up reaching OMT boundaries, with weird consequences if you try to handle it, up to and including the no extra bubble situation where the PC getting burnt to death when the reality bubble hits the boundary of a fire and it then suddenly tries to catch up the lost time and sweeps through the meadow the character is in.

One issue I haven't mentioned that needs to be handled regardless and probably isn't sufficiently handled currently is map cache synchronization. Most mapgen processing avoids things that affect caches, but not all of it. There's a need to ensure cache affecting actions are propagated to all overlapping maps (currently this would be a mapgen map partially overlapping with the reality bubble map, which is the situation we typically have when the reality bubble shifts around). I think this part of the issue needs a fair bit of reviewing as well.

@kevingranade
Copy link
Member

Essentially redirecting the current map pointer for the duration of a call chain would work, as long as it can essentially be guaranteed this is restored no matter what (throwing exceptions, early returns, etc.). It can be noted that this can be done where the differing map context is set up, so it probably shouldn't be that many places.

Yea ideally most use cases can be wrapped in a helper that handles this in just one place and the exceptions that don't fit into that model need to be extra careful.

The use case I do have that might be worth aiming for in the medium term are various forms of telepresence, in which case we'd need two full sized maps operating for the duration of the session. What I mean by telepresence is situations where the PC is in one location, but acting in another through magical or technological means, such as scrying (passive observation from a potentially invisible, but typically movable focus), or remote operation acting in an area (e.g. magic possession (which adds additional issues if it's not a supported shape), or remotely operation of a "vehicle" which, again, may cause shape issues).

So I actually did a version of this in #6951 and it was gross and bad and leaked memory, but it worked surprisingly well. It just needs probably tens of thousands of lines of cleanup to make the couple hundred line prototype not break everything. That ALSO patched into the renderer to overlay "here" and "there" onto the same display.

One issue I haven't mentioned that needs to be handled regardless and probably isn't sufficiently handled currently is map cache synchronization.

Yes this is hairy and bad, and is part of that "break map into a dozen pieces" thing where what we really want is e.g. a flavor of map-like thing that just does data access without the caches so we can just edit some map data and wrap up.

@LyranRenegade
Copy link
Contributor

The game has become notably more unstable during last half year in which I haven't touched it. Random crashes without debug messages that don't seem to repeat themselves usually on exit but sometimes on game-start.. saving, etc..
I cannot offer more than my moral support for this effort but you guys surely have it.

@PatrikLundell
Copy link
Contributor Author

@LyranRenegade: Unfortunately, the issues here are probably not related to the recent issues to any large extent. Instead, those are probably caused by recent changes to other functionality, where unanticipated side effects cause bugs.

  • One recent set of crashes is caused by EoCs (fairly new functionality that's exploding in use) that fail to find either a subject or an object creature (alpha and beta talker, respectively) and then attempts to use a null pointer to call a function of this null object. One example is an EoC that intends to do something when a creature damages another, but then blows up when the damage is caused by e.g. an electric field rather than a creature. This kind of errors can be handled by updating each failing EoC individually to check for the presence of the creature before accessing its information, but that's a churn. It might be possible to devise some way of catching the failure case so it fails rather than blows up in a segfault crash, but that hasn't been done (yet).
  • Another set of crashes is caused by "nowhere" item locations, in particular regarding item wielding, where code blows up when trying to process a nullitem rather than a null pointer which the code expects. I assume this is caused by recent changes somewhere.
  • The weird crashes and failures is relation to saving/loading is probably cause by recent changes, but I'm not aware of causes having been tracked down.

It's the nature of this project to get new/updated functionality followed by crashes and weird behavior when things settle. It's not desirable, but it's something you get when a large number of people with varying skill levels work in parallel and without much coordination on such a large code base.
(I have been the cause of this kind of issues as a result of expanding mapgen functionality to support 3D, as an example of changes that can cause this).

@PatrikLundell
Copy link
Contributor Author

I've made a hack (since I haven't seen any activity from someone qualified), and the results are rather horrible:

I created a swap_map class whose task is to take a "temporary" map and a context string and swap out the current map for this one until the swap_map object expires. I then had this object make use of a current_map class whose only instance was created in the game object and had this object store the current map. The swap_map objects then reference the instance in the game object. I then had the game object set up the current_map object to contain the 'm' map object together with the context string "Official reality bubble" as part of the initiation, and finally had game::get_map() return the current_map object instead of 'm'.

This seemed to work.

I then used a swap_map object in map::loadn where the mapgen smallmap is created and added debug output to the swap_map creator and destructor to trace when maps were swapped in and out, which worked as expected (at least appeared to do so), i.e. a lot of mapgen was done as the PC moved in a vehicle.

Note that this is far from the only place where we'd need such an object, but it was just the most obvious place for a test. In a real implementation it would be needed in most locations where map objects are created.

And then to the horrible part, namely adding debug logging to get_map(), where I ended up filtering out the official reality bubble ones, due to the enormous volume. Despite that, I had almost 4000 get_map calls for the loadn mapgen map in between the swap to it and the swap back. Note that these calls currently use the reality bubble map when they probably shouldn't (but I haven't seen anything obviously different).

It can be noted that there are a number of get_map() calls that are explicitly intended to get the reality bubble map, so those would have to be identified and shunted to a separate operation (these are to check if the map used is the reality bubble one in order to do mapgen specific processing or to carry over map cache changes to the reality bubble. There are also a number of cases of checks against the reality bubble to determine whether to generate sound and visual effects (which are far from the majority of the cases where it's relevant and should be checked),

Another problematic issue is that there are many hundreds of usages of 'm' directly, and these should all need to be rerouted to either get_map() or a new get_official_map() operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
<Suggestion / Discussion> Talk it out before implementing
Projects
None yet
Development

No branches or pull requests

3 participants