Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing PathHelpers functionality to RelativePath and AbsolutePath in a standardized way. #10

Open
Al12rs opened this issue Aug 17, 2023 · 6 comments

Comments

@Al12rs
Copy link
Contributor

Al12rs commented Aug 17, 2023

Currently IPath is a very anemic interface:

namespace NexusMods.Paths;
/// <summary>
/// Abstracts an individual path.
/// </summary>
public interface IPath
{
/// <summary>
/// Gets the extension of this path.
/// </summary>
Extension Extension { get; }
/// <summary>
/// Gets the file name of this path.
/// </summary>
RelativePath FileName { get; }
}

It covers AbsolutePath and RelativePath. The objective is to add common functionality and then implement the interface also on GamePath from NexusMods.App project.

The goal is to allow common path manipulations on all of these. In particular for Nexus-Mods/NexusMods.App#553

@Al12rs Al12rs self-assigned this Aug 17, 2023
@Al12rs Al12rs added this to MVP Aug 17, 2023
@Al12rs Al12rs moved this to In Progress in MVP Aug 17, 2023
@erri120
Copy link
Member

erri120 commented Aug 17, 2023

I'm guessing you want to add common operations like Combine and GetParent?

@Al12rs
Copy link
Contributor Author

Al12rs commented Aug 17, 2023

Walk, Parts, DropFirst, RelativeTo, etc...

@erri120
Copy link
Member

erri120 commented Aug 17, 2023

Depending on the methods you want to add, it's important to choose the correct return type. If we take Combine as an example:

struct AbsolutePath
{
  AbsolutePath Combine(RelativePath part);
}
struct RelativePath
{
  RelativePath Combine(RelativePath part);
}

If you want to add Combine to the interface, you'd have two options:

  1. use generics
interface IPath<TPath> where TPath : IPath, struct
{
  TPath Combine(RelativePath part);
}
  1. return the interface
interface IPath
{
  IPath Combine(RelativePath part);
}

The first option would result in identical runtime characteristics as the original code. However, the downside is having to deal with generics, which are infectious by design and require every dependent to have the same restrictions.

The second option doesn't have generics and seems like the way to go. However, our path types are all value types. A conversion from AbsolutePath to IPath requires boxing of the value type. Our path code is designed to be a low-cost abstraction over raw strings, and I'd like to keep it that way.

@Al12rs Al12rs changed the title Consolidate IPath abstraction, add common methods Add missing PathHelpers functionality to RelativePath and AbsolutePath in a standardized way. Aug 23, 2023
@Al12rs
Copy link
Contributor Author

Al12rs commented Aug 23, 2023

Using paths was a pain the last two times I had to deal with them, there is lots of missing useful functionality and lots of unexpected differences between AbsolutePath and RelativePath.

There are 3 levels of things that can be done here:

1. Adding missing functionality from PathHelpers to both

This can be done without breaking existing API, simply adding new methods.
Ideally using a common interface, like IPath<TPath>, so common functionality is defined in a shared space and functionality is accessible from FileTreeNode.

This is minimum effort high reward, but to not break existing api some inconsistencies will remain (stuff like FileName returning string instead of RelativePath and coexisting with Name).

2. API breaking changes to cleanup inconsistencies:

  • Fix inconsistencies between the two paths classes
  • Remove some methods that use strings (old StartsWith, EndsWith etc.)
  • General cleanup

3. Change how AbsolutePath is implemented internally

Currently AbsolutePath internally stores two strings, a Directory and a FileName.
This is only done for AbsolutePath, RelativePath or GamePath don't work like this at all.

A potentially better approach could be to instead have it implemented as:
RootPart + RelativePath. This is how GamePath and HashRelativePath also work (GameFolderType + RelativePath, Hash + RelativePath .

One of the main differences and difficulties with AbsolutePath is the fact that it requires FileSystem.OS to be able to correctly handle the Root part, depending on the current OS.
Keeping the root component separate could help in this regard.

It would also allow a lot of the common functionality of paths to easily be reused since the majority of the AbsolutePath is already a RelativePath.

If we want the performance advantages of keeping FileName separate, then we can still to that, simply in RelativePath, which would in turn propagate the change to all other path types since they are all implemented as RootComponent+ RelativePath.

@Al12rs
Copy link
Contributor Author

Al12rs commented Aug 23, 2023

From @Sewer56 on discord regarding point 3.

It's designed that way to maximize runtime efficiency.

Initially the path system used was from Wabbajack, where in more extreme cases, there was a need to work with 250,000+ file paths at a given moment.

The App had some additional needs that WJ did not originally have; especially the need to handle paths cross platform (e.g. a Windows tool can return a path with backslashes, but Linux uses forward slashes).

I've also uncovered (on Slack), that the design we used at the time was not very efficient speed wise; while it would save on memory to some degree, many operations were marginally slower.

So Tim, Seb and I discussed and created a spec for efficient handling of paths in a cross-platform manner, which is here https://github.com/Nexus-Mods/NexusMods.App/blob/main/docs/decisions/backend/0003-paths.md.

At the time; we didn't yet know what kinds of file counts we would be handling (as we now know; not as many); so I've optimized the paths code to support simultaneous handling of as many file paths as possible. Namely a workflow where we scan for files on disk; and want to do something with them as fast as possible in an efficient manner.

The reason the paths are split into Directory+FileName is down to how the Operating Systems' search APIs work (e.g. NtQueryDirectoryFileEx on Windows). You query a directory, and you get a list of files and/or directories contained within that folder.

Therefore, you can share a single string instance for all files within 1 directory, and only the file names have their own string instances; so if there's 100 files in 1 folder, the folder string is only stored once.

@Sewer56
Copy link
Member

Sewer56 commented Aug 23, 2023

One of the main differences and difficulties with AbsolutePath is the fact that it requires FileSystem.OS to be able to correctly handle the Root part, depending on the current OS.

As a quick note.

Originally (before you joined onboard) we did actually have AbsolutePath contain just the directory+filename tuple in its structure.

There was a concern however (think @halgari raised it), that if an API consumer makes use of those paths, the operation should ideally run on the 'true' source of the path. More specifically, there existed a risk that someone could create an AbsolutePath meant to be used with e.g. InMemoryFileSystem (we use this for testing), pass it around some methods and try to consume it with RealFileSystem. Such an error, could in some situations or contexts be potentially hard to catch.

While at the current moment we use RealFileSystem pretty much exclusively in the App; there's a very likely chance that for the likes of e.g. Running Tools on Linux; we might wind up alternative instances of IFileSystem that offer a view inside of a WINEPREFIX, for example.

As @halgari was working with code that relied heavily on paths; they were swinging both ways, as there are benefits to both a design where IFileSystem is separate (from an efficiency standpoint), and one where it is part of the structure. They tried a little bit of both ways and we wound up including it as part of the struct.

I somehow didn't think of it at the time; but in retrospective, I can't see why we can't have both to reap benefits of both designs. Much like the Base Class Libraries have IEnumerable and IEnumarable<T>, I don't see why we can't have AbsolutePath and AbsolutePath<T>.

@Al12rs Al12rs removed this from MVP Apr 8, 2024
@Al12rs Al12rs removed their assignment Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants