-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URL Scheme Proposal #3520
Comments
Do I understand correctly that with the new system when someone changes name of book, chapter etc, BookStack will be still able to retreive (or redirect?) to the proper address with the new trail, thanks to the unchangable ID? If so then it seems reasonable, same as replacing integer ID to more unpredictable one. But then I wonder how will be such ID generated, because It seems to be kidna too short for true pseudo randomization, unless it wil perform additional check if it is untaken. The separator visual value is hard to asses for me, but such binary question (separator, Yay or nay?) could be investigated with some simple survey., Or even make survey with multiple possible separators |
I personally don't see an issue with the current URL scheme itself - I actually use something very similar myself in an open-source project. The first URL component lists the item type, followed by the path to the item - for example:
My biggest pet peeve with the system in Bookstack is that a change to a page name or book name, changes the URLs used to access it. In a system designed for documentation, where links directly to content are pretty much expected, this is a big issue. My suggestions would be to either: a) keep the existing scheme but generate a separate "permalink" based on an identifier (not the DB ID.) When a page/chapter/shelf is created, generate a pseudo-random token of suitable length (e.g. ta19en2uan) and allow users to link to this permalink - e.g. /permalink/ta19en2uan - but the rest of Bookstack would still use the current scheme. The link would be a flat structure so you don't have to worry about nested pages/chapters etc. When a browser visits a permalink, Bookstack could retrieve the item based on the identifier (you could maybe include the item type in the link so you know what type it is - e.g. /permalink/book/ta19en2uan) and then redirects to whatever the current (non-permanent) URL of the item is, so search engines don't see it as duplicate content. This should be a temporary redirect so browsers do not cache the non-permanent link in place of the permanent link. E.g. 302 redirect /permalink/book/ta19en2uan --> /book/my-cool-book b) Alternatively, keep the existing scheme but whenever a page title changes, store a history of the URLs it has been accessible under. If accessing a URL would return a 404, check the history table to see if it was previously used and redirect to the current URL of the item it was used for. E.g.
|
Personally, I really like the proposed change. Sure, the URLs may be slightly ugly but it solves the very real problem of breaking links if and when the name of a document is updated. It's a URL, not a work of art. It doesn't have to be beautiful - I mean have you seen SharePoint URLs? I think |
Does it though? What happens to the URL “/p-4b72a/:/books/my-awesome-book/pages/my-cool-page” if you rename “My Cool Page” to “My Other Cool Page”? Yes the unique ID is still the same, but the page name is still part of the URL, so would that change to “/p-4b72a/:/books/my-awesome-book/pages/my-other-cool-page”, in which case you’ve got the same problem - the previous URL is now broken. Or are we saying that Bookstack would only use the UID part of the URL to find the matching content? In which case, the URL “/p-4b72a/:/books/my-awesome-book/pages/my-cool-page” and “/p-4b72a/:/something-completely-random” would arrive at the same piece of content? In which case, what is the purpose/benefit of the trail? EDIT: just seen Stack Overflow does exactly this. “/questions/1234/whatever” will get you to question ID 1234, whatever you put in place of “whatever”. However it does redirect to the correct URL of “/questions/1234/the-question-title” to avoid duplicate content issues arising with search engines.
I sure have! They’re horrendous. |
Hi, i like the proposed schema as well and would prefer a page/chapter lookup via UID. In my use case i am consuming Bookstack data exclusively via API and handling the navigation between pages myself. That means i'm parsing the target urls from content and rewriting it with pageid. This only works as long as page is not renamed. As soon as that happens, the link gets broken and since revision data is not available via API i have no means to lookup/fix those links automatically. For this reason i would prefer the proposed url schema and using UID per default in page content when cross linking to another page. Not sure why the API interface needs to change though, keeping Integer based Id for all operations would be still fine for me, i.E. api/pages/{id}, as long as the new UID is returned as result with page object. And yes, the separator feels a little bit ugly in the url, maybe it could be defined differently or removed alltogether. Just some suggestions: To be honest, even keeping the int based Id would be fine, the main issue for me is preventing having broken links after page being renamed, thus some kind of fixed, convention-based and parsable identifier in url is needed and for that case even current int Id's would be allright, imho. Thanks. |
Thanks all for your input so far. Some feedback on the responses:
Yes, that is correct.
Yeah, We'd scan the DB to ensure uniqueness. We already do this with content slugs.
We already have these available (albeit rather unused) for pages. The main thing I don't want is non-alignment between a permalink system and actual browser/resulting URLs. If a user can't have the same benefits from copying the URL from the browser URL bar I see that as a fail, and hence why I've been hesitant to expand the current ID-based system.
Yeah, this is another approach on the table, but I wanted to explore the changes to the URL system first to see if a wider set of issues could be solved, upon just the linking aspect.
End placement gets more complex to parse out, especially where we're allowing a lot of customizability in the trail part. Not impossible, but probably not worth the move to the end for the complexity it could add upon having stable base url patterns.
Yes, BookStack would only use the UID part to identify the content. As per the existing slugs in content URLs, it allows context to be provided in the URL alone. This is mentioned in the "History & Purpose" section of the proposal. As reflected in the proposal, you could configure the trail empty to have clean UID-only urls if desired.
The change away from the current integer ids would be to help avoid current potential security considerations. Jumps in id, or lack of access to certain id, can indicate existence of hidden content. Not an issue in many environments but will be a consideration in some. In addition, these new ids could set-up for future migration to having the different content types in shared table-space in the database (Longer term thinking though). Just to confirm though, if it helps, you can still currently use Personal Thoughts - UpdatedI'm still unsure about this overall, probably now less convinced than before. I think this may be mass-optimizing for too many problems that hardly actually are problematic in reality. Additional targeted addressing of URL changes would likely solve 90% of actual problems this whole proposal addresses, without causing a painful migration. |
Hi @ssddanbrown, thanks for the reply.
The problem is, that i don't have any control over the content. The editors/maintainers will simply use what's most convenient for them, meaning they will either use the built in URL picker element which doesn't produce the mentioned "permalink" or they just copy the url from adress bar, they won't bother selecting some text snippets to create permalinks. If this would be configurable (i.E. URL Picker would create permalinks instead of slug base urls), that would mitigate my current problem, although there would be still a small issue with manually pasted urls. |
I'm not a fan of the proposed change. The only issue we've faced with the current URLs is that if you move a page, the pretty URL changes. However, we mitigated that by using the page's permalink by hovering over the page title and clicking the pop-over copy button to get the permalink. A sizeable portion of our documentation in BookStack is linked to from an external system to specific pages via their permalink. Changing the unique ID from an integer as it is today to a different value would be a breaking change that would be a monumental effort to adjust all of our links or risk them not working at some undetermined time in the future. I tend to think that using a UID instead of the plain old database incrementing IDs is security through obscurity and not worth the added complexity here. Yes, it's possible for someone to deduce that if the page ID in their URL is 123 that they could try to access page 122, but the real effort would be ensuring the permission system is working appropriately to deny them access if they shouldn't be able to see that page in the first place. Our instance of BookStack is for internal-only access for our documentation. What could someone possibly have to gain from knowing that we have thousands of pages based on the ID? To me, it's not that useful to know. I'd be curious to know how many people are actually using BookStack in a truly public manner (not just internal company documentation as we do) and how much obscurity would really benefit them. |
Would it be possible to review the permalink visibility when addressing this? Users currently have to highlight a portion of the page content to get the UID/permalink but I think there should be an option to display this on the page automatically from the settings. We find that users typically copy the URL from the browser's address bar but this breaks when pages are moved or renamed. |
@cbbaaron I agree. Even if it was another button with the other actions on the right side of the page that when clicked copied the permalink to the clipboard that would be helpful. |
I think overall the URL scheme should be improved but the current proposed scheme feels awkward to me. You could use something like Hashids to generate reversible unique IDs from the database ID for each entity type, which alleviate having to run a large migration across the whole database. Then use each URL prefix (or just use Hashids\Hashids;
$hashids = new Hashids('', 10);
echo $hashids->encode(1); // 1 => 'VolejRejNm'
$shelfHash = new Hashids('s', 10);
echo $shelfHash->encode(1); // 1 => 'x9JgqK6emq'
$bookHash = new Hashids('b', 10);
echo $bookHash->encode(1); // 1 => 'WM6epYXdKv'
$pageHash = new Hashids('p', 10);
echo $pageHash->encode(1); // 1 => 'xm0kebQ7Vq' Another suggestion I propose is reducing the URL prefixes for entities from plural to singular, while leaving the plural names (e.g.
|
As of v22.09 the main pain-point, that this proposal would have addressed (breaking of internal cross-links), should now be much less of an issue. Therefore I think it'd be especially not worthwhile now to move ahead with this proposal, or a variation of it, as I don't think it'd be addressed enough of a fundamental need for the cost and confusion it would require. Thanks everyone for your input, it has been very valuable to guide my thoughts and understand the actual needs & desires at play here. |
Hi @ssddanbrown, The Problem still exists that when someone shares a URL to a bookstack instance and the URL changed, the link just breaks. I didnt even know about Permalinks until today. Since your biggest concern about this change (if i understand correctly) was that existing Databases would break, i wonder if now - having the .zip import feature making Migrations super easy - this could come to the table again? On the other hand, like @ghaberek mentioned above, you could try to stick to the existing database scheme with the db ID, and just hash it for the URL Scheme. Could you please state your thoughts on this one more time? |
@whoamiafterall I would not reconsider, and I'd be more set on keeping it as-is (rather than a full overhaul) due to the relative infrequency I hear of issues around the scheme. The ZIP exports/imports don't really have an impact regarding this (they're not a backup format to be used to fix potential issues, and the concerns were with maintaining compatibility in general, not for specific database breakages). If there are common issues, I'd be open to improving the old URL resolution (by properly tracking old slugs/paths rather than the semi-hack of using revisions like we do now) as a main focused approach to that issue. |
Thanks for your reply. That makes sense, and your suggestion seems like a good solution. In addition to that, a lot of people will link to the content on our instance by copying the URL from the adress bar when they share it with their group or even include it in documents. That is why we would be very happy if this issue was adressed :) |
@whoamiafterall I've now opened an issue/request focused on that in #5411. |
History & Purpose
Currently BookStack uses a fixed system for URLs of content within the system which uses "slugs" generated from the name of content. Upon this there's also a fairly hidden id-based system for pages. Examples of existing content urls:
The original idea behind using slugs is provide the user an indication of the likely destination content from the URL alone. A user observing the uri
/books/frogs
would instantly know the link will likely lead to a book about frogs.Over the years since building BookStack a number of cases have arisen which has indicated this scheme is not ideal in some scenarios. This proposal/discussion puts forward a new scheme, to address these scenarios, to use as default going forward.
This is an open discussion to gain feedback, details below are not at all final. Comments are very much welcomed. This is not assured to go ahead, especially if the impact looks to be greater than actual long-term benefits.
Targets
Achieve more "Permanent" URLs by default
Right now URLs use slugs which are generated based upon the name of an item.
Changes to name can cause changes to the URL which can break URL references. We do have a system to help handle these scenarios, by referring to the revisions system, but this does not cover every change and revisions may be pruned.
We do have the
/link/<page_id>
permalinks but these use the database incrementing integers which can leak detail regarding system content (Gap in ids may indicate hidden pages).There are additional ways we could improve the current URL handling on changes but I think it may be better to use a more reliable base system than apply patches.
Allow flexibility of the content URL
While the existing URLs provide good indication of content, this breaks down when other languages are used. The URL path sections, between slugs, is hard-coded to English which may not be understandable to the reader. In addition, we attempt some conversion to latin for some slugs which can completely remove the original content name for some languages.
We've also seen both requests to have a minimal length URL, and a longer, more descriptive, URL.
Proposed Scheme
The proposed new URL scheme is that shown below. This reflects what would be the new default for a "Page" item URL. The default configuration is intended to closely align with the appearance of the existing default URLs. The components of this scheme are broken down in sections below.
Examples for existing item types
UID
This acts as a unique identifier for an item within BookStack. It will be a flexible-length case-insensitive (defaulting to lower case) alpha-numeric string defaulting to minimum 5 characters. This is used instead of a UUID-like ID to keep this short and usable in URLs with little impact. It's case-insensitive for compatibility with case-insensitive systems.
The UID is prefixed with an type letter followed by a hyphen. The type letter represents the content type (
p
= page) which allows type identification from the UID alone while ensuring UIDs are unique across content types. The hyphen separates the type indicator while allowing a pattern to match upon to prevent confusion with other systems URL at the same path prefix.Separator
The separator simply exists to separate the UID from the configurable trail portion of the URL.
This allows us to still use the core ID-based URL for system endpoints (For example
/b-7itu3/create-page
) while having clear separation to the configurable trail to prevent conflicts.Configurable Trail
This portion of the URL would be system-admin configurable as required. It would default to match the current BookStack slug scheme, but we would provide an interface to allow per-content-type (book/page/chapter/shelf) configuration of this trail using static and dynamic-placeholder elements.
The available dynamic-placeholder elements would initially be as follows:
{{name}}
- URL encoded version of the item name.{{slug}}
- Name-generated slug, as we provide now.{{book_slug}}
- - Name-generated slug, as we provide now. Chapters and pages only.This could be expanded on in the future, but the initial implementation goal would be to match existing options.
The trail could be made empty if desired, which would cleanly generate item URLs with no trail and no separator components.
Considerations
/books
endpoint would remain as-is, which may limit the flexibility benefits in scenarios where users want to get away from the default BookStack terms, although this isn't a core goal here.Personal Thoughts
I'm not fully sure on the propose scheme. The separator especially makes it look a little ugly, but it's the closest I could think when thinking about the technical handling and attempting to have a default aligning with the current URL scheme.
After writing this out, It's very hard to assess if such a change would be worth it. The benefits right now are very edge-case based but these may amplify long-term.
The text was updated successfully, but these errors were encountered: