Search: Return Results for URLs #5936

StephenB87 · 2023-08-31T18:46:34Z

Context

A user should be able to search for a URL and the return results should match that URL. For example, if I am searching for https://example1.abc.com/, the results should match the entire URL. Currently, they do not match, when using default settings for the search plugin:

Description

When searching for a URL, the results that are returned should match the entire URL string. For example, if I am searching for https://example1.abc.com/, the results should match the entire URL, not https, example1, xyz com. See screenshot above for context.

Use Cases

Site search plugin can be further enhanced to return match results for URLs. Many organizations have documentation that contains multiple URLs. Someone might not know what the URL is for, but if they can search the URL they can find relevant documentation.

Visuals

No response

Before submitting

I have read and followed the change request guidelines.
I have verified that my idea is a change request and not a bug report.
I have ensured that, to the best of my knowledge, my idea will benefit the entire community.
I have included relevant links to the documentation, related issues, and discussions to underline the need for my idea.

The text was updated successfully, but these errors were encountered:

MaximilianKohler · 2023-09-01T15:38:25Z

Yeah, I've been needing this as well. #4384 (comment)

I figured out a few things:

Separator (docs) https://squidfunk.github.io/mkdocs-material/setup/setting-up-site-search/?h=separator#special-characters

Their default is:

plugins:
  - search:
      separator: '[\s\-,:!=\[\]()"/]+|(?!\b)(?=[A-Z][a-z])|\.(?!\d)|&[lg]t;'

This part: (?!\b)(?=[A-Z][a-z]) results in no results for PubPeer, only pubpeer. So I removed it.

This part: \.(?!\d) results in pubpeer.com returning all results for all words with com in them. So I removed it.

Some parts of this are needed to return any results for https://example.com:

[\s\-,:!=\[\]()"`/]+

But it also returns results for all instances of https.

Removing / gives results for https://example.com but incomplete results for example.com.

Removing : screws it up completely. So that's where I'm stuck at.

squidfunk · 2023-09-02T12:34:21Z

Thanks for suggesting. I'm not sure that many users need this, but we can definitely let it sit here for a while to collect some feedback. We might consider shipping it with our new search functionality.

MaximilianKohler · 2023-09-02T13:18:24Z

@squidfunk does it need to be a big deal? Can't it be something as "simple" as changing options/code like I listed above?

squidfunk · 2023-09-02T13:32:05Z

I don't think so. From what you write in #5936 (comment), you want to match:

example.com
https://example.com/foo/bar

I don't think you can achieve it by just changing the search separator, because currently, you can either have 1. or 2., but not both. You will also match instances of https that are then incorporated into the ranking of documents and match URls with other domains. It's just how lunr.js currently works. What you essentially want is that exact matches rank higher (or outcompete) partial matches + span rankings, i.e., words that occur together should rank higher than those that don't. I've fiddled around with lunr.js for a long time, and judging from what I know about its architecture, it's just not possible.

This is one of the reasons why I'm currently rewriting search from scratch.

squidfunk · 2023-10-30T20:07:00Z

Thanks again for suggesting. While I'm not sure whether the general case of adding URLs to indexes is something many users want, I'm confident that I found a good design that allows for defining different separators for different fields:

const config: Config<Document> = {
  schemas: [
    {
      kind: "term",
      data: {
        separator: whitespace, // "foo bar baz" -> "foo", "bar", "baz"
        fields: [
          { name: "foo", from: ({ foo }) => foo },
          { name: "baz", from: ({ bar }) => bar?.baz }
        ]
      }
    },
    {
      kind: "term",
      data: {
        separator: none, // "http://example.com" -> "http://example.com"
        fields: [
          { name: "url", from: ({ url }) => url }
        ]
      }
    }
  ]
}

This will also allow to index the same terms in different ways, including camel- and pascal-case terms, e.g. MkDocs as mkdocs, as well as mk + docs, which was asked for a few times. I hope to finish the new implementation soon, working hard on it!

squidfunk · 2023-11-03T10:06:48Z

A further thing that came to my mind: the approach I mentioned above will work for when URLs are provided as metadata, i.e., as a separate field, but not if they are contained in the text. No tokenization separator will allow to cover that. However, we could essentially add a preprocessing step to the text to extract URLs and then index them accordingly. I'm not sure we will offer this functionality from the start, but as long as you specify URLs as metadata to documents ( or use the actual URL of the document for indexing), it should work in the first iteration of the new search.

squidfunk · 2023-11-07T14:11:06Z

Please see the announcement in #6307.

squidfunk · 2023-11-20T15:05:32Z

I invite you to try the 2nd search research preview – I think this should solve the issue at hand:

If you add characters to the separator that are contained in URLs, they will be tokenized as well, but that should not matter, since they should now be found when they match correctly. Additionally, tokenizing gives you the opportunity to search for path parts:

squidfunk · 2024-02-16T11:40:55Z

@StephenB87 @MaximilianKohler did any of you check out the research preview? Does it improve results?

MaximilianKohler · 2024-02-16T20:11:55Z

I have been following your progress in these issues but I haven't installed any beta/preview versions. Your screenshots look like the new versions are a full solution though.

squidfunk · 2024-02-17T01:59:02Z

It would be good, to receive some feedback, especially looking at #5936 (comment) which I got no reaction on. It's one thing to raise feature requests or report things that can be improved, but it's another one to provide feedback on solutions that we propose to address those issues 😉 Otherwise, when we release the new version and it does not fully solve what was requested here, it might be too late. That's why we try to get early feedback.

MaximilianKohler · 2024-02-17T03:24:26Z

I didn't quite understand that comment. Do you mean "naked link" (https://squidfunk.github.io/mkdocs-material) vs contained in text?

Naked links would be the most important for me, but contained-in-text would be great if possible.

I didn't try any of the PRs/betas/tests because I'm not too familiar with switching back and forth between them and the master branch, and I think I read something about it being difficult or problematic. Oh, it might have been this #6372, including your note to not use it in production.

Looking at the installation docs I'm not too sure how switching back and forth works. I'm a novice, on Windows, and managed to get the master installed with pip, but I don't understand where it's been installed, how installing a PR would work or replace or conflict with the existing install, where the PR would be and how to use it, etc.

That reminds me, I was curious how you made the front page with this and couldn't find the source. Here's why I was curious. I posted that to a few other places but didn't find an answer.

squidfunk · 2024-02-17T05:11:34Z

I didn't try any of the PRs/betas/tests because I'm not too familiar with switching back and forth between them and the master branch, and I think I read something about it being difficult or problematic. Oh, it might have been this #6372, including your note to not use it in production.

I added instructions in the research preview:

pip install git+https://github.com/squidfunk/mkdocs-material.git@spike/search-preview-2

Additionally, somebody asked how to switch back in the same issue:

pip install mkdocs-material

Regardless, we'll keep working on this. If the solution we come up with doesn't entirely meet your requirements, you can always customize or fork the theme to get it exactly to your taste ☺️ I was just hoping for feedback, and think it is a fair ask, given that I try to solve the problem reported here, but I get that installing branches might be too much of an ask.

MaximilianKohler · 2024-02-17T05:57:58Z

I added instructions in the #6372:

That's just one command. I realize that command fetches the PR but it doesn't answer my other questions.

Additionally, somebody asked #6372 (comment) in the same issue:

Yes, I saw. It doesn't answer my other questions.

kamilkrzyskow · 2024-03-09T02:58:51Z

@MaximilianKohler

I'm not too sure how switching back and forth works. I'm a novice, on Windows, and managed to get the master installed with pip, but I don't understand where it's been installed, how installing a PR would work or replace or conflict with the existing install, where the PR would be and how to use it, etc.

Installing a package with the same name should override the previous one, or detect there is no need to install the version, so it doesn't install, in that case you can add the --force-reinstall flag at the end.
You can also use pip show mkdocs-material to see the Location where it's currently installed.
You can (and should) use a Virtual Environment to separate the package installations:
https://docs.python.org/3/tutorial/venv.html
This is mentioned in the reproduction guide:
https://squidfunk.github.io/mkdocs-material/guides/creating-a-reproduction/

You can also watch a whole guide on how to setup a development environment if you want to be extra thorough:
https://www.youtube.com/@coreyms/search?query=visual%20code (not affiliated with him, nor did I use those guides, but he provides Mac and Windows guides and from the key points it seems detailed enough, despite being 4 years old the guides shouldn't be too outdated ✌️)

Most of the questions you asked, are one or 2 google/chatgpt searches away, and I'm not sure if they are in scope of the material theme's documentation.

That reminds me, I was curious how you made the front page with this and couldn't find the source. Here's why I was curious. I posted that to a few other places but didn't find an answer.

Searching the discussions board (including closed discussions) for parallax or landing page will lead you to answers:

The landing page is a custom addition for the theme's documentation, the source code can be viewed by sponsors with access to Insiders. The images are under a special licence and can't be reused iirc. It's not a supported feature, just a customization so there is no easy configuration settings for it.

EDIT: Fixup for the "It's not a supported feature". The custom home page is supported, with custom templates, and an example is provided in the community version:

https://squidfunk.github.io/mkdocs-material/reference/#setting-the-page-template
https://github.com/squidfunk/mkdocs-material/blob/master/src/overrides/home.html
just the parallax version isn't public.

squidfunk added the needs investigation Issue must be investigated by the maintainers label Aug 31, 2023

squidfunk added change request Issue requests a new feature or improvement and removed needs investigation Issue must be investigated by the maintainers labels Sep 2, 2023

squidfunk mentioned this issue Nov 7, 2023

Towards better documentation search #6307

Open

24 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search: Return Results for URLs #5936

Search: Return Results for URLs #5936

StephenB87 commented Aug 31, 2023

MaximilianKohler commented Sep 1, 2023 •

edited

Loading

squidfunk commented Sep 2, 2023

MaximilianKohler commented Sep 2, 2023

squidfunk commented Sep 2, 2023 •

edited

Loading

squidfunk commented Oct 30, 2023 •

edited

Loading

squidfunk commented Nov 3, 2023

squidfunk commented Nov 7, 2023

squidfunk commented Nov 20, 2023

squidfunk commented Feb 16, 2024

MaximilianKohler commented Feb 16, 2024

squidfunk commented Feb 17, 2024 •

edited

Loading

MaximilianKohler commented Feb 17, 2024

squidfunk commented Feb 17, 2024

MaximilianKohler commented Feb 17, 2024

kamilkrzyskow commented Mar 9, 2024 •

edited

Loading

Search: Return Results for URLs #5936

Search: Return Results for URLs #5936

Comments

StephenB87 commented Aug 31, 2023

Context

Description

Related links

Use Cases

Visuals

Before submitting

MaximilianKohler commented Sep 1, 2023 • edited Loading

squidfunk commented Sep 2, 2023

MaximilianKohler commented Sep 2, 2023

squidfunk commented Sep 2, 2023 • edited Loading

squidfunk commented Oct 30, 2023 • edited Loading

squidfunk commented Nov 3, 2023

squidfunk commented Nov 7, 2023

squidfunk commented Nov 20, 2023

squidfunk commented Feb 16, 2024

MaximilianKohler commented Feb 16, 2024

squidfunk commented Feb 17, 2024 • edited Loading

MaximilianKohler commented Feb 17, 2024

squidfunk commented Feb 17, 2024

MaximilianKohler commented Feb 17, 2024

kamilkrzyskow commented Mar 9, 2024 • edited Loading

MaximilianKohler commented Sep 1, 2023 •

edited

Loading

squidfunk commented Sep 2, 2023 •

edited

Loading

squidfunk commented Oct 30, 2023 •

edited

Loading

squidfunk commented Feb 17, 2024 •

edited

Loading

kamilkrzyskow commented Mar 9, 2024 •

edited

Loading