Towards better documentation search #6307

https://github.com/squidfunk/mkdocs-material/issues/6307

Description squidfunk opened on Nov 7, 2023 · edited by squidfunk Owner Background As you may have read in one of my recent comments, we’re currently revising our search implementation. The current search is based on Lunr.js, which is also the search engine that MkDocs has been using the time Material for MkDocs started in 2016. In the beginning, we felt that this was a good fit, as Lunr.js allows searching in the browser without the need for an external service. This makes deploying documentation much simpler, since search is and should always be a central component to each and every good documentation site.

In the past years, we’ve invested hundreds of hours into making search better. With the help of our awesome sponsors, we were able to ship rich search previews, support for more sophisticated tokenizers, support for Chinese, as well as better highlighting. Additionally, we made search almost twice as fast. However, in order to progress, and solve the many open issues that are related to search, we decided to throw out Lunr.js. There are several reasons for that, the most important of which that it is unmaintained since 2020. Additionally, Lunr.js only allows ranking with BM25, which is a good basis, but almost all issues that are related to weird rankings are caused by the fact that BM25 is not ideal for stable typeahead search. It was meant for full-word retrieval and is almost impossible to tame for the many different use cases that we’ve seen in the wild. Again, we’ve invested a lot of time to improve the situation, but we’ve reached an end where this doesn’t make sense anymore.

This is the reason why we’re currently releasing so few new features, because we’re putting our entire energy in finishing the new search implementation. We’re already almost en-par with Lunr.js’ functionality, but now have an entirely modular architecture, which will allow us to swap out everything. Yes, I mean everything: the ranking algorithm, wildcard matching, the inverted index implementation, yada, yada, yada. Solving the documentation search problem is a personal affair for me. I really hate that there’s not yet a solution that works reliably, can run anywhere, and is modular so it can be easily customised.

This is what we’re building.

As you may already suspect, this is a pretty big project, which is why it is taking so long. We feel, it is the perfect moment to venture into this problem, because we gathered a lot of use cases that we can now balance and optimise for. However, please understand that this takes time, so I kindly ask you to be a little more patient. Development on this project is after all 99% done by me, @squidfunk, and we’re rewriting something that millions of users are using each and every day. That needs care.

Where we’re currently at First of all: search will be a separate, new project! This means you will be able to use the same engine in your other projects as well. Additionally, here’s a non-exhaustive list of things we’re planning to ship in the first version:

Modular engines: Search should not only allow to search for text in an inverted index, but also support new use cases like nearest neighbours on vector embeddings. We designed the new search so that multiple engines can be configured for the same set of documents, e.g. store text and title in an inverted index, and store embeddings in a vector store – all from the same document. They should then be searched and ranked together. Additionally, document fields can be tokenised differently, and the tokenising algorithm can be based on a regular expression, or a function, allowing for maximum flexibility.

Search: Return Results for URLs #5936

Powerful plugin system: Plugins are first-class-citizens! The new search is completely modularised. For example, the inverted index itself does not compute scores – it’s implemented as a plugin. This means, alternative ranking plugins can be implemented. The plugin architecture is dead simple, but insanely powerful. From my current knowledge, I know of nothing that could not be implemented as a plugin.

Search: customise behaviour with hooks #4980

Document metadata – authors should be able to configure which parts of document metadata should be included with the documents, so that documents can be indexed with custom metadata. Currently, only text, title, location and tags are included. The new search should allow to configure which fields are indexed how, i.e., how they should render in search results, if they should render at all (think keywords or aliases), etc. This would also allow to slice the search into different sections, e.g. for the blog, API reference, etc., by allowing the author to render those as tabs in the search bar.

Search: index document by custom metadata #3174 Search: facet/filter by metadata, e.g. tags or categories #4983 How to restrict search scope to specific page? #4965

Better accuracy – the current implementation uses Lunr.js, which uses OR to combine terms. This is not ideal for document search, as users reported repeatedly that they expect to narrow the number of search results with more terms entered. The new search will make it easy to switch to AND as a default combinator.

Detect misspellings – if a typo is entered, e.g. instlal, the engine should detect the typo and correct it to install. Many engines support this, so we should find a way to do the same.

Offline-first – it goes without saying that one of the highest priorities is that search will keep working offline. The new search implementation will, of course, still not need a server.

Span queries – searches like “single page application” should be ranked higher when those words appear together. This removes the need for exact search within quotes, which is something many non-technical users don’t even know is possible in search engines like Google or Bing. The goal is that entering a few words should be enough, no special syntax should be needed.

Compound words – the current search allows to index words like PascalCase as Pascal and Case by using clever lookaheads, but it also means that searching for the entire term in lowercase pascalcase will not return any results. This should be fixed in a way that both can be found.

Search: find PascalCase, pascalcase and case at the same time #6632

Document hierarchy – the search index should be organised hierarchically, so that the explicit navigation structure and implicit table of contents hierarchy yield more context to search results, helping to disambiguate repetitive documentation.

Search: breadcrumbs to signal context of a document #3787 Here’s a list of ideas, partially based on open change requests, which we will implement after the first version is out and reached a stable state. We believe all of those features will be great additions:

Stemming and segmentation – of course, search should be multi-lingual and support language-specific stemming and text segmentation for iconographic languages like Japanese and Chinese. We should check whether we can use browser-based APIs for text segmentation, or if not available, maybe fall back to a polyfill. Alternatively (or ideally?), segmentation could be done during build time, so that the payload shipped to the user is even smaller. Additionally, stopwords should be allowed to be provided by the author. Here’s an interesting stemmer implementation.

Compact summaries the current search indexes the HTML and divides it into blocks on the top-level. If a long list is contained, and a single word matches in that list, the entire list is rendered as part of the search results. This is not ideal, since the user has to scroll through a lot of irrelevant content. The new search should provide an intelligent summarisation algorithm, possibly with a configurable way to detect endings of sentences and paragraphs.

Search: compact summaries for long code blocks and lists #4278

Federated search it should be possible to federate the search with other sites that are built with the same engine, so that a single MkDocs site and a federated search can be built from multiple MkDocs projects. This could also be applied to different versions. The author must be able to influence the rendering of federated results.

Search: cross-site search federation #5230

Caching – since we’re re-architecting the entire search implementation, we can leverage caching, so that the index can be completely persisted and restored from memory without the need to rebuild it every time.

Search: caching for faster startup #5391

Deep linking – the entire search must be serializable to a URL query string, so that the query the user entered, as well as all filters that were selected can be directly linked to.

Adaptive rendering – The search result list should be much smaller than it currently is, only including text when the search has only a few results, adapting to what the user expects. When the user only enters a few characters, a lot of documents will be returned. The more characters are entered, the less results will be returned, and at some threshold, the document text should be shown. This threshold should be configurable and tuneable.

Fuzzy finder – as opposed to the common tokenisation and ranking with BM25, search should support to index datums like file paths, class names, attribute names, etc. with a fuzzy finder approach, similar to what IDEs like VS Code do when you’re using auto complete.

Can Search support fuzzy finder on filepath? #4466

Allow to use search as a component in Markdown – allow the user to embed search bars at arbitrary locations, possibly re-configured.

Search bar location in main page #6858

Rich results – not only code blocks should be renderable, but also Mermaid diagrams and code annotations.

Recommend term removal – if a search matches no results, recommend to the user which term can be removed.

Search history – we could allow to preserve the search history, which means that users have an easy way to go back to previous search results without having them to re-enter again. Entries in the search history could be cleared out by the user one-by-one.

Synonyms – authors should be allowed to provide synonyms for specific words. We need to think of a good way to signal to the user that a synonym was found, or we just replace the word with the synonym in search results.

Index non-Markdown sources – It should be possible to index other contents alongside Markdown, including HTML, PDFs, etc., possibly with the help of plugins.

Arbitrary sections – authors should be allowed to add custom sections like Admonitions or tabs (or whatever) to the search, in order to provide an even more flexible structure.

Search separator testing – we should provide a method for authors to easily test the search separator on their site.

This list is far from complete. We have so many more ideas, which we’ll share when the time has come. We’ll keep this issue updated, so feel free to subscribe or check back from time to time. We hope to push our the first candidate before the end of this year! Thank you for your patience and for your trust in Material for MkDocs.

Sub-issues Sub-issues 11 of 1111 of 11 Issues completed Search: caching for faster startup #5391 Search: facet/filter by metadata, e.g. tags or categories #4983 Search: breadcrumbs to signal context of a document #3787 Search: compact summaries for long code blocks and lists #4278 Search: customise behaviour with hooks #4980 Search: Return Results for URLs #5936 Search: cross-site search federation #5230 Search: index document by custom metadata #3174 Search: highlight the first result to indicate current selection #6333 Search: improve search highlighting in hidden components (tabs, details, …) #4125 Search: find PascalCase, pascalcase and case at the same time #6632 Activity

squidfunk added announcement Issue announces news or new features on Nov 7, 2023

squidfunk pinned this issue on Nov 7, 2023

squidfunk mentioned this on Nov 8, 2023 Search: facet/filter by metadata, e.g. tags or categories #4983 Search: breadcrumbs to signal context of a document #3787 Search: compact summaries for long code blocks and lists #4278 Search: customise behaviour with hooks #4980 Search: Return Results for URLs #5936 Search: caching for faster startup #5391 Search: cross-site search federation #5230 Search: index document by custom metadata #3174 Upgrade to Mermaid 10.6.1 #6265

HonkingGoose mentioned this on Nov 8, 2023 Weird search result order renovatebot/renovatebot.github.io#337 strausmann strausmann commented on Nov 9, 2023 strausmann on Nov 9, 2023 Great ideas and very great features for searching. The most important thing for us is that the search continued to work completely offline and without a web server. We use MkDocs as documentation, it has to work offline on the plane or on the ship.

squidfunk squidfunk commented on Nov 9, 2023 squidfunk on Nov 9, 2023 Owner Author @strausmann that is our priority. It will definitely work offline (our prototype already does), but we’ll also add interesting new features like search federation (merging search indexes with other MkDocs sites) for which you obviously need to be online. All of those are optional and will degrade gracefully when offline, of course.

strausmann strausmann commented on Nov 9, 2023 strausmann on Nov 9, 2023 The search federation is of course one of the most interesting features for us too. The documentation also runs on a web server. Several mkdocs instances run side by side for different topics. If the search for one mkdocd now also returns the contents of the other instances, that’s brilliant. Of course, these instances then run on a web server in a closed environment.

94 remaining items RonGros RonGros commented on Jul 2, 2025 RonGros on Jul 2, 2025 Hi @squidfunk - really love Material for mkdocs, but the search is probably the biggest painpoint…. I know it’s in the works, but is there a tentative timeline to when we can expect to see a first version?

Thanks!

squidfunk mentioned this on Jul 23, 2025 Use of Math.random() in wordcut.js – Potential cryptographic weakness #8350

nealkruis mentioned this on Aug 20, 2025 Mkdocs search issues cse-sim/cse#567

StevenMaude mentioned this on Aug 29, 2025 Spike: how to improve the docs search functionality opensafely-core/job-server#5231

squidfunk mentioned this on Sep 3, 2025 Quick Search for URLs/File Names #2316 ZnPdCo ZnPdCo commented on Sep 25, 2025 ZnPdCo on Sep 25, 2025 · edited by ZnPdCo @strausmann that is our priority. It will definitely work offline (our prototype already does), but we’ll also add interesting new features like search federation (merging search indexes with other MkDocs sites) for which you obviously need to be online. All of those are optional and will degrade gracefully when offline, of course.

I’m really happy to see such an exciting feature 😍. Although federated search might not be necessary for one of my small projects, could you support configuring a self-hosted search server API — so that when online it uses my own API, and when offline it gracefully degrades? Some of our users have limited devices, and loading a 20+ MB index can be a bit heavy.

Our custom fork of Material that we modified still has a few small issues, for example:

The input box has no debouncing, so if you type a bit fast it triggers a lot of API calls (especially with Chinese pinyin input). Some users habitually finish typing a query and press Enter immediately. Because the server’s response is slower than the user’s Enter, Enter by default selects the best match — but the currently shown content is completely wrong (it’s the previous search result), so it often navigates to the wrong page. The search server is asynchronous: when typing something like ssr it may return results for both ss and ssr. Sometimes ssr’s response arrives first and then ss comes later, which causes the UI to display incorrect results. I know these bugs are due to my crude and incomplete changes, but could someone teach me how to handle these problems? Or is there a plan to add a “use self-hosted search server” configuration in a future release? Thanks🙏🏼🌟

Edit: I saw a similar discussion in #2799. The advantages of server side search are mentioned here.

squidfunk squidfunk commented on Sep 25, 2025 squidfunk on Sep 25, 2025 Owner Author I’m really happy to see such an exciting feature 😍. Although federated search might not be necessary for one of my small projects, could you support configuring a self-hosted search server API — so that when online it uses my own API, and when offline it gracefully degrades? Some of our users have limited devices, and loading a 20+ MB index can be a bit heavy.

@ZnPdCo Thanks for asking. Yes, client-side as well as server-side search will be supported, and any mix between them. Many projects are small enough so that they don’t need a dedicated server, but as projects grow, this eventually becomes a requirement. We’ll also provide a self-hostable server in the future that is compatible with our client side search.

Regarding your other questions you’re better of creating a discussion to ask for help by the community.

squidfunk squidfunk commented on Sep 25, 2025 squidfunk on Sep 25, 2025 · edited by squidfunk Owner Author We’ve posted an update on the status of our foundational work, that has kept us busy over the last 16 months. The new search will be part of the upcoming initial release, which will be out before the end of the year: #8461

squidfunk squidfunk commented on Nov 6, 2025 squidfunk on Nov 6, 2025 · edited by squidfunk Owner Author I’m happy to announce the release of the new search – Disco!

We’ve released Disco yesterday as part of Zensical, a modern static site generator, that we’ve written from scratch over the past 16 months, designed and developed with the experience of maintaining Material for MkDocs for 10 years. Zensical represents our path forward, especially in regards to the MkDocs maintenance situation.

→ Read the full announcement on our blog

Disco Here’s how the new search looks – you can try it out on Zensical’s documentation!

Image Disco’s modular architecture (as previously discussed in this thread) is extremely flexible. The initial release includes minimal configuration options beyond the separator, which now features an improved default. Over the coming weeks, we’ll be gathering user feedback and adding configuration options accordingly.

In early 2026, we’ll release Disco as a standalone OSS project, so other SSGs and projects can benefit as well. You can take a look at the roadmap for Disco to see our planned enhancements.

Going forward Our priority is now achieving feature parity in Zensical to ensure the smoothest possible transition path for users. Compatibility with Material for MkDocs is our no 1 goal.

Warning

Regarding Material for MkDocs integration: Disco will not be backported to Material for MkDocs. Given that MkDocs is unmaintained and locked off, with no MkDocs team member triaging or merging proposed PRs, and the constraints this places on us and the ecosystem, we’ve made the strategic decision to focus our efforts on Zensical. If you want to learn more: we’ve shared our perspective in our update on our foundational work.

squidfunk mentioned this on Nov 6, 2025 Ability to open search bar using cmd+k #8486 do-me do-me commented on Nov 6, 2025 do-me on Nov 6, 2025 Thanks for the update. Given the circumstances with mkdocs this looks like a reasonable way forward. Best of luck to you folks with zensical and thanks for the years of effort you put into mkdocs material!

MateuszKubuszok MateuszKubuszok commented on Nov 6, 2025 MateuszKubuszok on Nov 6, 2025 Looking forward to it! (Currently blocked by mk-docs-macros, but as soon as new [the components system] will let me reimplement my macros I will migrate.

provisota mentioned this on Nov 12, 2025 Break search plugin out into separate package provisota/mkdocs#59 squidfunk squidfunk commented on Nov 27, 2025 squidfunk on Nov 27, 2025 Owner Author Update: We recently benchmarked Material for MkDocs’s search (Lunr.js) against Zensical (Disco) https://fosstodon.org/@squidfunk/115616586114803206

Result

Indexing is 4x faster Querying is up to 20x faster Image

feasgal mentioned this on Dec 1, 2025 exact phrase matching in search. zensical/zensical#161

Updated: