Integrate Pagefind's Search with Astro: A Complete Setup Guide

Murtuzaali Surti
Murtuzaali Surti

• 7 min read

Pagefind's take on search is quite simple - index your site at build time and host it alongside your static site. The search index sits right alongside the files of your site and it doesn't load all the data upfront.

As mentioned in this HN answer, it only loads relevant search data when you start typing, and you can also load the js client (pagefind.js) conditionally when going for a custom implementation.

With Pagefind, very little is loaded until you type in a search term. Once you start typing, a chunk of the search index is loaded containing your search word(s) and is then queried. This is what drives the performance you're seeing, which is predominantly the time for the data to load after you start typing — the query step itself is near-instant. - liambigelow on Hacker News

With that being said, let's look at how you can integrate pagefind with your Astro site to implement static, site-wide search.

Installing Pagefind #

Install it from a package manager such NPM.

npm i pagefind -D

Building Search Index #

The search index needs to be built at build time in order to query it later. You can do that in your build script itself or you can use a postinstall script in the file package.json. I prefer to put it in the build script itself.

"scripts": {
    "build": "astro build && npx pagefind --site dist"
}

Pagefind also supports a config file named pagefind.yml which you can use to specify the configuration options instead of specifying it in the cli command. The site option here specifies the build directory of your project, and the glob option is for only including the files to be parsed to build the search index.

# pagefind.yml
site: dist
glob: "**/*.{html}"

The build script can be simplified as:

"scripts": {
    "build:astro": "astro build",
    "build:pagefind": "npx pagefind",
    "build": "npm-run-all -s build:astro build:pagefind"
}

The npm-run-all package is a great tool to run cli commands either sequentially or parallelly in a cross-platform way.

This will build the pagefind search index whenever you build your site. But, what about dev mode? Astro won't access files from your dist folder in dev mode, so you need to copy the pagefind files from the dist folder into your public directory (got the idea from this post) which is used for static files.

So, you can build the pagefind index, copy it from the dist to public directory and then run astro dev. This way you can get the latest search index but, it still won't change on hot reload (this is a limitation).

GOTCHA: If you are starting from scratch, i.e. you don't have a previous build, then you must build the site first and then run it in dev mode. It's because pagefind must have a build to build an index upon.

"scripts" : {
    "copy:pagefind:dev": "npx shx cp -r dist/pagefind public/",
    "dev:astro": "astro dev",
    "dev": "npm-run-all -s build:pagefind copy:pagefind:dev dev:astro",
}

Initializing Pagefind #

There are two ways in which you can use pagefind:

  1. With pre-built UI
  2. Custom implementation using Pagefind API

If you want to quickly implement search without worrying too much about the layout, you can go with the default UI provided by pagefind. For that, you need to link the js and css files as shown below:

<!-- this will work in both dev and prod environments as we have copied the pagefind directory locally -->
<link href="/pagefind/pagefind-ui.css" rel="stylesheet">
<script is:inline src="/pagefind/pagefind-ui.js"></script>

And then, initializing the PagefindUI:

<script is:inline>
    window.addEventListener('DOMContentLoaded', (event) => {
        new PagefindUI({ element: "#searchContainer" });
    });
</script>

Although this seems quick and efficient, I always prefer having more control of the search layout and data. And that's why I prefer using the Search API provided by pagefind which lets you implement search however you want.

Using the API

In order to use the API, you must import a file named pagefind.js from the pagefind directory. This file acts as a js client for the API. It's beneficial to import this file only once when the user focuses on the search input element and store it in a variable for future use.

let pagefind;
document.querySelector("#search").addEventListener("focus", async (e) => {
    if (!pagefind) {
        pagefind = await import("/pagefind/pagefind.js");
        pagefind.init();
    }
})

Wrap the above code in a <script defer> tag for it to be considered as a client-side script. Astro won't process and bundle it. It will be shipped to the client as it is. You can also consider the is:inline attribute.

The init() method invoked above is for loading core dependencies and metadata about the site. It's an optional method but, you are pre-loading dependencies when you call it when the element gains focus, instead of it getting automatically invoked when a search method is called after the user starts typing.

You can also specify search options before you initialize the pagefind instance:

document.querySelector("#search").addEventListener("focus", async (e) => {
    if (!pagefind) {
        pagefind = await import("/pagefind/pagefind.js");
        await pagefind.options({
            ranking: {
                // Decreasing the pageLength parameter is a good way to suppress very short pages that are undesirably ranking higher than longer pages. (max: 1, min: 0)
                pageLength: 0.5,
            },
        });
        pagefind.init();

    }
})

Then, when the user starts typing, you can call the search method as shown below and get the search results:

document.querySelector("#search").addEventListener("keydown", async (e) => {
    const results = await (await pagefind.search(e.target.value)).results;
    for (const result of results) {
        const data = await result.data();
        console.log(data, data.meta.title, data.excerpt);
        // do required DOM manipulation
    }
})

There's a catch. You are firing pagefind.search on every keydown event, that's quite expensive because the user is still typing a word but, you are fetching search results for nearly every letter/substring. The solution to this problem is debouncing. You delay the execution of the function of the event up to a certain time by grouping the number of calls made during that time, up until no further event is fired during that time. In short, the search method won't be fired until the user stops typing after some time.

I personally use the lodash library for debouncing functionality but pagefind provides a native method of implementing a debounced search, namely, pagefind.debouncedSearch and you should definitely check that out.

Pre-processing the Script

If you want Astro to bundle and process the script, or you want to import any npm packages in it, then you will have to make a couple of changes in the script as well as the build process.

Firstly, all the <script> tags are processed by Astro, unless you add an attribute like is:inline or defer to the script, or want to opt-out of script processing. So, you will want to remove any of those attributes for the script to be processed.

Secondly, the dynamic import won't work. Typescript will complain about it because it can't resolve it yet as it doesn't exist yet. You are copying that /pagefind/pagefind.js after the build to the public directory. So, you need to append a ?url param.

let pagefind: any;
const searchField = document.querySelector("#search") as HTMLInputElement;

searchField.addEventListener("focus", async () => {
    if (!pagefind) {
        pagefind = await import("/pagefind/pagefind.js?url"); // appending `?url` because typescript will complain about the module not existing
        pagefind.init();
    }
});

Lastly, if you don't have a previous build (i.e. if you are starting from scratch), the dynamic /pagefind/pagefind.js?url import will throw an error when you run npm run build because rollup won't be able to resolve that file as it doesn't exist yet (because we haven't built pagefind index and copied it to /public - that happens after you build). In order to overcome that, you need to declare the dynamic import as an external dependency in rollup config.

export default defineConfig({
  // ...
  vite: {
    build: {
      rollupOptions: {
        external: '/pagefind/pagefind.js?url'
      }
    }
  },
});

Also, if you are running the site in dev mode without having a previous build, make sure to build the site once for the first time. You can add a new script for that in package.json file: "dev:build": "npm-run-all -s build dev".

Now, you can import and use lodash or any npm package in your script.

import lodash from 'lodash';
// ...
const searchResults = document.querySelector("#searchResults") as HTMLElement;
searchField.addEventListener("keydown", lodash.debounce(async () => {
    const results = await (await pagefind.debouncedSearch(searchField.value, {}, 700))?.results;
    if (results) {
        for (const result of results) {
            const data = await result.data();
            console.log(data, data.meta.title, data.excerpt);
            // DOM manipulation
        }
    }
}, 700));
// ...

That was all about using pagefind to implement search in a static site built using astro.

Bonus

Pagefind has some defaults of selecting the metadata about the page, for example, it will return the content of the first h1 element on your page as the title of the page. To override that, you can use the data-pagefind-meta attribute and set it's value to title on the element you wish to be returned as the title.

<title data-pagefind-meta="title">Page</title>

Syntackle itself uses Pagefind for global search, press Ctrl+K or click the search button to see it in action.


Running PostgreSQL using Docker

Previous

Create a Node Server using Hono under 10 Lines of Code

Next