Adding Search to Gatsby via Lunrjs

gatsbyhack

Since this blog was migrated to the static site generator, the search feature was removed due to the lack of backend services. I investigated the Search-as-a-Service offerings, such as Algolia and SwiftType, — the popularity of this site hardly can justify the extra costs.

Another option is client-side search: the index is stored in the browser, and the query can be served in the client side without interaction with the backend! I played with Lunr this weekend, and it worked relatively well. I’d like to share some technical details, and hope it is useful if you are in the same camp.

Pre-build the indexes

It takes time for Lunr to build the indexes for large numbers of documents, we can pre-build the indexes in the CreatePages functions. In the gastby-node.js:

exports.createPages = async ({ graphql, actions, reporter }) => {
  // ... ...
  const lunr = require("lunr");

  const index = lunr(function () {
    this.ref("uri");
    this.field("title");
    // ... ... other fields

    result.data.allMarkdownRemark.edges.forEach((edge) => {
      this.add({
        uri: edge.node.fields.uri,
        title: edge.node.frontmatter.title,
        // ... ...
      });
    });
  });

  createPage({
    path: "/search/",
    component: path.resolve("src/templates/search.jsx"),
    context: {
      index,
    },
  });
};
  1. We first configure the index using uri as the key, and then index the fields, such as title.
  2. The GraphQL query results is mapped to the expected document type.
  3. We pass the index as the page context to create the search page.

Dehydrate the index

As the search page needs to maintain its own state, more concretely query and queryResults, we will build the Search component in a more traditional fashion:

class Search extends Component {
  constructor(props) {
    super(props);

    const { index } = props.pageContext;
    this.state = {
      query: '',
      engine: lunr.Index.load(index),
      queryResults: [],
    };
    this.handleChange = this.handleChange.bind(this);
  }
  ... ...
}

In the constructor, we invoke lunr.Index.load to load the index from the context. It seems redundant, but remember the engine is consumed in the browser side, and the index is pre-built, — thus it is implicitly serialized in the JSON format from server side to the browser!

The Search component will render an input element and the query results using the Controlled Components pattern:

  handleChange(e) {
    const query = e.target.value;
    const {engine} = this.state;
    this.setState({
      query,
      queryResults:  query ? engine.search(query) : [],
    })
  }

  render() {
    const {query, queryResults} = this.state;

    return (
      <Layout>
        <input
          value={query}
          onSubmit={e => e.preventDefault()}
          onChange={this.handleChange}
        />
        <ul>
          {
            queryResults.map(item => {
              return (
                <li key={item.ref}>
                  <Link to={item.ref}>
                    {item.ref}
                  </Link>
                </li>
              )
            })
          }
        </ul>
      </Layout>
    );
  }
};

Challenges

We barely scratch the surface for the site search. There exist many technical challenges for better usability:

  • Tokenize the HTML page, and reconstruct the HTML elements for highlighting.
  • Trim the indexes to decrease the memory usage.
  • Custom stemmers for date, tags and other metadata.
  • Fine tune the boost coefficient to improve the relevance.