Adding Search to Gatsby via Lunrjs
gatsby hackSince this blog was migrated to the static site generator, the search feature was removed due to the lack of backend services. I investigated the Search-as-a-Service offerings, such as Algolia and SwiftType, ā the popularity of this site hardly can justify the extra costs.
Another option is client-side search: the index is stored in the browser, and the query can be served in the client side without interaction with the backend! I played with Lunr this weekend, and it worked relatively well. Iād like to share some technical details, and hope it is useful if you are in the same camp.
Pre-build the indexes
It takes time for Lunr to build the indexes for large numbers of documents, we
can pre-build the indexes in the CreatePages
functions. In the
gastby-node.js
:
exports.createPages = async ({ graphql, actions, reporter }) => {
// ... ...
const lunr = require("lunr");
const index = lunr(function () {
this.ref("uri");
this.field("title");
// ... ... other fields
result.data.allMarkdownRemark.edges.forEach((edge) => {
this.add({
uri: edge.node.fields.uri,
title: edge.node.frontmatter.title,
// ... ...
});
});
});
createPage({
path: "/search/",
component: path.resolve("src/templates/search.jsx"),
context: {
index,
},
});
};
- We first configure the index using
uri
as the key, and then index the fields, such astitle
. - The GraphQL query
results
is mapped to the expected document type. - We pass the
index
as the page context to create the search page.
Dehydrate the index
As the search page needs to maintain its own state, more concretely query
and
queryResults
, we will build the Search
component in a more traditional
fashion:
class Search extends Component {
constructor(props) {
super(props);
const { index } = props.pageContext;
this.state = {
query: '',
engine: lunr.Index.load(index),
queryResults: [],
};
this.handleChange = this.handleChange.bind(this);
}
... ...
}
In the constructor
, we invoke lunr.Index.load
to load the index
from the
context. It seems redundant, but remember the engine
is consumed in the
browser side, and the index
is pre-built, ā thus it is implicitly serialized
in the JSON format from server side to the browser!
The Search
component will render an input
element and the query results
using the Controlled Components pattern:
handleChange(e) {
const query = e.target.value;
const {engine} = this.state;
this.setState({
query,
queryResults: query ? engine.search(query) : [],
})
}
render() {
const {query, queryResults} = this.state;
return (
<Layout>
<input
value={query}
onSubmit={e => e.preventDefault()}
onChange={this.handleChange}
/>
<ul>
{
queryResults.map(item => {
return (
<li key={item.ref}>
<Link to={item.ref}>
{item.ref}
</Link>
</li>
)
})
}
</ul>
</Layout>
);
}
};
Challenges
We barely scratch the surface for the site search. There exist many technical challenges for better usability:
- Tokenize the HTML page, and reconstruct the HTML elements for highlighting.
- Trim the indexes to decrease the memory usage.
- Custom stemmers for date, tags and other metadata.
- Fine tune the boost coefficient to improve the relevance.