Highlighting Results from a Full-Text Search Query with PostgreSQL

|4 min read

When searching using Google, it usually returns fragments of pages that match your search query. I’ve been working on a full-text search example and wanted to add a similar feature after querying my database.

Duckduckgo search results

One way of going about it would be to create a function that uses a regular expression to find all matches of the query, truncates the text, and adds the <mark> HTML element.

The <mark> HTML element represents text that’s marked or highlighted. You can read further here.

What if I told you Postgres already does that?

Postgres provides a function called ts_headline that accepts the following arguments:

  • Configuration(Optional) – define your config or Postgres uses the default_text_search_config

  • The text being queried

  • Search query – to_tsquery

  • Options (Optional) – extend defaults such as MaxWords and MinWords

to_tsquery is a function that converts text to the tsquery data type. tsquery is the text that stop words have been removed, stemmed, and lexically analyzed.

In this example, the search queries are foundation and model.

SELECT
	ts_headline('english',
	'Database management systems are typically designed to organize data according to a specific pattern.
	 These patterns, called database types or database models, are the logical and structural foundations that determine how individual pieces of data are stored and managed.
	 There are many different database types, each with their own advantages and limitations.
	 The relational model, which organizes data into cross-referenced tables, rows, and columns, is often considered to be the default paradigm.',
	 to_tsquery('english', 'foundation & model')
);

Response:

<b>models</b>, are the logical and structural <b>foundations</b> that determine
how individual pieces of data are stored

You can execute the above query against your database without having any tables or data setup.

Postgres sets the following default options for you:

  • MaxWords, MinWords (integers) – number of characters to be truncated before and after the text. Default is 35 and 15, respectively
  • ShortWord (integers) – words shorter than this length are removed from the result if it’s part of the beginning or end. Default: eliminates common English articles
  • HighlightAll (boolean) – default: false. Postgres returns only one document with the first match
  • MaxFragments (integer) – defines the number of fragments to be returned
  • StartSel, StopSel (strings) – specifies the string or HTML element used to highlight the matched text. Default is <b></b> HTML element
  • FragmentDelimiter(string) – When more than one fragment is returned, the fragments are separated by ”…”

In this example, the search queries are data and type. The Options are in string format and you can extend it as follows:

SELECT
	ts_headline('english',
	'Database management systems are typically designed to organize data according to a specific pattern.
	 These patterns, called database types or database models, are the logical and structural foundations that determine how individual pieces of data are stored and managed.
	 There are many different database types, each with their own advantages and limitations.
	 The relational model, which organizes data into cross-referenced tables, rows, and columns, is often considered to be the default paradigm.',
	 to_tsquery('english', 'data & type'),
	 'HighlightAll=true, StartSel=<mark>, StopSel=</mark>, MaxFragments=2,FragmentDelimiter=💩'
);

Response:

Database management systems are typically designed to organize
<mark>data</mark> according to a specific pattern. These patterns, called
database <mark>types</mark> or database models, are the logical and structural
foundations that determine how individual pieces of <mark>data</mark>💩stored
and managed. There are many different database <mark>types</mark>, each with
their own advantages and limitations. The relational model, which organizes
<mark>data</mark> into cross-referenced tables, rows, and columns, is often
considered

You can read further in the Postgres docs.

Go forth and query thine database. 🤺