When searching using Google, it usually returns fragments of pages that match your search query. I’ve been working on a full-text search example ↗ and wanted to add a similar feature after querying my database.
One way of going about it would be to create a function that uses a regular expression to find all matches of the query, truncates the text, and adds the <mark>
HTML element.
The
<mark>
HTML element represents text that’s marked or highlighted. You can read further here ↗.
What if I told you Postgres already does that?
Postgres provides a function called ts_headline
that accepts the following arguments:
-
Configuration(Optional) – define your config or Postgres uses the
default_text_search_config
↗ -
The text being queried
-
Search query –
to_tsquery
-
Options (Optional) – extend defaults such as
MaxWords
andMinWords
to_tsquery
is a function that converts text to thetsquery
data type.tsquery
is the text that stop words have been removed, stemmed, and lexically analyzed.
In this example, the search queries are foundation and model.
SELECT
ts_headline('english',
'Database management systems are typically designed to organize data according to a specific pattern.
These patterns, called database types or database models, are the logical and structural foundations that determine how individual pieces of data are stored and managed.
There are many different database types, each with their own advantages and limitations.
The relational model, which organizes data into cross-referenced tables, rows, and columns, is often considered to be the default paradigm.',
to_tsquery('english', 'foundation & model')
);
Response:
<b>models</b>, are the logical and structural <b>foundations</b> that determine
how individual pieces of data are stored
You can execute the above query against your database without having any tables or data setup.
Postgres sets the following default options for you:
MaxWords
,MinWords
(integers) – number of characters to be truncated before and after the text. Default is 35 and 15, respectivelyShortWord
(integers) – words shorter than this length are removed from the result if it’s part of the beginning or end. Default: eliminates common English articlesHighlightAll
(boolean) – default:false
. Postgres returns only one document with the first matchMaxFragments
(integer) – defines the number of fragments to be returnedStartSel
,StopSel
(strings) – specifies the string or HTML element used to highlight the matched text. Default is<b></b>
HTML elementFragmentDelimiter
(string) – When more than one fragment is returned, the fragments are separated by ”…”
In this example, the search queries are data and type. The Options are in string format and you can extend it as follows:
SELECT
ts_headline('english',
'Database management systems are typically designed to organize data according to a specific pattern.
These patterns, called database types or database models, are the logical and structural foundations that determine how individual pieces of data are stored and managed.
There are many different database types, each with their own advantages and limitations.
The relational model, which organizes data into cross-referenced tables, rows, and columns, is often considered to be the default paradigm.',
to_tsquery('english', 'data & type'),
'HighlightAll=true, StartSel=<mark>, StopSel=</mark>, MaxFragments=2,FragmentDelimiter=💩'
);
Response:
Database management systems are typically designed to organize
<mark>data</mark> according to a specific pattern. These patterns, called
database <mark>types</mark> or database models, are the logical and structural
foundations that determine how individual pieces of <mark>data</mark>💩stored
and managed. There are many different database <mark>types</mark>, each with
their own advantages and limitations. The relational model, which organizes
<mark>data</mark> into cross-referenced tables, rows, and columns, is often
considered
You can read further in the Postgres docs ↗.
Go forth and query thine database. 🤺