Remove 'db' and 'postgres_search' search backends

pull/7941/head
Matt Westcott 2022-02-08 17:26:27 +00:00 zatwierdzone przez Matt Westcott
rodzic 43edd0c187
commit 00582ba35a
21 zmienionych plików z 4 dodań i 1551 usunięć

Wyświetl plik

@ -59,18 +59,6 @@ Add a `WAGTAIL_SITE_NAME` - this will be displayed on the main dashboard of the
WAGTAIL_SITE_NAME = 'My Example Site'
```
<!--- RemovedInWagtail217Warning (wagtail.search.backends.database will be made the default and will not need to be added explicitly here) -->
Add the `WAGTAILSEARCH_BACKENDS` setting to enable full-text searching:
```python
WAGTAILSEARCH_BACKENDS = {
'default': {
'BACKEND': 'wagtail.search.backends.database',
}
}
```
Various other settings are available to configure Wagtail's behaviour - see [Settings](/reference/settings).
## URL configuration

Wyświetl plik

@ -13,7 +13,6 @@ Wagtail ships with a variety of extra optional modules.
frontendcache
routablepage
modeladmin/index
postgres_search
searchpromotions
simple_translation
table_block

Wyświetl plik

@ -1,130 +0,0 @@
.. _postgres_search:
========================
PostgreSQL search engine
========================
.. warning::
| This search backend is deprecated, and has been replaced by ``wagtail.search.backends.database``. See :ref:`wagtailsearch_backends`.
This contrib module provides a search engine backend using
`PostgreSQL full-text search capabilities <https://www.postgresql.org/docs/current/static/textsearch.html>`_.
.. warning::
| You can only use this module to index data from a PostgreSQL database.
**Features**:
- It supports all the search features available in Wagtail.
- Easy to install and adds no external dependency or service.
- Excellent performance for sites with up to 200 000 pages and stays decent for sites up to a million pages.
- Faster to reindex than Elasticsearch, if you use PostgreSQL 9.5 or higher.
**Drawbacks**:
- Partial matching (``SearchField(partial_match=True)``) is not supported
- ``SearchField(boost=…)`` is only partially respected as PostgreSQL only supports four different boosts.
So if you use five or more distinct values for the boost in your site, slight inaccuracies may occur.
- When :ref:`wagtailsearch_specifying_fields`, the index is not used,
so it will be slow on huge sites.
- Still when :ref:`wagtailsearch_specifying_fields`, you cannot search
on a specific method.
Installation
============
Add ``'wagtail.contrib.postgres_search',`` anywhere in your ``INSTALLED_APPS``:
.. code-block:: python
INSTALLED_APPS = [
...
'wagtail.contrib.postgres_search',
...
]
Then configure Wagtail to use it as a search backend.
Give it the alias `'default'` if you want it to be the default search backend:
.. code-block:: python
WAGTAILSEARCH_BACKENDS = {
'default': {
'BACKEND': 'wagtail.contrib.postgres_search.backend',
},
}
After installing the module, run ``python manage.py migrate`` to create the necessary ``postgres_search_indexentry`` table.
You then need to index data inside this backend using
the :ref:`update_index` command. You can reuse this command whenever
you want. However, it should not be needed after a first usage since
the search engine is automatically updated when data is modified.
To disable this behaviour, see :ref:`wagtailsearch_backends_auto_update`.
Configuration
=============
Language / PostgreSQL search configuration
------------------------------------------
Use the additional ``'SEARCH_CONFIG'`` key to define which PostgreSQL
search configuration should be used. For example:
.. code-block:: python
WAGTAILSEARCH_BACKENDS = {
'default': {
'BACKEND': 'wagtail.contrib.postgres_search.backend',
'SEARCH_CONFIG': 'english',
}
}
As you can deduce, a PostgreSQL search configuration is mostly used to define
rules for a language, English in this case. A search configuration consists
in a compilation of algorithms (parsers & analysers)
and language specifications (stop words, stems, dictionaries, synonyms,
thesauruses, etc.).
A few search configurations are already defined by default in PostgreSQL.
You can list them using ``sudo -u postgres psql -c "\dF"`` in a Unix shell
or by using this SQL query: ``SELECT cfgname FROM pg_catalog.pg_ts_config``.
These already-defined search configurations are decent, but theyre basic
compared to commercial search engines.
If you want better support for your language, you will have to create
your own PostgreSQL search configuration. See the PostgreSQL documentation for
`an example <https://www.postgresql.org/docs/current/static/textsearch-configuration.html>`_,
`the list of parsers <https://www.postgresql.org/docs/current/static/textsearch-parsers.html>`_,
and `a guide to use dictionaries <https://www.postgresql.org/docs/current/static/textsearch-dictionaries.html>`_.
Atomic rebuild
--------------
Like the Elasticsearch backend, this backend supports
:ref:`wagtailsearch_backends_atomic_rebuild`:
.. code-block:: python
WAGTAILSEARCH_BACKENDS = {
'default': {
'BACKEND': 'wagtail.contrib.postgres_search.backend',
'ATOMIC_REBUILD': True,
}
}
This is nearly useless with this backend. In Elasticsearch, all data
is removed before rebuilding the index. But in this PostgreSQL backend,
only objects no longer in the database are removed. Then the index is
progressively updated, with no moment where the index is empty.
However, if you want to be extra sure that nothing wrong happens while updating
the index, you can use atomic rebuild. The index will be rebuilt, but nobody
will have access to it until reindexing is complete. If any error occurs during
the operation, all changes to the index are reverted
as if reindexing was never started.

Wyświetl plik

@ -1,8 +0,0 @@
import django
if django.VERSION >= (3, 2):
# The declaration is only needed for older Django versions
pass
else:
default_app_config = 'wagtail.contrib.postgres_search.apps.PostgresSearchConfig'

Wyświetl plik

@ -1,35 +0,0 @@
import warnings
from django.apps import AppConfig
from django.core.checks import Error, Tags, register
from wagtail.utils.deprecation import RemovedInWagtail217Warning
from .utils import get_postgresql_connections, set_weights
class PostgresSearchConfig(AppConfig):
name = 'wagtail.contrib.postgres_search'
default_auto_field = 'django.db.models.AutoField'
def ready(self):
warnings.warn(
"The wagtail.contrib.postgres_search backend is deprecated and has been replaced by "
"wagtail.search.backends.database. "
"See https://docs.wagtail.org/en/stable/releases/2.15.html#database-search-backends-replaced",
category=RemovedInWagtail217Warning
)
@register(Tags.compatibility, Tags.database)
def check_if_postgresql(app_configs, **kwargs):
if get_postgresql_connections():
return []
return [Error('You must use a PostgreSQL database '
'to use PostgreSQL search.',
id='wagtail.contrib.postgres_search.E001')]
set_weights()
from .models import IndexEntry
IndexEntry.add_generic_relations()

Wyświetl plik

@ -1,710 +0,0 @@
import warnings
from collections import OrderedDict
from functools import reduce
from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
from django.db import DEFAULT_DB_ALIAS, NotSupportedError, connections, transaction
from django.db.models import Avg, Count, F, Manager, Q, TextField, Value
from django.db.models.constants import LOOKUP_SEP
from django.db.models.functions import Cast, Length
from django.db.models.sql.subqueries import InsertQuery
from django.utils.encoding import force_str
from django.utils.functional import cached_property
from wagtail.search.backends.base import (
BaseSearchBackend, BaseSearchQueryCompiler, BaseSearchResults, FilterFieldError)
from wagtail.search.index import AutocompleteField, RelatedFields, SearchField, get_indexed_models
from wagtail.search.query import And, Boost, MatchAll, Not, Or, Phrase, PlainText
from wagtail.search.utils import ADD, MUL, OR
from wagtail.utils.deprecation import RemovedInWagtail217Warning
from .models import IndexEntry
from .query import Lexeme
from .utils import (
get_content_type_pk, get_descendants_content_types_pks, get_postgresql_connections,
get_sql_weights, get_weight)
warnings.warn(
"The wagtail.contrib.postgres_search backend is deprecated and has been replaced by "
"wagtail.search.backends.database. "
"See https://docs.wagtail.org/en/stable/releases/2.15.html#database-search-backends-replaced",
category=RemovedInWagtail217Warning
)
EMPTY_VECTOR = SearchVector(Value('', output_field=TextField()))
class ObjectIndexer:
"""
Responsible for extracting data from an object to be inserted into the index.
"""
def __init__(self, obj, backend):
self.obj = obj
self.search_fields = obj.get_search_fields()
self.config = backend.config
self.autocomplete_config = backend.autocomplete_config
def prepare_value(self, value):
if isinstance(value, str):
return value
elif isinstance(value, list):
return ', '.join(self.prepare_value(item) for item in value)
elif isinstance(value, dict):
return ', '.join(self.prepare_value(item)
for item in value.values())
return force_str(value)
def prepare_field(self, obj, field):
if isinstance(field, SearchField):
yield (field, get_weight(field.boost),
self.prepare_value(field.get_value(obj)))
elif isinstance(field, AutocompleteField):
# AutocompleteField does not define a boost parameter, so use a base weight of 'D'
yield (field, 'D', self.prepare_value(field.get_value(obj)))
elif isinstance(field, RelatedFields):
sub_obj = field.get_value(obj)
if sub_obj is None:
return
if isinstance(sub_obj, Manager):
sub_objs = sub_obj.all()
else:
if callable(sub_obj):
sub_obj = sub_obj()
sub_objs = [sub_obj]
for sub_obj in sub_objs:
for sub_field in field.fields:
yield from self.prepare_field(sub_obj, sub_field)
def as_vector(self, texts, for_autocomplete=False):
"""
Converts an array of strings into a SearchVector that can be indexed.
"""
texts = [(text.strip(), weight) for text, weight in texts]
texts = [(text, weight) for text, weight in texts if text]
if not texts:
return EMPTY_VECTOR
search_config = self.autocomplete_config if for_autocomplete else self.config
return ADD([
SearchVector(Value(text, output_field=TextField()), weight=weight, config=search_config)
for text, weight in texts
])
@cached_property
def id(self):
"""
Returns the value to use as the ID of the record in the index
"""
return force_str(self.obj.pk)
@cached_property
def title(self):
"""
Returns all values to index as "title". This is the value of all SearchFields that have the field_name 'title'
"""
texts = []
for field in self.search_fields:
for current_field, boost, value in self.prepare_field(self.obj, field):
if isinstance(current_field, SearchField) and current_field.field_name == 'title':
texts.append((value, boost))
return self.as_vector(texts)
@cached_property
def body(self):
"""
Returns all values to index as "body". This is the value of all SearchFields excluding the title
"""
texts = []
for field in self.search_fields:
for current_field, boost, value in self.prepare_field(self.obj, field):
if isinstance(current_field, SearchField) and not current_field.field_name == 'title':
texts.append((value, boost))
return self.as_vector(texts)
@cached_property
def autocomplete(self):
"""
Returns all values to index as "autocomplete". This is the value of all AutocompleteFields
"""
texts = []
for field in self.search_fields:
for current_field, boost, value in self.prepare_field(self.obj, field):
if isinstance(current_field, AutocompleteField):
texts.append((value, boost))
return self.as_vector(texts, for_autocomplete=True)
class Index:
def __init__(self, backend, db_alias=None):
self.backend = backend
self.name = self.backend.index_name
self.db_alias = DEFAULT_DB_ALIAS if db_alias is None else db_alias
self.connection = connections[self.db_alias]
if self.connection.vendor != 'postgresql':
raise NotSupportedError(
'You must select a PostgreSQL database '
'to use PostgreSQL search.')
# Whether to allow adding items via the faster upsert method available in Postgres >=9.5
self._enable_upsert = (self.connection.pg_version >= 90500)
self.entries = IndexEntry._default_manager.using(self.db_alias)
def add_model(self, model):
pass
def refresh(self):
pass
def _refresh_title_norms(self, full=False):
"""
Refreshes the value of the title_norm field.
This needs to be set to 'lavg/ld' where:
- lavg is the average length of titles in all documents (also in terms)
- ld is the length of the title field in this document (in terms)
"""
lavg = self.entries.annotate(title_length=Length('title')).filter(title_length__gt=0).aggregate(Avg('title_length'))['title_length__avg']
if full:
# Update the whole table
# This is the most accurate option but requires a full table rewrite
# so we can't do it too often as it could lead to locking issues.
entries = self.entries
else:
# Only update entries where title_norm is 1.0
# This is the default value set on new entries.
# It's possible that other entries could have this exact value but there shouldn't be too many of those
entries = self.entries.filter(title_norm=1.0)
entries.annotate(title_length=Length('title')).filter(title_length__gt=0).update(title_norm=lavg / F('title_length'))
def delete_stale_model_entries(self, model):
existing_pks = (
model._default_manager.using(self.db_alias)
.annotate(object_id=Cast('pk', TextField()))
.values('object_id')
)
content_types_pks = get_descendants_content_types_pks(model)
stale_entries = (
self.entries.filter(content_type_id__in=content_types_pks)
.exclude(object_id__in=existing_pks)
)
stale_entries.delete()
def delete_stale_entries(self):
for model in get_indexed_models():
# We dont need to delete stale entries for non-root models,
# since we already delete them by deleting roots.
if not model._meta.parents:
self.delete_stale_model_entries(model)
def add_item(self, obj):
self.add_items(obj._meta.model, [obj])
def add_items_upsert(self, content_type_pk, indexers):
compiler = InsertQuery(IndexEntry).get_compiler(connection=self.connection)
title_sql = []
autocomplete_sql = []
body_sql = []
data_params = []
for indexer in indexers:
data_params.extend((content_type_pk, indexer.id))
# Compile title value
value = compiler.prepare_value(IndexEntry._meta.get_field('title'), indexer.title)
sql, params = value.as_sql(compiler, self.connection)
title_sql.append(sql)
data_params.extend(params)
# Compile autocomplete value
value = compiler.prepare_value(IndexEntry._meta.get_field('autocomplete'), indexer.autocomplete)
sql, params = value.as_sql(compiler, self.connection)
autocomplete_sql.append(sql)
data_params.extend(params)
# Compile body value
value = compiler.prepare_value(IndexEntry._meta.get_field('body'), indexer.body)
sql, params = value.as_sql(compiler, self.connection)
body_sql.append(sql)
data_params.extend(params)
data_sql = ', '.join([
'(%%s, %%s, %s, %s, %s, 1.0)' % (a, b, c)
for a, b, c in zip(title_sql, autocomplete_sql, body_sql)
])
with self.connection.cursor() as cursor:
cursor.execute("""
INSERT INTO %s (content_type_id, object_id, title, autocomplete, body, title_norm)
(VALUES %s)
ON CONFLICT (content_type_id, object_id)
DO UPDATE SET title = EXCLUDED.title,
title_norm = 1.0,
autocomplete = EXCLUDED.autocomplete,
body = EXCLUDED.body
""" % (IndexEntry._meta.db_table, data_sql), data_params)
self._refresh_title_norms()
def add_items_update_then_create(self, content_type_pk, indexers):
ids_and_data = {}
for indexer in indexers:
ids_and_data[indexer.id] = (indexer.title, indexer.autocomplete, indexer.body)
index_entries_for_ct = self.entries.filter(content_type_id=content_type_pk)
indexed_ids = frozenset(
index_entries_for_ct.filter(object_id__in=ids_and_data.keys()).values_list('object_id', flat=True)
)
for indexed_id in indexed_ids:
title, autocomplete, body = ids_and_data[indexed_id]
index_entries_for_ct.filter(object_id=indexed_id).update(title=title, autocomplete=autocomplete, body=body)
to_be_created = []
for object_id in ids_and_data.keys():
if object_id not in indexed_ids:
title, autocomplete, body = ids_and_data[object_id]
to_be_created.append(IndexEntry(
content_type_id=content_type_pk,
object_id=object_id,
title=title,
autocomplete=autocomplete,
body=body
))
self.entries.bulk_create(to_be_created)
self._refresh_title_norms()
def add_items(self, model, objs):
search_fields = model.get_search_fields()
if not search_fields:
return
indexers = [ObjectIndexer(obj, self.backend) for obj in objs]
# TODO: Delete unindexed objects while dealing with proxy models.
if indexers:
content_type_pk = get_content_type_pk(model)
update_method = (
self.add_items_upsert if self._enable_upsert
else self.add_items_update_then_create)
update_method(content_type_pk, indexers)
def delete_item(self, item):
item.index_entries.using(self.db_alias).delete()
def __str__(self):
return self.name
class PostgresSearchQueryCompiler(BaseSearchQueryCompiler):
DEFAULT_OPERATOR = 'and'
LAST_TERM_IS_PREFIX = False
TARGET_SEARCH_FIELD_TYPE = SearchField
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
local_search_fields = self.get_search_fields_for_model()
# Due to a Django bug, arrays are not automatically converted
# when we use WEIGHTS_VALUES.
self.sql_weights = get_sql_weights()
if self.fields is None:
# search over the fields defined on the current model
self.search_fields = local_search_fields
else:
# build a search_fields set from the passed definition,
# which may involve traversing relations
self.search_fields = {
field_lookup: self.get_search_field(field_lookup, fields=local_search_fields)
for field_lookup in self.fields
}
def get_config(self, backend):
return backend.config
def get_search_fields_for_model(self):
return self.queryset.model.get_searchable_search_fields()
def get_search_field(self, field_lookup, fields=None):
if fields is None:
fields = self.search_fields
if LOOKUP_SEP in field_lookup:
field_lookup, sub_field_name = field_lookup.split(LOOKUP_SEP, 1)
else:
sub_field_name = None
for field in fields:
if isinstance(field, self.TARGET_SEARCH_FIELD_TYPE) and field.field_name == field_lookup:
return field
# Note: Searching on a specific related field using
# `.search(fields=…)` is not yet supported by Wagtail.
# This method anticipates by already implementing it.
if isinstance(field, RelatedFields) and field.field_name == field_lookup:
return self.get_search_field(sub_field_name, field.fields)
def build_tsquery_content(self, query, config=None, invert=False):
if isinstance(query, PlainText):
terms = query.query_string.split()
if not terms:
return None
last_term = terms.pop()
lexemes = Lexeme(last_term, invert=invert, prefix=self.LAST_TERM_IS_PREFIX)
for term in terms:
new_lexeme = Lexeme(term, invert=invert)
if query.operator == 'and':
lexemes &= new_lexeme
else:
lexemes |= new_lexeme
return SearchQuery(lexemes, search_type='raw', config=config)
elif isinstance(query, Phrase):
return SearchQuery(query.query_string, search_type='phrase')
elif isinstance(query, Boost):
# Not supported
msg = "The Boost query is not supported by the PostgreSQL search backend."
warnings.warn(msg, RuntimeWarning)
return self.build_tsquery_content(query.subquery, config=config, invert=invert)
elif isinstance(query, Not):
return self.build_tsquery_content(query.subquery, config=config, invert=not invert)
elif isinstance(query, (And, Or)):
# If this part of the query is inverted, we swap the operator and
# pass down the inversion state to the child queries.
# This works thanks to De Morgan's law.
#
# For example, the following query:
#
# Not(And(Term("A"), Term("B")))
#
# Is equivalent to:
#
# Or(Not(Term("A")), Not(Term("B")))
#
# It's simpler to code it this way as we only need to store the
# invert status of the terms rather than all the operators.
subquery_lexemes = [
self.build_tsquery_content(subquery, config=config, invert=invert)
for subquery in query.subqueries
]
is_and = isinstance(query, And)
if invert:
is_and = not is_and
if is_and:
return reduce(lambda a, b: a & b, subquery_lexemes)
else:
return reduce(lambda a, b: a | b, subquery_lexemes)
raise NotImplementedError(
'`%s` is not supported by the PostgreSQL search backend.'
% query.__class__.__name__)
def build_tsquery(self, query, config=None):
return self.build_tsquery_content(query, config=config)
def build_tsrank(self, vector, query, config=None, boost=1.0):
if isinstance(query, (Phrase, PlainText, Not)):
rank_expression = SearchRank(
vector,
self.build_tsquery(query, config=config),
weights=self.sql_weights
)
if boost != 1.0:
rank_expression *= boost
return rank_expression
elif isinstance(query, Boost):
boost *= query.boost
return self.build_tsrank(vector, query.subquery, config=config, boost=boost)
elif isinstance(query, And):
return MUL(
1 + self.build_tsrank(vector, subquery, config=config, boost=boost)
for subquery in query.subqueries
) - 1
elif isinstance(query, Or):
return ADD(
self.build_tsrank(vector, subquery, config=config, boost=boost)
for subquery in query.subqueries
) / (len(query.subqueries) or 1)
raise NotImplementedError(
'`%s` is not supported by the PostgreSQL search backend.'
% query.__class__.__name__)
def get_index_vectors(self, search_query):
return [
(F('postgres_index_entries__title'), F('postgres_index_entries__title_norm')),
(F('postgres_index_entries__body'), 1.0),
]
def get_fields_vectors(self, search_query):
return [
(SearchVector(
field_lookup,
config=search_query.config,
), search_field.boost)
for field_lookup, search_field in self.search_fields.items()
]
def get_search_vectors(self, search_query):
if self.fields is None:
return self.get_index_vectors(search_query)
else:
return self.get_fields_vectors(search_query)
def _build_rank_expression(self, vectors, config):
rank_expressions = [
self.build_tsrank(vector, self.query, config=config) * boost
for vector, boost in vectors
]
rank_expression = rank_expressions[0]
for other_rank_expression in rank_expressions[1:]:
rank_expression += other_rank_expression
return rank_expression
def search(self, config, start, stop, score_field=None):
# TODO: Handle MatchAll nested inside other search query classes.
if isinstance(self.query, MatchAll):
return self.queryset[start:stop]
elif isinstance(self.query, Not) and isinstance(self.query.subquery, MatchAll):
return self.queryset.none()
search_query = self.build_tsquery(self.query, config=config)
vectors = self.get_search_vectors(search_query)
rank_expression = self._build_rank_expression(vectors, config)
combined_vector = vectors[0][0]
for vector, boost in vectors[1:]:
combined_vector = combined_vector._combine(vector, '||', False)
queryset = self.queryset.annotate(_vector_=combined_vector).filter(_vector_=search_query)
if self.order_by_relevance:
queryset = queryset.order_by(rank_expression.desc(), '-pk')
elif not queryset.query.order_by:
# Adds a default ordering to avoid issue #3729.
queryset = queryset.order_by('-pk')
rank_expression = F('pk')
if score_field is not None:
queryset = queryset.annotate(**{score_field: rank_expression})
return queryset[start:stop]
def _process_lookup(self, field, lookup, value):
lhs = field.get_attname(self.queryset.model) + '__' + lookup
return Q(**{lhs: value})
def _connect_filters(self, filters, connector, negated):
if connector == 'AND':
q = Q(*filters)
elif connector == 'OR':
q = OR([Q(fil) for fil in filters])
else:
return
if negated:
q = ~q
return q
class PostgresAutocompleteQueryCompiler(PostgresSearchQueryCompiler):
LAST_TERM_IS_PREFIX = True
TARGET_SEARCH_FIELD_TYPE = AutocompleteField
def get_config(self, backend):
return backend.autocomplete_config
def get_search_fields_for_model(self):
return self.queryset.model.get_autocomplete_search_fields()
def get_index_vectors(self, search_query):
return [(F('postgres_index_entries__autocomplete'), 1.0)]
def get_fields_vectors(self, search_query):
return [
(SearchVector(
field_lookup,
config=search_query.config,
weight='D',
), 1.0)
for field_lookup, search_field in self.search_fields.items()
]
class PostgresSearchResults(BaseSearchResults):
def get_queryset(self, for_count=False):
if for_count:
start = None
stop = None
else:
start = self.start
stop = self.stop
return self.query_compiler.search(
self.query_compiler.get_config(self.backend),
start,
stop,
score_field=self._score_field
)
def _do_search(self):
return list(self.get_queryset())
def _do_count(self):
return self.get_queryset(for_count=True).count()
supports_facet = True
def facet(self, field_name):
# Get field
field = self.query_compiler._get_filterable_field(field_name)
if field is None:
raise FilterFieldError(
'Cannot facet search results with field "' + field_name + '". Please add index.FilterField(\''
+ field_name + '\') to ' + self.query_compiler.queryset.model.__name__ + '.search_fields.',
field_name=field_name
)
query = self.query_compiler.search(self.query_compiler.get_config(self.backend), None, None)
results = query.values(field_name).annotate(count=Count('pk')).order_by('-count')
return OrderedDict([
(result[field_name], result['count'])
for result in results
])
class PostgresSearchRebuilder:
def __init__(self, index):
self.index = index
def start(self):
self.index.delete_stale_entries()
return self.index
def finish(self):
self.index._refresh_title_norms(full=True)
class PostgresSearchAtomicRebuilder(PostgresSearchRebuilder):
def __init__(self, index):
super().__init__(index)
self.transaction = transaction.atomic(using=index.db_alias)
self.transaction_opened = False
def start(self):
self.transaction.__enter__()
self.transaction_opened = True
return super().start()
def finish(self):
self.index._refresh_title_norms(full=True)
self.transaction.__exit__(None, None, None)
self.transaction_opened = False
def __del__(self):
# TODO: Implement a cleaner way to close the connection on failure.
if self.transaction_opened:
self.transaction.needs_rollback = True
self.finish()
class PostgresSearchBackend(BaseSearchBackend):
query_compiler_class = PostgresSearchQueryCompiler
autocomplete_query_compiler_class = PostgresAutocompleteQueryCompiler
results_class = PostgresSearchResults
rebuilder_class = PostgresSearchRebuilder
atomic_rebuilder_class = PostgresSearchAtomicRebuilder
def __init__(self, params):
super().__init__(params)
self.index_name = params.get('INDEX', 'default')
self.config = params.get('SEARCH_CONFIG')
# Use 'simple' config for autocomplete to disable stemming
# A good description for why this is important can be found at:
# https://www.postgresql.org/docs/9.1/datatype-textsearch.html#DATATYPE-TSQUERY
self.autocomplete_config = params.get('AUTOCOMPLETE_SEARCH_CONFIG', 'simple')
if params.get('ATOMIC_REBUILD', False):
self.rebuilder_class = self.atomic_rebuilder_class
def get_index_for_model(self, model, db_alias=None):
return Index(self, db_alias)
def get_index_for_object(self, obj):
return self.get_index_for_model(obj._meta.model, obj._state.db)
def reset_index(self):
for connection in get_postgresql_connections():
IndexEntry._default_manager.using(connection.alias).delete()
def add_type(self, model):
pass # Not needed.
def refresh_index(self):
pass # Not needed.
def add(self, obj):
self.get_index_for_object(obj).add_item(obj)
def add_bulk(self, model, obj_list):
if obj_list:
self.get_index_for_object(obj_list[0]).add_items(model, obj_list)
def delete(self, obj):
self.get_index_for_object(obj).delete_item(obj)
SearchBackend = PostgresSearchBackend

Wyświetl plik

@ -1,46 +0,0 @@
# -*- coding: utf-8 -*-
# Generated by Django 1.10.1 on 2017-03-22 14:53
import django.db.models.deletion
from django.db import migrations, models
import django.contrib.postgres.fields.jsonb
import django.contrib.postgres.search
from ..models import IndexEntry
table = IndexEntry._meta.db_table
class Migration(migrations.Migration):
initial = True
dependencies = [
('contenttypes', '0002_remove_content_type_name'),
]
operations = [
migrations.CreateModel(
name='IndexEntry',
fields=[
('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
('object_id', models.TextField()),
('body_search', django.contrib.postgres.search.SearchVectorField()),
('content_type', models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='contenttypes.ContentType')),
],
options={
'verbose_name_plural': 'index entries',
'verbose_name': 'index entry',
},
),
migrations.AlterUniqueTogether(
name='indexentry',
unique_together=set([('content_type', 'object_id')]),
),
migrations.RunSQL(
'CREATE INDEX {0}_body_search ON {0} '
'USING GIN(body_search);'.format(table),
'DROP INDEX {}_body_search;'.format(table),
),
]

Wyświetl plik

@ -1,49 +0,0 @@
# -*- coding: utf-8 -*-
# Generated by Django 1.11.5 on 2017-10-19 14:53
from __future__ import unicode_literals
import django.contrib.postgres.search
from django.db import migrations
from ..models import IndexEntry
table = IndexEntry._meta.db_table
class Migration(migrations.Migration):
dependencies = [
('postgres_search', '0001_initial'),
]
operations = [
migrations.RunSQL(
'DROP INDEX {}_body_search;'.format(table),
'CREATE INDEX {0}_body_search ON {0} '
'USING GIN(body_search);'.format(table),
),
migrations.RenameField(
model_name='indexentry',
old_name='body_search',
new_name='body',
),
migrations.AddField(
model_name='indexentry',
name='autocomplete',
field=django.contrib.postgres.search.SearchVectorField(default=''),
preserve_default=False,
),
migrations.AddIndex(
model_name='indexentry',
index=django.contrib.postgres.indexes.GinIndex(
fields=['autocomplete'],
name='postgres_se_autocom_ee48c8_gin'),
),
migrations.AddIndex(
model_name='indexentry',
index=django.contrib.postgres.indexes.GinIndex(
fields=['body'],
name='postgres_se_body_aaaa99_gin'),
),
]

Wyświetl plik

@ -1,30 +0,0 @@
# Generated by Django 3.0.6 on 2020-04-24 13:00
import django.contrib.postgres.indexes
import django.contrib.postgres.search
from django.db import migrations, models
class Migration(migrations.Migration):
dependencies = [
('postgres_search', '0002_add_autocomplete'),
]
operations = [
migrations.AddField(
model_name='indexentry',
name='title',
field=django.contrib.postgres.search.SearchVectorField(default=''),
preserve_default=False,
),
migrations.AddIndex(
model_name='indexentry',
index=django.contrib.postgres.indexes.GinIndex(fields=['title'], name='postgres_se_title_b56f33_gin'),
),
migrations.AddField(
model_name='indexentry',
name='title_norm',
field=models.FloatField(default=1.0),
),
]

Wyświetl plik

@ -1,22 +0,0 @@
# Generated by Django 3.1.3 on 2020-11-13 16:55
from django.db import migrations
from wagtail.contrib.postgres_search.models import IndexEntry
table = IndexEntry._meta.db_table
class Migration(migrations.Migration):
dependencies = [
('postgres_search', '0003_title'),
]
operations = [
migrations.RunSQL(
'CREATE INDEX {0}_title_body_concat_search ON {0} '
'USING GIN(( title || body));'.format(table),
'DROP INDEX IF EXISTS {0}_title_body_concat_search;'.format(table),
),
]

Wyświetl plik

@ -1,90 +0,0 @@
from django import VERSION as DJANGO_VERSION
from django.apps import apps
from django.contrib.contenttypes.fields import GenericForeignKey, GenericRelation
from django.contrib.contenttypes.models import ContentType
from django.contrib.postgres.indexes import GinIndex
from django.contrib.postgres.search import SearchVectorField
from django.db import models
from django.db.models.functions import Cast
from django.db.models.sql.where import WhereNode
from django.utils.translation import gettext_lazy as _
from wagtail.search.index import class_is_indexed
from .utils import get_descendants_content_types_pks
class TextIDGenericRelation(GenericRelation):
auto_created = True
def get_content_type_lookup(self, alias, remote_alias):
field = self.remote_field.model._meta.get_field(
self.content_type_field_name)
return field.get_lookup('in')(
field.get_col(remote_alias),
get_descendants_content_types_pks(self.model))
def get_object_id_lookup(self, alias, remote_alias):
from_field = self.remote_field.model._meta.get_field(
self.object_id_field_name)
to_field = self.model._meta.pk
return from_field.get_lookup('exact')(
from_field.get_col(remote_alias),
Cast(to_field.get_col(alias), from_field))
if DJANGO_VERSION >= (4, 0):
def get_extra_restriction(self, alias, remote_alias):
cond = WhereNode()
cond.add(self.get_content_type_lookup(alias, remote_alias), 'AND')
cond.add(self.get_object_id_lookup(alias, remote_alias), 'AND')
return cond
else:
def get_extra_restriction(self, where_class, alias, remote_alias):
cond = where_class()
cond.add(self.get_content_type_lookup(alias, remote_alias), 'AND')
cond.add(self.get_object_id_lookup(alias, remote_alias), 'AND')
return cond
def resolve_related_fields(self):
return []
class IndexEntry(models.Model):
content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
# We do not use an IntegerField since primary keys are not always integers.
object_id = models.TextField()
content_object = GenericForeignKey()
# TODO: Add per-object boosting.
autocomplete = SearchVectorField()
title = SearchVectorField()
# This field stores the "Title Normalisation Factor"
# This factor is multiplied onto the the rank of the title field.
# This allows us to apply a boost to results with shorter titles
# elevating more specific matches to the top.
title_norm = models.FloatField(default=1.0)
body = SearchVectorField()
class Meta:
unique_together = ('content_type', 'object_id')
verbose_name = _('index entry')
verbose_name_plural = _('index entries')
# An additional computed GIN index on 'title || body' is created in a SQL migration
# covers the default case of PostgresSearchQueryCompiler.get_index_vectors.
indexes = [GinIndex(fields=['autocomplete']),
GinIndex(fields=['title']),
GinIndex(fields=['body'])]
def __str__(self):
return '%s: %s' % (self.content_type.name, self.content_object)
@property
def model(self):
return self.content_type.model
@classmethod
def add_generic_relations(cls):
for model in apps.get_models():
if class_is_indexed(model):
TextIDGenericRelation(cls).contribute_to_class(model,
'postgres_index_entries')

Wyświetl plik

@ -1,87 +0,0 @@
# Originally from https://github.com/django/django/pull/8313
# Resubmitted in https://github.com/django/django/pull/12727
# If that PR gets merged, we should be able to replace this with the version in Django.
from django.contrib.postgres.search import SearchQueryField
from django.db.models.expressions import Expression, Value
class LexemeCombinable(Expression):
BITAND = '&'
BITOR = '|'
def _combine(self, other, connector, reversed, node=None):
if not isinstance(other, LexemeCombinable):
raise TypeError(
'Lexeme can only be combined with other Lexemes, '
'got {}.'.format(type(other))
)
if reversed:
return CombinedLexeme(other, connector, self)
return CombinedLexeme(self, connector, other)
# On Combinable, these are not implemented to reduce confusion with Q. In
# this case we are actually (ab)using them to do logical combination so
# it's consistent with other usage in Django.
def bitand(self, other):
return self._combine(other, self.BITAND, False)
def bitor(self, other):
return self._combine(other, self.BITOR, False)
def __or__(self, other):
return self._combine(other, self.BITOR, False)
def __and__(self, other):
return self._combine(other, self.BITAND, False)
class Lexeme(LexemeCombinable, Value):
_output_field = SearchQueryField()
def __init__(self, value, output_field=None, *, invert=False, prefix=False, weight=None):
self.prefix = prefix
self.invert = invert
self.weight = weight
super().__init__(value, output_field=output_field)
def as_sql(self, compiler, connection):
param = "'%s'" % self.value.replace("'", "''").replace("\\", "\\\\")
template = "%s"
label = ''
if self.prefix:
label += '*'
if self.weight:
label += self.weight
if label:
param = '{}:{}'.format(param, label)
if self.invert:
param = '!{}'.format(param)
return template, [param]
class CombinedLexeme(LexemeCombinable):
_output_field = SearchQueryField()
def __init__(self, lhs, connector, rhs, output_field=None):
super().__init__(output_field=output_field)
self.connector = connector
self.lhs = lhs
self.rhs = rhs
def as_sql(self, compiler, connection):
value_params = []
lsql, params = compiler.compile(self.lhs)
value_params.extend(params)
rsql, params = compiler.compile(self.rhs)
value_params.extend(params)
combined_sql = '({} {} {})'.format(lsql, self.connector, rsql)
combined_value = combined_sql % tuple(value_params)
return '%s', [combined_value]

Wyświetl plik

@ -1,151 +0,0 @@
from django.test import TestCase
from wagtail.search.tests.test_backends import BackendTests
from wagtail.tests.search import models
from ..utils import BOOSTS_WEIGHTS, WEIGHTS_VALUES, determine_boosts_weights, get_weight
class TestPostgresSearchBackend(BackendTests, TestCase):
backend_path = 'wagtail.contrib.postgres_search.backend'
def test_weights(self):
self.assertListEqual(BOOSTS_WEIGHTS,
[(10, 'A'), (2, 'B'), (0.5, 'C'), (0.25, 'D')])
self.assertListEqual(WEIGHTS_VALUES, [0.025, 0.05, 0.2, 1.0])
self.assertEqual(get_weight(15), 'A')
self.assertEqual(get_weight(10), 'A')
self.assertEqual(get_weight(9.9), 'B')
self.assertEqual(get_weight(2), 'B')
self.assertEqual(get_weight(1.9), 'C')
self.assertEqual(get_weight(0), 'D')
self.assertEqual(get_weight(-1), 'D')
self.assertListEqual(determine_boosts_weights([1]),
[(1, 'A'), (0, 'B'), (0, 'C'), (0, 'D')])
self.assertListEqual(determine_boosts_weights([-1]),
[(-1, 'A'), (-1, 'B'), (-1, 'C'), (-1, 'D')])
self.assertListEqual(determine_boosts_weights([-1, 1, 2]),
[(2, 'A'), (1, 'B'), (-1, 'C'), (-1, 'D')])
self.assertListEqual(determine_boosts_weights([0, 1, 2, 3]),
[(3, 'A'), (2, 'B'), (1, 'C'), (0, 'D')])
self.assertListEqual(determine_boosts_weights([0, 0.25, 0.75, 1, 1.5]),
[(1.5, 'A'), (1, 'B'), (0.5, 'C'), (0, 'D')])
self.assertListEqual(determine_boosts_weights([0, 1, 2, 3, 4, 5, 6]),
[(6, 'A'), (4, 'B'), (2, 'C'), (0, 'D')])
self.assertListEqual(determine_boosts_weights([-2, -1, 0, 1, 2, 3, 4]),
[(4, 'A'), (2, 'B'), (0, 'C'), (-2, 'D')])
def test_search_tsquery_chars(self):
"""
Checks that tsquery characters are correctly escaped
and do not generate a PostgreSQL syntax error.
"""
# Simple quote should be escaped inside each tsquery term.
results = self.backend.search("L'amour piqué par une abeille",
models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
results = self.backend.search("'starting quote",
models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
results = self.backend.search("ending quote'",
models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
results = self.backend.search("double quo''te",
models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
results = self.backend.search("triple quo'''te",
models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
# Now suffixes.
results = self.backend.search("Something:B", models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
results = self.backend.search("Something:*", models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
results = self.backend.search("Something:A*BCD", models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
# Now the AND operator.
results = self.backend.search("first & second", models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
# Now the OR operator.
results = self.backend.search("first | second", models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
# Now the NOT operator.
results = self.backend.search("first & !second", models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
# Now the phrase operator.
results = self.backend.search("first <-> second", models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
def test_autocomplete_tsquery_chars(self):
"""
Checks that tsquery characters are correctly escaped
and do not generate a PostgreSQL syntax error.
"""
# Simple quote should be escaped inside each tsquery term.
results = self.backend.autocomplete("L'amour piqué par une abeille",
models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
results = self.backend.autocomplete("'starting quote",
models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
results = self.backend.autocomplete("ending quote'",
models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
results = self.backend.autocomplete("double quo''te",
models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
results = self.backend.autocomplete("triple quo'''te",
models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
# Backslashes should be escaped inside each tsquery term.
results = self.backend.autocomplete("backslash\\",
models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
# Now suffixes.
results = self.backend.autocomplete("Something:B", models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
results = self.backend.autocomplete("Something:*", models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
results = self.backend.autocomplete("Something:A*BCD", models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
# Now the AND operator.
results = self.backend.autocomplete("first & second", models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
# Now the OR operator.
results = self.backend.autocomplete("first | second", models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
# Now the NOT operator.
results = self.backend.autocomplete("first & !second", models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
# Now the phrase operator.
results = self.backend.autocomplete("first <-> second", models.Book)
self.assertUnsortedListEqual([r.title for r in results], [])
def test_index_without_upsert(self):
# Test the add_items code path for Postgres 9.4, where upsert is not available
self.backend.reset_index()
index = self.backend.get_index_for_model(models.Book)
index._enable_upsert = False
index.add_items(models.Book, models.Book.objects.all())
results = self.backend.search("JavaScript", models.Book)
self.assertUnsortedListEqual([r.title for r in results], [
"JavaScript: The good parts",
"JavaScript: The Definitive Guide"
])

Wyświetl plik

@ -1,43 +0,0 @@
import unittest
from django.conf import settings
from django.db import connection
from django.test import TestCase
from wagtail.search.backends import get_search_backend
from wagtail.tests.search import models
class TestPostgresStemming(TestCase):
def setUp(self):
backend_name = "wagtail.contrib.postgres_search.backend"
for conf in settings.WAGTAILSEARCH_BACKENDS.values():
if conf['BACKEND'] == backend_name:
break
else:
raise unittest.SkipTest("Only for %s" % backend_name)
self.backend = get_search_backend(backend_name)
def test_ru_stemming(self):
with connection.cursor() as cursor:
cursor.execute(
"SET default_text_search_config TO 'pg_catalog.russian'"
)
ru_book = models.Book.objects.create(
title="Голубое сало", publication_date="1999-05-01",
number_of_pages=352
)
self.backend.add(ru_book)
results = self.backend.search("Голубое", models.Book)
self.assertEqual(list(results), [ru_book])
results = self.backend.search("Голубая", models.Book)
self.assertEqual(list(results), [ru_book])
results = self.backend.search("Голубой", models.Book)
self.assertEqual(list(results), [ru_book])
ru_book.delete()

Wyświetl plik

@ -1,113 +0,0 @@
from itertools import zip_longest
from django.apps import apps
from django.db import connections
from wagtail.search.index import Indexed, RelatedFields, SearchField
def get_postgresql_connections():
return [connection for connection in connections.all()
if connection.vendor == 'postgresql']
def get_descendant_models(model):
"""
Returns all descendants of a model, including the model itself.
"""
descendant_models = {other_model for other_model in apps.get_models()
if issubclass(other_model, model)}
descendant_models.add(model)
return descendant_models
def get_content_type_pk(model):
# We import it locally because this file is loaded before apps are ready.
from django.contrib.contenttypes.models import ContentType
return ContentType.objects.get_for_model(model).pk
def get_ancestors_content_types_pks(model):
"""
Returns content types ids for the ancestors of this model, excluding it.
"""
from django.contrib.contenttypes.models import ContentType
return [ct.pk for ct in
ContentType.objects.get_for_models(*model._meta.get_parent_list())
.values()]
def get_descendants_content_types_pks(model):
"""
Returns content types ids for the descendants of this model, including it.
"""
from django.contrib.contenttypes.models import ContentType
return [ct.pk for ct in
ContentType.objects.get_for_models(*get_descendant_models(model))
.values()]
def get_search_fields(search_fields):
for search_field in search_fields:
if isinstance(search_field, SearchField):
yield search_field
elif isinstance(search_field, RelatedFields):
for sub_field in get_search_fields(search_field.fields):
yield sub_field
WEIGHTS = 'ABCD'
WEIGHTS_COUNT = len(WEIGHTS)
# These are filled when apps are ready.
BOOSTS_WEIGHTS = []
WEIGHTS_VALUES = []
def get_boosts():
boosts = set()
for model in apps.get_models():
if issubclass(model, Indexed):
for search_field in get_search_fields(model.get_search_fields()):
boost = search_field.boost
if boost is not None:
boosts.add(boost)
return boosts
def determine_boosts_weights(boosts=()):
if not boosts:
boosts = get_boosts()
boosts = list(sorted(boosts, reverse=True))
min_boost = boosts[-1]
if len(boosts) <= WEIGHTS_COUNT:
return list(zip_longest(boosts, WEIGHTS, fillvalue=min(min_boost, 0)))
max_boost = boosts[0]
boost_step = (max_boost - min_boost) / (WEIGHTS_COUNT - 1)
return [(max_boost - (i * boost_step), weight)
for i, weight in enumerate(WEIGHTS)]
def set_weights():
BOOSTS_WEIGHTS.extend(determine_boosts_weights())
weights = [w for w, c in BOOSTS_WEIGHTS]
min_weight = min(weights)
if min_weight <= 0:
if min_weight == 0:
min_weight = -0.1
weights = [w - min_weight for w in weights]
max_weight = max(weights)
WEIGHTS_VALUES.extend([w / max_weight
for w in reversed(weights)])
def get_weight(boost):
if boost is None:
return WEIGHTS[-1]
for max_boost, weight in BOOSTS_WEIGHTS:
if boost >= max_boost:
return weight
return weight
def get_sql_weights():
return '{' + ','.join(map(str, WEIGHTS_VALUES)) + '}'

Wyświetl plik

@ -18,8 +18,7 @@ def get_search_backend_config():
# Make sure the default backend is always defined
search_backends.setdefault('default', {
# RemovedInWagtail217Warning - will switch to wagtail.search.backends.database
'BACKEND': 'wagtail.search.backends.db',
'BACKEND': 'wagtail.search.backends.database',
})
return search_backends

Wyświetl plik

@ -14,14 +14,9 @@ from wagtail.search.utils import AND, OR
# This file implements a database search backend using basic substring matching, and no
# database-specific full-text search capabilities. It will be used in the following cases:
# * The current default database is SQLite <3.19, or something other than PostgreSQL, MySQL or
# SQLite
# * The current default database is SQLite <3.19, or SQLite built without fulltext
# extensions, or something other than PostgreSQL, MySQL or SQLite
# * 'wagtail.search.backends.database.fallback' is specified directly as the search backend
# * The deprecated 'wagtail.search.backends.db' backend is active; this is the default when no
# WAGTAILSEARCH_BACKENDS setting is present.
#
# RemovedInWagtail217Warning - the default will be switched to wagtail.search.backends.database
# and wagtail.search.backends.db will be dropped.
MATCH_ALL = "_ALL_"

Wyświetl plik

@ -1,13 +0,0 @@
from warnings import warn
from wagtail.search.backends.database.fallback import ( # noqa
DatabaseSearchBackend, DatabaseSearchQueryCompiler, DatabaseSearchResults, SearchBackend)
from wagtail.utils.deprecation import RemovedInWagtail217Warning
warn(
"The wagtail.search.backends.db search backend is deprecated and has been replaced by "
"wagtail.search.backends.database. "
"See https://docs.wagtail.org/en/stable/releases/2.15.html#database-search-backends-replaced",
category=RemovedInWagtail217Warning
)

Wyświetl plik

@ -196,9 +196,8 @@ else:
WAGTAIL_USER_CUSTOM_FIELDS = ['country', 'attachment']
if os.environ.get('DATABASE_ENGINE') == 'django.db.backends.postgresql':
INSTALLED_APPS += ('wagtail.contrib.postgres_search',)
WAGTAILSEARCH_BACKENDS['postgresql'] = {
'BACKEND': 'wagtail.contrib.postgres_search.backend',
'BACKEND': 'wagtail.search.backends.database',
'AUTO_UPDATE': False,
'SEARCH_CONFIG': 'english'
}