Cache `Site.get_site_root_paths` in the request scope (#3354)

2016-11-29 15:10:54 +00:00 · 2016-11-29 15:10:54 +00:00 · 1f8edc8984
commit 1f8edc8984
--- a/CHANGELOG.txt
+++ b/CHANGELOG.txt
@ -5,6 +5,7 @@ Changelog
 ~~~~~~~~~~~~~~~~~

 * Added a new page explorer menu built with the admin API and React (Karl Hobley, Josh Barr, Thibaud Colas, Janneke Janssen, Rob Moorman, Maurice Bartnig, Jonny Scholes, Matt Westcott, Sævar Öfjörð Magnússon, Eirikur Ingi Magnusson, Harris Lapiroff, Hugo van den Berg, Olly Willans, Andy Babic, Ben Enright, Bertrand Bordage)
+ * Optimised page URL generation by caching site paths in the request scope (Tobias McNulty, Matt Westcott)
 * Fix: Unauthenticated AJAX requests to admin views now return 403 rather than redirecting to the login page (Karl Hobley)
 * Fix: `TableBlock` options `afterChange`, `afterCreateCol`, `afterCreateRow`, `afterRemoveCol`, `afterRemoveRow` and `contextMenu` can now be overridden (Loic Teixeira)

--- a/docs/releases/1.11.rst
+++ b/docs/releases/1.11.rst
@ -44,6 +44,8 @@ This work is the product of ``4`` Wagtail sprints, and the efforts of ``16`` peo
 Other features
 ~~~~~~~~~~~~~~

+ * Optimised page URL generation by caching site paths in the request scope (Tobias McNulty, Matt Westcott)
+

 Bug fixes
 ~~~~~~~~~
@ -59,3 +61,45 @@ Browser requirements for the new explorer menu
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 The new explorer menu does not support IE8, IE9, and IE10. The fallback experience is a link pointing to the explorer pages.
+
+
+Caching of site-level URL information throughout the request cycle
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``get_url_parts`` and ``relative_url`` methods on ``Page`` now accept an optional ``request`` keyword argument.
+Additionally, two new methods have been added, ``get_url`` (analogous to the ``url`` property) and ``get_full_url``
+(analogous to the ``full_url``) property. Whenever possible, these methods should be used instead of the property
+versions, and the request passed to each method. For example:
+
+.. code-block:: python
+
+    page_url = my_page.url
+
+would become:
+
+.. code-block:: python
+
+    page_url = my_page.get_url(request=request)
+
+This enables caching of underlying site-level URL information throughout the request cycle, thereby significantly
+reducing the number of cache or SQL queries your site will generate for a given page load. A common use case for these
+methods is any custom template tag your project may include for generating navigation menus. For more information,
+please refer to :ref:`page_urls`.
+
+Furthermore, if you have overridden ``get_url_parts`` or ``relative_url`` on any of your page models, you will need to
+update the method signature to support this keyword argument; most likely, this will involve changing the line:
+
+.. code-block:: python
+
+    def get_url_parts(self):
+
+to:
+
+.. code-block:: python
+
+    def get_url_parts(self, *args, **kwargs):
+
+and passing those through at the point where you are calling ``get_url_parts`` on ``super`` (if applicable).
+
+See also: :meth:`wagtail.wagtailcore.models.Page.get_url_parts`, :meth:`wagtail.wagtailcore.models.Page.get_url`,
+:meth:`wagtail.wagtailcore.models.Page.get_full_url`, and :meth:`wagtail.wagtailcore.models.Page.relative_url`
--- a/docs/topics/pages.rst
+++ b/docs/topics/pages.rst
@ -192,6 +192,49 @@ By default, any page type can be created under any page type and it is not neces

 Setting ``parent_page_types`` to an empty list is a good way of preventing a particular page type from being created in the editor interface.

+.. _page_urls:
+
+Page URLs
+---------
+
+The most common method of retrieving page URLs is by using the ``{% pageurl %}`` template tag. Since it's called from a template, ``pageurl`` automatically includes the optimizations mentioned below. For more information, see :ref:`pageurl_tag`.
+
+Page models also include several low-level methods for overriding or accessing page URLs.
+
+Customising URL patterns for a page model
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``Page.get_url_parts(request)`` method will not typically be called directly, but may be overriden to define custom URL routing for a given page model. It should return a tuple of ``(site_id, root_url, page_path)``, which are used by ``get_url`` and ``get_full_url`` (see below) to construct the given type of page URL.
+
+When overriding ``get_url_parts()``, you should accept ``*args, **kwargs``:
+
+.. code-block:: python
+
+    def get_url_parts(self, *args, **kwargs):
+
+and pass those through at the point where you are calling ``get_url_parts`` on ``super`` (if applicable), e.g.:
+
+.. code-block:: python
+
+    super(MyPageModel, self).get_url_parts(*args, **kwargs)
+
+While you could pass only the ``request`` keyword argument, passing all arguments as-is ensures compatibility with any
+future changes to these method signatures.
+
+For more information, please see :meth:`wagtail.wagtailcore.models.Page.get_url_parts`.
+
+Obtaining URLs for page instances
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``Page.get_url(request)`` method can be called whenever a page URL is needed. It defaults to returning local URLs (not including the protocol or domain) if it can detect that the page is on current site (via ``request.site``); otherwise, a full URL including the protocol and domain is returned. Whenever possible, the optional ``request`` argument should be included to enable per-request caching of site-level URL information and facilitate the generation of local URLs.
+
+A common use case for ``get_url(request)`` is in any custom template tag your project may include for generating navigation menus. When writing a such a
+custom template tag, ensure it includes ``takes_context=True`` and use ``context.get('request')`` to safely pass the
+request or ``None`` if no request exists in the context.
+
+For more information, please see :meth:`wagtail.wagtailcore.models.Page.get_url`.
+
+In the event a full URL (including the protocol and domain) is needed, ``Page.get_full_url(request)`` can be used instead. Whenever possible, the optional ``request`` argument should be included to enable per-request caching of site-level URL information. For more information, please see :meth:`wagtail.wagtailcore.models.Page.get_full_url`.

 Template rendering
 ==================
--- a/wagtail/wagtailcore/models.py
+++ b/wagtail/wagtailcore/models.py
@ -2,6 +2,7 @@ from __future__ import absolute_import, unicode_literals

 import json
 import logging
+import warnings
 from collections import defaultdict
 from django import VERSION as DJANGO_VERSION

@ -30,12 +31,13 @@ from modelcluster.models import ClusterableModel, get_all_child_relations
 from treebeard.mp_tree import MP_Node

 from wagtail.utils.compat import user_is_authenticated
+from wagtail.utils.deprecation import RemovedInWagtail113Warning
 from wagtail.wagtailcore.query import PageQuerySet, TreeQuerySet
 from wagtail.wagtailcore.signals import page_published, page_unpublished
 from wagtail.wagtailcore.sites import get_site_for_hostname
 from wagtail.wagtailcore.url_routing import RouteResult
 from wagtail.wagtailcore.utils import (
-    WAGTAIL_APPEND_SLASH, camelcase_to_underscore, resolve_model_string)
+    WAGTAIL_APPEND_SLASH, accepts_kwarg, camelcase_to_underscore, resolve_model_string)
 from wagtail.wagtailsearch import index

 logger = logging.getLogger('wagtail.core')
@ -542,6 +544,18 @@ class Page(six.with_metaclass(PageBase, AbstractPage, index.Indexed, Clusterable
                )
            )

+        if not accepts_kwarg(cls.relative_url, 'request'):
+            warnings.warn(
+                "%s.relative_url should accept a 'request' keyword argument. "
+                "See http://docs.wagtail.io/en/v1.11/reference/pages/model_reference.html#wagtail.wagtailcore.models.Page.relative_url" % cls,
+                RemovedInWagtail113Warning)
+
+        if not accepts_kwarg(cls.get_url_parts, 'request'):
+            warnings.warn(
+                "%s.get_url_parts should accept a 'request' keyword argument. "
+                "See http://docs.wagtail.io/en/v1.11/reference/pages/model_reference.html#wagtail.wagtailcore.models.Page.get_url_parts" % cls,
+                RemovedInWagtail113Warning)
+
        return errors

    def _update_descendant_url_paths(self, old_url_path, new_url_path):
@ -730,7 +744,32 @@ class Page(six.with_metaclass(PageBase, AbstractPage, index.Indexed, Clusterable
        """
        return (not self.is_leaf()) or self.depth == 2

-    def get_url_parts(self):
+    def _get_site_root_paths(self, request=None):
+        """
+        Return ``Site.get_site_root_paths()``, using the cached copy on the
+        request object if available.
+        """
+        # if we have a request, use that to cache site_root_paths; otherwise, use self
+        cache_object = request if request else self
+        try:
+            return cache_object._wagtail_cached_site_root_paths
+        except AttributeError:
+            cache_object._wagtail_cached_site_root_paths = Site.get_site_root_paths()
+            return cache_object._wagtail_cached_site_root_paths
+
+    def _safe_get_url_parts(self, request=None):
+        """
+        Backwards-compatibility method to safely call get_url_parts without
+        the new ``request`` kwarg (added in Wagtail 1.11), if needed.
+        """
+        # RemovedInWagtail113Warning - this accepts_kwarg test can be removed when we drop support
+        # for get_url_parts methods which omit the `request` kwarg
+        if accepts_kwarg(self.get_url_parts, 'request'):
+            return self.get_url_parts(request=request)
+        else:
+            return self.get_url_parts()
+
+    def get_url_parts(self, request=None):
        """
        Determine the URL for this page and return it as a tuple of
        ``(site_id, site_root_url, page_url_relative_to_site_root)``.
@ -740,8 +779,14 @@ class Page(six.with_metaclass(PageBase, AbstractPage, index.Indexed, Clusterable
        and ``get_site`` properties and methods; pages with custom URL routing
        should override this method in order to have those operations return
        the custom URLs.
+
+        Accepts an optional keyword argument ``request``, which may be used
+        to avoid repeated database / cache lookups. Typically, a page model
+        that overrides ``get_url_parts`` should not need to deal with
+        ``request`` directly, and should just pass it to the original method
+        when calling ``super``.
        """
-        for (site_id, root_path, root_url) in Site.get_site_root_paths():
+        for (site_id, root_path, root_url) in self._get_site_root_paths(request):
            if self.url_path.startswith(root_path):
                page_path = reverse('wagtail_serve', args=(self.url_path[len(root_path):],))

@ -753,10 +798,9 @@ class Page(six.with_metaclass(PageBase, AbstractPage, index.Indexed, Clusterable

                return (site_id, root_url, page_path)

-    @property
-    def full_url(self):
+    def get_full_url(self, request=None):
        """Return the full URL (including protocol / domain) to this page, or None if it is not routable"""
-        url_parts = self.get_url_parts()
+        url_parts = self._safe_get_url_parts(request=request)

        if url_parts is None:
            # page is not routable
@ -766,8 +810,9 @@ class Page(six.with_metaclass(PageBase, AbstractPage, index.Indexed, Clusterable

        return root_url + page_path

-    @property
-    def url(self):
+    full_url = property(get_full_url)
+
+    def get_url(self, request=None, current_site=None):
        """
        Return the 'most appropriate' URL for referring to this page from the pages we serve,
        within the Wagtail backend and actual website templates;
@ -775,8 +820,18 @@ class Page(six.with_metaclass(PageBase, AbstractPage, index.Indexed, Clusterable
        (i.e. we know that whatever the current page is being served from, this link will be on the
        same domain), and the full URL (with domain) if not.
        Return None if the page is not routable.
+
+        Accepts an optional but recommended ``request`` keyword argument that, if provided, will
+        be used to cache site-level URL information (thereby avoiding repeated database / cache
+        lookups) and, via the ``request.site`` attribute, determine whether a relative or full URL
+        is most appropriate.
        """
-        url_parts = self.get_url_parts()
+        # ``current_site`` is purposefully undocumented, as one can simply pass the request and get
+        # a relative URL based on ``request.site``. Nonetheless, support it here to avoid
+        # copy/pasting the code to the ``relative_url`` method below.
+        if current_site is None and request is not None:
+            current_site = getattr(request, 'site', None)
+        url_parts = self._safe_get_url_parts(request=request)

        if url_parts is None:
            # page is not routable
@ -784,30 +839,25 @@ class Page(six.with_metaclass(PageBase, AbstractPage, index.Indexed, Clusterable

        site_id, root_url, page_path = url_parts

-        if len(Site.get_site_root_paths()) == 1:
-            # we're only running a single site, so a local URL is sufficient
+        if (current_site is not None and site_id == current_site.id) or len(self._get_site_root_paths(request)) == 1:
+            # the site matches OR we're only running a single site, so a local URL is sufficient
            return page_path
        else:
            return root_url + page_path

-    def relative_url(self, current_site):
+    url = property(get_url)
+
+    def relative_url(self, current_site, request=None):
        """
        Return the 'most appropriate' URL for this page taking into account the site we're currently on;
        a local URL if the site matches, or a fully qualified one otherwise.
        Return None if the page is not routable.
+
+        Accepts an optional but recommended ``request`` keyword argument that, if provided, will
+        be used to cache site-level URL information (thereby avoiding repeated database / cache
+        lookups).
        """
-        url_parts = self.get_url_parts()
-
-        if url_parts is None:
-            # page is not routable
-            return
-
-        site_id, root_url, page_path = url_parts
-
-        if site_id == current_site.id:
-            return page_path
-        else:
-            return root_url + page_path
+        return self.get_url(request=request, current_site=current_site)

    def get_site(self):
        """
--- a/wagtail/wagtailcore/templatetags/wagtailcore_tags.py
+++ b/wagtail/wagtailcore/templatetags/wagtailcore_tags.py
@ -8,6 +8,7 @@ from django.utils.safestring import mark_safe
 from wagtail import __version__
 from wagtail.wagtailcore.models import Page
 from wagtail.wagtailcore.rich_text import RichText, expand_db_html
+from wagtail.wagtailcore.utils import accepts_kwarg

 register = template.Library()

@ -24,7 +25,16 @@ def pageurl(context, page):
        # request.site not available in the current context; fall back on page.url
        return page.url

-    return page.relative_url(current_site)
+    # RemovedInWagtail113Warning - this accepts_kwarg test can be removed when we drop support
+    # for relative_url methods which omit the `request` kwarg
+    if accepts_kwarg(page.relative_url, 'request'):
+        # Pass page.relative_url the request object, which may contain a cached copy of
+        # Site.get_site_root_paths()
+        # This avoids page.relative_url having to make a database/cache fetch for this list
+        # each time it's called.
+        return page.relative_url(current_site, request=context.get('request'))
+    else:
+        return page.relative_url(current_site)


@register.simple_tag(takes_context=True)
@ -32,7 +42,10 @@ def slugurl(context, slug):
    """Returns the URL for the page that has the given slug."""
    page = Page.objects.filter(slug=slug).first()

-    if not page:
+    if page:
+        # call pageurl() instead of page.relative_url() here so we get the ``accepts_kwarg`` logic
+        return pageurl(context, page)
+    else:
        return None

    try:
--- a/wagtail/wagtailcore/tests/test_page_model.py
+++ b/wagtail/wagtailcore/tests/test_page_model.py
@ -331,6 +331,34 @@ class TestRouting(TestCase):
        with self.assertRaises(Http404):
            homepage.route(request, ['events', 'tentative-unpublished-event'])

+    # Override CACHES so we don't generate any cache-related SQL queries (tests use DatabaseCache
+    # otherwise) and so cache.get will always return None.
+    @override_settings(CACHES={'default': {'BACKEND': 'django.core.cache.backends.dummy.DummyCache'}})
+    def test_request_scope_site_root_paths_cache(self):
+        homepage = Page.objects.get(url_path='/home/')
+        christmas_page = EventPage.objects.get(url_path='/home/events/christmas/')
+
+        # without a request, get_url should only issue 1 SQL query
+        with self.assertNumQueries(1):
+            self.assertEqual(homepage.get_url(), '/')
+        # subsequent calls with the same page should generate no SQL queries
+        with self.assertNumQueries(0):
+            self.assertEqual(homepage.get_url(), '/')
+        # subsequent calls with a different page will still generate 1 SQL query
+        with self.assertNumQueries(1):
+            self.assertEqual(christmas_page.get_url(), '/events/christmas/')
+
+        # with a request, the first call to get_url should issue 1 SQL query
+        request = HttpRequest()
+        with self.assertNumQueries(1):
+            self.assertEqual(homepage.get_url(request=request), '/')
+        # subsequent calls should issue no SQL queries
+        with self.assertNumQueries(0):
+            self.assertEqual(homepage.get_url(request=request), '/')
+        # even if called on a different page
+        with self.assertNumQueries(0):
+            self.assertEqual(christmas_page.get_url(request=request), '/events/christmas/')
+

 class TestServeView(TestCase):
    fixtures = ['test.json']
--- a/wagtail/wagtailcore/tests/test_utils.py
+++ b/wagtail/wagtailcore/tests/test_utils.py
@ -4,7 +4,7 @@ from __future__ import absolute_import, unicode_literals
 from django.test import TestCase
 from django.utils.text import slugify

-from wagtail.wagtailcore.utils import cautious_slugify
+from wagtail.wagtailcore.utils import accepts_kwarg, cautious_slugify


 class TestCautiousSlugify(TestCase):
@ -36,3 +36,19 @@ class TestCautiousSlugify(TestCase):

        for (original, expected_result) in test_cases:
            self.assertEqual(cautious_slugify(original), expected_result)
+
+
+class TestAcceptsKwarg(TestCase):
+    def test_accepts_kwarg(self):
+        def func_without_banana(apple, orange=42):
+            pass
+
+        def func_with_banana(apple, banana=42):
+            pass
+
+        def func_with_kwargs(apple, **kwargs):
+            pass
+
+        self.assertFalse(accepts_kwarg(func_without_banana, 'banana'))
+        self.assertTrue(accepts_kwarg(func_with_banana, 'banana'))
+        self.assertTrue(accepts_kwarg(func_with_kwargs, 'banana'))
--- a/wagtail/wagtailcore/utils.py
+++ b/wagtail/wagtailcore/utils.py
@ -1,6 +1,8 @@
 from __future__ import absolute_import, unicode_literals

+import inspect
 import re
+import sys
 import unicodedata

 from django.apps import apps
@ -94,3 +96,20 @@ def cautious_slugify(value):
    # mark_safe); this will also strip out the backslashes from the 'backslashreplace'
    # conversion
    return slugify(value)
+
+
+def accepts_kwarg(func, kwarg):
+    """
+    Determine whether the callable `func` has a signature that accepts the keyword argument `kwarg`
+    """
+    if sys.version_info >= (3, 3):
+        signature = inspect.signature(func)
+        try:
+            signature.bind_partial(**{kwarg: None})
+            return True
+        except TypeError:
+            return False
+    else:
+        # Fall back on inspect.getargspec, available on Python 2.7 but deprecated since 3.5
+        argspec = inspect.getargspec(func)
+        return (kwarg in argspec.args) or (argspec.keywords is not None)