# Rich text internals At first glance, Wagtail's rich text capabilities appear to give editors direct control over a block of HTML content. In reality, it's necessary to give editors a representation of rich text content that is several steps removed from the final HTML output, for several reasons: - The editor interface needs to filter out certain kinds of unwanted markup; this includes malicious scripting, font styles pasted from an external word processor, and elements which would break the validity or consistency of the site design (for example, pages will generally reserve the `
` element", since various components of Wagtail - both client and server-side - need to agree on how to handle that feature, including how it should be exposed in the editor interface, how it should be represented within the database, and (if appropriate) how it should be translated when rendered on the front-end. The components involved in Wagtail's rich text handling are described below. ## Data format Rich text data (as handled by [RichTextField](rich-text), and `RichTextBlock` within [StreamField](../topics/streamfield.rst)) is stored in the database in a format that is similar, but not identical, to HTML. For example, a link to a page might be stored as: ```htmlContact us for more information.
``` Here, the `linktype` attribute identifies a rule that shall be used to rewrite the tag. When rendered on a template through the `|richtext` filter (see [rich text filter](rich_text_filter)), this is converted into valid HTML: ```htmlContact us for more information.
``` In the case of `RichTextBlock`, the block's value is a `RichText` object which performs this conversion automatically when rendered as a string, so the `|richtext` filter is not necessary. Likewise, an image inside rich text content might be stored as: ```html ``` which is converted into an `img` element when rendered: ```html ``` Again, the `embedtype` attribute identifies a rule that shall be used to rewrite the tag. All tags other than `` and `` are left unchanged in the converted HTML. A number of additional constraints apply to `` and `` tags, to allow the conversion to be performed efficiently via string replacement: - The tag name and attributes must be lower-case - Attribute values must be quoted with double-quotes - `embed` elements must use XML self-closing tag syntax (i.e. end in `/>` instead of a closing `` tag) - The only HTML entities permitted in attribute values are `<`, `>`, `&` and `"` ## The feature registry Any app within your project can define extensions to Wagtail's rich text handling, such as new `linktype` and `embedtype` rules. An object known as the _feature registry_ serves as a central source of truth about how rich text should behave. This object can be accessed through the [Register Rich Text Features](register_rich_text_features) hook, which is called on startup to gather all definitions relating to rich text: ```python # my_app/wagtail_hooks.py from wagtail import hooks @hooks.register('register_rich_text_features') def register_my_feature(features): # add new definitions to 'features' here ``` (rich_text_rewrite_handlers)= ## Rewrite handlers Rewrite handlers are classes that know how to translate the content of rich text tags like `` and `` into front-end HTML. For example, the `PageLinkHandler` class knows how to convert the rich text tag `` into the HTML tag ``. Rewrite handlers can also provide other useful information about rich text tags. For example, given an appropriate tag, `PageLinkHandler` can be used to extract which page is being referred to. This can be useful for downstream code that may want information about objects being referenced in rich text. You can create custom rewrite handlers to support your own new `linktype` and `embedtype` tags. New handlers must be Python classes that inherit from either `wagtail.richtext.LinkHandler` or `wagtail.richtext.EmbedHandler`. Your new classes should override at least some of the following methods (listed here for `LinkHandler`, although `EmbedHandler` has an identical signature): ```{eval-rst} .. class:: LinkHandler .. attribute:: identifier Required. The ``identifier`` attribute is a string that indicates which rich text tags should be handled by this handler. For example, ``PageLinkHandler.identifier`` is set to the string ``"page"``, indicating that any rich text tags with ```` should be handled by it. .. method:: expand_db_attributes(attrs) Required. The ``expand_db_attributes`` method is expected to take a dictionary of attributes from a database rich text ```` tag (``