Merge branch 'ap-processing-improvements' into 'master'

Content processing improvements. See merge request jaywink/federation!177
2023-09-04 21:38:47 +00:00 · 2023-09-04 21:38:47 +00:00 · add80e0f6c
commit add80e0f6c
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -22,7 +22,7 @@
 * For inbound payload, a cached dict of all the defined AP extensions is merged with each incoming LD context.
 * Better handle conflicting property defaults by having `get_base_attributes` return only attributes that
-  are not empty (or bool). This helps distinguishing between `marshmallow.missing` and empty values.
+  are not empty (or bool). This helps distinguish between `marshmallow.missing` and empty values.
 * JsonLD document caching now set in `activitypub/__init__.py`.
@ -45,6 +45,10 @@
 * In fetch_document: if response.encoding is not set, default to utf-8.
 * Fix process_text_links that would crash on `a` tags with no `href` attribute.
 * Ignore relayed AP retractions.
 ## [0.24.1] - 2023-03-18
 ### Fixed
--- a/docs/protocols.rst
+++ b/docs/protocols.rst
@ -4,9 +4,8 @@ Protocols
 Currently three protocols are being focused on.
 * Diaspora is considered to be stable with most of the protocol implemented.
-* ActivityPub support should be considered as alpha - all the basic
+* ActivityPub support should be considered as beta - all the basic
-  things work but there are likely to be a lot of compatibility issues with other ActivityPub
+  things work and we are fixing incompatibilities as they are identified.
  implementations.
 * Matrix support cannot be considered usable as of yet.
 For example implementations in real life projects check :ref:`example-projects`.
@ -69,20 +68,21 @@ Content media type
 The following keys will be set on the entity based on the ``source`` property existing:
 * if the object has an ``object.source`` property:
-  * ``_media_type`` will be the source media type
+  * ``_media_type`` will be the source media type (only text/markdown is supported).
-  * ``_rendered_content`` will be the object ``content``
+  * ``rendered_content`` will be the object ``content``
  * ``raw_content`` will be the source ``content``
 * if the object has no ``object.source`` property:
  * ``_media_type`` will be ``text/html``
-  * ``_rendered_content`` will be the object ``content``
+  * ``rendered_content`` will be the object ``content``
-  * ``raw_content`` will object ``content`` run through a HTML2Markdown renderer
+  * ``raw_content`` will be empty
 The ``contentMap`` property is processed but content language selection is not implemented yet.
 For outbound entities, ``raw_content`` is expected to be in ``text/markdown``,
-specifically CommonMark. When sending payloads, ``raw_content`` will be rendered via
+specifically CommonMark. The client applications are expected to provide the
-the ``commonmark`` library into ``object.content``. The original ``raw_content``
+rendered content for protocols that require it (e.g. ActivityPub).
-will be added to the ``object.source`` property.
+When sending payloads, ``object.contentMap`` will be set to ``rendered_content``
 and ``raw_content`` will be added to the ``object.source`` property.
 Medias
 ......
@ -98,6 +98,19 @@ support from client applications.
 For inbound entities we do this automatically by not including received image attachments in
 the entity ``_children`` attribute. Audio and video are passed through the client application.
 Hashtags and mentions
 .....................
 For outbound payloads, client applications must add/set the hashtag/mention value to
 the ``class`` attribute of rendered content linkified hashtags/mentions. These will be
 used to help build the corresponding ``Hashtag`` and ``Mention`` objects.
 For inbound payloads, if a markdown source is provided, hashtags/mentions will be extracted
 through the same method used for Diaspora. If only HTML content is provided, the ``a`` tags
 will be marked with a ``data-[hashtag|mention]`` attribute (based on the provided Hashtag/Mention
 objects) to facilitate the ``href`` attribute modifications lient applications might
 wish to make. This should ensure links can be replaced regardless of how the HTML is structured.
 .. _matrix:
 Matrix
--- a/federation/entities/activitypub/django/views.py
+++ b/federation/entities/activitypub/django/views.py
@ -2,7 +2,7 @@ from cryptography.exceptions import InvalidSignature
 from django.http import JsonResponse, HttpResponse, HttpResponseNotFound
 from federation.entities.activitypub.mappers import get_outbound_entity
-from federation.protocols.activitypub.signing import verify_request_signature
+from federation.protocols.activitypub.protocol import Protocol
 from federation.types import RequestType
 from federation.utils.django import get_function_from_config
@ -23,9 +23,11 @@ def get_and_verify_signer(request):
            body=request.body,
            method=request.method,
            headers=request.headers)
    protocol = Protocol(request=req,  get_contact_key=get_public_key)
    try:
-        return verify_request_signature(req)
+        protocol.verify()
-    except ValueError:
+        return protocol.sender
    except (ValueError, KeyError, InvalidSignature) as exc:
        return None
--- a/federation/entities/activitypub/ldcontext.py
+++ b/federation/entities/activitypub/ldcontext.py
@ -113,10 +113,11 @@ class LdContextManager:
        if 'python-federation"' in s:
            ctx = json.loads(s.replace('python-federation', 'python-federation#', 1))
-        # some platforms have http://joinmastodon.com/ns in @context. This
+        # Some platforms have reference invalid json-ld document in @context.
-        # is not a json-ld document.
+        # Remove those.
        for url in ['http://joinmastodon.org/ns', 'http://schema.org']:
            try:
-            ctx.pop(ctx.index('http://joinmastodon.org/ns'))
+                ctx.pop(ctx.index(url))
            except ValueError:
                pass
@ -137,12 +138,17 @@ class LdContextManager:
        # Merge all defined AP extensions to the inbound context
        uris = []
        defs = {}
-        # Merge original context dicts in one dict
+        # Merge original context dicts in one dict, taking into account nested @context
        def parse_context(ctx):
            for item in ctx:
                if isinstance(item, str):
                    uris.append(item)
                else:
                    if '@context' in item:
                        parse_context([item['@context']])
                        item.pop('@context')
                    defs.update(item)
        parse_context(ctx)
        for item in self._merged:
            if isinstance(item, str) and item not in uris:
--- a/federation/entities/activitypub/ldsigning.py
+++ b/federation/entities/activitypub/ldsigning.py
@ -75,8 +75,8 @@ def verify_ld_signature(payload):
    obj_digest = hash(obj)
    digest = (sig_digest + obj_digest).encode('utf-8')
    sig_value = b64decode(signature.get('signatureValue'))
    try:
        sig_value = b64decode(signature.get('signatureValue'))
        verifier.verify(SHA256.new(digest), sig_value)
        logger.debug('ld_signature - %s has a valid signature', payload.get("id"))
        return profile.id
@ -99,6 +99,6 @@ class NormalizedDoubles(jsonld.JsonLdProcessor):
            item['@value'] = math.floor(value)
        obj = super()._object_to_rdf(item, issuer, triples, rdfDirection)
        # This is to address https://github.com/digitalbazaar/pyld/issues/175
-        if obj.get('datatype') == jsonld.XSD_DOUBLE:
+        if obj and obj.get('datatype') == jsonld.XSD_DOUBLE:
            obj['value'] = re.sub(r'(\d)0*E\+?(-)?0*(\d)', r'\1E\2\3', obj['value'])
        return obj
--- a/federation/entities/activitypub/models.py
+++ b/federation/entities/activitypub/models.py
@ -1,12 +1,16 @@
 import copy
 import json
 import logging
 import re
 import traceback
 import uuid
-from datetime import timedelta
+from operator import attrgetter
 from typing import List, Dict, Union
-from urllib.parse import urlparse
+from unicodedata import normalize
 from urllib.parse import unquote, urlparse
 import bleach
 from bs4 import BeautifulSoup
 from calamus import fields
 from calamus.schema import JsonLDAnnotation, JsonLDSchema, JsonLDSchemaOpts
 from calamus.utils import normalize_value
@ -31,10 +35,10 @@ from federation.utils.text import with_slash, validate_handle
 logger = logging.getLogger("federation")
-def get_profile_or_entity(fid):
+def get_profile_or_entity(**kwargs):
-    obj = get_profile(fid=fid)
+    obj = get_profile(**kwargs)
-    if not obj:
+    if not obj and kwargs.get('fid'):
-        obj = retrieve_and_parse_document(fid)
+        obj = retrieve_and_parse_document(kwargs['fid'])
    return obj
@ -57,6 +61,7 @@ as2 = fields.Namespace("https://www.w3.org/ns/activitystreams#")
 dc = fields.Namespace("http://purl.org/dc/terms/")
 diaspora = fields.Namespace("https://diasporafoundation.org/ns/")
 ldp = fields.Namespace("http://www.w3.org/ns/ldp#")
 lemmy = fields.Namespace("https://join-lemmy.org/ns#")
 litepub = fields.Namespace("http://litepub.social/ns#")
 misskey = fields.Namespace("https://misskey-hub.net/ns#")
 ostatus = fields.Namespace("http://ostatus.org#")
@ -241,8 +246,8 @@ class Object(BaseEntity, metaclass=JsonLDAnnotation):
                        metadata={'ctx':[{ 'alsoKnownAs':{'@id':'as:alsoKnownAs','@type':'@id'}}]})
    icon = MixedField(as2.icon, nested='ImageSchema')
    image = MixedField(as2.image, nested='ImageSchema')
-    tag_objects = MixedField(as2.tag, nested=['HashtagSchema','MentionSchema','PropertyValueSchema','EmojiSchema'], many=True)
+    tag_objects = MixedField(as2.tag, nested=['NoteSchema', 'HashtagSchema','MentionSchema','PropertyValueSchema','EmojiSchema'], many=True)
-    attachment = fields.Nested(as2.attachment, nested=['ImageSchema', 'AudioSchema', 'DocumentSchema','PropertyValueSchema','IdentityProofSchema'],
+    attachment = fields.Nested(as2.attachment, nested=['LinkSchema', 'NoteSchema', 'ImageSchema', 'AudioSchema', 'DocumentSchema','PropertyValueSchema','IdentityProofSchema'],
                               many=True, default=[])
    content_map = LanguageMap(as2.content)  # language maps are not implemented in calamus
    context = fields.RawJsonLD(as2.context)
@ -250,7 +255,7 @@ class Object(BaseEntity, metaclass=JsonLDAnnotation):
    generator = MixedField(as2.generator, nested=['ApplicationSchema','ServiceSchema'])
    created_at = fields.DateTime(as2.published, add_value_types=True)
    replies = MixedField(as2.replies, nested=['CollectionSchema','OrderedCollectionSchema'])
-    signature = MixedField(sec.signature, nested = 'SignatureSchema',
+    signature = MixedField(sec.signature, nested = 'RsaSignature2017Schema',
                           metadata={'ctx': [CONTEXT_SECURITY,
                                             {'RsaSignature2017':'sec:RsaSignature2017'}]})
    start_time = fields.DateTime(as2.startTime, add_value_types=True)
@ -333,6 +338,20 @@ class Object(BaseEntity, metaclass=JsonLDAnnotation):
            data['@context'] = context_manager.merge_context(ctx)
            return data
        # JSONLD specs states it is case sensitive.
        # Ensure type names for which we have an implementation have the proper case
        # for platforms that ignore the spec.
        @pre_load
        def patch_types(self, data, **kwargs):
            def walk_payload(payload):
                for key,val in copy.copy(payload).items():
                    if isinstance(val, dict):
                        walk_payload(val)
                    if key == 'type':
                        payload[key] = MODEL_NAMES.get(val.lower(), val)
                return payload
            return walk_payload(data)
        # A node without an id isn't true json-ld, but many payloads have
        # id-less nodes. Since calamus forces random ids on such nodes, 
        # this removes it.
@ -567,7 +586,8 @@ class Person(Object, base.Profile):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
-        self._allowed_children += (PropertyValue, IdentityProof)
+        self._required += ['url']
        self._allowed_children += (Note, PropertyValue, IdentityProof)
    # Set finger to username@host if not provided by the platform
    def post_receive(self):
@ -576,12 +596,15 @@ class Person(Object, base.Profile):
            self.finger = profile.finger
        else:
            domain = urlparse(self.id).netloc
-            finger = f'{self.username.lower()}@{domain}'
+            finger = f'{self.username}@{domain}'
            if get_profile_id_from_webfinger(finger) == self.id:
                self.finger = finger
        # multi-protocol platform
        if self.finger and self.guid is not missing and self.handle is missing:
            self.handle = self.finger
        # Some platforms don't set this property.
        if self.url is missing:
            self.url = self.id
    def to_as2(self):
        self.followers = f'{with_slash(self.id)}followers/'
@ -716,15 +739,19 @@ class Note(Object, RawContentMixin):
    _cached_raw_content = ''
    _cached_children = []
    _soup = None
    signable = True
    def __init__(self, *args, **kwargs):
        self.tag_objects = [] # mutable objects...
        super().__init__(*args, **kwargs)
-        self._allowed_children += (base.Audio, base.Video)
+        self.raw_content  # must be "primed" with source property for inbound payloads
        self.rendered_content # must be "primed" with content_map property for inbound payloads
        self._allowed_children += (base.Audio, base.Video, Link)
        self._required.remove('raw_content')
        self._required += ['rendered_content']
    def to_as2(self):
        self.sensitive = 'nsfw' in self.tags
        self.url = self.id
        edited = False
@ -752,8 +779,8 @@ class Note(Object, RawContentMixin):
    def to_base(self):
        kwargs = get_base_attributes(self, keep=(
-            '_mentions', '_media_type', '_rendered_content', '_source_object',
+            '_mentions', '_media_type', '_source_object',
-            '_cached_children', '_cached_raw_content'))
+            '_cached_children', '_cached_raw_content', '_soup'))
        entity = Comment(**kwargs) if getattr(self, 'target_id') else Post(**kwargs)
        # Plume (and maybe other platforms) send the attrbutedTo field as an array
        if isinstance(entity.actor_id, list): entity.actor_id = entity.actor_id[0]
@ -764,6 +791,7 @@ class Note(Object, RawContentMixin):
    def pre_send(self) -> None:
        """
        Attach any embedded images from raw_content.
        Add Hashtag and Mention objects (the client app must define the class tag/mention property)
        """
        super().pre_send()
        self._children = [
@ -774,133 +802,136 @@ class Note(Object, RawContentMixin):
                ) for image in self.embedded_images
                ]
-        # Add other AP objects
+        # Add Hashtag objects
-        self.extract_mentions()
+        for el in self._soup('a', attrs={'class':'hashtag'}):
-        self.content_map = {'orig': self.rendered_content}
+            self.tag_objects.append(Hashtag(
-        self.add_mention_objects()
+                href = el.attrs['href'],
-        self.add_tag_objects()
+                name = el.text
            ))
            self.tag_objects = sorted(self.tag_objects, key=attrgetter('name'))
            if el.text == '#nsfw': self.sensitive = True
-    def post_receive(self) -> None:
+        # Add Mention objects
-        """
+        mentions = []
-        Make linkified tags normal tags.
+        for el in self._soup('a', attrs={'class':'mention'}):
-        """
+            mentions.append(el.text.lstrip('@'))
        super().post_receive()
        if not self.raw_content or self._media_type == "text/markdown":
            # Skip when markdown
            return
        hrefs = []
        for tag in self.tag_objects:
            if isinstance(tag, Hashtag):
                if tag.href is not missing:
                    hrefs.append(tag.href.lower())
                elif tag.id is not missing:
                    hrefs.append(tag.id.lower())
        # noinspection PyUnusedLocal
        def remove_tag_links(attrs, new=False):
            # Hashtag object hrefs
            href = (None, "href")
            url = attrs.get(href, "").lower()
            if url in hrefs:
                return
            # one more time without the query (for pixelfed)
            parsed = urlparse(url)
            url = f'{parsed.scheme}://{parsed.netloc}{parsed.path}'
            if url in hrefs:
                return
            # Mastodon
            rel = (None, "rel")
            if attrs.get(rel) == "tag":
                return
            # Friendica
            if attrs.get(href, "").endswith(f'tag={attrs.get("_text")}'):
                return
            return attrs
        self.raw_content = bleach.linkify(
            self.raw_content,
            callbacks=[remove_tag_links],
            parse_email=False,
            skip_tags=["code", "pre"],
        )
        if getattr(self, 'target_id'): self.entity_type = 'Comment'
    def add_tag_objects(self) -> None:
        """
        Populate tags to the object.tag list.
        """
        try:
            from federation.utils.django import get_configuration
            config = get_configuration()
        except ImportError:
            tags_path = None
        else:
            if config["tags_path"]:
                tags_path = f"{config['base_url']}{config['tags_path']}"
            else:
                tags_path = None
        for tag in self.tags:
            _tag = Hashtag(name=f'#{tag}')
            if tags_path:
                _tag.href = tags_path.replace(":tag:", tag)
            self.tag_objects.append(_tag)
    def add_mention_objects(self) -> None:
        """
        Populate mentions to the object.tag list.
        """
        if len(self._mentions):
            mentions = list(self._mentions)
        mentions.sort()
        for mention in mentions:
            if validate_handle(mention):
-                    profile = get_profile(finger=mention)
+                profile = get_profile(finger__iexact=mention)
                # only add AP profiles mentions
                if getattr(profile, 'id', None):
                    self.tag_objects.append(Mention(href=profile.id, name='@'+mention))
                    # some platforms only render diaspora style markdown if it is available
                    self.source['content'] = self.source['content'].replace(mention, '{' + mention + '}')
    def post_receive(self) -> None:
        """
        Mark linkified tags and mentions with a data-{mention, tag} attribute.
        """
        super().post_receive()
        if self._media_type == "text/markdown":
            # Skip when markdown
            return
        self._find_and_mark_hashtags()
        self._find_and_mark_mentions()
        if getattr(self, 'target_id'): self.entity_type = 'Comment'
    def _find_and_mark_hashtags(self):
        hrefs = set()
        for tag in self.tag_objects:
            if isinstance(tag, Hashtag):
                if tag.href is not missing:
                    hrefs.add(tag.href.lower())
                # Some platforms use id instead of href...
                elif tag.id is not missing:
                    hrefs.add(tag.id.lower())
        for link in self._soup.find_all('a', href=True):
            parsed = urlparse(unquote(link['href']).lower())
            # remove the query part and trailing garbage, if any
            path = parsed.path
            trunc = re.match(r'(/[\w/\-]+)', parsed.path)
            if trunc:
                path = trunc.group()
            url = f'{parsed.scheme}://{parsed.netloc}{path}'
            # convert accented characters to their ascii equivalent
            normalized_path = normalize('NFD', path).encode('ascii', 'ignore')
            normalized_url = f'{parsed.scheme}://{parsed.netloc}{normalized_path.decode()}'
            links = {link['href'].lower(), unquote(link['href']).lower(), url, normalized_url}
            if links.intersection(hrefs):
                tag = re.match(r'^#?([\w\-]+$)', link.text)
                if tag:
                    link['data-hashtag'] = tag.group(1).lower()
    def _find_and_mark_mentions(self):
        mentions = [mention for mention in self.tag_objects if isinstance(mention, Mention)]
        # There seems to be consensus on using the profile url for
        # the link and the profile id for the Mention object href property,
        # but some platforms will set mention.href to the profile url, so
        # we check both.
        for mention in mentions:
            hrefs = []
            profile = get_profile_or_entity(fid=mention.href, remote_url=mention.href)
            if profile and not profile.url:
                # This should be removed when we are confident that the remote_url property
                # has been populated for most profiles on the client app side.
                profile = retrieve_and_parse_profile(profile.id)
            if profile:
                hrefs.extend([profile.id, profile.url])
            for href in hrefs:
                links = self._soup.find_all(href=href)
                for link in links:
                    link['data-mention'] = profile.finger
                    self._mentions.add(profile.finger)
    def extract_mentions(self):
        """
-        Extract mentions from the source object.
+        Attempt to extract mentions from raw_content if available
        """
        super().extract_mentions()
-        if getattr(self, 'tag_objects', None):
+        if self.raw_content:
-            #tag_objects = self.tag_objects if isinstance(self.tag_objects, list) else [self.tag_objects]
+            super().extract_mentions()
-            for tag in self.tag_objects:
+        return
                if isinstance(tag, Mention):
                    profile = get_profile_or_entity(fid=tag.href)
                    handle = getattr(profile, 'finger', None)
                    if handle: self._mentions.add(handle)
    @property
-    def raw_content(self):
+    def rendered_content(self):
-
+        if self._soup: return str(self._soup)
-        if self._cached_raw_content: return self._cached_raw_content
+        content = ''
        if self.content_map:
            orig = self.content_map.pop('orig')
            if len(self.content_map.keys()) > 1:
                logger.warning('Language selection not implemented, falling back to default')
-                self._rendered_content = orig.strip()
+                content = orig.strip()
            else:
-                self._rendered_content = orig.strip() if len(self.content_map.keys()) == 0 else next(iter(self.content_map.values())).strip()
+                content = orig.strip() if len(self.content_map.keys()) == 0 else next(iter(self.content_map.values())).strip()
            self.content_map['orig'] = orig
        # to allow for posts/replies with medias only.
        if not content: content = "<div></div>"
        self._soup = BeautifulSoup(content, 'html.parser')
        return str(self._soup)
    @rendered_content.setter
    def rendered_content(self, value):
        if not value: return
        self._soup = BeautifulSoup(value, 'html.parser')
        self.content_map = {'orig': value}
    @property
    def raw_content(self):
        if self._cached_raw_content: return self._cached_raw_content
        if isinstance(self.source, dict) and self.source.get('mediaType') == 'text/markdown':
            self._media_type = self.source['mediaType']
            self._cached_raw_content = self.source.get('content').strip()
        else:
            self._media_type = 'text/html'
-                self._cached_raw_content = self._rendered_content
+            self._cached_raw_content = ""
            # to allow for posts/replies with medias only.
            if not self._cached_raw_content: self._cached_raw_content = "<div></div>"
        return self._cached_raw_content
    @raw_content.setter
@ -917,12 +948,13 @@ class Note(Object, RawContentMixin):
        if isinstance(getattr(self, 'attachment', None), list):
            children = []
            for child in self.attachment:
-                if isinstance(child, Document):
+                if isinstance(child, (Document, Link)):
-                    obj = child.to_base()
+                    if hasattr(child, 'to_base'):
-                    if isinstance(obj, Image):
+                        child = child.to_base()
-                        if obj.inline or (obj.image and obj.image in self.raw_content):
+                    if isinstance(child, Image):
                        if child.inline or (child.image and child.image in self.raw_content):
                            continue
-                    children.append(obj)
+                    children.append(child)
            self._cached_children = children
        return self._cached_children
@ -1010,7 +1042,7 @@ class Video(Document, base.Video):
                self.actor_id = new_act[0]
            entity = Post(**get_base_attributes(self,
-                keep=('_mentions', '_media_type', '_rendered_content',
+                keep=('_mentions', '_media_type', '_soup',
                      '_cached_children', '_cached_raw_content', '_source_object')))
            set_public(entity)
            return entity
@ -1019,7 +1051,7 @@ class Video(Document, base.Video):
            return self
-class Signature(Object):
+class RsaSignature2017(Object):
    created = fields.DateTime(dc.created, add_value_types=True)
    creator = IRI(dc.creator)
    key = fields.String(sec.signatureValue)
@ -1174,6 +1206,7 @@ class Retraction(Announce, base.Retraction):
 class Tombstone(Object, base.Retraction):
    target_id = fields.Id()
    signable = True
    def to_as2(self):
        if not isinstance(self.activity, type): return None
@ -1294,7 +1327,7 @@ def extract_receivers(entity):
    profile = None
    # don't care about receivers for payloads without an actor_id    
    if getattr(entity, 'actor_id'):
-        profile = get_profile_or_entity(entity.actor_id)
+        profile = get_profile_or_entity(fid=entity.actor_id)
    if not isinstance(profile, base.Profile):
        return receivers
@ -1314,14 +1347,16 @@ def extract_and_validate(entity):
    entity._source_protocol = "activitypub"
    # Extract receivers
    entity._receivers = extract_receivers(entity)
    # Extract mentions
    if hasattr(entity, "extract_mentions"):
        entity.extract_mentions()
    if hasattr(entity, "post_receive"):
        entity.post_receive()
    if hasattr(entity, 'validate'): entity.validate()
    # Extract mentions
    if hasattr(entity, "extract_mentions"):
        entity.extract_mentions()
 def extract_replies(replies):
@ -1373,6 +1408,9 @@ def element_to_objects(element: Union[Dict, Object], sender: str = "") -> List:
            logger.error("Failed to validate entity %s: %s", entity, ex)
            return []
        except InvalidSignature as exc:
            if isinstance(entity, base.Retraction):
                logger.warning('Relayed retraction on %s, ignoring', entity.target_id)
                return []
            logger.info('%s, fetching from remote', exc)
            entity = retrieve_and_parse_document(entity.id)
            if not entity:
@ -1396,6 +1434,7 @@ def model_to_objects(payload):
            entity = model.schema().load(payload)
        except (KeyError, jsonld.JsonLdError, exceptions.ValidationError) as exc :  # Just give up for now. This must be made robust
            logger.error("Error parsing jsonld payload (%s)", exc)
            traceback.print_exception(exc)
            return None
        if isinstance(getattr(entity, 'object_', None), Object):
@ -1417,3 +1456,9 @@ CLASSES_WITH_CONTEXT_EXTENSIONS = (
    PropertyValue
 )
 context_manager = LdContextManager(CLASSES_WITH_CONTEXT_EXTENSIONS)
 MODEL_NAMES = {}
 for key,val in copy.copy(globals()).items():
    if type(val) == JsonLDAnnotation and issubclass(val, (Object, Link)):
        MODEL_NAMES[key.lower()] = key
--- a/federation/entities/mixins.py
+++ b/federation/entities/mixins.py
@ -4,12 +4,13 @@ import re
 import warnings
 from typing import List, Set, Union, Dict, Tuple
 from bs4 import BeautifulSoup
 from commonmark import commonmark
 from marshmallow import missing
 from federation.entities.activitypub.enums import ActivityType
 from federation.entities.utils import get_name_for_profile, get_profile
-from federation.utils.text import process_text_links, find_tags
+from federation.utils.text import find_elements, find_tags, MENTION_PATTERN
 class BaseEntity:
@ -22,6 +23,7 @@ class BaseEntity:
    _source_object: Union[str, Dict] = None
    _sender: str = ""
    _sender_key: str = ""
    _tags: Set = None
    # ActivityType
    activity: ActivityType = None
    activity_id: str = ""
@ -205,7 +207,7 @@ class CreatedAtMixin(BaseEntity):
 class RawContentMixin(BaseEntity):
    _media_type: str = "text/markdown"
    _mentions: Set = None
-    _rendered_content: str = ""
+    rendered_content: str = ""
    raw_content: str = ""
    def __init__(self, *args, **kwargs):
@ -231,59 +233,22 @@ class RawContentMixin(BaseEntity):
            images.append((groups[1], groups[0] or ""))
        return images
-    @property
+    # Legacy. Keep this until tests are reworked
    def rendered_content(self) -> str:
        """Returns the rendered version of raw_content, or just raw_content."""
        try:
            from federation.utils.django import get_configuration
            config = get_configuration()
            if config["tags_path"]:
                def linkifier(tag: str) -> str:
                    return f'<a class="mention hashtag" ' \
                           f' href="{config["base_url"]}{config["tags_path"].replace(":tag:", tag.lower())}" ' \
                           f'rel="noopener noreferrer">' \
                           f'#<span>{tag}</span></a>'
            else:
                linkifier = None
        except ImportError:
            linkifier = None
        if self._rendered_content:
            return self._rendered_content
        elif self._media_type == "text/markdown" and self.raw_content:
            # Do tags
            _tags, rendered = find_tags(self.raw_content, replacer=linkifier)
            # Render markdown to HTML
            rendered = commonmark(rendered).strip()
            # Do mentions
            if self._mentions:
                for mention in self._mentions:
                    # Diaspora mentions are linkified as mailto
                    profile = get_profile(finger=mention)
                    href = 'mailto:'+mention if not getattr(profile, 'id', None) else profile.id
                    rendered = rendered.replace(
                        "@%s" % mention,
                        f'@<a class="h-card" href="{href}"><span>{mention}</span></a>',
                    )
            # Finally linkify remaining URL's that are not links
            rendered = process_text_links(rendered)
            return rendered
        return self.raw_content
    @property
    def tags(self) -> List[str]:
        """Returns a `list` of unique tags contained in `raw_content`."""
        if not self.raw_content:
            return []
-        tags, _text = find_tags(self.raw_content)
+        return sorted(find_tags(self.raw_content))
        return sorted(tags)
    def extract_mentions(self):
-        if self._media_type != 'text/markdown': return
+        if not self.raw_content:
        matches = re.findall(r'@{?[\S ]?[^{}@]+[@;]?\s*[\w\-./@]+[\w/]+}?', self.raw_content)
        if not matches:
            return
-        for mention in matches:
+        mentions = find_elements(
            BeautifulSoup(
                commonmark(self.raw_content, ignore_html_blocks=True), 'html.parser'),
            MENTION_PATTERN)
        for ns in mentions:
            mention = ns.text
            handle = None
            splits = mention.split(";")
            if len(splits) == 1:
@ -297,6 +262,7 @@ class RawContentMixin(BaseEntity):
 class OptionalRawContentMixin(RawContentMixin):
    """A version of the RawContentMixin where `raw_content` is not required."""
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._required.remove("raw_content")
--- a/federation/protocols/activitypub/protocol.py
+++ b/federation/protocols/activitypub/protocol.py
@ -49,6 +49,11 @@ class Protocol:
    sender = None
    user = None
    def __init__(self, request=None, get_contact_key=None):
        # this is required for calls to verify on GET requests
        self.request = request
        self.get_contact_key = get_contact_key
    def build_send(self, entity: BaseEntity, from_user: UserType, to_user_key: RsaKey = None) -> Union[str, Dict]:
        """
        Build POST data for sending out to remotes.
@ -112,7 +117,7 @@ class Protocol:
        self.sender = signer.id if signer else self.actor
        key = getattr(signer, 'public_key', None)
        if not key:
-            key = self.get_contact_key(self.actor) if self.get_contact_key else ''
+            key = self.get_contact_key(self.actor) if self.get_contact_key and self.actor else ''
            if key:
                # fallback to the author's key the client app may have provided
                logger.warning("Failed to retrieve keyId for %s, trying the actor's key", sig.get('keyId'))
--- a/federation/tests/entities/activitypub/test_entities.py
+++ b/federation/tests/entities/activitypub/test_entities.py
@ -1,3 +1,4 @@
 import commonmark
 import pytest
 from unittest.mock import patch
 from pprint import pprint
@ -63,8 +64,12 @@ class TestEntitiesConvertToAS2:
            'published': '2019-04-27T00:00:00',
        }
    # Now handled by the client app
    @pytest.mark.skip
    def test_comment_to_as2__url_in_raw_content(self, activitypubcomment):
        activitypubcomment.raw_content = 'raw_content http://example.com'
        activitypubcomment.rendered_content = process_text_links(
            commonmark.commonmark(activitypubcomment.raw_content).strip())
        activitypubcomment.pre_send()
        result = activitypubcomment.to_as2()
        assert result == {
@ -118,6 +123,7 @@ class TestEntitiesConvertToAS2:
        }
    def test_post_to_as2(self, activitypubpost):
        activitypubpost.rendered_content = commonmark.commonmark(activitypubpost.raw_content).strip()
        activitypubpost.pre_send()
        result = activitypubpost.to_as2()
        assert result == {
@ -191,6 +197,15 @@ class TestEntitiesConvertToAS2:
        }
    def test_post_to_as2__with_tags(self, activitypubpost_tags):
        activitypubpost_tags.rendered_content = '<h1>raw_content</h1>\n' \
            '<p><a class="hashtag" ' \
            'href="https://example.com/tag/foobar/" rel="noopener ' \
            'noreferrer nofollow" ' \
            'target="_blank">#<span>foobar</span></a>\n' \
            '<a class="hashtag" ' \
            'href="https://example.com/tag/barfoo/" rel="noopener ' \
            'noreferrer nofollow" ' \
            'target="_blank">#<span>barfoo</span></a></p>'
        activitypubpost_tags.pre_send()
        result = activitypubpost_tags.to_as2()
        assert result == {
@ -204,11 +219,11 @@ class TestEntitiesConvertToAS2:
                'url': 'http://127.0.0.1:8000/post/123456/',
                'attributedTo': 'http://127.0.0.1:8000/profile/123456/',
                'content': '<h1>raw_content</h1>\n'
-                           '<p><a class="mention hashtag" '
+                           '<p><a class="hashtag" '
                           'href="https://example.com/tag/foobar/" rel="noopener '
                           'noreferrer nofollow" '
                           'target="_blank">#<span>foobar</span></a>\n'
-                           '<a class="mention hashtag" '
+                           '<a class="hashtag" '
                           'href="https://example.com/tag/barfoo/" rel="noopener '
                           'noreferrer nofollow" '
                           'target="_blank">#<span>barfoo</span></a></p>',
@ -235,6 +250,7 @@ class TestEntitiesConvertToAS2:
        }
    def test_post_to_as2__with_images(self, activitypubpost_images):
        activitypubpost_images.rendered_content = '<p>raw_content</p>'
        activitypubpost_images.pre_send()
        result = activitypubpost_images.to_as2()
        assert result == {
@ -274,6 +290,7 @@ class TestEntitiesConvertToAS2:
        }
    def test_post_to_as2__with_diaspora_guid(self, activitypubpost_diaspora_guid):
        activitypubpost_diaspora_guid.rendered_content = '<p>raw_content</p>'
        activitypubpost_diaspora_guid.pre_send()
        result = activitypubpost_diaspora_guid.to_as2()
        assert result == {
@ -418,17 +435,6 @@ class TestEntitiesPostReceive:
            "public": False,
        }]
    @patch("federation.entities.activitypub.models.bleach.linkify", autospec=True)
    def test_post_post_receive__linkifies_if_not_markdown(self, mock_linkify, activitypubpost):
        activitypubpost._media_type = 'text/html'
        activitypubpost.post_receive()
        mock_linkify.assert_called_once()
    @patch("federation.entities.activitypub.models.bleach.linkify", autospec=True)
    def test_post_post_receive__skips_linkify_if_markdown(self, mock_linkify, activitypubpost):
        activitypubpost.post_receive()
        mock_linkify.assert_not_called()
 class TestEntitiesPreSend:
    def test_post_inline_images_are_attached(self, activitypubpost_embedded_images):
--- a/federation/tests/entities/activitypub/test_mappers.py
+++ b/federation/tests/entities/activitypub/test_mappers.py
@ -4,6 +4,9 @@ from unittest.mock import patch, Mock, DEFAULT
 import json
 import pytest
 from federation.entities.activitypub.models import Person
 #from federation.entities.activitypub.entities import (
 #    models.Follow, models.Accept, models.Person, models.Note, models.Note,
 #    models.Delete, models.Announce)
@ -70,9 +73,7 @@ class TestActivitypubEntityMappersReceive:
        post = entities[0]
        assert isinstance(post, models.Note)
        assert isinstance(post, Post)
-        assert post.raw_content == '<p><span class="h-card"><a class="u-url mention" ' \
+        assert post.raw_content == ''
                                   'href="https://dev.jasonrobinson.me/u/jaywink/">' \
                                   '@<span>jaywink</span></a></span> boom</p>'
        assert post.rendered_content == '<p><span class="h-card"><a class="u-url mention" href="https://dev.jasonrobinson.me/u/jaywink/">' \
                                        '@<span>jaywink</span></a></span> boom</p>'
        assert post.id == "https://diaspodon.fr/users/jaywink/statuses/102356911717767237"
@ -87,40 +88,44 @@ class TestActivitypubEntityMappersReceive:
        post = entities[0]
        assert isinstance(post, models.Note)
        assert isinstance(post, Post)
-        assert post.raw_content == '<p>boom #test</p>'
+        assert post.raw_content == ''
        assert post.rendered_content == '<p>boom <a class="mention hashtag" data-hashtag="test" href="https://mastodon.social/tags/test" rel="tag">#<span>test</span></a></p>'
-    # TODO: fix this test
+    @patch("federation.entities.activitypub.models.get_profile_or_entity",
-    @pytest.mark.skip
+           return_value=Person(finger="jaywink@dev3.jasonrobinson.me",url="https://dev3.jasonrobinson.me/u/jaywink/"))
-    def test_message_to_objects_simple_post__with_mentions(self):
+    def test_message_to_objects_simple_post__with_mentions(self, mock_get):
        entities = message_to_objects(ACTIVITYPUB_POST_WITH_MENTIONS, "https://mastodon.social/users/jaywink")
        assert len(entities) == 1
        post = entities[0]
        assert isinstance(post, models.Note)
        assert isinstance(post, Post)
        assert len(post._mentions) == 1
-        assert list(post._mentions)[0] == "https://dev3.jasonrobinson.me/u/jaywink/"
+        assert list(post._mentions)[0] == "jaywink@dev3.jasonrobinson.me"
-    def test_message_to_objects_simple_post__with_source__bbcode(self):
+
    @patch("federation.entities.activitypub.models.get_profile_or_entity",
           return_value=Person(finger="jaywink@dev.jasonrobinson.me",url="https://dev.jasonrobinson.me/u/jaywink/"))
    def test_message_to_objects_simple_post__with_source__bbcode(self, mock_get):
        entities = message_to_objects(ACTIVITYPUB_POST_WITH_SOURCE_BBCODE, "https://diaspodon.fr/users/jaywink")
        assert len(entities) == 1
        post = entities[0]
        assert isinstance(post, models.Note)
        assert isinstance(post, Post)
-        assert post.rendered_content == '<p><span class="h-card"><a class="u-url mention" href="https://dev.jasonrobinson.me/u/jaywink/">' \
+        assert post.rendered_content == '<p><span class="h-card"><a class="u-url mention" data-mention="jaywink@dev.jasonrobinson.me" href="https://dev.jasonrobinson.me/u/jaywink/">' \
                                        '@<span>jaywink</span></a></span> boom</p>'
        assert post.raw_content == '<p><span class="h-card"><a class="u-url mention" ' \
                                   'href="https://dev.jasonrobinson.me/u/jaywink/">' \
                                        '@<span>jaywink</span></a></span> boom</p>'
        assert post.raw_content == ''
-    def test_message_to_objects_simple_post__with_source__markdown(self):
+    @patch("federation.entities.activitypub.models.get_profile_or_entity",
           return_value=Person(finger="jaywink@dev.jasonrobinson.me",url="https://dev.robinson.me/u/jaywink/"))
    def test_message_to_objects_simple_post__with_source__markdown(self, mock_get):
        entities = message_to_objects(ACTIVITYPUB_POST_WITH_SOURCE_MARKDOWN, "https://diaspodon.fr/users/jaywink")
        assert len(entities) == 1
        post = entities[0]
        assert isinstance(post, models.Note)
        assert isinstance(post, Post)
-        assert post.rendered_content == '<p><span class="h-card"><a href="https://dev.jasonrobinson.me/u/jaywink/" ' \
+        assert post.rendered_content == '<p><span class="h-card"><a class="u-url mention" ' \
-                                        'class="u-url mention">@<span>jaywink</span></a></span> boom</p>'
+                                        'href="https://dev.jasonrobinson.me/u/jaywink/">@<span>jaywink</span></a></span> boom</p>'
-        assert post.raw_content == "@jaywink boom"
+        assert post.raw_content == "@jaywink@dev.jasonrobinson.me boom"
        assert post.id == "https://diaspodon.fr/users/jaywink/statuses/102356911717767237"
        assert post.actor_id == "https://diaspodon.fr/users/jaywink"
        assert post.public is True
@ -145,15 +150,18 @@ class TestActivitypubEntityMappersReceive:
        assert photo.guid == ""
        assert photo.handle == ""
-    def test_message_to_objects_comment(self):
+    @patch("federation.entities.activitypub.models.get_profile_or_entity",
           return_value=Person(finger="jaywink@dev.jasonrobinson.me", url="https://dev.jasonrobinson.me/u/jaywink/"))
    def test_message_to_objects_comment(self, mock_get):
        entities = message_to_objects(ACTIVITYPUB_COMMENT, "https://diaspodon.fr/users/jaywink")
        assert len(entities) == 1
        comment = entities[0]
        assert isinstance(comment, models.Note)
        assert isinstance(comment, Comment)
-        assert comment.raw_content == '<p><span class="h-card"><a class="u-url mention" ' \
+        assert comment.rendered_content == '<p><span class="h-card"><a class="u-url mention" data-mention="jaywink@dev.jasonrobinson.me" ' \
                                      'href="https://dev.jasonrobinson.me/u/jaywink/">' \
                                      '@<span>jaywink</span></a></span> boom</p>'
        assert comment.raw_content == ''
        assert comment.id == "https://diaspodon.fr/users/jaywink/statuses/102356911717767237"
        assert comment.actor_id == "https://diaspodon.fr/users/jaywink"
        assert comment.target_id == "https://dev.jasonrobinson.me/content/653bad70-41b3-42c9-89cb-c4ee587e68e4/"
--- a/federation/tests/entities/test_base.py
+++ b/federation/tests/entities/test_base.py
@ -123,6 +123,7 @@ class TestShareEntity:
 class TestRawContentMixin:
    @pytest.mark.skip
    def test_rendered_content(self, post):
        assert post.rendered_content == """<p>One more test before sleep 😅 This time with an image.</p>
 <p><img src="https://jasonrobinson.me/media/uploads/2020/12/27/1b2326c6-554c-4448-9da3-bdacddf2bb77.jpeg" alt=""></p>"""
--- a/federation/tests/fixtures/entities.py
+++ b/federation/tests/fixtures/entities.py
@ -30,6 +30,7 @@ def activitypubcomment():
    with freeze_time("2019-04-27"):
        obj = models.Comment(
            raw_content="raw_content",
            rendered_content="<p>raw_content</p>",
            public=True,
            provider_display_name="Socialhome",
            id=f"http://127.0.0.1:8000/post/123456/",
@ -255,7 +256,8 @@ def profile():
        inboxes={
            "private": "https://example.com/bob/private",
            "public": "https://example.com/public",
-        }, public_key=PUBKEY, to=["https://www.w3.org/ns/activitystreams#Public"]
+        }, public_key=PUBKEY, to=["https://www.w3.org/ns/activitystreams#Public"],
        url="https://example.com/alice"
    )
--- a/federation/tests/fixtures/payloads/activitypub.py
+++ b/federation/tests/fixtures/payloads/activitypub.py
@ -35,7 +35,7 @@ ACTIVITYPUB_COMMENT = {
  'contentMap': {'en': '<p><span class="h-card"><a class="u-url mention" href="https://dev.jasonrobinson.me/u/jaywink/">@<span>jaywink</span></a></span> boom</p>'},
  'attachment': [],
  'tag': [{'type': 'Mention',
-    'href': 'https://dev.jasonrobinson.me/p/d4574854-a5d7-42be-bfac-f70c16fcaa97/',
+    'href': 'https://dev.jasonrobinson.me/u/jaywink/',
    'name': '@jaywink@dev.jasonrobinson.me'}],
  'replies': {'id': 'https://diaspodon.fr/users/jaywink/statuses/102356911717767237/replies',
   'type': 'Collection',
@ -459,9 +459,9 @@ ACTIVITYPUB_POST_WITH_TAGS = {
  'conversation': 'tag:diaspodon.fr,2019-06-28:objectId=2347687:objectType=Conversation',
  'content': '<p>boom <a href="https://mastodon.social/tags/test" class="mention hashtag" rel="tag">#<span>test</span></a></p>',
  'attachment': [],
-  'tag': [{'type': 'Mention',
+  'tag': [{'type': 'Hashtag',
-    'href': 'https://dev.jasonrobinson.me/p/d4574854-a5d7-42be-bfac-f70c16fcaa97/',
+    'href': 'https://mastodon.social/tags/test',
-    'name': '@jaywink@dev.jasonrobinson.me'}],
+    'name': '#test'}],
  'replies': {'id': 'https://diaspodon.fr/users/jaywink/statuses/102356911717767237/replies',
   'type': 'Collection',
   'first': {'type': 'CollectionPage',
@ -552,13 +552,13 @@ ACTIVITYPUB_POST_WITH_SOURCE_MARKDOWN = {
  'conversation': 'tag:diaspodon.fr,2019-06-28:objectId=2347687:objectType=Conversation',
  'content': '<p><span class="h-card"><a href="https://dev.jasonrobinson.me/u/jaywink/" class="u-url mention">@<span>jaywink</span></a></span> boom</p>',
  'source': {
-      'content': "@jaywink boom",
+      'content': "@{jaywink@dev.jasonrobinson.me} boom",
      'mediaType': "text/markdown",
  },
  'contentMap': {'en': '<p><span class="h-card"><a href="https://dev.jasonrobinson.me/u/jaywink/" class="u-url mention">@<span>jaywink</span></a></span> boom</p>'},
  'attachment': [],
  'tag': [{'type': 'Mention',
-    'href': 'https://dev.jasonrobinson.me/p/d4574854-a5d7-42be-bfac-f70c16fcaa97/',
+    'href': 'https://dev.jasonrobinson.me/u/jaywink/',
    'name': '@jaywink@dev.jasonrobinson.me'}],
  'replies': {'id': 'https://diaspodon.fr/users/jaywink/statuses/102356911717767237/replies',
   'type': 'Collection',
@ -612,7 +612,7 @@ ACTIVITYPUB_POST_WITH_SOURCE_BBCODE = {
  'contentMap': {'en': '<p><span class="h-card"><a class="u-url mention" href="https://dev.jasonrobinson.me/u/jaywink/">@<span>jaywink</span></a></span> boom</p>'},
  'attachment': [],
  'tag': [{'type': 'Mention',
-    'href': 'https://dev.jasonrobinson.me/p/d4574854-a5d7-42be-bfac-f70c16fcaa97/',
+    'href': 'https://dev.jasonrobinson.me/u/jaywink/',
    'name': '@jaywink@dev.jasonrobinson.me'}],
  'replies': {'id': 'https://diaspodon.fr/users/jaywink/statuses/102356911717767237/replies',
   'type': 'Collection',
--- a/federation/tests/utils/test_activitypub.py
+++ b/federation/tests/utils/test_activitypub.py
@ -60,7 +60,7 @@ class TestRetrieveAndParseDocument:
        entity = retrieve_and_parse_document("https://example.com/foobar")
        assert isinstance(entity, Follow)
-    @patch("federation.entities.activitypub.models.extract_receivers", return_value=[])
+    @patch("federation.entities.activitypub.models.get_profile_or_entity", return_value=None)
    @patch("federation.utils.activitypub.fetch_document", autospec=True, return_value=(
            json.dumps(ACTIVITYPUB_POST_OBJECT), None, None),
    )
@ -80,7 +80,7 @@ class TestRetrieveAndParseDocument:
                                          "/foobar.jpg"
    @patch("federation.entities.activitypub.models.verify_ld_signature", return_value=None)
-    @patch("federation.entities.activitypub.models.extract_receivers", return_value=[])
+    @patch("federation.entities.activitypub.models.get_profile_or_entity", return_value=None)
    @patch("federation.utils.activitypub.fetch_document", autospec=True, return_value=(
        json.dumps(ACTIVITYPUB_POST), None, None),
    )
--- a/federation/tests/utils/test_text.py
+++ b/federation/tests/utils/test_text.py
@ -1,4 +1,6 @@
-from federation.utils.text import decode_if_bytes, encode_if_text, validate_handle, process_text_links, find_tags
+import pytest
 from federation.utils.text import decode_if_bytes, encode_if_text, validate_handle, find_tags
 def test_decode_if_bytes():
@ -18,107 +20,49 @@ class TestFindTags:
    def test_all_tags_are_parsed_from_text(self):
        source = "#starting and #MixED with some #line\nendings also tags can\n#start on new line"
-        tags, text = find_tags(source)
+        tags = find_tags(source)
        assert tags == {"starting", "mixed", "line", "start"}
        assert text == source
        tags, text = find_tags(source, replacer=self._replacer)
        assert text == "#starting/starting and #MixED/mixed with some #line/line\nendings also tags can\n" \
                       "#start/start on new line"
    def test_code_block_tags_ignored(self):
        source = "foo\n```\n#code\n```\n#notcode\n\n    #alsocode\n"
-        tags, text = find_tags(source)
+        tags = find_tags(source)
        assert tags == {"notcode"}
        assert text == source
        tags, text = find_tags(source, replacer=self._replacer)
        assert text == "foo\n```\n#code\n```\n#notcode/notcode\n\n    #alsocode\n"
    def test_endings_are_filtered_out(self):
        source = "#parenthesis) #exp! #list] *#doh* _#bah_ #gah% #foo/#bar"
-        tags, text = find_tags(source)
+        tags = find_tags(source)
        assert tags == {"parenthesis", "exp", "list", "doh", "bah", "gah", "foo", "bar"}
        assert text == source
        tags, text = find_tags(source, replacer=self._replacer)
        assert text == "#parenthesis/parenthesis) #exp/exp! #list/list] *#doh/doh* _#bah/bah_ #gah/gah% " \
                       "#foo/foo/#bar/bar"
    def test_finds_tags(self):
        source = "#post **Foobar** #tag #OtherTag #third\n#fourth"
-        tags, text = find_tags(source)
+        tags = find_tags(source)
        assert tags == {"third", "fourth", "post", "othertag", "tag"}
        assert text == source
        tags, text = find_tags(source, replacer=self._replacer)
        assert text == "#post/post **Foobar** #tag/tag #OtherTag/othertag #third/third\n#fourth/fourth"
    def test_ok_with_html_tags_in_text(self):
        source = "<p>#starting and <span>#MixED</span> however not <#>this</#> or <#/>that"
-        tags, text = find_tags(source)
+        tags = find_tags(source)
        assert tags == {"starting", "mixed"}
        assert text == source
        tags, text = find_tags(source, replacer=self._replacer)
        assert text == "<p>#starting/starting and <span>#MixED/mixed</span> however not <#>this</#> or <#/>that"
    def test_postfixed_tags(self):
        source = "#foo) #bar] #hoo, #hee."
-        tags, text = find_tags(source)
+        tags = find_tags(source)
        assert tags == {"foo", "bar", "hoo", "hee"}
        assert text == source
        tags, text = find_tags(source, replacer=self._replacer)
        assert text == "#foo/foo) #bar/bar] #hoo/hoo, #hee/hee."
    def test_prefixed_tags(self):
        source = "(#foo [#bar"
-        tags, text = find_tags(source)
+        tags = find_tags(source)
        assert tags == {"foo", "bar"}
        assert text == source
        tags, text = find_tags(source, replacer=self._replacer)
        assert text == "(#foo/foo [#bar/bar"
    def test_invalid_text_returns_no_tags(self):
        source = "#a!a #a#a #a$a #a%a #a^a #a&a #a*a #a+a #a.a #a,a #a@a #a£a #a(a #a)a #a=a " \
                 "#a?a #a`a #a'a #a\\a #a{a #a[a #a]a #a}a #a~a #a;a #a:a #a\"a #a’a #a”a #\xa0cd"
-        tags, text = find_tags(source)
+        tags = find_tags(source)
-        assert tags == set()
+        assert tags == {'a'}
        assert text == source
        tags, text = find_tags(source, replacer=self._replacer)
        assert text == source
    def test_start_of_paragraph_in_html_content(self):
        source = '<p>First line</p><p>#foobar #barfoo</p>'
-        tags, text = find_tags(source)
+        tags = find_tags(source)
        assert tags == {"foobar", "barfoo"}
        assert text == source
        tags, text = find_tags(source, replacer=self._replacer)
        assert text == '<p>First line</p><p>#foobar/foobar #barfoo/barfoo</p>'
 class TestProcessTextLinks:
    def test_link_at_start_or_end(self):
        assert process_text_links('https://example.org example.org\nhttp://example.org') == \
               '<a href="https://example.org" rel="nofollow" target="_blank">https://example.org</a> ' \
               '<a href="http://example.org" rel="nofollow" target="_blank">example.org</a>\n' \
               '<a href="http://example.org" rel="nofollow" target="_blank">http://example.org</a>'
    def test_existing_links_get_attrs_added(self):
        assert process_text_links('<a href="https://example.org">https://example.org</a>') == \
               '<a href="https://example.org" rel="nofollow" target="_blank">https://example.org</a>'
    def test_code_sections_are_skipped(self):
        assert process_text_links('<code>https://example.org</code><code>\nhttps://example.org\n</code>') == \
               '<code>https://example.org</code><code>\nhttps://example.org\n</code>'
    def test_emails_are_skipped(self):
        assert process_text_links('foo@example.org') == 'foo@example.org'
    def test_does_not_add_target_blank_if_link_is_internal(self):
        assert process_text_links('<a href="/streams/tag/foobar">#foobar</a>') == \
               '<a href="/streams/tag/foobar">#foobar</a>'
    def test_does_not_remove_mention_classes(self):
        assert process_text_links('<p><span class="h-card"><a class="u-url mention" href="https://dev.jasonrobinson.me/u/jaywink/">'
                                  '@<span>jaywink</span></a></span> boom</p>') == \
           '<p><span class="h-card"><a class="u-url mention" href="https://dev.jasonrobinson.me/u/jaywink/" ' \
           'rel="nofollow" target="_blank">@<span>jaywink</span></a></span> boom</p>'
 def test_validate_handle():
--- a/federation/utils/text.py
+++ b/federation/utils/text.py
@ -1,12 +1,18 @@
 import re
-from typing import Set, Tuple
+from typing import Set, List
 from urllib.parse import urlparse
-import bleach
+from bs4 import BeautifulSoup
-from bleach import callbacks
+from bs4.element import NavigableString
 from commonmark import commonmark
 ILLEGAL_TAG_CHARS = "!#$%^&*+.,@£/()=?`'\\{[]}~;:\"’”—\xa0"
-
+TAG_PATTERN = re.compile(r'(#[\w\-]+)([)\]_!?*%/.,;\s]+\s*|\Z)', re.UNICODE)
 # This will match non matching braces. I don't think it's an issue.
 MENTION_PATTERN = re.compile(r'(@\{?(?:[\w\-. \u263a-\U0001f645]*; *)?[\w]+@[\w\-.]+\.[\w]+}?)', re.UNICODE)
 # based on https://stackoverflow.com/a/6041965
 URL_PATTERN = re.compile(r'((?:(?:https?|ftp)://|^|(?<=[("<\s]))+(?:[\w\-]+(?:(?:\.[\w\-]+)+))(?:[\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-]))',
                         re.UNICODE)
 def decode_if_bytes(text):
    try:
@ -22,67 +28,38 @@ def encode_if_text(text):
        return text
-def find_tags(text: str, replacer: callable = None) -> Tuple[Set, str]:
+def find_tags(text: str) -> Set[str]:
    """Find tags in text.
-    Tries to ignore tags inside code blocks.
+    Ignore tags inside code blocks.
-    Optionally, if passed a "replacer", will also replace the tag word with the result
+    Returns a set of tags.
    of the replacer function called with the tag word.
    Returns a set of tags and the original or replaced text.
    """
-    found_tags = set()
+    tags = find_elements(BeautifulSoup(commonmark(text, ignore_html_blocks=True), 'html.parser'),
-    # <br> and <p> tags cause issues in us finding words - add some spacing around them
+                         TAG_PATTERN)
-    new_text = text.replace("<br>", " <br> ").replace("<p>", " <p> ").replace("</p>", " </p> ")
+    return set([tag.text.lstrip('#').lower() for tag in tags])
-    lines = new_text.splitlines(keepends=True)
+
-    final_lines = []
+
-    code_block = False
+def find_elements(soup: BeautifulSoup, pattern: re.Pattern) -> List[NavigableString]:
-    final_text = None
+    """
-    # Check each line separately
+    Split a BeautifulSoup tree strings according to a pattern, replacing each element
-    for line in lines:
+    with a NavigableString. The returned list can be used to linkify the found
-        final_words = []
+    elements.
-        if line[0:3] == "```":
+
-            code_block = not code_block
+    :param soup: BeautifulSoup instance of the content being searched
-        if line.find("#") == -1 or line[0:4] == "    " or code_block:
+    :param pattern: Compiled regular expression defined using a single group
-            # Just add the whole line
+    :return: A NavigableString list attached to the original soup
-            final_lines.append(line)
+    """
-            continue
+    final = []
-        # Check each word separately
+    for candidate in soup.find_all(string=True):
-        words = line.split(" ")
+        if candidate.parent.name == 'code': continue
-        for word in words:
+        ns = [NavigableString(r) for r in pattern.split(candidate.text) if r]
-            if word.find('#') > -1:
+        found = [s for s in ns if pattern.match(s.text)]
-                candidate = word.strip().strip("([]),.!?:*_%/")
+        if found:
-                if candidate.find('<') > -1 or candidate.find('>') > -1:
+            candidate.replace_with(*ns)
-                    # Strip html
+            final.extend(found)
-                    candidate = bleach.clean(word, strip=True)
+    return final
                # Now split with slashes
                candidates = candidate.split("/")
                to_replace = []
                for candidate in candidates:
                    if candidate.startswith("#"):
                        candidate = candidate.strip("#")
                        if test_tag(candidate.lower()):
                            found_tags.add(candidate.lower())
                            to_replace.append(candidate)
                if replacer:
                    tag_word = word
                    try:
                        for counter, replacee in enumerate(to_replace, 1):
                            tag_word = tag_word.replace("#%s" % replacee, replacer(replacee))
                    except Exception:
                        pass
                    final_words.append(tag_word)
                else:
                    final_words.append(word)
            else:
                final_words.append(word)
        final_lines.append(" ".join(final_words))
    if replacer:
        final_text = "".join(final_lines)
    if final_text:
        final_text = final_text.replace(" <br> ", "<br>").replace(" <p> ", "<p>").replace(" </p> ", "</p>")
    return found_tags, final_text or text
 def get_path_from_url(url: str) -> str:
@ -93,28 +70,6 @@ def get_path_from_url(url: str) -> str:
    return parsed.path
 def process_text_links(text):
    """Process links in text, adding some attributes and linkifying textual links."""
    link_callbacks = [callbacks.nofollow, callbacks.target_blank]
    def link_attributes(attrs, new=False):
        """Run standard callbacks except for internal links."""
        href_key = (None, "href")
        if attrs.get(href_key).startswith("/"):
            return attrs
        # Run the standard callbacks
        for callback in link_callbacks:
            attrs = callback(attrs, new)
        return attrs
    return bleach.linkify(
        text,
        callbacks=[link_attributes],
        parse_email=False,
        skip_tags=["code"],
    )
 def test_tag(tag: str) -> bool:
    """Test a word whether it could be accepted as a tag."""