Merge branch 'ap-processing-improvements' into 'master'

Content processing improvements.

See merge request jaywink/federation!177
fix-url-regex
Alain St-Denis 2023-09-04 21:38:47 +00:00
commit add80e0f6c
Nie znaleziono w bazie danych klucza dla tego podpisu
16 zmienionych plików z 357 dodań i 400 usunięć

Wyświetl plik

@ -22,7 +22,7 @@
* For inbound payload, a cached dict of all the defined AP extensions is merged with each incoming LD context. * For inbound payload, a cached dict of all the defined AP extensions is merged with each incoming LD context.
* Better handle conflicting property defaults by having `get_base_attributes` return only attributes that * Better handle conflicting property defaults by having `get_base_attributes` return only attributes that
are not empty (or bool). This helps distinguishing between `marshmallow.missing` and empty values. are not empty (or bool). This helps distinguish between `marshmallow.missing` and empty values.
* JsonLD document caching now set in `activitypub/__init__.py`. * JsonLD document caching now set in `activitypub/__init__.py`.
@ -45,6 +45,10 @@
* In fetch_document: if response.encoding is not set, default to utf-8. * In fetch_document: if response.encoding is not set, default to utf-8.
* Fix process_text_links that would crash on `a` tags with no `href` attribute.
* Ignore relayed AP retractions.
## [0.24.1] - 2023-03-18 ## [0.24.1] - 2023-03-18
### Fixed ### Fixed

Wyświetl plik

@ -4,9 +4,8 @@ Protocols
Currently three protocols are being focused on. Currently three protocols are being focused on.
* Diaspora is considered to be stable with most of the protocol implemented. * Diaspora is considered to be stable with most of the protocol implemented.
* ActivityPub support should be considered as alpha - all the basic * ActivityPub support should be considered as beta - all the basic
things work but there are likely to be a lot of compatibility issues with other ActivityPub things work and we are fixing incompatibilities as they are identified.
implementations.
* Matrix support cannot be considered usable as of yet. * Matrix support cannot be considered usable as of yet.
For example implementations in real life projects check :ref:`example-projects`. For example implementations in real life projects check :ref:`example-projects`.
@ -69,20 +68,21 @@ Content media type
The following keys will be set on the entity based on the ``source`` property existing: The following keys will be set on the entity based on the ``source`` property existing:
* if the object has an ``object.source`` property: * if the object has an ``object.source`` property:
* ``_media_type`` will be the source media type * ``_media_type`` will be the source media type (only text/markdown is supported).
* ``_rendered_content`` will be the object ``content`` * ``rendered_content`` will be the object ``content``
* ``raw_content`` will be the source ``content`` * ``raw_content`` will be the source ``content``
* if the object has no ``object.source`` property: * if the object has no ``object.source`` property:
* ``_media_type`` will be ``text/html`` * ``_media_type`` will be ``text/html``
* ``_rendered_content`` will be the object ``content`` * ``rendered_content`` will be the object ``content``
* ``raw_content`` will object ``content`` run through a HTML2Markdown renderer * ``raw_content`` will be empty
The ``contentMap`` property is processed but content language selection is not implemented yet. The ``contentMap`` property is processed but content language selection is not implemented yet.
For outbound entities, ``raw_content`` is expected to be in ``text/markdown``, For outbound entities, ``raw_content`` is expected to be in ``text/markdown``,
specifically CommonMark. When sending payloads, ``raw_content`` will be rendered via specifically CommonMark. The client applications are expected to provide the
the ``commonmark`` library into ``object.content``. The original ``raw_content`` rendered content for protocols that require it (e.g. ActivityPub).
will be added to the ``object.source`` property. When sending payloads, ``object.contentMap`` will be set to ``rendered_content``
and ``raw_content`` will be added to the ``object.source`` property.
Medias Medias
...... ......
@ -98,6 +98,19 @@ support from client applications.
For inbound entities we do this automatically by not including received image attachments in For inbound entities we do this automatically by not including received image attachments in
the entity ``_children`` attribute. Audio and video are passed through the client application. the entity ``_children`` attribute. Audio and video are passed through the client application.
Hashtags and mentions
.....................
For outbound payloads, client applications must add/set the hashtag/mention value to
the ``class`` attribute of rendered content linkified hashtags/mentions. These will be
used to help build the corresponding ``Hashtag`` and ``Mention`` objects.
For inbound payloads, if a markdown source is provided, hashtags/mentions will be extracted
through the same method used for Diaspora. If only HTML content is provided, the ``a`` tags
will be marked with a ``data-[hashtag|mention]`` attribute (based on the provided Hashtag/Mention
objects) to facilitate the ``href`` attribute modifications lient applications might
wish to make. This should ensure links can be replaced regardless of how the HTML is structured.
.. _matrix: .. _matrix:
Matrix Matrix

Wyświetl plik

@ -2,7 +2,7 @@ from cryptography.exceptions import InvalidSignature
from django.http import JsonResponse, HttpResponse, HttpResponseNotFound from django.http import JsonResponse, HttpResponse, HttpResponseNotFound
from federation.entities.activitypub.mappers import get_outbound_entity from federation.entities.activitypub.mappers import get_outbound_entity
from federation.protocols.activitypub.signing import verify_request_signature from federation.protocols.activitypub.protocol import Protocol
from federation.types import RequestType from federation.types import RequestType
from federation.utils.django import get_function_from_config from federation.utils.django import get_function_from_config
@ -23,9 +23,11 @@ def get_and_verify_signer(request):
body=request.body, body=request.body,
method=request.method, method=request.method,
headers=request.headers) headers=request.headers)
protocol = Protocol(request=req, get_contact_key=get_public_key)
try: try:
return verify_request_signature(req) protocol.verify()
except ValueError: return protocol.sender
except (ValueError, KeyError, InvalidSignature) as exc:
return None return None

Wyświetl plik

@ -113,10 +113,11 @@ class LdContextManager:
if 'python-federation"' in s: if 'python-federation"' in s:
ctx = json.loads(s.replace('python-federation', 'python-federation#', 1)) ctx = json.loads(s.replace('python-federation', 'python-federation#', 1))
# some platforms have http://joinmastodon.com/ns in @context. This # Some platforms have reference invalid json-ld document in @context.
# is not a json-ld document. # Remove those.
for url in ['http://joinmastodon.org/ns', 'http://schema.org']:
try: try:
ctx.pop(ctx.index('http://joinmastodon.org/ns')) ctx.pop(ctx.index(url))
except ValueError: except ValueError:
pass pass
@ -137,12 +138,17 @@ class LdContextManager:
# Merge all defined AP extensions to the inbound context # Merge all defined AP extensions to the inbound context
uris = [] uris = []
defs = {} defs = {}
# Merge original context dicts in one dict # Merge original context dicts in one dict, taking into account nested @context
def parse_context(ctx):
for item in ctx: for item in ctx:
if isinstance(item, str): if isinstance(item, str):
uris.append(item) uris.append(item)
else: else:
if '@context' in item:
parse_context([item['@context']])
item.pop('@context')
defs.update(item) defs.update(item)
parse_context(ctx)
for item in self._merged: for item in self._merged:
if isinstance(item, str) and item not in uris: if isinstance(item, str) and item not in uris:

Wyświetl plik

@ -75,8 +75,8 @@ def verify_ld_signature(payload):
obj_digest = hash(obj) obj_digest = hash(obj)
digest = (sig_digest + obj_digest).encode('utf-8') digest = (sig_digest + obj_digest).encode('utf-8')
sig_value = b64decode(signature.get('signatureValue'))
try: try:
sig_value = b64decode(signature.get('signatureValue'))
verifier.verify(SHA256.new(digest), sig_value) verifier.verify(SHA256.new(digest), sig_value)
logger.debug('ld_signature - %s has a valid signature', payload.get("id")) logger.debug('ld_signature - %s has a valid signature', payload.get("id"))
return profile.id return profile.id
@ -99,6 +99,6 @@ class NormalizedDoubles(jsonld.JsonLdProcessor):
item['@value'] = math.floor(value) item['@value'] = math.floor(value)
obj = super()._object_to_rdf(item, issuer, triples, rdfDirection) obj = super()._object_to_rdf(item, issuer, triples, rdfDirection)
# This is to address https://github.com/digitalbazaar/pyld/issues/175 # This is to address https://github.com/digitalbazaar/pyld/issues/175
if obj.get('datatype') == jsonld.XSD_DOUBLE: if obj and obj.get('datatype') == jsonld.XSD_DOUBLE:
obj['value'] = re.sub(r'(\d)0*E\+?(-)?0*(\d)', r'\1E\2\3', obj['value']) obj['value'] = re.sub(r'(\d)0*E\+?(-)?0*(\d)', r'\1E\2\3', obj['value'])
return obj return obj

Wyświetl plik

@ -1,12 +1,16 @@
import copy import copy
import json import json
import logging import logging
import re
import traceback
import uuid import uuid
from datetime import timedelta from operator import attrgetter
from typing import List, Dict, Union from typing import List, Dict, Union
from urllib.parse import urlparse from unicodedata import normalize
from urllib.parse import unquote, urlparse
import bleach import bleach
from bs4 import BeautifulSoup
from calamus import fields from calamus import fields
from calamus.schema import JsonLDAnnotation, JsonLDSchema, JsonLDSchemaOpts from calamus.schema import JsonLDAnnotation, JsonLDSchema, JsonLDSchemaOpts
from calamus.utils import normalize_value from calamus.utils import normalize_value
@ -31,10 +35,10 @@ from federation.utils.text import with_slash, validate_handle
logger = logging.getLogger("federation") logger = logging.getLogger("federation")
def get_profile_or_entity(fid): def get_profile_or_entity(**kwargs):
obj = get_profile(fid=fid) obj = get_profile(**kwargs)
if not obj: if not obj and kwargs.get('fid'):
obj = retrieve_and_parse_document(fid) obj = retrieve_and_parse_document(kwargs['fid'])
return obj return obj
@ -57,6 +61,7 @@ as2 = fields.Namespace("https://www.w3.org/ns/activitystreams#")
dc = fields.Namespace("http://purl.org/dc/terms/") dc = fields.Namespace("http://purl.org/dc/terms/")
diaspora = fields.Namespace("https://diasporafoundation.org/ns/") diaspora = fields.Namespace("https://diasporafoundation.org/ns/")
ldp = fields.Namespace("http://www.w3.org/ns/ldp#") ldp = fields.Namespace("http://www.w3.org/ns/ldp#")
lemmy = fields.Namespace("https://join-lemmy.org/ns#")
litepub = fields.Namespace("http://litepub.social/ns#") litepub = fields.Namespace("http://litepub.social/ns#")
misskey = fields.Namespace("https://misskey-hub.net/ns#") misskey = fields.Namespace("https://misskey-hub.net/ns#")
ostatus = fields.Namespace("http://ostatus.org#") ostatus = fields.Namespace("http://ostatus.org#")
@ -241,8 +246,8 @@ class Object(BaseEntity, metaclass=JsonLDAnnotation):
metadata={'ctx':[{ 'alsoKnownAs':{'@id':'as:alsoKnownAs','@type':'@id'}}]}) metadata={'ctx':[{ 'alsoKnownAs':{'@id':'as:alsoKnownAs','@type':'@id'}}]})
icon = MixedField(as2.icon, nested='ImageSchema') icon = MixedField(as2.icon, nested='ImageSchema')
image = MixedField(as2.image, nested='ImageSchema') image = MixedField(as2.image, nested='ImageSchema')
tag_objects = MixedField(as2.tag, nested=['HashtagSchema','MentionSchema','PropertyValueSchema','EmojiSchema'], many=True) tag_objects = MixedField(as2.tag, nested=['NoteSchema', 'HashtagSchema','MentionSchema','PropertyValueSchema','EmojiSchema'], many=True)
attachment = fields.Nested(as2.attachment, nested=['ImageSchema', 'AudioSchema', 'DocumentSchema','PropertyValueSchema','IdentityProofSchema'], attachment = fields.Nested(as2.attachment, nested=['LinkSchema', 'NoteSchema', 'ImageSchema', 'AudioSchema', 'DocumentSchema','PropertyValueSchema','IdentityProofSchema'],
many=True, default=[]) many=True, default=[])
content_map = LanguageMap(as2.content) # language maps are not implemented in calamus content_map = LanguageMap(as2.content) # language maps are not implemented in calamus
context = fields.RawJsonLD(as2.context) context = fields.RawJsonLD(as2.context)
@ -250,7 +255,7 @@ class Object(BaseEntity, metaclass=JsonLDAnnotation):
generator = MixedField(as2.generator, nested=['ApplicationSchema','ServiceSchema']) generator = MixedField(as2.generator, nested=['ApplicationSchema','ServiceSchema'])
created_at = fields.DateTime(as2.published, add_value_types=True) created_at = fields.DateTime(as2.published, add_value_types=True)
replies = MixedField(as2.replies, nested=['CollectionSchema','OrderedCollectionSchema']) replies = MixedField(as2.replies, nested=['CollectionSchema','OrderedCollectionSchema'])
signature = MixedField(sec.signature, nested = 'SignatureSchema', signature = MixedField(sec.signature, nested = 'RsaSignature2017Schema',
metadata={'ctx': [CONTEXT_SECURITY, metadata={'ctx': [CONTEXT_SECURITY,
{'RsaSignature2017':'sec:RsaSignature2017'}]}) {'RsaSignature2017':'sec:RsaSignature2017'}]})
start_time = fields.DateTime(as2.startTime, add_value_types=True) start_time = fields.DateTime(as2.startTime, add_value_types=True)
@ -333,6 +338,20 @@ class Object(BaseEntity, metaclass=JsonLDAnnotation):
data['@context'] = context_manager.merge_context(ctx) data['@context'] = context_manager.merge_context(ctx)
return data return data
# JSONLD specs states it is case sensitive.
# Ensure type names for which we have an implementation have the proper case
# for platforms that ignore the spec.
@pre_load
def patch_types(self, data, **kwargs):
def walk_payload(payload):
for key,val in copy.copy(payload).items():
if isinstance(val, dict):
walk_payload(val)
if key == 'type':
payload[key] = MODEL_NAMES.get(val.lower(), val)
return payload
return walk_payload(data)
# A node without an id isn't true json-ld, but many payloads have # A node without an id isn't true json-ld, but many payloads have
# id-less nodes. Since calamus forces random ids on such nodes, # id-less nodes. Since calamus forces random ids on such nodes,
# this removes it. # this removes it.
@ -567,7 +586,8 @@ class Person(Object, base.Profile):
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs) super().__init__(*args, **kwargs)
self._allowed_children += (PropertyValue, IdentityProof) self._required += ['url']
self._allowed_children += (Note, PropertyValue, IdentityProof)
# Set finger to username@host if not provided by the platform # Set finger to username@host if not provided by the platform
def post_receive(self): def post_receive(self):
@ -576,12 +596,15 @@ class Person(Object, base.Profile):
self.finger = profile.finger self.finger = profile.finger
else: else:
domain = urlparse(self.id).netloc domain = urlparse(self.id).netloc
finger = f'{self.username.lower()}@{domain}' finger = f'{self.username}@{domain}'
if get_profile_id_from_webfinger(finger) == self.id: if get_profile_id_from_webfinger(finger) == self.id:
self.finger = finger self.finger = finger
# multi-protocol platform # multi-protocol platform
if self.finger and self.guid is not missing and self.handle is missing: if self.finger and self.guid is not missing and self.handle is missing:
self.handle = self.finger self.handle = self.finger
# Some platforms don't set this property.
if self.url is missing:
self.url = self.id
def to_as2(self): def to_as2(self):
self.followers = f'{with_slash(self.id)}followers/' self.followers = f'{with_slash(self.id)}followers/'
@ -716,15 +739,19 @@ class Note(Object, RawContentMixin):
_cached_raw_content = '' _cached_raw_content = ''
_cached_children = [] _cached_children = []
_soup = None
signable = True signable = True
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
self.tag_objects = [] # mutable objects... self.tag_objects = [] # mutable objects...
super().__init__(*args, **kwargs) super().__init__(*args, **kwargs)
self._allowed_children += (base.Audio, base.Video) self.raw_content # must be "primed" with source property for inbound payloads
self.rendered_content # must be "primed" with content_map property for inbound payloads
self._allowed_children += (base.Audio, base.Video, Link)
self._required.remove('raw_content')
self._required += ['rendered_content']
def to_as2(self): def to_as2(self):
self.sensitive = 'nsfw' in self.tags
self.url = self.id self.url = self.id
edited = False edited = False
@ -752,8 +779,8 @@ class Note(Object, RawContentMixin):
def to_base(self): def to_base(self):
kwargs = get_base_attributes(self, keep=( kwargs = get_base_attributes(self, keep=(
'_mentions', '_media_type', '_rendered_content', '_source_object', '_mentions', '_media_type', '_source_object',
'_cached_children', '_cached_raw_content')) '_cached_children', '_cached_raw_content', '_soup'))
entity = Comment(**kwargs) if getattr(self, 'target_id') else Post(**kwargs) entity = Comment(**kwargs) if getattr(self, 'target_id') else Post(**kwargs)
# Plume (and maybe other platforms) send the attrbutedTo field as an array # Plume (and maybe other platforms) send the attrbutedTo field as an array
if isinstance(entity.actor_id, list): entity.actor_id = entity.actor_id[0] if isinstance(entity.actor_id, list): entity.actor_id = entity.actor_id[0]
@ -764,6 +791,7 @@ class Note(Object, RawContentMixin):
def pre_send(self) -> None: def pre_send(self) -> None:
""" """
Attach any embedded images from raw_content. Attach any embedded images from raw_content.
Add Hashtag and Mention objects (the client app must define the class tag/mention property)
""" """
super().pre_send() super().pre_send()
self._children = [ self._children = [
@ -774,133 +802,136 @@ class Note(Object, RawContentMixin):
) for image in self.embedded_images ) for image in self.embedded_images
] ]
# Add other AP objects # Add Hashtag objects
self.extract_mentions() for el in self._soup('a', attrs={'class':'hashtag'}):
self.content_map = {'orig': self.rendered_content} self.tag_objects.append(Hashtag(
self.add_mention_objects() href = el.attrs['href'],
self.add_tag_objects() name = el.text
))
self.tag_objects = sorted(self.tag_objects, key=attrgetter('name'))
if el.text == '#nsfw': self.sensitive = True
def post_receive(self) -> None: # Add Mention objects
""" mentions = []
Make linkified tags normal tags. for el in self._soup('a', attrs={'class':'mention'}):
""" mentions.append(el.text.lstrip('@'))
super().post_receive()
if not self.raw_content or self._media_type == "text/markdown":
# Skip when markdown
return
hrefs = []
for tag in self.tag_objects:
if isinstance(tag, Hashtag):
if tag.href is not missing:
hrefs.append(tag.href.lower())
elif tag.id is not missing:
hrefs.append(tag.id.lower())
# noinspection PyUnusedLocal
def remove_tag_links(attrs, new=False):
# Hashtag object hrefs
href = (None, "href")
url = attrs.get(href, "").lower()
if url in hrefs:
return
# one more time without the query (for pixelfed)
parsed = urlparse(url)
url = f'{parsed.scheme}://{parsed.netloc}{parsed.path}'
if url in hrefs:
return
# Mastodon
rel = (None, "rel")
if attrs.get(rel) == "tag":
return
# Friendica
if attrs.get(href, "").endswith(f'tag={attrs.get("_text")}'):
return
return attrs
self.raw_content = bleach.linkify(
self.raw_content,
callbacks=[remove_tag_links],
parse_email=False,
skip_tags=["code", "pre"],
)
if getattr(self, 'target_id'): self.entity_type = 'Comment'
def add_tag_objects(self) -> None:
"""
Populate tags to the object.tag list.
"""
try:
from federation.utils.django import get_configuration
config = get_configuration()
except ImportError:
tags_path = None
else:
if config["tags_path"]:
tags_path = f"{config['base_url']}{config['tags_path']}"
else:
tags_path = None
for tag in self.tags:
_tag = Hashtag(name=f'#{tag}')
if tags_path:
_tag.href = tags_path.replace(":tag:", tag)
self.tag_objects.append(_tag)
def add_mention_objects(self) -> None:
"""
Populate mentions to the object.tag list.
"""
if len(self._mentions):
mentions = list(self._mentions)
mentions.sort() mentions.sort()
for mention in mentions: for mention in mentions:
if validate_handle(mention): if validate_handle(mention):
profile = get_profile(finger=mention) profile = get_profile(finger__iexact=mention)
# only add AP profiles mentions # only add AP profiles mentions
if getattr(profile, 'id', None): if getattr(profile, 'id', None):
self.tag_objects.append(Mention(href=profile.id, name='@'+mention)) self.tag_objects.append(Mention(href=profile.id, name='@'+mention))
# some platforms only render diaspora style markdown if it is available # some platforms only render diaspora style markdown if it is available
self.source['content'] = self.source['content'].replace(mention, '{' + mention + '}') self.source['content'] = self.source['content'].replace(mention, '{' + mention + '}')
def post_receive(self) -> None:
"""
Mark linkified tags and mentions with a data-{mention, tag} attribute.
"""
super().post_receive()
if self._media_type == "text/markdown":
# Skip when markdown
return
self._find_and_mark_hashtags()
self._find_and_mark_mentions()
if getattr(self, 'target_id'): self.entity_type = 'Comment'
def _find_and_mark_hashtags(self):
hrefs = set()
for tag in self.tag_objects:
if isinstance(tag, Hashtag):
if tag.href is not missing:
hrefs.add(tag.href.lower())
# Some platforms use id instead of href...
elif tag.id is not missing:
hrefs.add(tag.id.lower())
for link in self._soup.find_all('a', href=True):
parsed = urlparse(unquote(link['href']).lower())
# remove the query part and trailing garbage, if any
path = parsed.path
trunc = re.match(r'(/[\w/\-]+)', parsed.path)
if trunc:
path = trunc.group()
url = f'{parsed.scheme}://{parsed.netloc}{path}'
# convert accented characters to their ascii equivalent
normalized_path = normalize('NFD', path).encode('ascii', 'ignore')
normalized_url = f'{parsed.scheme}://{parsed.netloc}{normalized_path.decode()}'
links = {link['href'].lower(), unquote(link['href']).lower(), url, normalized_url}
if links.intersection(hrefs):
tag = re.match(r'^#?([\w\-]+$)', link.text)
if tag:
link['data-hashtag'] = tag.group(1).lower()
def _find_and_mark_mentions(self):
mentions = [mention for mention in self.tag_objects if isinstance(mention, Mention)]
# There seems to be consensus on using the profile url for
# the link and the profile id for the Mention object href property,
# but some platforms will set mention.href to the profile url, so
# we check both.
for mention in mentions:
hrefs = []
profile = get_profile_or_entity(fid=mention.href, remote_url=mention.href)
if profile and not profile.url:
# This should be removed when we are confident that the remote_url property
# has been populated for most profiles on the client app side.
profile = retrieve_and_parse_profile(profile.id)
if profile:
hrefs.extend([profile.id, profile.url])
for href in hrefs:
links = self._soup.find_all(href=href)
for link in links:
link['data-mention'] = profile.finger
self._mentions.add(profile.finger)
def extract_mentions(self): def extract_mentions(self):
""" """
Extract mentions from the source object. Attempt to extract mentions from raw_content if available
""" """
super().extract_mentions()
if getattr(self, 'tag_objects', None): if self.raw_content:
#tag_objects = self.tag_objects if isinstance(self.tag_objects, list) else [self.tag_objects] super().extract_mentions()
for tag in self.tag_objects: return
if isinstance(tag, Mention):
profile = get_profile_or_entity(fid=tag.href)
handle = getattr(profile, 'finger', None)
if handle: self._mentions.add(handle)
@property @property
def raw_content(self): def rendered_content(self):
if self._soup: return str(self._soup)
if self._cached_raw_content: return self._cached_raw_content content = ''
if self.content_map: if self.content_map:
orig = self.content_map.pop('orig') orig = self.content_map.pop('orig')
if len(self.content_map.keys()) > 1: if len(self.content_map.keys()) > 1:
logger.warning('Language selection not implemented, falling back to default') logger.warning('Language selection not implemented, falling back to default')
self._rendered_content = orig.strip() content = orig.strip()
else: else:
self._rendered_content = orig.strip() if len(self.content_map.keys()) == 0 else next(iter(self.content_map.values())).strip() content = orig.strip() if len(self.content_map.keys()) == 0 else next(iter(self.content_map.values())).strip()
self.content_map['orig'] = orig self.content_map['orig'] = orig
# to allow for posts/replies with medias only.
if not content: content = "<div></div>"
self._soup = BeautifulSoup(content, 'html.parser')
return str(self._soup)
@rendered_content.setter
def rendered_content(self, value):
if not value: return
self._soup = BeautifulSoup(value, 'html.parser')
self.content_map = {'orig': value}
@property
def raw_content(self):
if self._cached_raw_content: return self._cached_raw_content
if isinstance(self.source, dict) and self.source.get('mediaType') == 'text/markdown': if isinstance(self.source, dict) and self.source.get('mediaType') == 'text/markdown':
self._media_type = self.source['mediaType'] self._media_type = self.source['mediaType']
self._cached_raw_content = self.source.get('content').strip() self._cached_raw_content = self.source.get('content').strip()
else: else:
self._media_type = 'text/html' self._media_type = 'text/html'
self._cached_raw_content = self._rendered_content self._cached_raw_content = ""
# to allow for posts/replies with medias only.
if not self._cached_raw_content: self._cached_raw_content = "<div></div>"
return self._cached_raw_content return self._cached_raw_content
@raw_content.setter @raw_content.setter
@ -917,12 +948,13 @@ class Note(Object, RawContentMixin):
if isinstance(getattr(self, 'attachment', None), list): if isinstance(getattr(self, 'attachment', None), list):
children = [] children = []
for child in self.attachment: for child in self.attachment:
if isinstance(child, Document): if isinstance(child, (Document, Link)):
obj = child.to_base() if hasattr(child, 'to_base'):
if isinstance(obj, Image): child = child.to_base()
if obj.inline or (obj.image and obj.image in self.raw_content): if isinstance(child, Image):
if child.inline or (child.image and child.image in self.raw_content):
continue continue
children.append(obj) children.append(child)
self._cached_children = children self._cached_children = children
return self._cached_children return self._cached_children
@ -1010,7 +1042,7 @@ class Video(Document, base.Video):
self.actor_id = new_act[0] self.actor_id = new_act[0]
entity = Post(**get_base_attributes(self, entity = Post(**get_base_attributes(self,
keep=('_mentions', '_media_type', '_rendered_content', keep=('_mentions', '_media_type', '_soup',
'_cached_children', '_cached_raw_content', '_source_object'))) '_cached_children', '_cached_raw_content', '_source_object')))
set_public(entity) set_public(entity)
return entity return entity
@ -1019,7 +1051,7 @@ class Video(Document, base.Video):
return self return self
class Signature(Object): class RsaSignature2017(Object):
created = fields.DateTime(dc.created, add_value_types=True) created = fields.DateTime(dc.created, add_value_types=True)
creator = IRI(dc.creator) creator = IRI(dc.creator)
key = fields.String(sec.signatureValue) key = fields.String(sec.signatureValue)
@ -1174,6 +1206,7 @@ class Retraction(Announce, base.Retraction):
class Tombstone(Object, base.Retraction): class Tombstone(Object, base.Retraction):
target_id = fields.Id() target_id = fields.Id()
signable = True
def to_as2(self): def to_as2(self):
if not isinstance(self.activity, type): return None if not isinstance(self.activity, type): return None
@ -1294,7 +1327,7 @@ def extract_receivers(entity):
profile = None profile = None
# don't care about receivers for payloads without an actor_id # don't care about receivers for payloads without an actor_id
if getattr(entity, 'actor_id'): if getattr(entity, 'actor_id'):
profile = get_profile_or_entity(entity.actor_id) profile = get_profile_or_entity(fid=entity.actor_id)
if not isinstance(profile, base.Profile): if not isinstance(profile, base.Profile):
return receivers return receivers
@ -1314,14 +1347,16 @@ def extract_and_validate(entity):
entity._source_protocol = "activitypub" entity._source_protocol = "activitypub"
# Extract receivers # Extract receivers
entity._receivers = extract_receivers(entity) entity._receivers = extract_receivers(entity)
# Extract mentions
if hasattr(entity, "extract_mentions"):
entity.extract_mentions()
if hasattr(entity, "post_receive"): if hasattr(entity, "post_receive"):
entity.post_receive() entity.post_receive()
if hasattr(entity, 'validate'): entity.validate() if hasattr(entity, 'validate'): entity.validate()
# Extract mentions
if hasattr(entity, "extract_mentions"):
entity.extract_mentions()
def extract_replies(replies): def extract_replies(replies):
@ -1373,6 +1408,9 @@ def element_to_objects(element: Union[Dict, Object], sender: str = "") -> List:
logger.error("Failed to validate entity %s: %s", entity, ex) logger.error("Failed to validate entity %s: %s", entity, ex)
return [] return []
except InvalidSignature as exc: except InvalidSignature as exc:
if isinstance(entity, base.Retraction):
logger.warning('Relayed retraction on %s, ignoring', entity.target_id)
return []
logger.info('%s, fetching from remote', exc) logger.info('%s, fetching from remote', exc)
entity = retrieve_and_parse_document(entity.id) entity = retrieve_and_parse_document(entity.id)
if not entity: if not entity:
@ -1396,6 +1434,7 @@ def model_to_objects(payload):
entity = model.schema().load(payload) entity = model.schema().load(payload)
except (KeyError, jsonld.JsonLdError, exceptions.ValidationError) as exc : # Just give up for now. This must be made robust except (KeyError, jsonld.JsonLdError, exceptions.ValidationError) as exc : # Just give up for now. This must be made robust
logger.error("Error parsing jsonld payload (%s)", exc) logger.error("Error parsing jsonld payload (%s)", exc)
traceback.print_exception(exc)
return None return None
if isinstance(getattr(entity, 'object_', None), Object): if isinstance(getattr(entity, 'object_', None), Object):
@ -1417,3 +1456,9 @@ CLASSES_WITH_CONTEXT_EXTENSIONS = (
PropertyValue PropertyValue
) )
context_manager = LdContextManager(CLASSES_WITH_CONTEXT_EXTENSIONS) context_manager = LdContextManager(CLASSES_WITH_CONTEXT_EXTENSIONS)
MODEL_NAMES = {}
for key,val in copy.copy(globals()).items():
if type(val) == JsonLDAnnotation and issubclass(val, (Object, Link)):
MODEL_NAMES[key.lower()] = key

Wyświetl plik

@ -4,12 +4,13 @@ import re
import warnings import warnings
from typing import List, Set, Union, Dict, Tuple from typing import List, Set, Union, Dict, Tuple
from bs4 import BeautifulSoup
from commonmark import commonmark from commonmark import commonmark
from marshmallow import missing from marshmallow import missing
from federation.entities.activitypub.enums import ActivityType from federation.entities.activitypub.enums import ActivityType
from federation.entities.utils import get_name_for_profile, get_profile from federation.entities.utils import get_name_for_profile, get_profile
from federation.utils.text import process_text_links, find_tags from federation.utils.text import find_elements, find_tags, MENTION_PATTERN
class BaseEntity: class BaseEntity:
@ -22,6 +23,7 @@ class BaseEntity:
_source_object: Union[str, Dict] = None _source_object: Union[str, Dict] = None
_sender: str = "" _sender: str = ""
_sender_key: str = "" _sender_key: str = ""
_tags: Set = None
# ActivityType # ActivityType
activity: ActivityType = None activity: ActivityType = None
activity_id: str = "" activity_id: str = ""
@ -205,7 +207,7 @@ class CreatedAtMixin(BaseEntity):
class RawContentMixin(BaseEntity): class RawContentMixin(BaseEntity):
_media_type: str = "text/markdown" _media_type: str = "text/markdown"
_mentions: Set = None _mentions: Set = None
_rendered_content: str = "" rendered_content: str = ""
raw_content: str = "" raw_content: str = ""
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
@ -231,59 +233,22 @@ class RawContentMixin(BaseEntity):
images.append((groups[1], groups[0] or "")) images.append((groups[1], groups[0] or ""))
return images return images
@property # Legacy. Keep this until tests are reworked
def rendered_content(self) -> str:
"""Returns the rendered version of raw_content, or just raw_content."""
try:
from federation.utils.django import get_configuration
config = get_configuration()
if config["tags_path"]:
def linkifier(tag: str) -> str:
return f'<a class="mention hashtag" ' \
f' href="{config["base_url"]}{config["tags_path"].replace(":tag:", tag.lower())}" ' \
f'rel="noopener noreferrer">' \
f'#<span>{tag}</span></a>'
else:
linkifier = None
except ImportError:
linkifier = None
if self._rendered_content:
return self._rendered_content
elif self._media_type == "text/markdown" and self.raw_content:
# Do tags
_tags, rendered = find_tags(self.raw_content, replacer=linkifier)
# Render markdown to HTML
rendered = commonmark(rendered).strip()
# Do mentions
if self._mentions:
for mention in self._mentions:
# Diaspora mentions are linkified as mailto
profile = get_profile(finger=mention)
href = 'mailto:'+mention if not getattr(profile, 'id', None) else profile.id
rendered = rendered.replace(
"@%s" % mention,
f'@<a class="h-card" href="{href}"><span>{mention}</span></a>',
)
# Finally linkify remaining URL's that are not links
rendered = process_text_links(rendered)
return rendered
return self.raw_content
@property @property
def tags(self) -> List[str]: def tags(self) -> List[str]:
"""Returns a `list` of unique tags contained in `raw_content`."""
if not self.raw_content: if not self.raw_content:
return [] return []
tags, _text = find_tags(self.raw_content) return sorted(find_tags(self.raw_content))
return sorted(tags)
def extract_mentions(self): def extract_mentions(self):
if self._media_type != 'text/markdown': return if not self.raw_content:
matches = re.findall(r'@{?[\S ]?[^{}@]+[@;]?\s*[\w\-./@]+[\w/]+}?', self.raw_content)
if not matches:
return return
for mention in matches: mentions = find_elements(
BeautifulSoup(
commonmark(self.raw_content, ignore_html_blocks=True), 'html.parser'),
MENTION_PATTERN)
for ns in mentions:
mention = ns.text
handle = None handle = None
splits = mention.split(";") splits = mention.split(";")
if len(splits) == 1: if len(splits) == 1:
@ -297,6 +262,7 @@ class RawContentMixin(BaseEntity):
class OptionalRawContentMixin(RawContentMixin): class OptionalRawContentMixin(RawContentMixin):
"""A version of the RawContentMixin where `raw_content` is not required.""" """A version of the RawContentMixin where `raw_content` is not required."""
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs) super().__init__(*args, **kwargs)
self._required.remove("raw_content") self._required.remove("raw_content")

Wyświetl plik

@ -49,6 +49,11 @@ class Protocol:
sender = None sender = None
user = None user = None
def __init__(self, request=None, get_contact_key=None):
# this is required for calls to verify on GET requests
self.request = request
self.get_contact_key = get_contact_key
def build_send(self, entity: BaseEntity, from_user: UserType, to_user_key: RsaKey = None) -> Union[str, Dict]: def build_send(self, entity: BaseEntity, from_user: UserType, to_user_key: RsaKey = None) -> Union[str, Dict]:
""" """
Build POST data for sending out to remotes. Build POST data for sending out to remotes.
@ -112,7 +117,7 @@ class Protocol:
self.sender = signer.id if signer else self.actor self.sender = signer.id if signer else self.actor
key = getattr(signer, 'public_key', None) key = getattr(signer, 'public_key', None)
if not key: if not key:
key = self.get_contact_key(self.actor) if self.get_contact_key else '' key = self.get_contact_key(self.actor) if self.get_contact_key and self.actor else ''
if key: if key:
# fallback to the author's key the client app may have provided # fallback to the author's key the client app may have provided
logger.warning("Failed to retrieve keyId for %s, trying the actor's key", sig.get('keyId')) logger.warning("Failed to retrieve keyId for %s, trying the actor's key", sig.get('keyId'))

Wyświetl plik

@ -1,3 +1,4 @@
import commonmark
import pytest import pytest
from unittest.mock import patch from unittest.mock import patch
from pprint import pprint from pprint import pprint
@ -63,8 +64,12 @@ class TestEntitiesConvertToAS2:
'published': '2019-04-27T00:00:00', 'published': '2019-04-27T00:00:00',
} }
# Now handled by the client app
@pytest.mark.skip
def test_comment_to_as2__url_in_raw_content(self, activitypubcomment): def test_comment_to_as2__url_in_raw_content(self, activitypubcomment):
activitypubcomment.raw_content = 'raw_content http://example.com' activitypubcomment.raw_content = 'raw_content http://example.com'
activitypubcomment.rendered_content = process_text_links(
commonmark.commonmark(activitypubcomment.raw_content).strip())
activitypubcomment.pre_send() activitypubcomment.pre_send()
result = activitypubcomment.to_as2() result = activitypubcomment.to_as2()
assert result == { assert result == {
@ -118,6 +123,7 @@ class TestEntitiesConvertToAS2:
} }
def test_post_to_as2(self, activitypubpost): def test_post_to_as2(self, activitypubpost):
activitypubpost.rendered_content = commonmark.commonmark(activitypubpost.raw_content).strip()
activitypubpost.pre_send() activitypubpost.pre_send()
result = activitypubpost.to_as2() result = activitypubpost.to_as2()
assert result == { assert result == {
@ -191,6 +197,15 @@ class TestEntitiesConvertToAS2:
} }
def test_post_to_as2__with_tags(self, activitypubpost_tags): def test_post_to_as2__with_tags(self, activitypubpost_tags):
activitypubpost_tags.rendered_content = '<h1>raw_content</h1>\n' \
'<p><a class="hashtag" ' \
'href="https://example.com/tag/foobar/" rel="noopener ' \
'noreferrer nofollow" ' \
'target="_blank">#<span>foobar</span></a>\n' \
'<a class="hashtag" ' \
'href="https://example.com/tag/barfoo/" rel="noopener ' \
'noreferrer nofollow" ' \
'target="_blank">#<span>barfoo</span></a></p>'
activitypubpost_tags.pre_send() activitypubpost_tags.pre_send()
result = activitypubpost_tags.to_as2() result = activitypubpost_tags.to_as2()
assert result == { assert result == {
@ -204,11 +219,11 @@ class TestEntitiesConvertToAS2:
'url': 'http://127.0.0.1:8000/post/123456/', 'url': 'http://127.0.0.1:8000/post/123456/',
'attributedTo': 'http://127.0.0.1:8000/profile/123456/', 'attributedTo': 'http://127.0.0.1:8000/profile/123456/',
'content': '<h1>raw_content</h1>\n' 'content': '<h1>raw_content</h1>\n'
'<p><a class="mention hashtag" ' '<p><a class="hashtag" '
'href="https://example.com/tag/foobar/" rel="noopener ' 'href="https://example.com/tag/foobar/" rel="noopener '
'noreferrer nofollow" ' 'noreferrer nofollow" '
'target="_blank">#<span>foobar</span></a>\n' 'target="_blank">#<span>foobar</span></a>\n'
'<a class="mention hashtag" ' '<a class="hashtag" '
'href="https://example.com/tag/barfoo/" rel="noopener ' 'href="https://example.com/tag/barfoo/" rel="noopener '
'noreferrer nofollow" ' 'noreferrer nofollow" '
'target="_blank">#<span>barfoo</span></a></p>', 'target="_blank">#<span>barfoo</span></a></p>',
@ -235,6 +250,7 @@ class TestEntitiesConvertToAS2:
} }
def test_post_to_as2__with_images(self, activitypubpost_images): def test_post_to_as2__with_images(self, activitypubpost_images):
activitypubpost_images.rendered_content = '<p>raw_content</p>'
activitypubpost_images.pre_send() activitypubpost_images.pre_send()
result = activitypubpost_images.to_as2() result = activitypubpost_images.to_as2()
assert result == { assert result == {
@ -274,6 +290,7 @@ class TestEntitiesConvertToAS2:
} }
def test_post_to_as2__with_diaspora_guid(self, activitypubpost_diaspora_guid): def test_post_to_as2__with_diaspora_guid(self, activitypubpost_diaspora_guid):
activitypubpost_diaspora_guid.rendered_content = '<p>raw_content</p>'
activitypubpost_diaspora_guid.pre_send() activitypubpost_diaspora_guid.pre_send()
result = activitypubpost_diaspora_guid.to_as2() result = activitypubpost_diaspora_guid.to_as2()
assert result == { assert result == {
@ -418,17 +435,6 @@ class TestEntitiesPostReceive:
"public": False, "public": False,
}] }]
@patch("federation.entities.activitypub.models.bleach.linkify", autospec=True)
def test_post_post_receive__linkifies_if_not_markdown(self, mock_linkify, activitypubpost):
activitypubpost._media_type = 'text/html'
activitypubpost.post_receive()
mock_linkify.assert_called_once()
@patch("federation.entities.activitypub.models.bleach.linkify", autospec=True)
def test_post_post_receive__skips_linkify_if_markdown(self, mock_linkify, activitypubpost):
activitypubpost.post_receive()
mock_linkify.assert_not_called()
class TestEntitiesPreSend: class TestEntitiesPreSend:
def test_post_inline_images_are_attached(self, activitypubpost_embedded_images): def test_post_inline_images_are_attached(self, activitypubpost_embedded_images):

Wyświetl plik

@ -4,6 +4,9 @@ from unittest.mock import patch, Mock, DEFAULT
import json import json
import pytest import pytest
from federation.entities.activitypub.models import Person
#from federation.entities.activitypub.entities import ( #from federation.entities.activitypub.entities import (
# models.Follow, models.Accept, models.Person, models.Note, models.Note, # models.Follow, models.Accept, models.Person, models.Note, models.Note,
# models.Delete, models.Announce) # models.Delete, models.Announce)
@ -70,9 +73,7 @@ class TestActivitypubEntityMappersReceive:
post = entities[0] post = entities[0]
assert isinstance(post, models.Note) assert isinstance(post, models.Note)
assert isinstance(post, Post) assert isinstance(post, Post)
assert post.raw_content == '<p><span class="h-card"><a class="u-url mention" ' \ assert post.raw_content == ''
'href="https://dev.jasonrobinson.me/u/jaywink/">' \
'@<span>jaywink</span></a></span> boom</p>'
assert post.rendered_content == '<p><span class="h-card"><a class="u-url mention" href="https://dev.jasonrobinson.me/u/jaywink/">' \ assert post.rendered_content == '<p><span class="h-card"><a class="u-url mention" href="https://dev.jasonrobinson.me/u/jaywink/">' \
'@<span>jaywink</span></a></span> boom</p>' '@<span>jaywink</span></a></span> boom</p>'
assert post.id == "https://diaspodon.fr/users/jaywink/statuses/102356911717767237" assert post.id == "https://diaspodon.fr/users/jaywink/statuses/102356911717767237"
@ -87,40 +88,44 @@ class TestActivitypubEntityMappersReceive:
post = entities[0] post = entities[0]
assert isinstance(post, models.Note) assert isinstance(post, models.Note)
assert isinstance(post, Post) assert isinstance(post, Post)
assert post.raw_content == '<p>boom #test</p>' assert post.raw_content == ''
assert post.rendered_content == '<p>boom <a class="mention hashtag" data-hashtag="test" href="https://mastodon.social/tags/test" rel="tag">#<span>test</span></a></p>'
# TODO: fix this test @patch("federation.entities.activitypub.models.get_profile_or_entity",
@pytest.mark.skip return_value=Person(finger="jaywink@dev3.jasonrobinson.me",url="https://dev3.jasonrobinson.me/u/jaywink/"))
def test_message_to_objects_simple_post__with_mentions(self): def test_message_to_objects_simple_post__with_mentions(self, mock_get):
entities = message_to_objects(ACTIVITYPUB_POST_WITH_MENTIONS, "https://mastodon.social/users/jaywink") entities = message_to_objects(ACTIVITYPUB_POST_WITH_MENTIONS, "https://mastodon.social/users/jaywink")
assert len(entities) == 1 assert len(entities) == 1
post = entities[0] post = entities[0]
assert isinstance(post, models.Note) assert isinstance(post, models.Note)
assert isinstance(post, Post) assert isinstance(post, Post)
assert len(post._mentions) == 1 assert len(post._mentions) == 1
assert list(post._mentions)[0] == "https://dev3.jasonrobinson.me/u/jaywink/" assert list(post._mentions)[0] == "jaywink@dev3.jasonrobinson.me"
def test_message_to_objects_simple_post__with_source__bbcode(self):
@patch("federation.entities.activitypub.models.get_profile_or_entity",
return_value=Person(finger="jaywink@dev.jasonrobinson.me",url="https://dev.jasonrobinson.me/u/jaywink/"))
def test_message_to_objects_simple_post__with_source__bbcode(self, mock_get):
entities = message_to_objects(ACTIVITYPUB_POST_WITH_SOURCE_BBCODE, "https://diaspodon.fr/users/jaywink") entities = message_to_objects(ACTIVITYPUB_POST_WITH_SOURCE_BBCODE, "https://diaspodon.fr/users/jaywink")
assert len(entities) == 1 assert len(entities) == 1
post = entities[0] post = entities[0]
assert isinstance(post, models.Note) assert isinstance(post, models.Note)
assert isinstance(post, Post) assert isinstance(post, Post)
assert post.rendered_content == '<p><span class="h-card"><a class="u-url mention" href="https://dev.jasonrobinson.me/u/jaywink/">' \ assert post.rendered_content == '<p><span class="h-card"><a class="u-url mention" data-mention="jaywink@dev.jasonrobinson.me" href="https://dev.jasonrobinson.me/u/jaywink/">' \
'@<span>jaywink</span></a></span> boom</p>'
assert post.raw_content == '<p><span class="h-card"><a class="u-url mention" ' \
'href="https://dev.jasonrobinson.me/u/jaywink/">' \
'@<span>jaywink</span></a></span> boom</p>' '@<span>jaywink</span></a></span> boom</p>'
assert post.raw_content == ''
def test_message_to_objects_simple_post__with_source__markdown(self): @patch("federation.entities.activitypub.models.get_profile_or_entity",
return_value=Person(finger="jaywink@dev.jasonrobinson.me",url="https://dev.robinson.me/u/jaywink/"))
def test_message_to_objects_simple_post__with_source__markdown(self, mock_get):
entities = message_to_objects(ACTIVITYPUB_POST_WITH_SOURCE_MARKDOWN, "https://diaspodon.fr/users/jaywink") entities = message_to_objects(ACTIVITYPUB_POST_WITH_SOURCE_MARKDOWN, "https://diaspodon.fr/users/jaywink")
assert len(entities) == 1 assert len(entities) == 1
post = entities[0] post = entities[0]
assert isinstance(post, models.Note) assert isinstance(post, models.Note)
assert isinstance(post, Post) assert isinstance(post, Post)
assert post.rendered_content == '<p><span class="h-card"><a href="https://dev.jasonrobinson.me/u/jaywink/" ' \ assert post.rendered_content == '<p><span class="h-card"><a class="u-url mention" ' \
'class="u-url mention">@<span>jaywink</span></a></span> boom</p>' 'href="https://dev.jasonrobinson.me/u/jaywink/">@<span>jaywink</span></a></span> boom</p>'
assert post.raw_content == "@jaywink boom" assert post.raw_content == "@jaywink@dev.jasonrobinson.me boom"
assert post.id == "https://diaspodon.fr/users/jaywink/statuses/102356911717767237" assert post.id == "https://diaspodon.fr/users/jaywink/statuses/102356911717767237"
assert post.actor_id == "https://diaspodon.fr/users/jaywink" assert post.actor_id == "https://diaspodon.fr/users/jaywink"
assert post.public is True assert post.public is True
@ -145,15 +150,18 @@ class TestActivitypubEntityMappersReceive:
assert photo.guid == "" assert photo.guid == ""
assert photo.handle == "" assert photo.handle == ""
def test_message_to_objects_comment(self): @patch("federation.entities.activitypub.models.get_profile_or_entity",
return_value=Person(finger="jaywink@dev.jasonrobinson.me", url="https://dev.jasonrobinson.me/u/jaywink/"))
def test_message_to_objects_comment(self, mock_get):
entities = message_to_objects(ACTIVITYPUB_COMMENT, "https://diaspodon.fr/users/jaywink") entities = message_to_objects(ACTIVITYPUB_COMMENT, "https://diaspodon.fr/users/jaywink")
assert len(entities) == 1 assert len(entities) == 1
comment = entities[0] comment = entities[0]
assert isinstance(comment, models.Note) assert isinstance(comment, models.Note)
assert isinstance(comment, Comment) assert isinstance(comment, Comment)
assert comment.raw_content == '<p><span class="h-card"><a class="u-url mention" ' \ assert comment.rendered_content == '<p><span class="h-card"><a class="u-url mention" data-mention="jaywink@dev.jasonrobinson.me" ' \
'href="https://dev.jasonrobinson.me/u/jaywink/">' \ 'href="https://dev.jasonrobinson.me/u/jaywink/">' \
'@<span>jaywink</span></a></span> boom</p>' '@<span>jaywink</span></a></span> boom</p>'
assert comment.raw_content == ''
assert comment.id == "https://diaspodon.fr/users/jaywink/statuses/102356911717767237" assert comment.id == "https://diaspodon.fr/users/jaywink/statuses/102356911717767237"
assert comment.actor_id == "https://diaspodon.fr/users/jaywink" assert comment.actor_id == "https://diaspodon.fr/users/jaywink"
assert comment.target_id == "https://dev.jasonrobinson.me/content/653bad70-41b3-42c9-89cb-c4ee587e68e4/" assert comment.target_id == "https://dev.jasonrobinson.me/content/653bad70-41b3-42c9-89cb-c4ee587e68e4/"

Wyświetl plik

@ -123,6 +123,7 @@ class TestShareEntity:
class TestRawContentMixin: class TestRawContentMixin:
@pytest.mark.skip
def test_rendered_content(self, post): def test_rendered_content(self, post):
assert post.rendered_content == """<p>One more test before sleep 😅 This time with an image.</p> assert post.rendered_content == """<p>One more test before sleep 😅 This time with an image.</p>
<p><img src="https://jasonrobinson.me/media/uploads/2020/12/27/1b2326c6-554c-4448-9da3-bdacddf2bb77.jpeg" alt=""></p>""" <p><img src="https://jasonrobinson.me/media/uploads/2020/12/27/1b2326c6-554c-4448-9da3-bdacddf2bb77.jpeg" alt=""></p>"""

Wyświetl plik

@ -30,6 +30,7 @@ def activitypubcomment():
with freeze_time("2019-04-27"): with freeze_time("2019-04-27"):
obj = models.Comment( obj = models.Comment(
raw_content="raw_content", raw_content="raw_content",
rendered_content="<p>raw_content</p>",
public=True, public=True,
provider_display_name="Socialhome", provider_display_name="Socialhome",
id=f"http://127.0.0.1:8000/post/123456/", id=f"http://127.0.0.1:8000/post/123456/",
@ -255,7 +256,8 @@ def profile():
inboxes={ inboxes={
"private": "https://example.com/bob/private", "private": "https://example.com/bob/private",
"public": "https://example.com/public", "public": "https://example.com/public",
}, public_key=PUBKEY, to=["https://www.w3.org/ns/activitystreams#Public"] }, public_key=PUBKEY, to=["https://www.w3.org/ns/activitystreams#Public"],
url="https://example.com/alice"
) )

Wyświetl plik

@ -35,7 +35,7 @@ ACTIVITYPUB_COMMENT = {
'contentMap': {'en': '<p><span class="h-card"><a class="u-url mention" href="https://dev.jasonrobinson.me/u/jaywink/">@<span>jaywink</span></a></span> boom</p>'}, 'contentMap': {'en': '<p><span class="h-card"><a class="u-url mention" href="https://dev.jasonrobinson.me/u/jaywink/">@<span>jaywink</span></a></span> boom</p>'},
'attachment': [], 'attachment': [],
'tag': [{'type': 'Mention', 'tag': [{'type': 'Mention',
'href': 'https://dev.jasonrobinson.me/p/d4574854-a5d7-42be-bfac-f70c16fcaa97/', 'href': 'https://dev.jasonrobinson.me/u/jaywink/',
'name': '@jaywink@dev.jasonrobinson.me'}], 'name': '@jaywink@dev.jasonrobinson.me'}],
'replies': {'id': 'https://diaspodon.fr/users/jaywink/statuses/102356911717767237/replies', 'replies': {'id': 'https://diaspodon.fr/users/jaywink/statuses/102356911717767237/replies',
'type': 'Collection', 'type': 'Collection',
@ -459,9 +459,9 @@ ACTIVITYPUB_POST_WITH_TAGS = {
'conversation': 'tag:diaspodon.fr,2019-06-28:objectId=2347687:objectType=Conversation', 'conversation': 'tag:diaspodon.fr,2019-06-28:objectId=2347687:objectType=Conversation',
'content': '<p>boom <a href="https://mastodon.social/tags/test" class="mention hashtag" rel="tag">#<span>test</span></a></p>', 'content': '<p>boom <a href="https://mastodon.social/tags/test" class="mention hashtag" rel="tag">#<span>test</span></a></p>',
'attachment': [], 'attachment': [],
'tag': [{'type': 'Mention', 'tag': [{'type': 'Hashtag',
'href': 'https://dev.jasonrobinson.me/p/d4574854-a5d7-42be-bfac-f70c16fcaa97/', 'href': 'https://mastodon.social/tags/test',
'name': '@jaywink@dev.jasonrobinson.me'}], 'name': '#test'}],
'replies': {'id': 'https://diaspodon.fr/users/jaywink/statuses/102356911717767237/replies', 'replies': {'id': 'https://diaspodon.fr/users/jaywink/statuses/102356911717767237/replies',
'type': 'Collection', 'type': 'Collection',
'first': {'type': 'CollectionPage', 'first': {'type': 'CollectionPage',
@ -552,13 +552,13 @@ ACTIVITYPUB_POST_WITH_SOURCE_MARKDOWN = {
'conversation': 'tag:diaspodon.fr,2019-06-28:objectId=2347687:objectType=Conversation', 'conversation': 'tag:diaspodon.fr,2019-06-28:objectId=2347687:objectType=Conversation',
'content': '<p><span class="h-card"><a href="https://dev.jasonrobinson.me/u/jaywink/" class="u-url mention">@<span>jaywink</span></a></span> boom</p>', 'content': '<p><span class="h-card"><a href="https://dev.jasonrobinson.me/u/jaywink/" class="u-url mention">@<span>jaywink</span></a></span> boom</p>',
'source': { 'source': {
'content': "@jaywink boom", 'content': "@{jaywink@dev.jasonrobinson.me} boom",
'mediaType': "text/markdown", 'mediaType': "text/markdown",
}, },
'contentMap': {'en': '<p><span class="h-card"><a href="https://dev.jasonrobinson.me/u/jaywink/" class="u-url mention">@<span>jaywink</span></a></span> boom</p>'}, 'contentMap': {'en': '<p><span class="h-card"><a href="https://dev.jasonrobinson.me/u/jaywink/" class="u-url mention">@<span>jaywink</span></a></span> boom</p>'},
'attachment': [], 'attachment': [],
'tag': [{'type': 'Mention', 'tag': [{'type': 'Mention',
'href': 'https://dev.jasonrobinson.me/p/d4574854-a5d7-42be-bfac-f70c16fcaa97/', 'href': 'https://dev.jasonrobinson.me/u/jaywink/',
'name': '@jaywink@dev.jasonrobinson.me'}], 'name': '@jaywink@dev.jasonrobinson.me'}],
'replies': {'id': 'https://diaspodon.fr/users/jaywink/statuses/102356911717767237/replies', 'replies': {'id': 'https://diaspodon.fr/users/jaywink/statuses/102356911717767237/replies',
'type': 'Collection', 'type': 'Collection',
@ -612,7 +612,7 @@ ACTIVITYPUB_POST_WITH_SOURCE_BBCODE = {
'contentMap': {'en': '<p><span class="h-card"><a class="u-url mention" href="https://dev.jasonrobinson.me/u/jaywink/">@<span>jaywink</span></a></span> boom</p>'}, 'contentMap': {'en': '<p><span class="h-card"><a class="u-url mention" href="https://dev.jasonrobinson.me/u/jaywink/">@<span>jaywink</span></a></span> boom</p>'},
'attachment': [], 'attachment': [],
'tag': [{'type': 'Mention', 'tag': [{'type': 'Mention',
'href': 'https://dev.jasonrobinson.me/p/d4574854-a5d7-42be-bfac-f70c16fcaa97/', 'href': 'https://dev.jasonrobinson.me/u/jaywink/',
'name': '@jaywink@dev.jasonrobinson.me'}], 'name': '@jaywink@dev.jasonrobinson.me'}],
'replies': {'id': 'https://diaspodon.fr/users/jaywink/statuses/102356911717767237/replies', 'replies': {'id': 'https://diaspodon.fr/users/jaywink/statuses/102356911717767237/replies',
'type': 'Collection', 'type': 'Collection',

Wyświetl plik

@ -60,7 +60,7 @@ class TestRetrieveAndParseDocument:
entity = retrieve_and_parse_document("https://example.com/foobar") entity = retrieve_and_parse_document("https://example.com/foobar")
assert isinstance(entity, Follow) assert isinstance(entity, Follow)
@patch("federation.entities.activitypub.models.extract_receivers", return_value=[]) @patch("federation.entities.activitypub.models.get_profile_or_entity", return_value=None)
@patch("federation.utils.activitypub.fetch_document", autospec=True, return_value=( @patch("federation.utils.activitypub.fetch_document", autospec=True, return_value=(
json.dumps(ACTIVITYPUB_POST_OBJECT), None, None), json.dumps(ACTIVITYPUB_POST_OBJECT), None, None),
) )
@ -80,7 +80,7 @@ class TestRetrieveAndParseDocument:
"/foobar.jpg" "/foobar.jpg"
@patch("federation.entities.activitypub.models.verify_ld_signature", return_value=None) @patch("federation.entities.activitypub.models.verify_ld_signature", return_value=None)
@patch("federation.entities.activitypub.models.extract_receivers", return_value=[]) @patch("federation.entities.activitypub.models.get_profile_or_entity", return_value=None)
@patch("federation.utils.activitypub.fetch_document", autospec=True, return_value=( @patch("federation.utils.activitypub.fetch_document", autospec=True, return_value=(
json.dumps(ACTIVITYPUB_POST), None, None), json.dumps(ACTIVITYPUB_POST), None, None),
) )

Wyświetl plik

@ -1,4 +1,6 @@
from federation.utils.text import decode_if_bytes, encode_if_text, validate_handle, process_text_links, find_tags import pytest
from federation.utils.text import decode_if_bytes, encode_if_text, validate_handle, find_tags
def test_decode_if_bytes(): def test_decode_if_bytes():
@ -18,107 +20,49 @@ class TestFindTags:
def test_all_tags_are_parsed_from_text(self): def test_all_tags_are_parsed_from_text(self):
source = "#starting and #MixED with some #line\nendings also tags can\n#start on new line" source = "#starting and #MixED with some #line\nendings also tags can\n#start on new line"
tags, text = find_tags(source) tags = find_tags(source)
assert tags == {"starting", "mixed", "line", "start"} assert tags == {"starting", "mixed", "line", "start"}
assert text == source
tags, text = find_tags(source, replacer=self._replacer)
assert text == "#starting/starting and #MixED/mixed with some #line/line\nendings also tags can\n" \
"#start/start on new line"
def test_code_block_tags_ignored(self): def test_code_block_tags_ignored(self):
source = "foo\n```\n#code\n```\n#notcode\n\n #alsocode\n" source = "foo\n```\n#code\n```\n#notcode\n\n #alsocode\n"
tags, text = find_tags(source) tags = find_tags(source)
assert tags == {"notcode"} assert tags == {"notcode"}
assert text == source
tags, text = find_tags(source, replacer=self._replacer)
assert text == "foo\n```\n#code\n```\n#notcode/notcode\n\n #alsocode\n"
def test_endings_are_filtered_out(self): def test_endings_are_filtered_out(self):
source = "#parenthesis) #exp! #list] *#doh* _#bah_ #gah% #foo/#bar" source = "#parenthesis) #exp! #list] *#doh* _#bah_ #gah% #foo/#bar"
tags, text = find_tags(source) tags = find_tags(source)
assert tags == {"parenthesis", "exp", "list", "doh", "bah", "gah", "foo", "bar"} assert tags == {"parenthesis", "exp", "list", "doh", "bah", "gah", "foo", "bar"}
assert text == source
tags, text = find_tags(source, replacer=self._replacer)
assert text == "#parenthesis/parenthesis) #exp/exp! #list/list] *#doh/doh* _#bah/bah_ #gah/gah% " \
"#foo/foo/#bar/bar"
def test_finds_tags(self): def test_finds_tags(self):
source = "#post **Foobar** #tag #OtherTag #third\n#fourth" source = "#post **Foobar** #tag #OtherTag #third\n#fourth"
tags, text = find_tags(source) tags = find_tags(source)
assert tags == {"third", "fourth", "post", "othertag", "tag"} assert tags == {"third", "fourth", "post", "othertag", "tag"}
assert text == source
tags, text = find_tags(source, replacer=self._replacer)
assert text == "#post/post **Foobar** #tag/tag #OtherTag/othertag #third/third\n#fourth/fourth"
def test_ok_with_html_tags_in_text(self): def test_ok_with_html_tags_in_text(self):
source = "<p>#starting and <span>#MixED</span> however not <#>this</#> or <#/>that" source = "<p>#starting and <span>#MixED</span> however not <#>this</#> or <#/>that"
tags, text = find_tags(source) tags = find_tags(source)
assert tags == {"starting", "mixed"} assert tags == {"starting", "mixed"}
assert text == source
tags, text = find_tags(source, replacer=self._replacer)
assert text == "<p>#starting/starting and <span>#MixED/mixed</span> however not <#>this</#> or <#/>that"
def test_postfixed_tags(self): def test_postfixed_tags(self):
source = "#foo) #bar] #hoo, #hee." source = "#foo) #bar] #hoo, #hee."
tags, text = find_tags(source) tags = find_tags(source)
assert tags == {"foo", "bar", "hoo", "hee"} assert tags == {"foo", "bar", "hoo", "hee"}
assert text == source
tags, text = find_tags(source, replacer=self._replacer)
assert text == "#foo/foo) #bar/bar] #hoo/hoo, #hee/hee."
def test_prefixed_tags(self): def test_prefixed_tags(self):
source = "(#foo [#bar" source = "(#foo [#bar"
tags, text = find_tags(source) tags = find_tags(source)
assert tags == {"foo", "bar"} assert tags == {"foo", "bar"}
assert text == source
tags, text = find_tags(source, replacer=self._replacer)
assert text == "(#foo/foo [#bar/bar"
def test_invalid_text_returns_no_tags(self): def test_invalid_text_returns_no_tags(self):
source = "#a!a #a#a #a$a #a%a #a^a #a&a #a*a #a+a #a.a #a,a #a@a #a£a #a(a #a)a #a=a " \ source = "#a!a #a#a #a$a #a%a #a^a #a&a #a*a #a+a #a.a #a,a #a@a #a£a #a(a #a)a #a=a " \
"#a?a #a`a #a'a #a\\a #a{a #a[a #a]a #a}a #a~a #a;a #a:a #a\"a #aa #a”a #\xa0cd" "#a?a #a`a #a'a #a\\a #a{a #a[a #a]a #a}a #a~a #a;a #a:a #a\"a #aa #a”a #\xa0cd"
tags, text = find_tags(source) tags = find_tags(source)
assert tags == set() assert tags == {'a'}
assert text == source
tags, text = find_tags(source, replacer=self._replacer)
assert text == source
def test_start_of_paragraph_in_html_content(self): def test_start_of_paragraph_in_html_content(self):
source = '<p>First line</p><p>#foobar #barfoo</p>' source = '<p>First line</p><p>#foobar #barfoo</p>'
tags, text = find_tags(source) tags = find_tags(source)
assert tags == {"foobar", "barfoo"} assert tags == {"foobar", "barfoo"}
assert text == source
tags, text = find_tags(source, replacer=self._replacer)
assert text == '<p>First line</p><p>#foobar/foobar #barfoo/barfoo</p>'
class TestProcessTextLinks:
def test_link_at_start_or_end(self):
assert process_text_links('https://example.org example.org\nhttp://example.org') == \
'<a href="https://example.org" rel="nofollow" target="_blank">https://example.org</a> ' \
'<a href="http://example.org" rel="nofollow" target="_blank">example.org</a>\n' \
'<a href="http://example.org" rel="nofollow" target="_blank">http://example.org</a>'
def test_existing_links_get_attrs_added(self):
assert process_text_links('<a href="https://example.org">https://example.org</a>') == \
'<a href="https://example.org" rel="nofollow" target="_blank">https://example.org</a>'
def test_code_sections_are_skipped(self):
assert process_text_links('<code>https://example.org</code><code>\nhttps://example.org\n</code>') == \
'<code>https://example.org</code><code>\nhttps://example.org\n</code>'
def test_emails_are_skipped(self):
assert process_text_links('foo@example.org') == 'foo@example.org'
def test_does_not_add_target_blank_if_link_is_internal(self):
assert process_text_links('<a href="/streams/tag/foobar">#foobar</a>') == \
'<a href="/streams/tag/foobar">#foobar</a>'
def test_does_not_remove_mention_classes(self):
assert process_text_links('<p><span class="h-card"><a class="u-url mention" href="https://dev.jasonrobinson.me/u/jaywink/">'
'@<span>jaywink</span></a></span> boom</p>') == \
'<p><span class="h-card"><a class="u-url mention" href="https://dev.jasonrobinson.me/u/jaywink/" ' \
'rel="nofollow" target="_blank">@<span>jaywink</span></a></span> boom</p>'
def test_validate_handle(): def test_validate_handle():

Wyświetl plik

@ -1,12 +1,18 @@
import re import re
from typing import Set, Tuple from typing import Set, List
from urllib.parse import urlparse from urllib.parse import urlparse
import bleach from bs4 import BeautifulSoup
from bleach import callbacks from bs4.element import NavigableString
from commonmark import commonmark
ILLEGAL_TAG_CHARS = "!#$%^&*+.,@£/()=?`'\\{[]}~;:\"’”—\xa0" ILLEGAL_TAG_CHARS = "!#$%^&*+.,@£/()=?`'\\{[]}~;:\"’”—\xa0"
TAG_PATTERN = re.compile(r'(#[\w\-]+)([)\]_!?*%/.,;\s]+\s*|\Z)', re.UNICODE)
# This will match non matching braces. I don't think it's an issue.
MENTION_PATTERN = re.compile(r'(@\{?(?:[\w\-. \u263a-\U0001f645]*; *)?[\w]+@[\w\-.]+\.[\w]+}?)', re.UNICODE)
# based on https://stackoverflow.com/a/6041965
URL_PATTERN = re.compile(r'((?:(?:https?|ftp)://|^|(?<=[("<\s]))+(?:[\w\-]+(?:(?:\.[\w\-]+)+))(?:[\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-]))',
re.UNICODE)
def decode_if_bytes(text): def decode_if_bytes(text):
try: try:
@ -22,67 +28,38 @@ def encode_if_text(text):
return text return text
def find_tags(text: str, replacer: callable = None) -> Tuple[Set, str]: def find_tags(text: str) -> Set[str]:
"""Find tags in text. """Find tags in text.
Tries to ignore tags inside code blocks. Ignore tags inside code blocks.
Optionally, if passed a "replacer", will also replace the tag word with the result Returns a set of tags.
of the replacer function called with the tag word.
Returns a set of tags and the original or replaced text.
""" """
found_tags = set() tags = find_elements(BeautifulSoup(commonmark(text, ignore_html_blocks=True), 'html.parser'),
# <br> and <p> tags cause issues in us finding words - add some spacing around them TAG_PATTERN)
new_text = text.replace("<br>", " <br> ").replace("<p>", " <p> ").replace("</p>", " </p> ") return set([tag.text.lstrip('#').lower() for tag in tags])
lines = new_text.splitlines(keepends=True)
final_lines = []
code_block = False def find_elements(soup: BeautifulSoup, pattern: re.Pattern) -> List[NavigableString]:
final_text = None """
# Check each line separately Split a BeautifulSoup tree strings according to a pattern, replacing each element
for line in lines: with a NavigableString. The returned list can be used to linkify the found
final_words = [] elements.
if line[0:3] == "```":
code_block = not code_block :param soup: BeautifulSoup instance of the content being searched
if line.find("#") == -1 or line[0:4] == " " or code_block: :param pattern: Compiled regular expression defined using a single group
# Just add the whole line :return: A NavigableString list attached to the original soup
final_lines.append(line) """
continue final = []
# Check each word separately for candidate in soup.find_all(string=True):
words = line.split(" ") if candidate.parent.name == 'code': continue
for word in words: ns = [NavigableString(r) for r in pattern.split(candidate.text) if r]
if word.find('#') > -1: found = [s for s in ns if pattern.match(s.text)]
candidate = word.strip().strip("([]),.!?:*_%/") if found:
if candidate.find('<') > -1 or candidate.find('>') > -1: candidate.replace_with(*ns)
# Strip html final.extend(found)
candidate = bleach.clean(word, strip=True) return final
# Now split with slashes
candidates = candidate.split("/")
to_replace = []
for candidate in candidates:
if candidate.startswith("#"):
candidate = candidate.strip("#")
if test_tag(candidate.lower()):
found_tags.add(candidate.lower())
to_replace.append(candidate)
if replacer:
tag_word = word
try:
for counter, replacee in enumerate(to_replace, 1):
tag_word = tag_word.replace("#%s" % replacee, replacer(replacee))
except Exception:
pass
final_words.append(tag_word)
else:
final_words.append(word)
else:
final_words.append(word)
final_lines.append(" ".join(final_words))
if replacer:
final_text = "".join(final_lines)
if final_text:
final_text = final_text.replace(" <br> ", "<br>").replace(" <p> ", "<p>").replace(" </p> ", "</p>")
return found_tags, final_text or text
def get_path_from_url(url: str) -> str: def get_path_from_url(url: str) -> str:
@ -93,28 +70,6 @@ def get_path_from_url(url: str) -> str:
return parsed.path return parsed.path
def process_text_links(text):
"""Process links in text, adding some attributes and linkifying textual links."""
link_callbacks = [callbacks.nofollow, callbacks.target_blank]
def link_attributes(attrs, new=False):
"""Run standard callbacks except for internal links."""
href_key = (None, "href")
if attrs.get(href_key).startswith("/"):
return attrs
# Run the standard callbacks
for callback in link_callbacks:
attrs = callback(attrs, new)
return attrs
return bleach.linkify(
text,
callbacks=[link_attributes],
parse_email=False,
skip_tags=["code"],
)
def test_tag(tag: str) -> bool: def test_tag(tag: str) -> bool:
"""Test a word whether it could be accepted as a tag.""" """Test a word whether it could be accepted as a tag."""