From c72ea4c6ca60e4974a05b8fff0827532bec481a6 Mon Sep 17 00:00:00 2001 From: Slatian Date: Sat, 19 Nov 2022 01:11:31 +0100 Subject: [PATCH] Improved heuristics for enlish language text to skip over most fluff paragraphs to get better samples of sites --- data/heuristics.txt | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/data/heuristics.txt b/data/heuristics.txt index 99d2993..d6f1c38 100644 --- a/data/heuristics.txt +++ b/data/heuristics.txt @@ -8,3 +8,16 @@ last edit (c) all rights reserved licensed under +subscribe +| +generated by +powered by +this post was +click here for +click here to +published on: +published: +posted: +note: +share this article +estimated read time