Wykres commitów

76 Commity (dbb76f3618db54d945b2bab7ee2a94b45c1a623b)

Autor SHA1 Wiadomość Data
dgtlmoon 4ae27af511
Code cleanup - Browser Steps 2023-10-28 14:58:12 +02:00
dgtlmoon e1860549dc
Fetching - Browser Step enabled watches should also identify 404/non-200 status situations (#1907) 2023-10-28 14:37:42 +02:00
dgtlmoon 349111eb35
Fetching/BrowserSteps - Going to a page was using slightly logic to the main way - make them use the same methods (#1890) 2023-10-26 20:19:22 +02:00
Marcelo Alencar 0aef5483d9
Upgrade selenium to 4.14.0 (latest) (#1783) 2023-10-26 10:09:03 +02:00
dgtlmoon 7debccca73
Fetching - Clarifying how fetchers work with SOCKS5 proxies 2023-10-09 16:57:30 +02:00
dgtlmoon e30b17b8bc
UI + Fetching - Be more helpful when a filter contains no text, suggest ways to deal with images in filters (#1819) 2023-09-26 13:59:59 +02:00
dgtlmoon 57de4ffe4f
Page fetching - Fixed possible incorrect browser user-agent header in playwright/puppeteer/browserless fetchers (#1811) 2023-09-24 08:42:24 +02:00
dgtlmoon 7cb7eebbc5 Browser Steps - When cleaning up old screenshots, check the file exists 2023-07-11 10:44:54 +02:00
dgtlmoon f9387522ee
Fetching - Be sure that content-type detection works when the headers are a mixed case (#1604) 2023-05-29 16:11:43 +02:00
dgtlmoon 1aeafef910
Fetcher - Puppeteer experimental fetcher wasn't returning the status-code (#1585) 2023-05-21 23:10:08 +02:00
dgtlmoon e4f6d54ae2 BrowserSteps - Refactored to re-use playwright context which should solve some errors 2023-05-12 15:38:55 +02:00
dgtlmoon d939882dde
Fetcher - Experimental fetcher improvements (Code TidyUp, Improve tests, revert to old playwright when using BrowserSteps for now) (#1564) 2023-05-11 16:36:35 +02:00
dgtlmoon 5325918f29
Puppeteer fetcher, adding disk cache and other fixes (#1563) 2023-05-10 23:23:34 +02:00
dgtlmoon 316f28a0f2
Fetcher - Experimental fetcher fixes, now only enabled with 'USE_EXPERIMENTAL_PUPPETEER_FETCH' env var (default off) (#1561) 2023-05-07 13:49:53 +02:00
dgtlmoon 94f38f052e
Fetcher - playwright/browserless - Use builtin node puppeteer handler in browserless, scales way better, and is faster (#1559) 2023-05-05 21:58:08 +02:00
dgtlmoon 6e71088cde New feature - Restock / stock / out of stock monitor option/mode 2023-03-18 20:36:26 +01:00
dgtlmoon 41856c4ed8
Re #1365 - Playwright - Browser "Service Workers" should be enabled by default but unset via env var PLAYWRIGHT_SERVICE_WORKERS=block (#1367) 2023-02-01 20:50:40 +01:00
dgtlmoon d47a25eb6d
Playwright - Removing old bug fix where playwright needed screenshot called twice to make the full screen screenshot be actually fullscreen (#1356) 2023-01-28 15:02:53 +01:00
dgtlmoon fcfd1b5e10
Ability to configure extra proxies via the UI (#1235) 2022-12-19 21:48:01 +01:00
dgtlmoon 13c4121f52
PDF File change detection - Initial PDF fetcher support with basic text extraction (#1244) 2022-12-19 17:51:41 +01:00
dgtlmoon 0c380c170f
Playwright - Better error reporting and re-try fetch on fail once (#1238) 2022-12-16 18:06:14 +01:00
dgtlmoon b76148a0f4
Fetcher - CPU usage - Skip processing if the previous checksum and the just fetched one was the same (#925) 2022-12-14 15:08:34 +01:00
dgtlmoon 93cc30437f
Playwright+BrowserSteps - Fetch changes - Fetch simply after page starts rendering + delay seconds, disable service workers 2022-12-14 12:16:04 +01:00
dgtlmoon 69756f20f2 VisualSelector & BrowserSteps - Scraper improvements, remove duplicate code 2022-11-25 10:45:38 +01:00
dgtlmoon fde7b3fd97 Remove dupe xpath finder prep code 2022-11-25 09:25:05 +01:00
dgtlmoon 5b530ff61c
Configurable "Browser Steps" when Playwright/Chrome is configured (enter text, scroll, wait for text, click button etc) (#478) 2022-11-24 20:53:01 +01:00
dgtlmoon df6e835035
Make VisualSelector show first available multiple selector, refactor to make more maintainable (#1132) 2022-11-17 11:52:48 +01:00
dgtlmoon 359fc48fb4
Filters can now accept a list/multiple filters (#1064) #623 2022-11-03 12:13:54 +01:00
dgtlmoon 669fd3ae0b
Dont use default Requests `user-agent` and `accept` headers in playwright+selenium requests, breaks sites such as united.com. (#1004) 2022-10-09 18:25:36 +02:00
dgtlmoon 3ebb2ab9ba Selenium fetcher - screenshot should be taken after 'wait' time, not before #873 2022-09-25 11:05:07 +02:00
dgtlmoon 3705ce6681 Render Extract Configurable Delay Seconds should also apply after executing any JS #958 2022-09-24 23:48:03 +02:00
dgtlmoon f7ea99412f Re #958 - remove change screensize, should be in 1280x720 default, was causing "Unable to retrieve content because the page is navigating and changing the content." on some sites 2022-09-19 14:02:32 +02:00
dgtlmoon 1193a7f22c Playwright - Support proxy auth mechanisms (#859) 2022-08-18 09:46:28 +02:00
dgtlmoon e461c0b819
Playwright fetcher didn't report low level HTTP errors correctly (like Connection Refused) (#852) 2022-08-17 13:25:08 +02:00
dgtlmoon 9942107016
Massive improvements to error handling - show separate output for non HTTP 200 status replies 2022-08-15 18:56:53 +02:00
dgtlmoon 1eb5726cbf Execute JS should happen after waiting seconds 2022-08-15 11:27:04 +02:00
dgtlmoon e6173357a9 Visual Selector direct element finder fix 2022-07-28 09:19:10 +02:00
dgtlmoon fae1164c0b
Ability to specify JS before running change-detection (#744) 2022-07-10 13:56:01 +02:00
dgtlmoon 169c293143 Playwright - log console errors to output 2022-07-10 13:55:29 +02:00
dgtlmoon 6553980cd5
Playwright - Use HTTP Request Headers override (Cookie, etc) 2022-06-25 23:42:48 +02:00
dgtlmoon 4a91505af5 Playwright screenshots - no need for high-res "bug workaround" screenshot, use lower quality/faster configurable image quality env var 2022-06-15 10:52:24 +02:00
dgtlmoon 82b900fbf4 Give more helpful error message when a page doesnt load 2022-06-14 08:16:22 +02:00
dgtlmoon 358a365303 Tweaks to playwright fetch code - better timeout handling 2022-06-13 23:39:43 +02:00
dgtlmoon 8294519f43 Content fetcher - Handle when a page doesnt load properly 2022-06-01 13:12:37 +02:00
dgtlmoon 8ba8a220b6 Playwright - Correctly close browser context/sessions on exceptions 2022-06-01 12:59:44 +02:00
dgtlmoon 5cefb16e52 Minor code cleanup 2022-05-25 15:38:40 +02:00
dgtlmoon 341ae24b73 Re #616 - content trigger - adding extra test (#620) 2022-05-25 15:31:59 +02:00
dgtlmoon 9d742446ab Playwright - ByPass CSP for more reliable JS scraping, disable accept downloads 2022-05-25 11:05:18 +02:00
dgtlmoon e3e022b0f4 VisualSelector - Better handling of filter targets that are no longer available in the HTML 2022-05-25 10:23:43 +02:00
dgtlmoon 7983675325 Visual Selector - be more resilient when sites interfere with the xPath scraping 2022-05-24 00:10:38 +02:00