Small mods to the Nim source

Signed-off-by: Jacob Torrey <jacob@thinkst.com>
pull/6/head
Jacob Torrey 2023-05-12 17:14:54 -06:00
rodzic ca3b13a790
commit 78da12adb4
3 zmienionych plików z 13 dodań i 3 usunięć

Wyświetl plik

@ -8,7 +8,7 @@ its training data to calculate the probability of each word given the preceeding
the more high-probability tokens are more likely to be AI-originated. Techniques and tools in this repo are looking for
faster approximation to be embeddable and more scalable.
## LZMA compression detector (`lzma_detect.py`)
## LZMA compression detector (`lzma_detect.py` and `nlzmadetect`)
This is the first attempt, using the LZMA compression ratios as a way to indirectly measure the perplexity of a text.
Compression ratios have been used in the past to [detect anomalies in network data](http://owncloud.unsri.ac.id/journal/security/ontheuse_compression_Network_anomaly_detec.pdf)

Wyświetl plik

@ -0,0 +1,10 @@
# Nim package to classify test as LLM-generated
This is a nim version of the LZMA detector written in Python.
## Instructions
Build with `nimble build` optionally passing `-d:release` for more optimized output.
Run `./nlzmadetect` with a filename to check (or multiple)
Test against the samples repository with `nimble test`

Wyświetl plik

@ -5,10 +5,10 @@ import strutils
when isMainModule:
import std/[parseopt, os]
const PRELUDE_FILE = "../ai-generated.txt"
const PRELUDE_FILE = "../../ai-generated.txt"
const COMPRESSION_PRESET = 2.int32
const SHORT_SAMPLE_THRESHOLD = 350
var PRELUDE_STR = readFile(PRELUDE_FILE).convert("us-ascii", "UTF-8").replace(re"[^\x00-\x7F]")
const PRELUDE_STR = staticRead(PRELUDE_FILE)
proc compress_str(s : string, preset = COMPRESSION_PRESET): float64