Small mods to the Nim source

Signed-off-by: Jacob Torrey <jacob@thinkst.com>
2023-05-12 17:14:54 -06:00 · 2023-05-12 17:14:54 -06:00 · 78da12adb4
commit 78da12adb4
--- a/README.md
+++ b/README.md
@ -8,7 +8,7 @@ its training data to calculate the probability of each word given the preceeding
 the more high-probability tokens are more likely to be AI-originated. Techniques and tools in this repo are looking for
 faster approximation to be embeddable and more scalable.

-## LZMA compression detector (`lzma_detect.py`)
+## LZMA compression detector (`lzma_detect.py` and `nlzmadetect`)

 This is the first attempt, using the LZMA compression ratios as a way to indirectly measure the perplexity of a text.
 Compression ratios have been used in the past to [detect anomalies in network data](http://owncloud.unsri.ac.id/journal/security/ontheuse_compression_Network_anomaly_detec.pdf)
--- a/nlzmadetect/README.md
+++ b/nlzmadetect/README.md
@ -0,0 +1,10 @@
+# Nim package to classify test as LLM-generated
+
+This is a nim version of the LZMA detector written in Python. 
+
+## Instructions
+Build with `nimble build` optionally passing `-d:release` for more optimized output.
+
+Run `./nlzmadetect` with a filename to check (or multiple)
+
+Test against the samples repository with `nimble test`
--- a/nlzmadetect/src/nlzmadetect.nim
+++ b/nlzmadetect/src/nlzmadetect.nim
@ -5,10 +5,10 @@ import strutils
 when isMainModule:
  import std/[parseopt, os]

-const PRELUDE_FILE = "../ai-generated.txt"
+const PRELUDE_FILE = "../../ai-generated.txt"
 const COMPRESSION_PRESET = 2.int32
 const SHORT_SAMPLE_THRESHOLD = 350
-var PRELUDE_STR = readFile(PRELUDE_FILE).convert("us-ascii", "UTF-8").replace(re"[^\x00-\x7F]")
+const PRELUDE_STR = staticRead(PRELUDE_FILE)

 proc compress_str(s : string, preset = COMPRESSION_PRESET): float64