kopia lustrzana https://github.com/thinkst/zippy
Small mods to the Nim source
Signed-off-by: Jacob Torrey <jacob@thinkst.com>pull/6/head
rodzic
ca3b13a790
commit
78da12adb4
|
@ -8,7 +8,7 @@ its training data to calculate the probability of each word given the preceeding
|
|||
the more high-probability tokens are more likely to be AI-originated. Techniques and tools in this repo are looking for
|
||||
faster approximation to be embeddable and more scalable.
|
||||
|
||||
## LZMA compression detector (`lzma_detect.py`)
|
||||
## LZMA compression detector (`lzma_detect.py` and `nlzmadetect`)
|
||||
|
||||
This is the first attempt, using the LZMA compression ratios as a way to indirectly measure the perplexity of a text.
|
||||
Compression ratios have been used in the past to [detect anomalies in network data](http://owncloud.unsri.ac.id/journal/security/ontheuse_compression_Network_anomaly_detec.pdf)
|
||||
|
|
|
@ -0,0 +1,10 @@
|
|||
# Nim package to classify test as LLM-generated
|
||||
|
||||
This is a nim version of the LZMA detector written in Python.
|
||||
|
||||
## Instructions
|
||||
Build with `nimble build` optionally passing `-d:release` for more optimized output.
|
||||
|
||||
Run `./nlzmadetect` with a filename to check (or multiple)
|
||||
|
||||
Test against the samples repository with `nimble test`
|
|
@ -5,10 +5,10 @@ import strutils
|
|||
when isMainModule:
|
||||
import std/[parseopt, os]
|
||||
|
||||
const PRELUDE_FILE = "../ai-generated.txt"
|
||||
const PRELUDE_FILE = "../../ai-generated.txt"
|
||||
const COMPRESSION_PRESET = 2.int32
|
||||
const SHORT_SAMPLE_THRESHOLD = 350
|
||||
var PRELUDE_STR = readFile(PRELUDE_FILE).convert("us-ascii", "UTF-8").replace(re"[^\x00-\x7F]")
|
||||
const PRELUDE_STR = staticRead(PRELUDE_FILE)
|
||||
|
||||
proc compress_str(s : string, preset = COMPRESSION_PRESET): float64
|
||||
|
||||
|
|
Ładowanie…
Reference in New Issue