Update README.md

pull/6/head
Jacob Torrey 2023-05-12 17:17:31 -06:00
rodzic 78da12adb4
commit 0c9a712ad3
1 zmienionych plików z 3 dodań i 2 usunięć

Wyświetl plik

@ -1,7 +1,5 @@
# ai-detect: Fast methods to classify text as AI or human-generated
[![Classifiation accuracy testing](https://github.com/Tail-Pipe/ai-detect/actions/workflows/pytest.yml/badge.svg)](https://github.com/Tail-Pipe/ai-detect/actions/workflows/pytest.yml)
This is a research repo for fast AI detection methods as we experiment with different techniques.
While there are a number of existing LLM detection systems, they all use a large model trained on either an LLM or
its training data to calculate the probability of each word given the preceeding, then calculating a score where
@ -10,6 +8,9 @@ faster approximation to be embeddable and more scalable.
## LZMA compression detector (`lzma_detect.py` and `nlzmadetect`)
[![Python classifiation accuracy testing](https://github.com/Tail-Pipe/ai-detect/actions/workflows/pytest.yml/badge.svg)](https://github.com/Tail-Pipe/ai-detect/actions/workflows/pytest.yml)
[![Nim classification accuracy testing](https://github.com/Tail-Pipe/ai-detect/actions/workflows/nimtest.yml/badge.svg)](https://github.com/Tail-Pipe/ai-detect/actions/workflows/nimtest.yml)
This is the first attempt, using the LZMA compression ratios as a way to indirectly measure the perplexity of a text.
Compression ratios have been used in the past to [detect anomalies in network data](http://owncloud.unsri.ac.id/journal/security/ontheuse_compression_Network_anomaly_detec.pdf)
for intrusion detection, so if perplexity is roughly a measure of anomalous tokens, it may be possible to use compression to detect low-perplexity text.