OpenAI tool to detect AI generated content

OpenAI has launched an “AI Classifier” tool that attempts to distinguish between human-written and AI-generated text.

The classifier has been trained to distinguish between text written by a human and text written by AIs from a variety of providers including ChatGPT and GPT3. The current classifier is not fully reliable though. OpenAI says it is impossible to reliably detect all AI-written text and currently the classifier identifies 26% of AI-written text correctly. But they argue that, when this tool is used in tandem with other methods, it would benefit preventing AI text generators from being abused.

Each document submitted is classified into one of five classes:

Very unlikely AI-generated,
Unlikely AI-generated,
Unclear if it is AI-generated,
Possibly AI-generated, or
Likely AI-generated.

As the classifier still has a number of limitations to be sorted out, it should be used as a complement to other methods of determining the source of text, instead of being the primary decision-making tool as per OpenAI’s blog.

OpenAI will continue their work on the detection of AI-generated text, and they will get to know whether tools like this are useful, based on the current version released.

Here are some of the current limitations listed by OpenAI

The classifier is very unreliable on short texts (below 1,000 characters). Even longer texts are sometimes incorrectly labeled by the classifier.
Sometimes human-written text will be incorrectly but confidently labeled as AI-written by our classifier.
We recommend using the classifier only for English text. It performs significantly worse in other languages and it is unreliable on code.
Text that is very predictable cannot be reliably identified. For example, it is impossible to predict whether a list of the first 1,000 prime numbers was written by AI or humans, because the correct answer is always the same.
AI-written text can be edited to evade the classifier. Classifiers like ours can be updated and retrained based on successful attacks, but it is unclear whether detection has an advantage in the long-term.
Classifiers based on neural networks are known to be poorly calibrated outside of their training data. For inputs that are very different from text in our training set, the classifier is sometimes extremely confident in a wrong prediction.

In conclusion, the positive aspect is that this is a step in the right direction and will become an invaluable tool, especially for educators and educational institutions. Responsible AI has always been front-of mind with most people, and OpenAI has been very open about their focus and due diligence regarding responsible AI. On the other side, there is no easy way to solve the problems AI-generated text poses. Possibly there won’t ever be.

Thanks for Reading. Stay Tuned!