VU#733789: ChatGPT-4o contains security bypass vulnerability through time and search functions called "Time Bandit"

Overview

ChatGPT-4o contains a jailbreak vulnerability called “Time Bandit” that allows an attacker the ability to circumvent the safety guardrails of ChatGPT and instruct it to provide illicit or dangerous content. The jailbreak can be initiated in a variety of ways, but centrally requires the attacker to prompt the AI with questions regarding a specific time period in history. The jailbreak can be established in two ways, either through the Search function, or by prompting the AI directly. Once this historical timeframe been established in the ChatGPT conversation, the attacker can exploit time line confusion and procedural ambiguity in following prompts to circumvent the safety guidelines, resulting in ChatGPT generating illicit content. This information could be leveraged at scale by a motivated threat actor for malicious purposes.

Description

“Time Bandit” is a jailbreak vulnerability present in ChatGPT-4o that can be used to bypass safety restrictions within the chatbot and instruct it to generate content that breaks its safety guardrails. An attacker can exploit the vulnerability by beginning a session with ChatGPT and prompting it directly about a specific historical event, historical time period, or by instructing it to pretend it is assisting the user in a specific historical event. Once this has been established, the user can pivot the received responses to various illicit topics through subsequent prompts. These prompts must be procedural, first instructing the AI to provide further details on the time period asked before gradually pivoting the prompts to illicit topics. These prompts must all maintain the established time for the conversation, otherwise it will be detected as a malicious prompt and removed.

This jailbreak could also be achieved through the “Search” functionality. ChatGPT supports a Search feature, which allows a logged in user to prompt ChatGPT with a question, and it will then search the web based on that prompt. By instructing ChatGPT to search the web for information surrounding a specific historical context, an attacker can then continue the searches within that time frame and eventually pivot to prompting ChatGPT directly regarding illicit subjects through usage of procedural ambiguity.

During testing, the CERT/CC was able to replicate the jailbreak, but ChatGPT removed the prompt provided and stated that it violated usage policies. Nonetheless, ChatGPT would then proceed to answer the removed prompt. This activity was replicated several times in a row. The first jailbreak, exploited through repeated direct prompts and using procedural ambiguity, was exploited without authentication. The second, which requires exploit through the Search function, requires authentication by the user. During testing, the jailbreak was more successful using a time frame within the 1800s or 1900s.

Impact

This vulnerability bypasses the security and safety guidelines of OpenAI, allowing an attacker to abuse ChatGPT for instructions regarding, for example, how to make weapons or drugs, or for other malicious purposes. A jailbreak of this type exploited at scale by a motivated threat actor could result in a variety of malicious actions, such as the mass creation of phishing emails and malware. Additionally, the usage of a legitimate service such as ChatGPT can function as a proxy, hiding their malicious activities.

Solution

OpenAI has mitigated this vulnerability. One OpenAI spokesperson provided the below statement:
“It is very important to us that we develop our models safely. We don’t want our models to be used for malicious purposes. We appreciate you for disclosing your findings. We’re constantly working to make our models safer and more robust against exploits, including jailbreaks, while also maintaining the models’ usefulness and task performance.”

Acknowledgements

Thanks to the reporter, Dave Kuszmar, for reporting the vulnerability. This document was written by Christopher Cullen.