OpenAI acknowledges that prompt injections – text-based attacks on language models running in browsers – may never be completely eliminated. Still, the company says it’s ”optimistic” about reducing the risks over time.
OpenAI has released a security update for the browser agent in ChatGPT Atlas. The update includes a newly adversarially trained model and enhanced security measures, prompted by a new class of prompt injection attacks discovered through OpenAI’s internal automated red-teaming.
Källa: OpenAI admits prompt injection may never be fully solved, casting doubt on the agentic AI vision
