The final quarter of 2024 was especially significant for PRISM Eval. We formed key partnerships with clients, advocated for new AI model evaluation methods, and received notable recognition for our work by the French Government.

Spotlight

1. Introducing Behavior Elicitation Tool (BET) API

We’re proud to introduce our Behavior Elicitation Tool (BET) API: an automated red teaming engine that tests prompt injection vulnerabilities in Large Language Models (LLMs). Unlike static benchmarks, BET adversarially interacts with your LLM or LLM-based agent system and optimizes against it. It is thus able to test for complex vulnerabilities that other benchmarks including filters and guardrails miss.

Our tool relies on a library of techniques for behavior elicitation that leverages both publicly known jailbreaks as well as unique prompt injection attacks we discovered through our research on GenAI Ethology.

Developers can seamlessly integrate BET via its API to rigorously test their LLMs or LLM-based systems. This is how BET works:

For organizations seeking to deploy LLM-powered chat bots at scale and secure their applications against prompt injection attacks, this means fewer blind spots, faster iteration cycles, a proactive security stance; and ultimately more trust in their systems towards large scale deployment.

Join Our Closed Beta – Experience BET API’s adaptive vulnerability detection and strengthen your LLMs now!

‍

‍

2. PRISM Eval supports new industry customers

At PRISM Eval, we are already making a real-world impact. Leveraging our Behavior Elicitation Tool (BET), we helped a leading automotive multinational company significantly enhance the safety and robustness of their LLM-powered chatbot against prompt injection attacks. Over the last quarter, we worked closely with them to:

Establish a robustness baseline for their existing system.
Identify more robust combinations of LLM, system prompt, input/output filters, and guardrails.
Evaluate and address context-specific prompt injection vulnerabilities before deployment.
Measure the effectiveness of their improvements and iterate.

‍

ML Commons introduces the AI Luminate safety benchmark supported by PRISM Eval

In collaboration with MLCommons we worked on the AILuminate safety benchmark which evaluates an AI system-under-test (SUT) by inputting a set of prompts, recording the SUT’s responses, and then using a specialized set of “safety evaluators models” to determine which of the responses are violations according to the AILuminate Assessment Standard guidelines.

We applied our Behavior Elicitation Tool (BET) to demonstrate how dynamic, optimization-based testing could be used to systematically measure model defenses by adaptively generating "jailbreaks" that elicit harmful behavior from a target LLM.

The analysis of the preliminary results revealed that within just a few optimization steps, prompt effectiveness (measured by the ability to elicit harmful behavior from a target LLM) could be significantly improved (from 8% up to 78% in less than 5 optimization steps). Additionally, we found that techniques successful against one model often showed limited effectiveness against others, and some techniques' performance varied even across different behaviors within the same model which demonstrated the need for dynamic testing.

Find out more about the methodology here.

‍

3. PRISM Eval Selected to Present a Jailbreaking Robustness Leaderboard at the Paris AI Action Summit

PRISM Eval has been selected to lead one of the "AI Convergence" challenges to be presented at the AI Action Summit by the General Secretariat for Investment (SGPI) through its #France2030 initiative.

As part of this challenge, we will be leveraging our Behavior Elicitation Tool (BET) API to evaluate the robustness of leading large language models against prompt injection attacks. The results of this evaluation will be presented via a comprehensive leaderboard unveiled at the upcoming AI Action Summit in Paris (February 10-11, 2025).

‍

‍

Other Highlights in Q4 2024

4. Our CTO Takes 3rd Place in GraySwan's International Jailbreak Championship

Manually crafting a single, universal prompt, Quentin Feuillade--Montixi (CTO) successfully bypassed the defenses of 22 unsecured models in under five hours, including models like claude-3-5-sonnet, llama-3.1-405b-instruct, gpt-4o, and mistral-large.

This achievement not only demonstrates our jailbreaking expertise, but also exposes critical vulnerabilities in frontier LLMs. It underscores the importance of tools like our Behavior Elicitation Tool (BET) in identifying prompt injection vulnerabilities and improving system robustness against attacks.

‍

‍

5. Contributing to the international dialogue on AI safety and evaluation standards

Over the last quarter, our team has continued to help strengthen international cooperation on AI safety and evaluation.

In September, Nicolas Miailhe (CEO) and Tom David (Director for Governance & Standardization) published an article in the latest edition of the Politique Étrangère magazine, analyzing the opportunities and challenges for international cooperation and formulating 5 actionable recommendations (english version here).

In November, they joined the delegation of France’s AI Minister Clara Chappaz for her first official visit abroad in San Francisco, on the occasion of the launch events of the International Network of Cooperation between AI Safety Institutes.

In parallel, Pierre Peigné (CSO) contributed his expertise at the Conference on frontier AI Safety frameworks organized by the UK AI Safety Institute, while Nicolas Miailhe shared insights on AI evaluation during a roundtable discussion hosted by the EU AI Office.

‍

6. Advocating for Voluntary Commitments on Frontier AI Safety at Paris Peace Forum

PRISM Eval, represented by Nicolas Miailhe (CEO), actively contributed to discussions on frontier AI safety at the Paris Peace Forum, assessing progress made on voluntary commitments since the Bletchley Park AI Safety Summit in November 2023.

To address the growing AI interpretability "black box problem", Nicolas stressed the urgent need for dynamic testing technologies like our Behavior Elicitation Tool (BET). He argued that traditional static benchmarks lack the rigor to ensure the safety and control of AI systems, especially as we move towards broader industrial applications of LLM agents.

To further address this challenge, Nicolas called for a collaborative "metrology" effort, urging all actors in the GenAI value chain — from AI safety institutes and frontier labs to hyperscalers and investors — to invest in developing a robust "science of measurement" for AI safety.

To ensure truly robust AI safety, Nicolas envisions a strong commitment to independent evaluation. Recognizing the pressures to deploy AI models quickly and the inherent power imbalances in the AI landscape, he emphasized the need to protect and incentivize third-party evaluators like PRISM Eval. This will ensure objective assessments and a more trustworthy AI ecosystem.

‍

7. Driving Awareness and Adoption of Dynamic Evaluation Methods in AI Safety

In pursuit of more dynamic evaluation methods we also ramped up our outreach and education initiatives.

Our team delivered a Masterclass on Generative AI Safety at the Harvard Club of France, emphasizing the urgent need for more advanced evaluation methods to address the complex safety considerations of deploying GenAI in real-world applications.

‍

We also led a workshop titled "Why Current Benchmark Approaches Are Not Sufficient for Safety" at the 2024 edition of the “AI, Data, and Robotics Forum” in Eindhoven in collaboration with Peter Mattson (President, MLCommons). The discussion underscored the limitations of traditional static benchmarks and advocated for more dynamic and adaptive evaluation methods, like our Behavior Elicitation Tool (BET) technology and ML Common’s AILuminate v1.0 benchmark . A summary of the key finding can be found here in this detailed 2-pager.

‍

‍

What’s Next for PRISM Eval?

BET API Expansion: Sign up now for our closed beta to assess your LLM’s and chatbot agents’ robustness to prompt injection attacks.
Jailbreaking Robustness Leaderboard Reveal: Stay tuned for insights at the AI Action Summit. Follow PRISM Eval on LinkedIn for the latest updates.‍
Fundraising & Growth: We’re seeking mission-aligned partners—reach out to learn more.

‍