ATLAS Research - Guiding Principles

At ATLAS, we aim to strengthen AI systems by rigorously probing their limitations. By identifying weaknesses before adversaries do, we work to stay ahead of potential threats while contributing to the improvement of current technologies. Our research spans both technical security challenges and broader conceptual questions in AI safety. We explore diverse perspectives to uncover new ways of understanding and addressing safety concerns. Our findings are shared openly through publications at both technical and theoretical conferences.

Technical Research

One Agent to Rule Them All: How One Malicious Agent Hijacks A2A System (2025)
BlackHat SECTOR 2025🎉

Authors: Adar Peleg, Dvir Alsheich, Shaked Adi, Amit LeVi, Rom Himelstein, Avi Mendelson and Stav Cohen
Equal contribution

Explore our publication that focuses on severe vulnerabilities in Google's A2A framework. Gain a deeper understanding of the attack surface with real-world examples on this new framework. Presented at BlackHat.

Impact of Jailbreak Attacks on Code Generation Inefficiencies in LLMs (2025)

Authors: Ori Suliman, Anat Gindin, Amit LeVi, Rom Himelstein and Avi Mendelson
Equal contribution *

This study investigates how jailbreak attacks affect the behavior and efficiency of code-generation LLMs by evaluating existing baselines and introducing novel attack methods that expose hidden vulnerabilities. Our goal is to map the attack surface of LLM-based coding tools and inform the design of more robust, security-aware systems.

Theoretical Research

Jailbreak Attack Initializations as Extractors of Compliance Directions (2025)

Authors: Amit LeVi, Rom Himelstein, Yaniv Nemkovsky, Chaim Baskin and Avi Mendelson
Equal contribution

Explore our latest work on jailbreak attack initializations. We reveal how certain initializations encode a shared compliance direction in safety-aligned LLMs - enabling faster, more generalizable jailbreaks. We introduce CRI, a novel initialization framework that leverages this insight to accelerate jailbreak attacks by up to 100×. Check out our paper and code.

Uncovering Hidden Biases in LLMs via Jailbreak Attacks (2025)

Authors: Amit LeVi, Rom Himelstein, Brit Youngmann., Avi Mendelson, Yaniv Nemkovsky
Equal contribution *

This work uses a novel method to uncover hidden biases in LLMs that standard prompts fail to reveal. By testing multiple models, we expose deeper discrepancies and introduce a benchmark, evaluation framework, and public leaderboard to track bias and fairness. Our goal is to improve transparency and support more informed use of LLMs in real-world applications.

Where Unlearning Fails: Insights From the Unlearning Direction (2025)

Authors: Ofir Shabat, Pol Fuentes Camacho, Rom Himelstein, Amit LeVi, Avi Mendelson, Liran Ringel
Equal contribution *

As data privacy demands grow, effective machine unlearning for LLMs is becoming essential. This work evaluates how well current unlearning methods truly erase sensitive information without compromising model performance. We introduce a rigorous framework to test and compare unlearning strategies, highlighting key trade-offs between utility and true data removal.

LLM-Based Recommendation Steering (2025)

Authors: Ohad Elgamil, Yoav Raichshtein, Yarin Bekor, Rom Himelstein, Amit LeVi, Avi Mendelson
Equal contribution *

LLM-based recommendation systems use language understanding to predict and personalize user preferences with high accuracy. In this work, we introduce a method to shift item representations within the model’s activation space, boosting recommendation rates for specified content - like new products or underexposed items - without requiring retraining or harming overall performance.

Semantic Reasoning via Masked Augmented Segments Training (2025)

Authors: Nitzan Ron, Amit LeVi, Avi Mendelson
Equal contribution *

Computer vision has made major strides in semantic segmentation, with self-supervised learning helping reduce the need for labeled data. Yet, current methods struggle to model contextual relationships between distinct semantic regions. Approaches like MaskFormer and SemMAE rely on patches or heuristics, limiting object-level reasoning. Vision Transformers excel at local perception but fall short on capturing semantic interplay across objects. Our work introduces a strategy aimed at improving semantic reasoning by leveraging the structure of annotated scenes—achieving noticeable gains on ADE20K without additional compute.

ATLAS Research - Guiding Principles

Technical Research

One Agent to Rule Them All: How One Malicious Agent Hijacks A2A System (2025) BlackHat SECTOR 2025🎉

Authors: Adar Peleg*, Dvir Alsheich*, Shaked Adi*, Amit LeVi, Rom Himelstein, Avi Mendelson and Stav Cohen Equal contribution *