top of page
Text Reflection on Glass

ATLAS Research - Guiding Principles

At ATLAS, we aim to strengthen AI systems by rigorously probing their limitations. By identifying weaknesses before adversaries do, we work to stay ahead of potential threats while contributing to the improvement of current technologies. Our research spans both technical security challenges and broader conceptual questions in AI safety. We explore diverse perspectives to uncover new ways of understanding and addressing safety concerns. Our findings are shared openly through publications at both technical and theoretical conferences.

Technical Research

One Agent to Rule Them All: How One Malicious Agent Hijacks A2A System (2025)
BlackHat SECTOR 2025🎉

Authors: Adar Peleg*, Dvir Alsheich*, Shaked Adi*,  Amit LeVi, Rom Himelstein, Avi Mendelson and Stav Cohen
Equal contribution *

Explore our publication that focuses on severe vulnerabilities in Google's A2A framework. Gain a deeper understanding of the attack surface with real-world examples on this new framework. Presented at BlackHat.

PHOTO-2025-06-25-18-35-46.jpg

Impact of Jailbreak Attacks on Code Generation Inefficiencies in LLMs (2025)

Authors: Ori Suliman*, Anat Gindin*,  Amit LeVi, Rom Himelstein and Avi Mendelson
Equal contribution *

This study investigates how jailbreak attacks affect the behavior and efficiency of code-generation LLMs by evaluating existing baselines and introducing novel attack methods that expose hidden vulnerabilities. Our goal is to map the attack surface of LLM-based coding tools and inform the design of more robust, security-aware systems.

image_2025-07-07_163556113.png

Theoretical Research

Jailbreak Attack Initializations as Extractors of Compliance Directions (2025)

Authors: Amit LeVi*, Rom Himelstein*, Yaniv Nemkovsky*, Chaim Baskin and Avi Mendelson
Equal contribution *

Explore our latest work on jailbreak attack initializations. We reveal how certain initializations encode a shared compliance direction in safety-aligned LLMs - enabling faster, more generalizable jailbreaks. We introduce CRI, a novel initialization framework that leverages this insight to accelerate jailbreak attacks by up to 100×. Check out our paper and code.

image_2025-07-07_164804533.png

Uncovering Hidden Biases in LLMs via Jailbreak Attacks (2025)

Authors: Amit LeVi*,  Rom Himelstein*, Brit Youngmann., Avi Mendelson, Yaniv Nemkovsky
Equal contribution *

This work uses a novel method to uncover hidden biases in LLMs that standard prompts fail to reveal. By testing multiple models, we expose deeper discrepancies and introduce a benchmark, evaluation framework, and public leaderboard to track bias and fairness. Our goal is to improve transparency and support more informed use of LLMs in real-world applications.

Screenshot 2025-06-25 at 13.19.38.png

Where Unlearning Fails: Insights From the Unlearning Direction (2025)

Authors: Ofir Shabat*, Pol Fuentes Camacho*, Rom Himelstein, Amit LeVi,  Avi Mendelson, Liran Ringel
Equal contribution *

As data privacy demands grow, effective machine unlearning for LLMs is becoming essential. This work evaluates how well current unlearning methods truly erase sensitive information without compromising model performance. We introduce a rigorous framework to test and compare unlearning strategies, highlighting key trade-offs between utility and true data removal.

PHOTO-2025-06-08-23-16-38_edited.jpg

LLM-Based Recommendation Steering (2025)

Authors: Ohad Elgamil*, Yoav Raichshtein*, Yarin Bekor,  Rom Himelstein, Amit LeVi,  Avi Mendelson
Equal contribution *

LLM-based recommendation systems use language understanding to predict and personalize user preferences with high accuracy. In this work, we introduce a method to shift item representations within the model’s activation space, boosting recommendation rates for specified content - like new products or underexposed items - without requiring retraining or harming overall performance.

image.png

Semantic Reasoning via Masked Augmented Segments Training (2025)

Authors: Nitzan Ron*, Amit LeVi*,  Avi Mendelson
Equal contribution *

Computer vision has made major strides in semantic segmentation, with self-supervised learning helping reduce the need for labeled data. Yet, current methods struggle to model contextual relationships between distinct semantic regions. Approaches like MaskFormer and SemMAE rely on patches or heuristics, limiting object-level reasoning. Vision Transformers excel at local perception but fall short on capturing semantic interplay across objects. Our work introduces a strategy aimed at improving semantic reasoning by leveraging the structure of annotated scenes—achieving noticeable gains on ADE20K without additional compute.

PHOTO-2025-05-20-21-11-02.jpg

Powered and secured by Wix

bottom of page