英文字典中文字典Word104.com



中文字典辭典   英文字典 a   b   c   d   e   f   g   h   i   j   k   l   m   n   o   p   q   r   s   t   u   v   w   x   y   z   


安裝中文字典英文字典辭典工具!

安裝中文字典英文字典辭典工具!








  • Sandwich attack: Multi-language Mixture Adaptive Attack on LLMs
    In this paper, we introduce a new black-box attack vector called the \emph {Sandwich attack}: a multi-language mixture attack, which manipulates state-of-the-art LLMs into generating harmful and misaligned responses
  • Sandwich Attack: Multi-language Mixture Adaptive Attack on LLMs - ACL Anthology
    In this paper, we introduce a new black-box attack vector called the Sandwich Attack: a multi-language mixture attack, which manipulates state-of-the- art LLMs into generating harmful and mis- aligned responses
  • Sandwich attack: Multi-language Mixture Adaptive Attack on LLMs : r LocalLLaMA - Reddit
    The Sandwich attack is a black-box multi-language mixture attack to LLMs that elicit harmful and misaligned responses from the model In this attack, we use different low-resource languages to create a prompt of five questions and keep the adversarial question in the middle
  • Vulnerabilities in Language Models: The Sandwich Attack
    In this paper, we introduce a new black-box attack vector called the \emph {Sandwich attack}: a multi-language mixture attack, which manipulates state-of-the-art LLMs into generating harmful and misaligned responses
  • sandwich-attack-multi-language-mixture-adaptive-attack-on-llms. md
    本研究探讨了大型语言模型(LLMs)在广泛应用中面临的挑战,尤其是在安全和多语言能力方面。 LLMs在理解和生成多语言内容方面取得了显著进展,但同时也存在被恶意行为者操纵以生成有害内容的风险。 这些挑战包括确保LLMs的响应与人类价值观一致,防止产生有害内容,以及在不同资源语言之间存在的性能不平衡问题。 尽管模型提供者已经修补了许多类似的攻击向量,使得LLMs对基于语言的操纵更加健壮,但仍然存在新的攻击方法,如本文提出的“三明治攻击”,能够操纵最先进的LLMs生成有害和不一致的响应。 过去的研究主要集中在通过安全训练方法来对齐LLMs的响应与人类价值观,以防止有害输出。 这些方法包括对抗性训练、红队评估、强化学习人类反馈(RLHF)、输入输出过滤等。
  • Sandwich attack: Multi-language Mixture Adaptive Attack on LLMs - arXiv. org
    A Sandwich attack is a multilingual mixture adaptive attack that creates a prompt with a series of five questions in different low-resource languages, hiding the adversarial question in the middle position
  • ACL Anthology
    %T Sandwich attack: Multi-language Mixture Adaptive Attack on LLMs %A Upadhayay, Bibek %A Behzadan, Vahid %Y Ovalle, Anaelia %Y Chang, Kai-Wei %Y Cao, Yang Trista %Y Mehrabi, Ninareh %Y Zhao, Jieyu %Y Galstyan, Aram %Y Dhamala, Jwala %Y Kumar, Anoop %Y Gupta, Rahul %S Proceedings of the 4th Workshop on Trustworthy Natural Language Processing
  • probe: Adapt sandwich attack to auto-find effective languages
    The "sandwich attack" gives a few statements, each in a different language, to an LLM, with a malicious instruction in the middle Successfully merging a pull request may close this issue


















中文字典-英文字典  2005-2009

|中文姓名英譯,姓名翻譯 |简体中文英文字典