安裝中文字典英文字典辭典工具!
安裝中文字典英文字典辭典工具!
|
- GitHub - elder-plinius OBLITERATUS: OBLITERATE THE CHAINS THAT BIND YOU
OBLITERATUS is the most advanced open-source toolkit for understanding and removing refusal behaviors from large language models — and every single run makes it smarter
- elder-plinius OBLITERATUS | DeepWiki
OBLITERATUS (obliteratus Python package, v0 1 2) is an open-source toolkit for abliteration — the process of locating and surgically removing refusal behaviors from large language models through direct weight modification or inference-time steering, without retraining
- OBLITERATUS gemma-4-E4B-it-OBLITERATED · Hugging Face
Google built Gemma 4 with guardrails We built OBLITERATUS to tear them off They said their architecture was different They were right — it broke every tool we threw at it NaN activations, shared KV weights, thinking mode Gemma 4 fought back harder than any model we've cracked It still lost 🐉 0% hard refusal
- INTRODUCING: OBLITERATUS!!! GUARDRAILS-BE-GONE! ⛓️ OBLITERATUS is . . .
But here's what truly sets it apart: OBLITERATUS is a crowd-sourced research experiment Every time you run it with telemetry enabled, your anonymous benchmark data feeds a growing community dataset — refusal geometries, method comparisons, hardware profiles — at a scale no single lab could achieve
- Obliteratus — OBLITERATUS: abliterate LLM refusals (diff-in-means . . .
OBLITERATUS: abliterate LLM refusals (diff-in-means) The following is the complete skill definition that Hermes loads when this skill is triggered This is what the agent sees as instructions when the skill is active
- OBLITERATUS: Mapping the Geometry of Refusal Inside Large Language Models
OBLITERATUS is an open-source toolkit that uses mechanistic interpretability to locate and remove refusal directions in transformer weights — without retraining Understanding how refusal works geometrically is the first step to building better AI safety
- OBLITERATUS and the Science of AI Jailbreaking
OBLITERATUS is an experimental toolkit designed to analyze and modify refusal behaviors in open-weight LLMs Modern language models often refuse to answer certain prompts due to safety training
- OBLITERATUS Review: Open-Source Toolkit Uncensors 116 LLMs
This is a review of what OBLITERATUS actually ships, how it works technically, what it breaks, and why a small group of alignment researchers have been quietly sounding the alarm since the repo dropped
|
|
|