gpt-oss-20b-GGUF - Hugging Face I'm getting the error ValueError: np uint32 (39) is not a valid GGMLQuantizationType when trying to serve the quantized version with vllm v0 11 1
MoE quantization - Quantization - vLLM Forums This limitation is confirmed in recent vLLM issues and is not resolved by changing runtime flags or environment variables The only workaround is to use a different model or quantization format that does not trigger this unsupported code path, or to wait for an upstream fix in vLLM that adds support for this feature in Marlin MoE kernels
Ollama项目中Gemma-3-27b-it QAT GGUF模型加载问题解析 用户尝试通过Ollama加载Google发布的QAT(Quantization-Aware Training)量化版本的Gemma-3-27b-it模型时,虽然模型出现在本地列表中,但运行时却提示"model not found"或"file does not exist"错误。 这种QAT量化模型相比普通GGUF模型,在Q4量化级别下应该能提供更好的性能表现。 根本