arxiv Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

名称
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
首页
https://yiyibooks.cn/arxiv/2310.03693v1/index.html
原始地址
https://arxiv.org/abs/2310.03693
描述
下游场景优化大型语言模型(LLM)通常涉及通过进一步定制预的LLM。Meta公开发布的Llama模型和OpenAI用于在自定义数据集上关系GPT-3.5 Turbo的API也鼓励了这种做法... ...