jecc的文档

jecc

个性签名 ...

ChunkRAG: Novel LLM-Chunk Filtering Method for RAG Systems

使用大型语言模型 (LLM) 的检索增强生成 (RAG) 系统通常会由于检索不相关或松散相关的信息而生成不准确的响应。现有的在文档级别操作的方法无法有效过滤掉此类内容。我们提出了 LLM 驱动的块过滤 ChunkRAG，这是一个通过在块级别评估和过滤检索到的信息来增强 RAG 系统的框架 ...

0 1 0 0 2024/11/05 arXiv:2410.19572v3 jecc

SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains

检索增强生成（RAG）通过整合外部知识来增强大语言模型（LLM）的问答（QA）能力。然而，由于分布变化和对特定领域数据的访问有限，使通用 RAG 系统适应科学和医学等专业领域会带来独特的挑战。为了解决这个问题，我们提出了 SimRAG，这是一种自我训练方法，为 LLM 提供问答和问题生成的联合能力，以适应领域 ...

0 0 0 0 2024/10/24 arXiv:2410.17952v1 jecc

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

文档布局分析对于现实世界的文档理解系统至关重要，但它遇到了速度和准确性之间具有挑战性的权衡：利用文本和视觉特征的多模态方法可实现更高的准确性，但会遭受显着的延迟，而单模态方法仅依赖于视觉功能提供更快的处理速度，但以牺牲准确性为代价。为了解决这个困境，我们引入了 DocLayout-YOLO，这是一种新颖的方法，通过预训练和模型设计中特定于文档的优化来提高准确性，同时保持速度优势。为了进行稳健的文档预训练，我们引入了 Mesh-candidate BestFit 算法，该算法将文档合成构建为二维装箱问题，生成大规模、多样化的 DocSynth-300K 数据集 ...

0 1 0 0 2024/10/21 arXiv:2410.12628v1 jecc

ChunkRAG: Novel LLM-Chunk Filtering Method for RAG Systems

SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

Baichuan-Omni Technical Report

Extract, Define, Canonicalize: An LLM-based Framework for Knowledge Graph Construction

PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

LLMs Will Always Hallucinate, and We Need to Live With This

PyABSA: A Modularized Framework for Reproducible Aspect-based Sentiment Analysis

Editing Large Language Models: Problems, Methods, and Opportunities

CompAct: Compressing Retrieved Documents Actively for Question Answering

来一起翻译吧！

为了您和其他读者获得更好的阅读体验，请您在阅读时勇敢地改正翻译，特别是一些显而易见的机器翻译错误。

虽然我们追求卓越，但我们并不要求翻译十全十美，因此请不要担心您翻译有误 —— 我们的服务器已经记录所有的翻译，您不必担心会因为您的失误导致无法挽回的破坏。（改编自维基百科）