[Uncaptioned image] 大型语言模型与 NLP 的结合:一项调查

Libo Qin  Qiguang Chen  Xiachong Feng  Yang Wu  Yongheng Zhang
Yinghui Li  Min Li  Wanxiang Che  Philip S. Yu
Central South University   Harbin Institute of Technology   University of Hong Kong
Tsinghua University   University of Illinons at Chicago
lbqin@csu.edu.cn,
{qgchen,car}@ir.hit.edu.cn
摘要

虽然像 ChatGPT 这样的大型语言模型(大语言模型)在自然语言处理(NLP)任务中表现出了令人印象深刻的能力,但对其在该领域潜力的系统研究在很大程度上仍未得到探索。 本研究旨在通过探讨以下问题来弥补这一差距:(1)文献中目前大语言模型如何应用于 NLP 任务 (2) 大语言模型已经解决了传统的NLP任务吗? (3) NLP大语言模型的未来是什么 为了回答这些问题,我们首先全面概述 NLP 中的大语言模型。 具体来说,我们首先引入一个统一的分类法,包括(1)参数冻结应用和(2)参数调优应用,为理解大数据的当前进展提供统一的视角。 NLP 中的语言模型。 此外,我们总结了新的领域和相关的挑战,旨在激发进一步的突破性进展。 我们希望这项工作能够为 NLP 中大语言模型的潜力和局限性提供有价值的见解,同时也为构建有效的 NLP 大语言模型提供实用指南。

[Uncaptioned image]

大型语言模型与 NLP 的结合:一项调查


Libo Qin  Qiguang Chen  Xiachong Feng  Yang Wu  Yongheng Zhang Yinghui Li  Min Li  Wanxiang Che  Philip S. Yu Central South University   Harbin Institute of Technology   University of Hong Kong Tsinghua University   University of Illinons at Chicago lbqin@csu.edu.cn, {qgchen,car}@ir.hit.edu.cn


1简介

最近,大语言模型通过扩展语言模型代表了人工智能的重大突破 Zhao 等人 (2023a);卡杜尔等人 (2023);杨 等人 ;哈迪等人 (2023);庄等人 (2023). 大语言模型研究现状,如GPT系列Brown 等人 (2020);欧阳等人 (2022)、PaLM 系列 Chowdhery 等人 (2022)、OPT 张 等人 (2022a) 和 LLaMA Touvron 等人(2023),展示了令人印象深刻的零样本性能。 此外,大语言模型还带来了一些新兴能力,包括跟随Wei等人(2022a)的指令、链式推理Wei等人(2022c)和in-情境学习Min 等人(2022),引起越来越多的关注Wei 等人(2022b)

随着大语言模型的进步,如图1所示,大语言模型允许执行各种自然语言处理(NLP)任务(例如零样本数学推理、文本摘要、机器翻译、信息提取和情感分析)通过统一的生成范式来实现,取得了显着的成功 Wei 等人 (2022c, 2023a);秦等人 (2023a); Wang 等人 (2023a, d, h, j);万等人 (2023b);彭等人 (2023);黄等人(2023a) 此外,NLP中的一些大语言模型不需要任何额外的训练数据就可以工作,甚至可以超越通过监督学习微调的传统模型。 这一进步极大地促进了 NLP 文献的发展。 因此,社区见证了 NLP 研究中大语言模型的指数级增长,这促使我们研究以下问题:(1)目前文献中大语言模型如何应用于 NLP 任务 (2) 大语言模型已经解决了传统的NLP任务吗? (3) NLP大语言模型的未来是什么

Refer to caption
图1: 将大语言模型应用于 NLP 任务(例如数学推理、机器翻译、信息提取和情感分析)的示例。
Refer to caption
图2: NLP大语言模型分类,包括参数冻结范式(a)和参数调优范式(b),其中蓝色模块加冰表示参数保持不变,橙色module with fire 表示对全部或选定参数进行微调。

为了回答上述问题,我们首次尝试对NLP大语言模型进行全面、详细的分析。 这项工作的总体目标是探索 NLP 大语言模型的当前发展。 为此,在本文中,我们首先介绍一下相关背景和初步知识。 此外,我们引入了 NLP 大语言模型的统一范式:(1)参数冻结应用,包括(i)零样本学习和(ii)少样本学习; (2)参数调优应用包含(i)全参数调优和(ii)参数高效调优,旨在提供统一的了解NLP大语言模型目前进展的视角:

  • 参数冻结应用直接将提示方法应用于NLP任务的大语言模型上,无需进行参数调优。 此类别包括零样本少样本学习,具体取决于是否需要少样本演示。

  • 调参应用是指NLP任务需要调优大语言模型的参数。 此类别包括全参数参数高效调整,具体取决于是否需要对所有模型参数进行微调。

最后,我们通过确定未来研究的潜在前沿领域以及刺激进一步探索的相关挑战来得出结论。

总之,这项工作提供了以下贡献:

  • (1)

    第一次调查:我们提出了针对自然语言处理 (NLP) 任务的大型语言模型 (大语言模型) 的第一次全面调查。

  • (2)

    新分类法:我们引入了一种新分类法,包括(1)参数冻结应用程序和(2)参数调整应用程序,它提供了统一的视图理解 NLP 任务的大语言模型。

  • (3)

    新领域:我们讨论 NLP 大语言模型的新兴研究领域,并强调与之相关的挑战,旨在激发未来的突破。

  • (4)

    丰富的资源:我们创建了第一个精选的 NLP 大语言模型资源集合,包括开源实现、相关语料库和研究论文列表。 这些资源可从 https://github.com/LightChen233/Awesome-LLM-for-NLP 获取。

我们预计这项工作将成为研究人员的宝贵资源,并推动基于 LLM 的 NLP 领域的进一步进步。

2 背景

如图2所示,本节描述了参数冻结范式(§2.1)和参数调优范式(§2.2)的背景。

2.1 参数冻结范式

参数冻结范式可以直接对 NLP 任务应用提示,无需任何参数调整。 如图2(a)所示,该类别包括零样本学习少样本学习Brown等人(2020);小岛等人 (2022)

零样本学习

在零样本学习中,大语言模型利用指令跟随能力,根据给定的指令提示来解决NLP任务,其定义为:

𝒫=Prompt(), (1)

其中𝒫分别表示提示的输入和输出。

少样本学习

少样本学习使用上下文学习功能来解决模仿少样本演示的 NLP 任务。 形式上,给定一些演示,少样本学习的过程定义为:

𝒫=Prompt(,). (2)

2.2 参数调优范式

如图2(b)所示,参数调优范式涉及调整NLP任务的大语言模型参数,涵盖全参数参数高效调整

全参数调优

在全参数调整方法中,模型的所有参数都在训练数据集𝒟上进行微调:

^=Fine-tune(|𝒟), (3)

其中 ^ 是具有更新参数的微调模型。

参数高效调整

参数高效调整(PET)涉及调整一组现有参数或合并额外的可调参数(例如瓶颈适配器Houlsby等人(2019),低阶适应(LoRA)Hu等人(2021)、前缀调优 Li and Liang (2021a) 和 QLoRA Dettmers 等人 (2023)),以有效地调整模型以适应特定的 NLP 任务。 形式上,参数高效调整首先调整一组参数𝒲,表示为:

𝒲^=Fine-tune(𝒲|𝒟,), (4)

其中𝒲^代表训练后的参数。

3自然语言理解

如图 3 所示,我们首先介绍一些典型的 NLP 理解任务,包括语义分析(§3.1)、信息提取(§3.2)、对话理解(§3.3)和表格理解(§3.4)。

3.1情感分析

情感分析是自然语言处理中的关键功能,它可以识别文本的情绪基调,例如正面意见或批评Wankhade 等人 (2022)

3.1.1 参数冻结范式

零样本学习

在指令调优的帮助下,大语言模型已经具备了优秀的零样本学习能力Belkhir and Sadat (2023) 最近的研究Zhang等人(2023g)发现,使用简单的指令可以激发ChatGPT在情感分类和基于方面的情感分析等一系列情感分析任务上的强大能力。 此外,目前主流的大语言模型Koto 等人(2024)具备多语言理解能力,可以根据情感词典Koto 等人(2024)分析不同语言所传达的情感。 >。

少样本学习

少样本提示不仅可以引发大语言模型的上下文学习,而且可以更清楚地阐述用户的意图。 根据之前的研究结果Zhang 等人 (2023g);赵等人 (2023b); Xu等人(2023c)将示例纳入提示,显着提高了大语言模型在基于方面的情感分析和情感识别任务上的表现。 此外,Sun 等人 (2023b) 引入了针对更复杂程序的少样本学习,结合了用于情感分析的多 LLM 谈判框架。

forked edges, for tree= grow=east, reversed=true, anchor=base west, parent anchor=east, child anchor=west, base=left, font=, rectangle, draw=hidden-black, rounded corners, align=left, minimum width=4em, edge+=darkgray, line width=1pt, s sep=3pt, inner xsep=2pt, inner ysep=3pt, line width=0.8pt, ver/.style=rotate=90, child anchor=north, parent anchor=south, anchor=center, , where level=1text width=8.4em,font=,, where level=2text width=11.3em,font=,, where level=3text width=8.0em,font=,, where level=4text width=12em,font=,, [Parameter-Frozen Paradigm Taxonomy,ver [ Understanding (§3),ver [Sentiment Analysis (§3.1) [e.g., Zhang et al. (2023g), Koto et al. (2024), Zhao et al. (2023b), Xu et al. (2023c), Sun et al. (2023b) , leaf, text width=48.3em] ] [Information Extraction
3.2) [e.g., Zhang et al. (2023c),Wei et al. (2023a),Xie et al. (2023), Li and Zhang (2023),Li et al. (2023e), Bi et al. (2023) , leaf, text width=48.3em] ] [Dialogue Understanding
3.3) [e.g., Pan et al. (2023), He and Garner (2023), Hudeček and Dušek (2023), Heck et al. (2023), Gao et al. (2023a),
Li et al. (2022b), Zhang et al. (2023i, h), Wu et al. (2023c), Das et al. (2023a), Chi et al. (2023), Hu et al. (2022b),
King and Flanigan (2023), Addlesee et al. (2023),Chung et al. (2023),Lee et al. (2023),Lin et al. (2023), Cao (2023) , leaf, text width=48.3em] ] [Table Understanding (§3.4) [e.g., Singha et al. (2023), Patnaik et al. (2024), Ye et al. (2023, 2024), Sui et al. (2023a, b), Cheng et al. (2022),
Zhang et al. (2023f, k, j, 2024), Chen (2023), Luo et al. (2023b), Li et al. (2023b), Jiang et al. (2023) , leaf, text width=48.3em] ] ] [ Generation (§4),ver [Summarization (§4.1) [e.g., Goyal et al. (2022), Ravaut et al. (2023b), Bhaskar et al. (2022), Wang et al. (2023b), Zhang et al. (2023e, b),
Adams et al. (2023), Tang et al. (2023) , leaf, text width=48.3em] ] [Code Generation (§4.2) [e.g., Chen et al. (2021), Nijkamp et al. (2022), Christopoulou et al. (2022), Luo et al. (2023c), Allal et al. (2023),
Li et al. (2023f, g), Guo et al. (2024) Roziere et al. (2023), Zheng et al. (2023), , leaf, text width=48.3em] ] [Machine Translation (§4.3) [e.g., Wei et al. (2023b),Zhu et al. (2023a),Li et al. (2023a, c),Alves et al. (2023),Raunak et al. (2023),Lu et al. (2023b) , leaf, text width=48.3em] ] [Mathematical Reasoning
4.4) [e.g., Wei et al. (2022c), Zhang et al. (2022b), Kojima et al. (2022), Wang et al. (2023g), Touvron et al. (2023),
Lu et al. (2023d), Gao et al. (2023b) , leaf, text width=48.3em] ]] ]

forked edges, for tree= grow=east, reversed=true, anchor=base west, parent anchor=east, child anchor=west, base=left, font=, rectangle, draw=hidden-black, rounded corners, align=left, minimum width=4em, edge+=darkgray, line width=1pt, s sep=3pt, inner xsep=2pt, inner ysep=3pt, line width=0.8pt, ver/.style=rotate=90, child anchor=north, parent anchor=south, anchor=center, , where level=1text width=8.4em,font=,, where level=2text width=11.3em,font=,, where level=3text width=8.0em,font=,, where level=4text width=12em,font=,, [Parameter-Tuning Paradigm Taxonomy,ver [ Understanding (§3),ver [Sentiment Analysis (§3.1) [e.g., Wang et al. (2022), Varia et al. (2022), Yang and Li (2023), Zhao et al. (2016), Qiu et al. (2023) , leaf2, text width=48.3em] ] [Information Extraction
3.2) [e.g., Lu et al. (2023a),Gan et al. (2023),Sainz et al. (2023),Wang et al. (2023f), Das et al. (2023b),Liang et al. (2023) , leaf2, text width=48.3em] ] [Dialogue Understanding
3.3) [e.g., Xie et al. (2022),Zhao et al. (2022a), Gupta et al. (2022),Yu et al. (2022),Feng et al. (2023b), Liu et al. (2023b) , leaf2, text width=48.3em] ] [Table Understanding (§3.4) [e.g., Li et al. (2023d), Xie et al. (2022), Xue et al. (2023), Zhang et al. (2023a), Zhu et al. (2024), Bai et al. (2023),
Zhang et al. (2023d) , leaf2, text width=48.3em] ]] [ Generation (§4),ver [Summarization (§4.1) [e.g., Pagnoni et al. (2022), Zhao et al. (2022b), Yuan et al. (2022), Feng et al. (2023a), Li and Liang (2021b),
Ravaut et al. (2023a) , leaf2, text width=48.3em] ] [Code Generation (§4.2) [e.g., Wang et al. (2021, 2023i),Le et al. (2022),Shojaee et al. (2023),Ayupov and Chirkova (2022),Zhuo et al. (2024),
Weyssow et al. (2023) , leaf2, text width=48.3em] ] [Machine Translation (§4.3) [e.g., Xu et al. (2023b, 2024), Iyer et al. (2023),Moslem et al. (2023),Ustun and Stickland (2022),Alves et al. (2023)
Wu et al. (2023a, 2024), , leaf2, text width=48.3em] ] [Mathematical Reasoning
4.4) [e.g., Luo et al. (2023a), Yue et al. (2023), Ho et al. (2023), Schick et al. (2023), Hu et al. (2022a, 2023b) , leaf2, text width=48.3em] ] ] ]

图3: NLP大语言模型分类,包括参数冻结范式和参数调优范式。

3.1.2 参数调优范式

全参数调优

全参数指令调整已被证明是弥合任务无关预训练和特定任务推理之间差距的有效方法。 具体来说,Wang等人(2022)为各种基于方面的情感分析任务设计了统一的情感指令,以引出大语言模型。 Varia 等人 (2022) 利用特定于任务的情感指令来构建大语言模型,以实现任务间依赖性。 Yang and Li (2023) 在提示构建过程中将视觉输入转换为纯文本以进行指令调整。 这些作品展示了调整大语言模型以进行高级情感分析的潜力。

参数高效调整

情感分析技术在现实世界中有许多应用,例如意见挖掘Zhao等人(2016) 因此,效率是评价情感分析方法的一个重要维度。 Qiu 等人 (2023) 利用 LoRA 在同理心多轮对话数据集 SMILECHAT 上调优大语言模型来开发情感支持系统。

3.2信息提取

信息提取(IE)任务旨在从纯文本中提取结构信息,通常包括关系提取(RE)、命名实体识别(NER)和事件提取(EE)Xu 等人(2023a)

3.2.1 参数冻结范式

零样本学习

受到大语言模型在各种任务上令人印象深刻的能力的启发,最近的研究 Zhang 等人 (2023c);魏等人(2023a)开始探索利用大语言模型中嵌入的知识解决IE任务的零样本提示方法。 Wei 等人(2023a)Xie 等人(2023)Zhang 等人(2023c)提出了一系列分解问题的方法-通过将 NER 分解为更小、更简单的子问题来回答任务,从而改善了整体流程。 此外,Xie等人(2023)进一步介绍了句法提示和工具增强两种方法,通过融入句法信息来提高大语言模型的性能。

少样本学习

考虑到序列标记和文本生成之间的差距,提供示例可以帮助大语言模型更好地理解给定的任务并遵循解决问题的步骤。 为了选择相关的演示,Li and Zhu (2023)部署检索模块来检索给定测试句子的最合适的示例。 Li 等人 (2023e)Bi 等人 (2023) 提出使用与代码相关的大语言模型将 IE 任务重新表述为代码,而不是使用自然语言进行结构化输出,例如作为法典。

3.2.2 参数调优范式

全参数调优

定制大语言模型的常见做法是在收集的数据集上微调大语言模型。 通常采用三种调优范例来增强大语言模型的能力。 第一个是在单个数据集上调整大语言模型以增强特定能力。 第二个是标准化所有 IE 子任务的数据格式,从而使单一模型能够高效处理不同的任务(Lu 等人,2023a;Gan 等人,2023) 最后一个是在混合数据集上调整大语言模型并测试未见过的任务 Sainz 等人 (2023); Wang等人(2023f),常用于提高大语言模型的泛化能力。

参数高效调整

大语言模型的巨大参数调整对研究和开发都提出了重大挑战。 为了应对这一挑战,Das 等人训练 (2023b) 提出了一种动态稀疏微调方法,该方法专注于 IE 过程中的特定参数子集。 这种方法在处理有限数据时特别有用。 同时,Liang等人(2023)介绍了Lottery Prompt Tuning(LPT),一种仅有效地调整用于终身信息提取的部分提示向量的方法。 该技术优化了参数效率和部署效率。

3.3对话理解

对话理解通常包括口语理解 (SLU) Tur 和 De Mori (2011); Qing 等人 (2019, 2021) 和对话状态跟踪 (DST) Sarikaya 等人 (2016); Jacqmin 等人 (2022)

3.3.1 参数冻结范式

零样本学习

最近的研究通过零样本提示强调了大语言模型在对话理解方面的有效性 Pan 等人 (2023);他和加纳 (2023);胡德切克和杜塞克 (2023);哎呀等人(2023) Gao 等人(2023a)Addlesee 等人(2023)介绍大语言模型零样本思维链提示策略,循序渐进加深理解推理。 此外,Zhang等人(2023i)Wu等人(2023c)将SLU和DST视为代理系统和代码生成任务,以有效提高任务性能。 此外,Chung 等人 (2023)Chi 等人 (2023)Zhang 等人 (2023h) 将任务扩展到实际场景,通过零样本提示理解对话,以实现高效的交互和对话管理。

少样本学习

受限于大语言模型的指令跟随能力,近期研究主要集中在通过相关少样本演示Hudeček and Dušek (2023)来提高模型在对话理解方面的表现。 为了解决给定少样本演示中的“过度拟合”问题,Hu 等人 (2022b)King 和 Flanigan (2023)Das 等人 (2023a)Li 等人 (2022b)Lee 等人 (2023)King 和 Flanigan (2023)Addlesee等人(2023)进一步介绍了一些检索多种少样本演示的方法,以提高理解性能。 Lin 等人 (2023)Cao (2023) 通过上下文学习将 DST 任务与代理集成,增强对话理解能力。

3.3.2 参数调优范式

全参数调优

全参数调优是指不冻结任何参数,使用所有参数来训练对话理解任务Yu等人(2022) 具体来说,Xie 等人 (2022);赵等人(2022a)通过训练完整参数将结构化任务统一为文本格式,展示了显着的改进和泛化。 Gupta 等人 (2022) 利用带有一些演示的输入作为新的 DST 表示格式来训练全参数的大语言模型,并取得了很好的效果。

参数高效调整

受限于全参数微调的巨大成本,很多工作开始更多地关注参数高效调优(PET),以实现更低成本的对话理解任务训练。 具体来说,Feng 等人 (2023b) 提出了 LDST,这是一种 LLaMA 驱动的 DST 框架,它利用 LoRA 技术进行参数高效的微调,实现了与 ChatGPT 相当的性能。 Liu等人(2023b)提供一个键值对软提示池,根据对话历史记录从提示池中选择软提示,以获得更好的PET。

3.4表理解

表格理解涉及对表格中呈现的结构化数据的理解和分析,重点是解释和提取有意义的信息,如表格问答金等人(2022)

3.4.1 参数冻结范式

零样本学习

最近,大语言模型的进步为探索零样本学习能力在理解和解释表格数据方面铺平了道路 Singha 等人 (2023); Patnaik 等人 (2024);叶等人(2024) Ye等人(2023)Sui等人(2023a)专注于将大表分解成更小的部分,以减少表理解过程中不相关数据的干扰。 此外,Patnaik 等人 (2024) 介绍了 CABINET,这是一个框架,其中包含一个用于生成解析语句以强调与给定问题相关的数据的模块。 隋等人(2023b)开发TAP4LLM,通过将外部知识源的可靠信息融入到提示中,增强大语言模型的表格理解能力。 此外,Ye 等人 (2024) 提出了一个 DataFrameQA 框架,利用安全的 Pandas 查询来解决表理解中的数据泄漏问题。 这些努力标志着利用大语言模型在表数据理解中实现更有效和高效的零样本学习的重大进步。

少样本学习

少样本学习越来越成为研究人员解决大语言模型局限性的焦点,特别是在表格理解和指令遵循能力的背景下Chen (2023);张等人(2024) Luo等人(2023b)提出了一种与思维检索相结合的混合提示策略,以进一步提高表格理解任务的示例质量。 Cheng等人(2022)引入Binder将表理解任务重新定义为编码任务,使得代码的执行能够直接从表中得出答案。 此外,Li 等人 (2023b)Jiang 等人 (2023)Zhang 等人 (2023k, f) 将表理解概念化为更复杂的代理任务,利用外部工具来增强表任务中的大语言模型。 在这些发展的基础上,ReAcTable Zhang 等人 (2023j) 将其他操作集成到流程中,例如生成 SQL 查询、生成 Python 代码和直接回答问题,从而进一步丰富了少样本学习环境大语言模型.

3.4.2 参数调优范式

全参数调优

利用大语言模型的现有功能,全参数调优针对特定的表理解任务优化了这些模型。 Li 等人 (2023d)Xie 等人 (2022) 采用大量与表相关的数据进行表指令调优,从而在表理解任务中实现更好的泛化。 此外,Xue等人(2023)引入DB-GPT来增强大语言模型,通过微调它们并集成检索增强生成组件来更好地支持表理解。

参数高效调整

Xie 等人 (2022) 利用提示调整在表表示指令的统一框架内进行高效微调。 此外,Zhang 等人 (2023a)Zhu 等人 (2024)Bai 等人 (2023) 采用低秩适应(LoRA) )在指令调整期间为了更好地理解表和进一步清理表。 此外,Zhang 等人 (2023d) 通过实施 LongLoRA 解决了与长表输入相关的挑战,证明了其在管理表理解任务中的长上下文问题方面的有效性。

4 自然语言生成

This section presents the LLMs for classific NLP generation tasks containing Summarization (§4.1), Code Generation (§4.2), Machine Translation (§4.3), and Mathematical Reasoning (§4.4), which are illustrated in Figure 3.

4.1总结

摘要旨在从文本文档中提取最本质的信息,生成简洁、连贯的概要,保留原始内容的主要主题(石等人,2018)

4.1.1 参数冻结范式

零样本学习

在文本摘要零样本学习的探索中,GPT-3等大语言模型在生成简洁且准确的摘要方面表现出了惊人的优越性能,挑战了传统微调方法的需求(Goyal等人, 2022;巴斯卡等人,2022;王等人,2023b) Zhang 等人 (2023e) 强调指令调优是大语言模型摘要成功的关键。 Ravaut 等人 (2023b) 仔细检查大语言模型的上下文利用,发现摘要任务中对初始文档片段的偏见。 这些研究共同强调了在零样本摘要中部署大语言模型的多功能性和挑战。

少样本学习

对于少样本学习,像 ChatGPT 这样的大语言模型会受到审查其摘要能力。 Zhang 等人 (2023b)Tang 等人 (2023) 证明利用上下文学习和类似对话的方法可以增强大语言模型的提取摘要,特别是实现概括性的忠实性。 Adams 等人 (2023) 引入了“密度链”提示技术,揭示了人们对更密集、实体丰富的摘要的偏好,而不是稀疏的摘要。 这些研究共同揭示了优化摘要任务大语言模型的不断发展的策略。

4.1.2 参数调优范式

全参数调优

文本摘要的全参数调优利用大语言模型的强大功能,针对特定的摘要任务对其进行优化。 DIONYSUS (Li 等人, 2022a) 通过针对对话摘要量身定制的新颖预训练策略来适应新领域。 Socratic 预训练 (Pagnoni 等人, 2022) 引入了一种问题驱动的方法来改进摘要过程。 这使得模型可以轻松适应不同的摘要任务,从而产生更可控和相关的摘要。

参数高效调整

PET 策略彻底改变了大型预训练模型对特定摘要任务的适应性,展示了通过最小参数调整进行微调的力量Feng 等人(2023a) Zhao 等人 (2022b)Yuan 等人 (2022) 采用前缀调整 (Li and Liang, 2021b) 进行对话摘要,增强跨领域的模型知识和泛化。 Ravaut 等人 (2023a) 开发 PromptSum,将提示调整与离散实体提示相结合,以实现可控的抽象摘要。 这些方法共同展示了 PET 在以最小的额外计算成本实现稳健、领域自适应和可控摘要方面的功效。

4.2代码生成

代码生成涉及根据自然语言规范自动创建可执行代码,从而促进更直观的编程界面(陈等人,2021)

4.2.1 参数冻结范式

零样本学习

大语言模型的开发极大地推动了代码生成领域的最新进展,研究展示了它们以零样本方式生成代码的能力。 代码大语言模型经过代码和自然语言的训练,对于编程任务具有强大而惊人的零样本学习能力(Nijkamp等人,2022;Roziere等人,2023) 此外,CodeT5+ 通过提出灵活的编码器-解码器架构和一套预训练目标来丰富景观,从而带来显着的改进(Wang 等人,2023i) 这些模型共同突破了代码生成可实现的界限,为零样本学习提供了有希望的途径。

少样本学习

少样本学习正在彻底改变代码生成。 这种技术允许模型通过学习最少的示例Lu等人(2021)来创建精确的代码片段。 Chen 等人 (2021)Allal 等人 (2023)Li 等人 (2023f)Luo 等人 (2023c) )Christopoulou 等人 (2022) 说明了少样本学习的功效,展示了超越其前辈的代码生成能力。 更小但功能更强大的模型的开发(Li等人,2023g;Guo等人,2024)进一步凸显了少样本代码生成技术的可访问性,使其成为现代开发人员武器库中不可或缺的工具。

4.2.2 参数调优范式

全参数调优

全参数调优是增强代码生成模型的关键策略,可以实现全面的模型优化。 具体来说,CodeT 系列(Wang 等人,2021, 2023i) 分别体现了这种方法,通过结合特定于代码的预训练任务和架构灵活性,在代码理解和生成方面表现出色。 CodeRL (Le 等人, 2022) 和 PPOCoder (Shojaee 等人, 2023) 引入深度强化学习,利用编译器反馈和基于执行的策略进行模型细化,而 StepCoder (Shojaee 等人,2023)通过采用强化学习、课程学习和细粒度优化技术进一步推进了这一点。 这些模型共同展示了一系列与代码相关的任务的显着改进,体现了人工智能驱动的编程辅助工具的发展。

参数高效调整

PET 成为代码任务中的关键适应,在性能和计算效率之间取得平衡(Weyssow 等人,2023) 探索适配器和 LoRA 的研究(Ayupov 和 Chirkova,2022;卓等人,2024)展示了 PET 在代码理解和生成任务上的可行性,尽管在生成性能方面存在局限性。

4.3 机器翻译

机器翻译是一项经典任务,利用计算机自动将给定信息从一种语言翻译成另一种语言,力求准确性并保留原始材料的语义本质(Bahdanau 等人,2014)

4.3.1 参数冻结范式

零样本学习

在零样本学习领域,朱等人(2023a)魏等人(2023b)通过跨语言和多语言教学提升大语言模型的多语言表现——调整,显着改善翻译任务。 OpenBA 为双语模型空间做出了贡献,通过新颖的架构在面向中文的任务中展示了卓越的性能(Li 等人,2023c) 这些进步凸显了大语言模型在零样本设置中对齐语言的潜力。

少样本学习

在机器翻译(MT)的少样本学习探索中,最近的研究提出了增强大语言模型 Li 等人(2023a)能力的创新策略;黄等人(2024) Lu 等人 (2023b) 引入字典链提示(CoD),通过低资源语言的上下文学习来提高生僻词的机器翻译。 Raunak 等人 (2023) 研究了演示属性对上下文学习的影响,揭示了输出文本分布对翻译质量的关键作用。 这些作品共同说明了少样本学习和上下文策略在利用大语言模型推进机器翻译领域的巨大潜力。

4.3.2 参数调优范式

全参数调优

大语言模型在机器翻译中的全参数调优代表了提高翻译准确性和适应性的前沿Xu 等人(2023b) Iyer 等人 (2023) 展示了大语言模型通过上下文学习和对歧义数据集的微调来消除多义词歧义的潜力,从而在多种语言中实现卓越的性能。 Moslem 等人 (2023)Wu 等人 (2024) 专注于探索增强实时和上下文感知翻译能力的微调方法。 Xu 等人 (2024) 提出对比偏好优化(CPO)来进一步提高翻译质量,推动大语言模型获得更好的性能。 这些研究揭示了微调方法在实现大语言模型复杂机器翻译任务的全部潜力方面的有效性和必要性。

参数高效调整

PET 正在成为一种将大语言模型集成到机器翻译 (MT) 中、平衡性能和效率的变革性方法。 Ustun 和 Stickland (2022) 根据经验评估 PET 在不同语言和模型大小上的功效,强调适配器在充足的参数预算下的有效性。 Alves等人(2023)利用适配器优化微调过程,在少样本学习和微调效率之间取得平衡。 这些研究共同强调了 PET 通过使大语言模型更具适应性和资源效率而彻底改变机器翻译的潜力。

4.4数学推理

NLP中的数学推理任务涉及使用NLP技术来理解数学文本中的信息,进行逻辑推理并生成答案Lu等人(2023e)

4.4.1 参数冻结范式

零样本学习

以数学为试验台,考察大语言模型OpenAI(2023)的推理能力; Touvron 等人 (2023) 普通的提示方法要求大语言模型直接得出给定数学问题的最终答案。 这是非常具有挑战性的,并且推理过程对人类来说并不透明。 为了解决这个问题,Kojima等人(2022)开发了一种零样本思维链技术,利用简单的提示“让我们一步一步思考”来引出大语言模型中的数学推理。 通过这样做,大语言模型可以在得出最终答案之前将问题分解为更小、更容易解决的部分。 此外,Wang等人(2023g)提出了一种新的解码策略,称为自一致性。 这种方法整合了一系列提示结果来提高数学表现。

少样本学习

最近的研究探索为大语言模型构建更合适的范例以提高数学推理能力。 Wei等人(2022c)引入了思维链提示,通过一些思维链演示来逐步教会大语言模型思考。 然而,在少样本学习中手动构建演示既费时又费力。 为了解决这个问题,Zhang 等人 (2022b)Lu 等人 (2023d) 提出自动选择上下文示例。 即使给出了详细的例子,大语言模型仍然很难精确地计算出数字。 为了解决这个问题,PAL Gao 等人 (2023b) 直接生成程序作为中间推理步骤。 然后使用运行时环境(例如 Python 解释器)执行这些程序,以找到更好、更可靠的解决方案。

Refer to caption
图4: 大语言模型在NLP任务中的未来工作和新领域。

4.4.2 参数调优范式

全参数调优

全参数调优是指定大语言模型在数学推理任务上的行为的常用方法。 Luo等人(2023a)将他们提出的进化指令反馈强化学习(RLEIF)方法应用到数学领域,以提高大语言模型的数学推理能力。 Yue等人(2023)引入MathInstruct数据集,通过域内指令调优来增强大语言模型的通用数学问题解决能力。 Ho等人(2023)通过提炼大型语言模型生成的中间原理来教导小型语言模型进行数学推理。 Schick 等人 (2023) 提出 ToolFormer,在解决数学问题时可以使用计算器进行简单的数值计算。

参数高效调整

通过完整的参数更新来微调大语言模型会产生大量的内存开销,限制了许多用户的可访问性。 参数高效的调优技术,例如 LoRA Hu 等人 (2022a),提供了一种有前途的替代方案。 此外,Hu等人(2023b)提出了一个用户友好的框架,用于将各种适配器集成到大语言模型中,使它们能够处理数学推理等任务。

Takeaways
(1) LLMs offer a unified generative solution paradigm for various NLP tasks. (2) LLMs in NLP tasks still have a certain gap from smaller supervised learning models. (3) Continuing to fine-tune LLMs on NLP tasks bring substantial improvements.

5 未来的工作和新领域

在本节中,如图4所示,我们重点介绍了一些新的前沿领域,希望能够在未来带来更多突破。

5.1 NLP多语言大语言模型

尽管大语言模型在英语 NLP 任务中取得了巨大成功,但全球有超过 7000 种语言。 如何将以英语为中心的大语言模型的成功扩展到其他语言的 NLP 任务是一个重要的研究问题Qin 等人 (2024) 受此启发,最近的研究越来越关注使用多语言大语言模型来解决多语言场景下的 NLP 任务 Xue 等人 (2021);工作坊等人 (2022);石等人 (2022);秦等人 (2023a); Winata 等人 (2023).

这个方向的两个主要挑战如下:(1) 增强低资源语言性能: 由于低资源语言的性能较差,如何构建通用的多语言大语言模型,在跨语言的 NLP 任务中取得良好的性能是一个值得探索的方向。 (2) 改善跨语言对齐: 多语言大语言模型的关键是提高英语与其他语言的一致性。 在跨语言 NLP 任务中有效实现跨语言对齐是一个挑战。

5.2 NLP多模态大语言模型

目前的大语言模型在文本模态方面取得了优异的表现。 然而,集成更多模态是实现通用人工智能(AGI)的关键方法之一。 因此,大量的工作开始探索用于多模态 NLP 任务的多模态大语言模型 Lu 等人 (2022, 2023c);杨等人 (2023a, b);张等人 (2023l).

该领域的主要挑战是:(1) 复杂的多模态推理: 目前,大多数多模态大语言模型侧重于简单的多模态推理,例如识别Wang等人(2023e); Liu 等人 (2023a),而忽略了复杂的多模态推理 Yang 等人 (2023b);卢等人 (2023c). 因此,如何有效地探索自然语言处理的复杂多模态推理是一个至关重要的课题。 (2) 有效的多模态交互: 现有方法通常只是简单地关注在大语言模型中添加直接多模态投影或提示以弥补多模态差距Wang 等人 (2023e);刘等人 (2023a);吴等人 (2023b); Mitra 等人 (2023) 在多模态大语言模型中构建更有效的多模态交互机制来解决NLP任务是一个重要问题。

5.3 NLP大语言模型工具使用

虽然大语言模型在 NLP 任务中取得了成功,但在实际场景中应用时仍然面临挑战Qin 等人 (2023b) 因此,大量工作探索利用大语言模型作为中央控制器,使工具和代理的使用或构建能够解决更实际的 NLP 任务 Shinn 等人 (2023);王等人 (2023c);朱等人 (2023b);胡等人(2023a)

主要问题是:(1) 适当的工具使用: 目前的工作总是考虑静态工具的使用,而忽略了选择合适的工具来使用。 识别正确的工具并准确使用它们是高效解决 NLP 任务的关键问题。 (2) 高效的刀具规划: 目前的工作仍然集中在使用单一工具来完成 NLP 任务。 受此推动,NLP 任务迫切需要实现一个高效的工具链,以协调的方式利用多种工具。 例如,面对面向任务的对话任务时,我们可以使用三种工具:预订机票、预订训练票和预订公交车票。 那么,如何协作使行程时间尽可能短、成本尽可能低,是有效刀具规划中的典型问题。

5.4 NLP大语言模型中的X-ofthought

大语言模型在解决复杂的NLP问题时,往往不能直接给出正确答案,需要复杂的思考。 因此,一些作品采用 X-of-thought (XoT) 来进行高级逻辑推理。 XoT 主要旨在细化逻辑处理,以获得更好的 NLP 任务解决方案 Kojima 等人 (2022);张等人 (2022b);秦等人 (2023a);姚等人 (2023);陈 等人 (2022);雷等人(2023)

这个方向的主要挑战包括:(1) 通用步骤分解: 如何开发一种普遍适用的步骤分解方法,将大语言模型推广到各种 NLP 任务是 XoT 的核心挑战。 (2) 促进知识整合: 多样化的提示可以提高模型在各种场景下的性能。 如何更好地整合不同XoT的知识来解决NLP问题是一个重要方向。

5.5 NLP大语言模型中的幻觉

在解决NLP任务的过程中,大语言模型不可避免地会产生大语言模型产生偏离世界知识的输出的幻觉Muhlgay等人(2023); Min 等人 (2023)、用户请求 Adlakha 等人 (2023) 或自行生成的上下文 Liu 等人 (2022) 这种偏差损害了大语言模型在实际场景中的可靠性。

幻觉的主要挑战是:(1)高效的幻觉评估:如何为大语言模型在各种 NLP 任务中找到合适且统一的评估基准和指标是一个关键挑战。 (2)利用幻觉激发创造力:幻觉往往可以激发某些创造能力。 如何利用幻觉来激发创造力,产生更好的创新知识,是一个有趣的话题。

5.6 NLP大语言模型的安全性

将大型模型应用于下游 NLP 任务也不可避免地会带来安全问题,包括版权问题 Chang 等人 (2023)、仇恨毒性 Hartvigsen 等人 (2022)、社会偏见 万等人 (2023a); Dhamala 等人 (2021) 和心理安全 Huang 等人 (2023b) 受此启发,一系列工作专注于大语言模型针对多种NLP任务的安全性研究Ganguli等人(2022);孙等人(2023a)

大语言模型安全性的主要挑战是:(1) 安全基准建设: 目前,大语言模型在各种 NLP 任务上的安全相关基准测试还很少。 建立有效的安全基准是该领域的一个关键目标。 (2) 多语言安全风险: 大语言模型面临更多跨语言、跨文化的安全风险。 在多语言环境中识别和减轻这些风险是一项重大挑战。

6结论

在这项工作中,我们首次尝试对 NLP 中的大语言模型进行系统概述,引入关于参数冻结应用程序和参数调整应用程序的统一分类法。 此外,我们强调了新的研究前沿和挑战,希望能够促进未来的研究。 此外,我们还维护一个公开的资源网站来跟踪文献的最新进展。 我们希望这项工作能够为构建有效的 NLP 大语言模型提供宝贵的见解和资源。

参考

  • Adams et al. (2023) Griffin Adams, Alexander R. Fabbri, Faisal Ladhak, Eric Lehman, and Noémie Elhadad. 2023. From sparse to dense: Gpt-4 summarization with chain of density prompting. ArXiv, abs/2309.04269.
  • Addlesee et al. (2023) Angus Addlesee, Weronika Sieińska, Nancie Gunson, Daniel Hernández Garcia, Christian Dondrup, and Oliver Lemon. 2023. Multi-party goal tracking with llms: Comparing pre-training, fine-tuning, and prompt engineering. In Proceedings of the 24th Meeting of the Special Interest Group on Discourse and Dialogue, pages 229–241.
  • Adlakha et al. (2023) Vaibhav Adlakha, Parishad BehnamGhader, Xing Han Lu, Nicholas Meade, and Siva Reddy. 2023. Evaluating correctness and faithfulness of instruction-following models for question answering. arXiv preprint arXiv:2307.16877.
  • Allal et al. (2023) Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, et al. 2023. Santacoder: don’t reach for the stars! arXiv preprint arXiv:2301.03988.
  • Alves et al. (2023) Duarte M. Alves, Nuno M. Guerreiro, Joao Alves, José P. Pombal, Ricardo Rei, Jos’e G. C. de Souza, Pierre Colombo, and André Martins. 2023. Steering large language models for machine translation with finetuning and in-context learning. In Conference on Empirical Methods in Natural Language Processing.
  • Ayupov and Chirkova (2022) Shamil Ayupov and Nadezhda Chirkova. 2022. Parameter-efficient finetuning of transformers for source code. ArXiv, abs/2212.05901.
  • Bahdanau et al. (2014) Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473.
  • Bai et al. (2023) Fan Bai, Junmo Kang, Gabriel Stanovsky, Dayne Freitag, and Alan Ritter. 2023. Schema-driven information extraction from heterogeneous tables. arXiv preprint arXiv:2305.14336.
  • Belkhir and Sadat (2023) Ahmed Belkhir and Fatiha Sadat. 2023. Beyond information: Is chatgpt empathetic enough? In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 159–169.
  • Bhaskar et al. (2022) Adithya Bhaskar, Alexander R. Fabbri, and Greg Durrett. 2022. Prompted opinion summarization with gpt-3.5. In Annual Meeting of the Association for Computational Linguistics.
  • Bi et al. (2023) Zhen Bi, Jing Chen, Yinuo Jiang, Feiyu Xiong, Wei Guo, Huajun Chen, and Ningyu Zhang. 2023. Codekgc: Code language model for generative knowledge graph construction. arXiv preprint arXiv:2304.09048.
  • Brown et al. (2020) Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  • Cao (2023) Lang Cao. 2023. Diaggpt: An llm-based chatbot with automatic topic management for task-oriented dialogue. arXiv preprint arXiv:2308.08043.
  • Chang et al. (2023) Kent K Chang, Mackenzie Cramer, Sandeep Soni, and David Bamman. 2023. Speak, memory: An archaeology of books known to chatgpt/gpt-4. arXiv preprint arXiv:2305.00118.
  • Chen et al. (2021) Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
  • Chen (2023) Wenhu Chen. 2023. Large language models are few (1)-shot table reasoners. In Findings of the Association for Computational Linguistics: EACL 2023, pages 1090–1100.
  • Chen et al. (2022) Wenhu Chen, Xueguang Ma, Xinyi Wang, and William W Cohen. 2022. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588.
  • Cheng et al. (2022) Zhoujun Cheng, Tianbao Xie, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, et al. 2022. Binding language models in symbolic languages. In The Eleventh International Conference on Learning Representations.
  • Chi et al. (2023) Ryan A Chi, Jeremy Kim, Scott Hickmann, Siyan Li, Gordon Chi, Thanawan Atchariyachanvanit, Katherine Yu, Nathan A Chi, Gary Dai, Shashank Rammoorthy, et al. 2023. Dialogue distillery: Crafting interpolable, interpretable, and introspectable dialogue from llms. Alexa Prize SocialBot Grand Challenge, 5.
  • Chowdhery et al. (2022) Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. 2022. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  • Christopoulou et al. (2022) Fenia Christopoulou, Gerasimos Lampouras, Milan Gritta, Guchun Zhang, Yinpeng Guo, Zhongqi Li, Qi Zhang, Meng Xiao, Bo Shen, Lin Li, et al. 2022. Pangu-coder: Program synthesis with function-level language modeling. arXiv preprint arXiv:2207.11280.
  • Chung et al. (2023) Willy Chung, Samuel Cahyawijaya, Bryan Wilie, Holy Lovenia, and Pascale Fung. 2023. Instructtods: Large language models for end-to-end task-oriented dialogue systems. arXiv preprint arXiv:2310.08885.
  • Das et al. (2023a) Sarkar Snigdha Sarathi Das, Chirag Shah, Mengting Wan, Jennifer Neville, Longqi Yang, Reid Andersen, Georg Buscher, and Tara Safavi. 2023a. S3-dst: Structured open-domain dialogue segmentation and state tracking in the era of llms. arXiv preprint arXiv:2309.08827.
  • Das et al. (2023b) Sarkar Snigdha Sarathi Das, Haoran Zhang, Peng Shi, Wenpeng Yin, and Rui Zhang. 2023b. Unified low-resource sequence labeling by sample-aware dynamic sparse finetuning. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6998–7010, Singapore. Association for Computational Linguistics.
  • Dettmers et al. (2023) Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2023. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
  • Dhamala et al. (2021) J. Dhamala, Tony Sun, Varun Kumar, Satyapriya Krishna, Yada Pruksachatkun, Kai-Wei Chang, and Rahul Gupta. 2021. Bold: Dataset and metrics for measuring biases in open-ended language generation. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency.
  • Feng et al. (2023a) Xiachong Feng, Xiaocheng Feng, Xiyuan Du, MingSung Kan, and Bing Qin. 2023a. Adapter-based selective knowledge distillation for federated multi-domain meeting summarization. ArXiv, abs/2308.03275.
  • Feng et al. (2023b) Yujie Feng, Zexin Lu, Bo Liu, Liming Zhan, and Xiao-Ming Wu. 2023b. Towards llm-driven dialogue state tracking. arXiv preprint arXiv:2310.14970.
  • Gan et al. (2023) Chengguang Gan, Qinghao Zhang, and Tatsunori Mori. 2023. Giellm: Japanese general information extraction large language model utilizing mutual reinforcement effect. arXiv preprint arXiv:2311.06838.
  • Ganguli et al. (2022) Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, et al. 2022. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv preprint arXiv:2209.07858.
  • Gao et al. (2023a) Haoyu Gao, Ting-En Lin, Hangyu Li, Min Yang, Yuchuan Wu, Wentao Ma, and Yongbin Li. 2023a. Self-explanation prompting improves dialogue understanding in large language models. arXiv preprint arXiv:2309.12940.
  • Gao et al. (2023b) Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. 2023b. Pal: Program-aided language models. In International Conference on Machine Learning, pages 10764–10799. PMLR.
  • Goyal et al. (2022) Tanya Goyal, Junyi Jessy Li, and Greg Durrett. 2022. News summarization and evaluation in the era of gpt-3. ArXiv, abs/2209.12356.
  • Guo et al. (2024) Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Y Wu, YK Li, et al. 2024. Deepseek-coder: When the large language model meets programming–the rise of code intelligence. arXiv preprint arXiv:2401.14196.
  • Gupta et al. (2022) Raghav Gupta, Harrison Lee, Jeffrey Zhao, Yuan Cao, Abhinav Rastogi, and Yonghui Wu. 2022. Show, don’t tell: Demonstrations outperform descriptions for schema-guided task-oriented dialogue. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4541–4549.
  • Hadi et al. (2023) Muhammad Usman Hadi, Rizwan Qureshi, Abbas Shah, Muhammad Irfan, Anas Zafar, Muhammad Bilal Shaikh, Naveed Akhtar, Jia Wu, Seyedali Mirjalili, et al. 2023. Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects. Authorea Preprints.
  • Hartvigsen et al. (2022) Thomas Hartvigsen, Saadia Gabriel, Hamid Palangi, Maarten Sap, Dipankar Ray, and Ece Kamar. 2022. ToxiGen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3309–3326, Dublin, Ireland. Association for Computational Linguistics.
  • He and Garner (2023) Mutian He and Philip N Garner. 2023. Can chatgpt detect intent? evaluating large language models for spoken language understanding. arXiv preprint arXiv:2305.13512.
  • Heck et al. (2023) Michael Heck, Nurul Lubis, Benjamin Ruppik, Renato Vukovic, Shutong Feng, Christian Geishauser, Hsien-Chin Lin, Carel van Niekerk, and Milica Gašić. 2023. Chatgpt for zero-shot dialogue state tracking: A solution or an opportunity? arXiv preprint arXiv:2306.01386.
  • Ho et al. (2023) Namgyu Ho, Laura Schmid, and Se-Young Yun. 2023. Large language models are reasoning teachers. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14852–14882, Toronto, Canada. Association for Computational Linguistics.
  • Houlsby et al. (2019) Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR.
  • Hu et al. (2021) Edward J Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2021. Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations.
  • Hu et al. (2022a) Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022a. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.
  • Hu et al. (2023a) Mengkang Hu, Yao Mu, Xinmiao Yu, Mingyu Ding, Shiguang Wu, Wenqi Shao, Qiguang Chen, Bin Wang, Yu Qiao, and Ping Luo. 2023a. Tree-planner: Efficient close-loop task planning with large language models. arXiv preprint arXiv:2310.08582.
  • Hu et al. (2022b) Yushi Hu, Chia-Hsuan Lee, Tianbao Xie, Tao Yu, Noah A Smith, and Mari Ostendorf. 2022b. In-context learning for few-shot dialogue state tracking. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2627–2643.
  • Hu et al. (2023b) Zhiqiang Hu, Lei Wang, Yihuai Lan, Wanyu Xu, Ee-Peng Lim, Lidong Bing, Xing Xu, Soujanya Poria, and Roy Ka-Wei Lee. 2023b. Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models.
  • Huang et al. (2023a) Jen-tse Huang, Man Ho Lam, Eric John Li, Shujie Ren, Wenxuan Wang, Wenxiang Jiao, Zhaopeng Tu, and Michael R. Lyu. 2023a. Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench. ArXiv:2308.03656 [cs].
  • Huang et al. (2023b) Jen-tse Huang, Man Ho Lam, Eric John Li, Shujie Ren, Wenxuan Wang, Wenxiang Jiao, Zhaopeng Tu, and Michael R Lyu. 2023b. Emotionally numb or empathetic? evaluating how llms feel using emotionbench. ArXiv, abs/2308.03656.
  • Huang et al. (2024) Yi-Chong Huang, Xiaocheng Feng, Baohang Li, Chengpeng Fu, Wenshuai Huo, Ting Liu, and Bing Qin. 2024. Aligning translation-specific understanding to general understanding in large language models. ArXiv, abs/2401.05072.
  • Hudeček and Dušek (2023) Vojtěch Hudeček and Ondřej Dušek. 2023. Are llms all you need for task-oriented dialogue? arXiv preprint arXiv:2304.06556.
  • Iyer et al. (2023) Vivek Iyer, Pinzhen Chen, and Alexandra Birch. 2023. Towards effective disambiguation for machine translation with large language models. In Conference on Machine Translation.
  • Jacqmin et al. (2022) Léo Jacqmin, Lina M. Rojas Barahona, and Benoit Favre. 2022. “do you follow me?”: A survey of recent approaches in dialogue state tracking. In Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 336–350, Edinburgh, UK. Association for Computational Linguistics.
  • Jiang et al. (2023) Jinhao Jiang, Kun Zhou, Zican Dong, Keming Ye, Xin Zhao, and Ji-Rong Wen. 2023. StructGPT: A general framework for large language model to reason over structured data. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9237–9251, Singapore. Association for Computational Linguistics.
  • Jin et al. (2022) Nengzheng Jin, Joanna Siebert, Dongfang Li, and Qingcai Chen. 2022. A survey on table question answering: recent advances. In China Conference on Knowledge Graph and Semantic Computing, pages 174–186. Springer.
  • Kaddour et al. (2023) Jean Kaddour, Joshua Harris, Maximilian Mozes, Herbie Bradley, Roberta Raileanu, and Robert McHardy. 2023. Challenges and applications of large language models. arXiv preprint arXiv:2307.10169.
  • King and Flanigan (2023) Brendan King and Jeffrey Flanigan. 2023. Diverse retrieval-augmented in-context learning for dialogue state tracking. In Findings of the Association for Computational Linguistics: ACL 2023, pages 5570–5585.
  • Kojima et al. (2022) Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
  • Koto et al. (2024) Fajri Koto, Tilman Beck, Zeerak Talat, Iryna Gurevych, and Timothy Baldwin. 2024. Zero-shot sentiment analysis in low-resource languages using a multilingual sentiment lexicon. arXiv preprint arXiv:2402.02113.
  • Le et al. (2022) Hung Le, Yue Wang, Akhilesh Deepak Gotmare, Silvio Savarese, and Steven Chu Hong Hoi. 2022. Coderl: Mastering code generation through pretrained models and deep reinforcement learning. Advances in Neural Information Processing Systems, 35:21314–21328.
  • Lee et al. (2023) Chia-Hsuan Lee, Hao Cheng, and Mari Ostendorf. 2023. Orchestrallm: Efficient orchestration of language models for dialogue state tracking. arXiv preprint arXiv:2311.09758.
  • Lei et al. (2023) Bin Lei, Chunhua Liao, Caiwen Ding, et al. 2023. Boosting logical reasoning in large language models through a new framework: The graph of thought. arXiv preprint arXiv:2308.08614.
  • Li et al. (2023a) Chunyou Li, Mingtong Liu, Hongxiao Zhang, Yufeng Chen, Jinan Xu, and Ming Zhou. 2023a. Mt2: Towards a multi-task machine translation model with translation-specific in-context learning. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 8616–8627.
  • Li et al. (2023b) Hongxin Li, Jingran Su, Yuntao Chen, Qing Li, and Zhaoxiang Zhang. 2023b. Sheetcopilot: Bringing software productivity to the next level through large language models. arXiv preprint arXiv:2305.19308.
  • Li et al. (2023c) Juntao Li, Zecheng Tang, Yuyang Ding, Pinzheng Wang, Peiming Guo, Wangjie You, Dan Qiao, Wenliang Chen, Guohong Fu, Qiaoming Zhu, Guodong Zhou, and M. Zhang. 2023c. Openba: An open-sourced 15b bilingual asymmetric seq2seq model pre-trained from scratch. ArXiv, abs/2309.10706.
  • Li and Zhang (2023) Mingchen Li and Rui Zhang. 2023. How far is language model from 100% few-shot named entity recognition in medical domain. arXiv preprint arXiv:2307.00186.
  • Li et al. (2023d) Peng Li, Yeye He, Dror Yashar, Weiwei Cui, Song Ge, Haidong Zhang, Danielle Rifinski Fainman, Dongmei Zhang, and Surajit Chaudhuri. 2023d. Table-gpt: Table-tuned gpt for diverse table tasks. arXiv preprint arXiv:2310.09263.
  • Li et al. (2023e) Peng Li, Tianxiang Sun, Qiong Tang, Hang Yan, Yuanbin Wu, Xuanjing Huang, and Xipeng Qiu. 2023e. Codeie: Large code generation models are better few-shot information extractors. arXiv preprint arXiv:2305.05711.
  • Li et al. (2023f) Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, et al. 2023f. Starcoder: may the source be with you! arXiv preprint arXiv:2305.06161.
  • Li and Liang (2021a) Xiang Lisa Li and Percy Liang. 2021a. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597.
  • Li and Liang (2021b) Xiang Lisa Li and Percy Liang. 2021b. Prefix-tuning: Optimizing continuous prompts for generation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), abs/2101.00190.
  • Li et al. (2022a) Yu Li, Baolin Peng, Pengcheng He, Michel Galley, Zhou Yu, and Jianfeng Gao. 2022a. Dionysus: A pre-trained model for low-resource dialogue summarization. ArXiv, abs/2212.10018.
  • Li et al. (2023g) Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar, and Yin Tat Lee. 2023g. Textbooks are all you need ii: phi-1.5 technical report. arXiv preprint arXiv:2309.05463.
  • Li et al. (2022b) Zekun Li, Wenhu Chen, Shiyang Li, Hong Wang, Jing Qian, and Xifeng Yan. 2022b. Controllable dialogue simulation with in-context learning. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4330–4347.
  • Liang et al. (2023) Zujie Liang, Feng Wei, Yin Jie, Yuxi Qian, Zhenghong Hao, and Bing Han. 2023. Prompts can play lottery tickets well: Achieving lifelong information extraction via lottery prompt tuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 277–292, Toronto, Canada. Association for Computational Linguistics.
  • Lin et al. (2023) Eleanor Lin, James Hale, and Jonathan Gratch. 2023. Toward a better understanding of the emotional dynamics of negotiation with large language models. In Proceedings of the Twenty-fourth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing, pages 545–550.
  • Liu et al. (2023a) Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. 2023a. Improved baselines with visual instruction tuning. arXiv preprint arXiv:2310.03744.
  • Liu et al. (2023b) Hong Liu, Yucheng Cai, Yuan Zhou, Zhijian Ou, Yi Huang, and Junlan Feng. 2023b. Prompt pool based class-incremental continual learning for dialog state tracking. In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pages 1–8. IEEE.
  • Liu et al. (2022) Tianyu Liu, Yizhe Zhang, Chris Brockett, Yi Mao, Zhifang Sui, Weizhu Chen, and William B Dolan. 2022. A token-level reference-free hallucination detection benchmark for free-form text generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6723–6737.
  • Lu et al. (2023a) Di Lu, Shihao Ran, Joel Tetreault, and Alejandro Jaimes. 2023a. Event extraction as question generation and answering. arXiv preprint arXiv:2307.05567.
  • Lu et al. (2023b) Hongyuan Lu, Haoyang Huang, Dongdong Zhang, Haoran Yang, Wai Lam, and Furu Wei. 2023b. Chain-of-dictionary prompting elicits translation in large language models. ArXiv, abs/2305.06575.
  • Lu et al. (2023c) Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, and Jianfeng Gao. 2023c. Mathvista: Evaluating mathematical reasoning of foundation models in visual contexts. arXiv preprint arXiv:2310.02255.
  • Lu et al. (2022) Pan Lu, Swaroop Mishra, Tanglin Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, and Ashwin Kalyan. 2022. Learn to explain: Multimodal reasoning via thought chains for science question answering. Advances in Neural Information Processing Systems, 35:2507–2521.
  • Lu et al. (2023d) Pan Lu, Liang Qiu, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Tanmay Rajpurohit, Peter Clark, and Ashwin Kalyan. 2023d. Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning. In The Eleventh International Conference on Learning Representations.
  • Lu et al. (2023e) Pan Lu, Liang Qiu, Wenhao Yu, Sean Welleck, and Kai-Wei Chang. 2023e. A survey of deep learning for mathematical reasoning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14605–14631.
  • Lu et al. (2021) Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, et al. 2021. Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664.
  • Luo et al. (2023a) Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Jianguang Lou, Chongyang Tao, Xiubo Geng, Qingwei Lin, Shifeng Chen, and Dongmei Zhang. 2023a. Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct.
  • Luo et al. (2023b) Tongxu Luo, Fangyu Lei, Jiahe Lei, Weihao Liu, Shihu He, Jun Zhao, and Kang Liu. 2023b. Hrot: Hybrid prompt strategy and retrieval of thought for table-text hybrid question answering. arXiv preprint arXiv:2309.12669.
  • Luo et al. (2023c) Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, and Daxin Jiang. 2023c. Wizardcoder: Empowering code large language models with evol-instruct. arXiv preprint arXiv:2306.08568.
  • Min et al. (2023) Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Wei Koh, Mohit Iyyer, Luke Zettlemoyer, and Hannaneh Hajishirzi. 2023. Factscore: Fine-grained atomic evaluation of factual precision in long form text generation. arXiv preprint arXiv:2305.14251.
  • Min et al. (2022) Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2022. Rethinking the role of demonstrations: What makes in-context learning work? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11048–11064.
  • Mitra et al. (2023) Chancharik Mitra, Brandon Huang, Trevor Darrell, and Roei Herzig. 2023. Compositional chain-of-thought prompting for large multimodal models. arXiv preprint arXiv:2311.17076.
  • Moslem et al. (2023) Yasmin Moslem, Rejwanul Haque, and Andy Way. 2023. Fine-tuning large language models for adaptive machine translation. ArXiv, abs/2312.12740.
  • Muhlgay et al. (2023) Dor Muhlgay, Ori Ram, Inbal Magar, Yoav Levine, Nir Ratner, Yonatan Belinkov, Omri Abend, Kevin Leyton-Brown, Amnon Shashua, and Yoav Shoham. 2023. Generating benchmarks for factuality evaluation of language models. arXiv preprint arXiv:2307.06908.
  • Nijkamp et al. (2022) Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2022. Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474.
  • OpenAI (2023) OpenAI. 2023. Gpt-4 technical report.
  • Ouyang et al. (2022) Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  • Pagnoni et al. (2022) Artidoro Pagnoni, Alexander R. Fabbri, Wojciech Kryscinski, and Chien-Sheng Wu. 2022. Socratic pretraining: Question-driven pretraining for controllable summarization. ArXiv, abs/2212.10449.
  • Pan et al. (2023) Wenbo Pan, Qiguang Chen, Xiao Xu, Wanxiang Che, and Libo Qin. 2023. A preliminary evaluation of chatgpt for zero-shot dialogue understanding. arXiv preprint arXiv:2304.04256.
  • Patnaik et al. (2024) Sohan Patnaik, Heril Changwal, Milan Aggarwal, Sumita Bhatia, Yaman Kumar, and Balaji Krishnamurthy. 2024. Cabinet: Content relevance based noise reduction for table question answering. arXiv preprint arXiv:2402.01155.
  • Peng et al. (2023) Keqin Peng, Liang Ding, Qihuang Zhong, Li Shen, Xuebo Liu, Min Zhang, Yuanxin Ouyang, and Dacheng Tao. 2023. Towards making the most of chatgpt for machine translation. arXiv preprint arXiv:2303.13780.
  • Qin et al. (2019) Libo Qin, Wanxiang Che, Yangming Li, Haoyang Wen, and Ting Liu. 2019. A stack-propagation framework with token-level intent detection for spoken language understanding. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2078–2087, Hong Kong, China. Association for Computational Linguistics.
  • Qin et al. (2023a) Libo Qin, Qiguang Chen, Fuxuan Wei, Shijue Huang, and Wanxiang Che. 2023a. Cross-lingual prompting: Improving zero-shot chain-of-thought reasoning across languages. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2695–2709.
  • Qin et al. (2024) Libo Qin, Qiguang Chen, Yuhang Zhou, Zhi Chen, Yinghui Li, Lizi Liao, Min Li, Wanxiang Che, and Philip S Yu. 2024. Multilingual large language model: A survey of resources, taxonomy and frontiers. arXiv preprint arXiv:2404.04925.
  • Qin et al. (2021) Libo Qin, Tianbao Xie, Wanxiang Che, and Ting Liu. 2021. A survey on spoken language understanding: Recent advances and new frontiers. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pages 4577–4584. International Joint Conferences on Artificial Intelligence Organization. Survey Track.
  • Qin et al. (2023b) Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al. 2023b. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789.
  • Qiu et al. (2023) Huachuan Qiu, Hongliang He, Shuai Zhang, Anqi Li, and Zhenzhong Lan. 2023. Smile: Single-turn to multi-turn inclusive language expansion via chatgpt for mental health support. arXiv preprint arXiv:2305.00450.
  • Raunak et al. (2023) Vikas Raunak, Hany Hassan Awadalla, and Arul Menezes. 2023. Dissecting in-context learning of translations in gpts. ArXiv, abs/2310.15987.
  • Ravaut et al. (2023a) Mathieu Ravaut, Hailin Chen, Ruochen Zhao, Chengwei Qin, Shafiq R. Joty, and Nancy F. Chen. 2023a. Promptsum: Parameter-efficient controllable abstractive summarization. ArXiv, abs/2308.03117.
  • Ravaut et al. (2023b) Mathieu Ravaut, Shafiq R. Joty, Aixin Sun, and Nancy F. Chen. 2023b. On context utilization in summarization with large language models.
  • Roziere et al. (2023) Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, et al. 2023. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.
  • Sainz et al. (2023) Oscar Sainz, Iker García-Ferrero, Rodrigo Agerri, Oier Lopez de Lacalle, German Rigau, and Eneko Agirre. 2023. Gollie: Annotation guidelines improve zero-shot information-extraction. arXiv preprint arXiv:2310.03668.
  • Sarikaya et al. (2016) Ruhi Sarikaya, Paul A Crook, Alex Marin, Minwoo Jeong, Jean-Philippe Robichaud, Asli Celikyilmaz, Young-Bum Kim, Alexandre Rochette, Omar Zia Khan, Xiaohu Liu, et al. 2016. An overview of end-to-end language understanding and dialog management for personal digital assistants. In 2016 ieee spoken language technology workshop (slt), pages 391–397. IEEE.
  • Schick et al. (2023) Timo Schick, Jane Dwivedi-Yu, Roberto Dessi, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselves to use tools. In Thirty-seventh Conference on Neural Information Processing Systems.
  • Shi et al. (2022) Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, et al. 2022. Language models are multilingual chain-of-thought reasoners. In The Eleventh International Conference on Learning Representations.
  • Shi et al. (2018) Tian Shi, Yaser Keneshloo, Naren Ramakrishnan, and Chandan K. Reddy. 2018. Neural abstractive text summarization with sequence-to-sequence models. ACM Transactions on Data Science, 2:1 – 37.
  • Shinn et al. (2023) Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik R Narasimhan, and Shunyu Yao. 2023. Reflexion: Language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems.
  • Shojaee et al. (2023) Parshin Shojaee, Aneesh Jain, Sindhu Tipirneni, and Chandan K Reddy. 2023. Execution-based code generation using deep reinforcement learning. arXiv preprint arXiv:2301.13816.
  • Singha et al. (2023) Ananya Singha, José Cambronero, Sumit Gulwani, Vu Le, and Chris Parnin. 2023. Tabular representation, noisy operators, and impacts on table structure understanding tasks in llms. In NeurIPS 2023 Second Table Representation Learning Workshop.
  • Sui et al. (2023a) Yuan Sui, Mengyu Zhou, Mingjie Zhou, Shi Han, and Dongmei Zhang. 2023a. Gpt4table: Can large language models understand structured table data? a benchmark and empirical study.
  • Sui et al. (2023b) Yuan Sui, Jiaru Zou, Mengyu Zhou, Xinyi He, Lun Du, Shi Han, and Dongmei Zhang. 2023b. Tap4llm: Table provider on sampling, augmenting, and packing semi-structured data for large language model reasoning. arXiv preprint arXiv:2312.09039.
  • Sun et al. (2023a) Hao Sun, Zhexin Zhang, Jiawen Deng, Jiale Cheng, and Minlie Huang. 2023a. Safety assessment of chinese large language models. arXiv preprint arXiv:2304.10436.
  • Sun et al. (2023b) Xiaofei Sun, Xiaoya Li, Shengyu Zhang, Shuhe Wang, Fei Wu, Jiwei Li, Tianwei Zhang, and Guoyin Wang. 2023b. Sentiment analysis through llm negotiations. arXiv preprint arXiv:2311.01876.
  • Tang et al. (2023) Yuting Tang, Ratish Puduppully, Zhengyuan Liu, and Nancy Chen. 2023. In-context learning of large language models for controlled dialogue summarization: A holistic benchmark and empirical analysis. In Proceedings of the 4th New Frontiers in Summarization Workshop, pages 56–67, Singapore. Association for Computational Linguistics.
  • Touvron et al. (2023) Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  • Tur and De Mori (2011) Gokhan Tur and Renato De Mori. 2011. Spoken language understanding: Systems for extracting semantic information from speech. John Wiley & Sons.
  • Ustun and Stickland (2022) A. Ustun and Asa Cooper Stickland. 2022. When does parameter-efficient transfer learning work for machine translation? In Conference on Empirical Methods in Natural Language Processing.
  • Varia et al. (2022) Siddharth Varia, Shuai Wang, Kishaloy Halder, Robert Vacareanu, Miguel Ballesteros, Yassine Benajiba, Neha Anna John, Rishita Anubhai, Smaranda Muresan, and Dan Roth. 2022. Instruction tuning for few-shot aspect-based sentiment analysis. arXiv preprint arXiv:2210.06629.
  • Wan et al. (2023a) Yuxuan Wan, Wenxuan Wang, Pinjia He, Jiazhen Gu, Haonan Bai, and Michael R. Lyu. 2023a. Biasasker: Measuring the bias in conversational ai system. Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering.
  • Wan et al. (2023b) Zhen Wan, Fei Cheng, Zhuoyuan Mao, Qianying Liu, Haiyue Song, Jiwei Li, and Sadao Kurohashi. 2023b. Gpt-re: In-context learning for relation extraction using large language models. arXiv preprint arXiv:2305.02105.
  • Wang et al. (2023a) Jiaan Wang, Yunlong Liang, Fandong Meng, Beiqi Zou, Zhixu Li, Jianfeng Qu, and Jie Zhou. 2023a. Zero-shot cross-lingual summarization via large language models.
  • Wang et al. (2023b) Jiaan Wang, Yunlong Liang, Fandong Meng, Beiqi Zou, Zhixu Li, Jianfeng Qu, and Jie Zhou. 2023b. Zero-shot cross-lingual summarization via large language models. In Proceedings of the 4th New Frontiers in Summarization Workshop, pages 12–23, Singapore. Association for Computational Linguistics.
  • Wang et al. (2023c) Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, et al. 2023c. A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432.
  • Wang et al. (2023d) Longyue Wang, Chenyang Lyu, Tianbo Ji, Zhirui Zhang, Dian Yu, Shuming Shi, and Zhaopeng Tu. 2023d. Document-level machine translation with large language models. arXiv preprint arXiv:2304.02210.
  • Wang et al. (2023e) Weihan Wang, Qingsong Lv, Wenmeng Yu, Wenyi Hong, Ji Qi, Yan Wang, Junhui Ji, Zhuoyi Yang, Lei Zhao, Xixuan Song, Jiazheng Xu, Bin Xu, Juanzi Li, Yuxiao Dong, Ming Ding, and Jie Tang. 2023e. Cogvlm: Visual expert for pretrained language models. ArXiv.
  • Wang et al. (2023f) Xiao Wang, Weikang Zhou, Can Zu, Han Xia, Tianze Chen, Yuansen Zhang, Rui Zheng, Junjie Ye, Qi Zhang, Tao Gui, et al. 2023f. Instructuie: Multi-task instruction tuning for unified information extraction. arXiv preprint arXiv:2304.08085.
  • Wang et al. (2023g) Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023g. Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations.
  • Wang et al. (2023h) Yiming Wang, Zhuosheng Zhang, and Rui Wang. 2023h. Element-aware summarization with large language models: Expert-aligned evaluation and chain-of-thought method. arXiv preprint arXiv:2305.13412.
  • Wang et al. (2023i) Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi DQ Bui, Junnan Li, and Steven CH Hoi. 2023i. Codet5+: Open code large language models for code understanding and generation. arXiv preprint arXiv:2305.07922.
  • Wang et al. (2021) Yue Wang, Weishi Wang, Shafiq Joty, and Steven CH Hoi. 2021. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859.
  • Wang et al. (2022) Zengzhi Wang, Rui Xia, and Jianfei Yu. 2022. Unifiedabsa: A unified absa framework based on multi-task instruction tuning. arXiv preprint arXiv:2211.10986.
  • Wang et al. (2023j) Zengzhi Wang, Qiming Xie, Zixiang Ding, Yi Feng, and Rui Xia. 2023j. Is chatgpt a good sentiment analyzer? a preliminary study. arXiv preprint arXiv:2304.04339.
  • Wankhade et al. (2022) Mayur Wankhade, Annavarapu Chandra Sekhara Rao, and Chaitanya Kulkarni. 2022. A survey on sentiment analysis methods, applications, and challenges. Artificial Intelligence Review, 55(7):5731–5780.
  • Wei et al. (2022a) Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V Le. 2022a. Finetuned language models are zero-shot learners. In International Conference on Learning Representations.
  • Wei et al. (2022b) Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, et al. 2022b. Emergent abilities of large language models. Transactions on Machine Learning Research.
  • Wei et al. (2022c) Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed H Chi, Quoc V Le, Denny Zhou, et al. 2022c. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems.
  • Wei et al. (2023a) Xiang Wei, Xingyu Cui, Ning Cheng, Xiaobin Wang, Xin Zhang, Shen Huang, Pengjun Xie, Jinan Xu, Yufeng Chen, Meishan Zhang, et al. 2023a. Zero-shot information extraction via chatting with chatgpt. arXiv preprint arXiv:2302.10205.
  • Wei et al. (2023b) Xiangpeng Wei, Hao-Ran Wei, Huan Lin, Tianhao Li, Pei Zhang, Xingzhang Ren, Mei Li, Yu Wan, Zhiwei Cao, Binbin Xie, Tianxiang Hu, Shangjie Li, Binyuan Hui, Yu Bowen, Dayiheng Liu, Baosong Yang, Fei Huang, and Jun Xie. 2023b. Polylm: An open source polyglot large language model. ArXiv, abs/2307.06018.
  • Weyssow et al. (2023) Martin Weyssow, Xin Zhou, Kisub Kim, David Lo, and Houari Sahraoui. 2023. Exploring parameter-efficient fine-tuning techniques for code generation with large language models. arXiv preprint arXiv:2308.10462.
  • Winata et al. (2023) Genta Winata, Alham Fikri Aji, Zheng Xin Yong, and Thamar Solorio. 2023. The decades progress on code-switching research in NLP: A systematic survey on trends and challenges. In Findings of the Association for Computational Linguistics: ACL 2023, pages 2936–2978, Toronto, Canada. Association for Computational Linguistics.
  • Workshop et al. (2022) BigScience Workshop, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, et al. 2022. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
  • Wu et al. (2023a) Bohong Wu, Fei Yuan, Hai Zhao, Lei Li, and Jingjing Xu. 2023a. Extrapolating multilingual understanding models as multilingual generators. In Conference on Empirical Methods in Natural Language Processing.
  • Wu et al. (2024) Minghao Wu, Thuy-Trang Vu, Lizhen Qu, George Foster, and Gholamreza Haffari. 2024. Adapting large language models for document-level machine translation. ArXiv, abs/2401.06468.
  • Wu et al. (2023b) Yifan Wu, Pengchuan Zhang, Wenhan Xiong, Barlas Oguz, James C Gee, and Yixin Nie. 2023b. The role of chain-of-thought in complex vision-language reasoning task. arXiv preprint arXiv:2311.09193.
  • Wu et al. (2023c) Yuxiang Wu, Guanting Dong, and Weiran Xu. 2023c. Semantic parsing by large language models for intricate updating strategies of zero-shot dialogue state tracking. arXiv preprint arXiv:2310.10520.
  • Xie et al. (2022) Tianbao Xie, Chen Henry Wu, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Wu, Ming Zhong, Pengcheng Yin, Sida I Wang, et al. 2022. Unifiedskg: Unifying and multi-tasking structured knowledge grounding with text-to-text language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 602–631.
  • Xie et al. (2023) Tingyu Xie, Qi Li, Jian Zhang, Yan Zhang, Zuozhu Liu, and Hongwei Wang. 2023. Empirical study of zero-shot NER with ChatGPT. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7935–7956, Singapore. Association for Computational Linguistics.
  • Xu et al. (2023a) Derong Xu, Wei Chen, Wenjun Peng, Chao Zhang, Tong Xu, Xiangyu Zhao, Xian Wu, Yefeng Zheng, and Enhong Chen. 2023a. Large language models for generative information extraction: A survey. arXiv preprint arXiv:2312.17617.
  • Xu et al. (2023b) Haoran Xu, Young Jin Kim, Amr Sharaf, and Hany Hassan Awadalla. 2023b. A paradigm shift in machine translation: Boosting translation performance of large language models. ArXiv, abs/2309.11674.
  • Xu et al. (2024) Haoran Xu, Amr Sharaf, Yunmo Chen, Weiting Tan, Lingfeng Shen, Benjamin Van Durme, Kenton Murray, and Young Jin Kim. 2024. Contrastive preference optimization: Pushing the boundaries of llm performance in machine translation. ArXiv, abs/2401.08417.
  • Xu et al. (2023c) Xiancai Xu, Jia-Dong Zhang, Rongchang Xiao, and Lei Xiong. 2023c. The limits of chatgpt in extracting aspect-category-opinion-sentiment quadruples: A comparative analysis. arXiv preprint arXiv:2310.06502.
  • Xue et al. (2021) Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, and Colin Raffel. 2021. mt5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498.
  • Xue et al. (2023) Siqiao Xue, Caigao Jiang, Wenhui Shi, Fangyin Cheng, Keting Chen, Hongjun Yang, Zhiping Zhang, Jianshan He, Hongyang Zhang, Ganglin Wei, et al. 2023. Db-gpt: Empowering database interactions with private large language models. arXiv preprint arXiv:2312.17449.
  • Yang and Li (2023) Bin Yang and Jinlong Li. 2023. Visual elements mining as prompts for instruction learning for target-oriented multimodal sentiment classification. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 6062–6075.
  • (164) Jingfeng Yang, Hongye Jin, Ruixiang Tang, Xiaotian Han, Qizhang Feng, Haoming Jiang, Shaochen Zhong, Bing Yin, and Xia Hu. Harnessing the power of llms in practice: A survey on chatgpt and beyond. ACM Transactions on Knowledge Discovery from Data.
  • Yang et al. (2023a) Zhengyuan Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Chung-Ching Lin, Zicheng Liu, and Lijuan Wang. 2023a. The dawn of lmms: Preliminary explorations with gpt-4v (ision). arXiv preprint arXiv:2309.17421, 9(1):1.
  • Yang et al. (2023b) Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Ehsan Azarnasab, Faisal Ahmed, Zicheng Liu, Ce Liu, Michael Zeng, and Lijuan Wang. 2023b. Mm-react: Prompting chatgpt for multimodal reasoning and action. arXiv preprint arXiv:2303.11381.
  • Yao et al. (2023) Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601.
  • Ye et al. (2024) Junyi Ye, Mengnan Du, and Guiling Wang. 2024. Dataframe qa: A universal llm framework on dataframe question answering without data exposure. arXiv preprint arXiv:2401.15463.
  • Ye et al. (2023) Yunhu Ye, Binyuan Hui, Min Yang, Binhua Li, Fei Huang, and Yongbin Li. 2023. Large language models are versatile decomposers: Decomposing evidence and questions for table-based reasoning. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 174–184.
  • Yu et al. (2022) Dian Yu, Mingqiu Wang, Yuan Cao, Laurent El Shafey, Izhak Shafran, and Hagen Soltau. 2022. Knowledge-grounded dialog state tracking. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 3428–3435.
  • Yuan et al. (2022) Ruifeng Yuan, Zili Wang, Ziqiang Cao, and Wenjie Li. 2022. Few-shot query-focused summarization with prefix-merging. ArXiv, abs/2211.16164.
  • Yue et al. (2023) Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, and Wenhu Chen. 2023. Mammoth: Building math generalist models through hybrid instruction tuning.
  • Zhang et al. (2024) Hangwen Zhang, Qingyi Si, Peng Fu, Zheng Lin, and Weiping Wang. 2024. Are large language models table-based fact-checkers? arXiv preprint arXiv:2402.02549.
  • Zhang et al. (2023a) Haochen Zhang, Yuyang Dong, Chuan Xiao, and Masafumi Oyamada. 2023a. Jellyfish: A large language model for data preprocessing. arXiv preprint arXiv:2312.01678.
  • Zhang et al. (2023b) Haopeng Zhang, Xiao Liu, and Jiawei Zhang. 2023b. Extractive summarization via chatgpt for faithful summary generation. In Conference on Empirical Methods in Natural Language Processing.
  • Zhang et al. (2023c) Kai Zhang, Bernal Jimenez Gutierrez, and Yu Su. 2023c. Aligning instruction tasks unlocks large language models as zero-shot relation extractors. ArXiv, abs/2305.11159.
  • Zhang et al. (2022a) Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. 2022a. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
  • Zhang et al. (2023d) Tianshu Zhang, Xiang Yue, Yifei Li, and Huan Sun. 2023d. Tablellama: Towards open large generalist models for tables. arXiv preprint arXiv:2311.09206.
  • Zhang et al. (2023e) Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, and Tatsunori Hashimoto. 2023e. Benchmarking large language models for news summarization. Transactions of the Association for Computational Linguistics, 12:39–57.
  • Zhang et al. (2023f) Wenqi Zhang, Yongliang Shen, Weiming Lu, and Yueting Zhuang. 2023f. Data-copilot: Bridging billions of data and humans with autonomous workflow. arXiv preprint arXiv:2306.07209.
  • Zhang et al. (2023g) Wenxuan Zhang, Yue Deng, Bing Liu, Sinno Jialin Pan, and Lidong Bing. 2023g. Sentiment analysis in the era of large language models: A reality check. arXiv preprint arXiv:2305.15005.
  • Zhang et al. (2023h) Xiaoying Zhang, Baolin Peng, Kun Li, Jingyan Zhou, and Helen Meng. 2023h. Sgp-tod: Building task bots effortlessly via schema-guided llm prompting. arXiv preprint arXiv:2305.09067.
  • Zhang et al. (2023i) Yichi Zhang, Jianing Yang, Keunwoo Yu, Yinpei Dai, Shane Storks, Yuwei Bao, Jiayi Pan, Nikhil Devraj, Ziqiao Ma, and Joyce Chai. 2023i. Seagull: An embodied agent for instruction following through situated dialog.
  • Zhang et al. (2023j) Yunjia Zhang, Jordan Henkel, Avrilia Floratou, Joyce Cahoon, Shaleen Deep, and Jignesh M Patel. 2023j. Reactable: Enhancing react for table question answering. arXiv preprint arXiv:2310.00815.
  • Zhang et al. (2023k) Zhehao Zhang, Xitao Li, Yan Gao, and Jian-Guang Lou. 2023k. CRT-QA: A dataset of complex reasoning question answering over tabular data. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2131–2153, Singapore. Association for Computational Linguistics.
  • Zhang et al. (2022b) Zhuosheng Zhang, Aston Zhang, Mu Li, and Alex Smola. 2022b. Automatic chain of thought prompting in large language models. In The Eleventh International Conference on Learning Representations.
  • Zhang et al. (2023l) Zhuosheng Zhang, Aston Zhang, Mu Li, Hai Zhao, George Karypis, and Alex Smola. 2023l. Multimodal chain-of-thought reasoning in language models. arXiv preprint arXiv:2302.00923.
  • Zhao et al. (2022a) Jeffrey Zhao, Raghav Gupta, Yuan Cao, Dian Yu, Mingqiu Wang, Harrison Lee, Abhinav Rastogi, Izhak Shafran, and Yonghui Wu. 2022a. Description-driven task-oriented dialog modeling. arXiv preprint arXiv:2201.08904.
  • Zhao et al. (2016) Jun Zhao, Kang Liu, and Liheng Xu. 2016. Sentiment analysis: mining opinions, sentiments, and emotions.
  • Zhao et al. (2022b) Lulu Zhao, Fujia Zheng, Weihao Zeng, Keqing He, Weiran Xu, Huixing Jiang, Wei Wu, and Yanan Wu. 2022b. Domain-oriented prefix-tuning: Towards efficient and generalizable fine-tuning for zero-shot dialogue summarization. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4848–4862, Seattle, United States. Association for Computational Linguistics.
  • Zhao et al. (2023a) Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023a. A survey of large language models. arXiv preprint arXiv:2303.18223.
  • Zhao et al. (2023b) Weixiang Zhao, Yanyan Zhao, Xin Lu, Shilong Wang, Yanpeng Tong, and Bing Qin. 2023b. Is chatgpt equipped with emotional dialogue capabilities? arXiv preprint arXiv:2304.09582.
  • Zheng et al. (2023) Qinkai Zheng, Xiao Xia, Xu Zou, Yuxiao Dong, Shan Wang, Yufei Xue, Zihan Wang, Lei Shen, Andi Wang, Yang Li, et al. 2023. Codegeex: A pre-trained model for code generation with multilingual evaluations on humaneval-x. arXiv preprint arXiv:2303.17568.
  • Zhu et al. (2024) Fengbin Zhu, Ziyang Liu, Fuli Feng, Chao Wang, Moxin Li, and Tat-Seng Chua. 2024. Tat-llm: A specialized language model for discrete reasoning over tabular and textual data. arXiv preprint arXiv:2401.13223.
  • Zhu et al. (2023a) Wenhao Zhu, Yunzhe Lv, Qingxiu Dong, Fei Yuan, Jingjing Xu, Shujian Huang, Lingpeng Kong, Jiajun Chen, and Lei Li. 2023a. Extrapolating large language models to non-english by aligning languages. ArXiv, abs/2308.04948.
  • Zhu et al. (2023b) Xizhou Zhu, Yuntao Chen, Hao Tian, Chenxin Tao, Weijie Su, Chenyu Yang, Gao Huang, Bin Li, Lewei Lu, Xiaogang Wang, et al. 2023b. Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory. arXiv preprint arXiv:2305.17144.
  • Zhuang et al. (2023) Ziyu Zhuang, Qiguang Chen, Longxuan Ma, Mingda Li, Yi Han, Yushan Qian, Haopeng Bai, Zixian Feng, Weinan Zhang, and Ting Liu. 2023. Through the lens of core competency: Survey on evaluation of large language models. arXiv preprint arXiv:2308.07902.
  • Zhuo et al. (2024) Terry Yue Zhuo, Armel Zebaze, Nitchakarn Suppattarachai, Leandro von Werra, Harm de Vries, Qian Liu, and Niklas Muennighoff. 2024. Astraios: Parameter-efficient instruction tuning code large language models. arXiv preprint arXiv:2401.00788.