LawGPT: A Chinese Legal Knowledge-Enhanced Large Language Model

Zhi Zhou1†, Jiang-Xin Shi12†, Peng-Xiao Song1†, Xiao-Wen Yang12†,
Yi-Xuan Jin1, Lan-Zhe Guo13‡, Yu-Feng Li12‡
1
National Key Laboratory for Novel Software Technology, Nanjing University
2School of Artifical Intelligence, Nanjing University
3School of Intelligence Science and Technology, Nanjing University
{zhouz,shijx,songpx,yangxw,jinyx,guolz,liyf}@lamda.nju.edu.cn
Equal Contribution Corresponding Author
Abstract

Large language models (LLMs), including both proprietary and open-source models, have showcased remarkable capabilities in addressing a wide range of downstream tasks. Nonetheless, when it comes to practical Chinese legal tasks, these models fail to meet the actual requirements. Proprietary models do not ensure data privacy for sensitive legal cases, while open-source models demonstrate unsatisfactory performance due to their lack of legal knowledge. To address this problem, we introduce LawGPT, the first open-source model specifically designed for Chinese legal applications. LawGPT comprises two key components: legal-oriented pre-training and legal supervised fine-tuning. Specifically, we employ large-scale Chinese legal documents for legal-oriented pre-training to incorporate legal domain knowledge. To further improve the model’s performance on downstream legal tasks, we create a knowledge-driven instruction dataset for legal supervised fine-tuning. Our experimental results demonstrate that LawGPT outperforms the open-source LLaMA 7B model. Our code and resources are publicly available at https://github.com/pengxiao-song/LaWGPT and have received 5.7K stars on GitHub.

1 Introduction

Large language models (LLMs) (OpenAI, 2023b; Touvron et al., 2023b) have achieved remarkable success in various natural language processing (NLP) tasks, including natural language understanding (Dong et al., 2019), reasoning (Huang and Chang, 2023), and generation (Yu et al., 2022). Both proprietary and open-source LLMs exhibit strong generalization capabilities, enabling their application in diverse downstream scenarios, such as medicine (Thirunavukarasu et al., 2023), finance (Yang et al., 2023b), education (Gan et al., 2023). Recent studies (Fei et al., 2023; Nguyen, 2023) have demonstrated the preliminary effectiveness of existing general LLMs in legal tasks, including legal judgment prediction (Luo et al., 2017), legal documents retrieval (Chen et al., 2013), and legal question answering (Zhong et al., 2020a).

Despite the preliminary effectiveness of LLMs in legal applications, there are two obstacles that hinder their practical use in legal tasks. On the one hand, proprietary LLMs such as GPT-4 (OpenAI, 2023b) and GPT-3.5 Turbo (OpenAI, 2023a) can only be accessed through APIs, which do not guarantee data privacy in sensitive legal cases. On the other hand, open-source LLMs like LLaMA (Touvron et al., 2023a) and ChatGLM (Du et al., 2022) fail to achieve satisfactory performance due to their insufficient legal knowledge and incompatibility with downstream legal tasks. Therefore, it is necessary to develop a open-source LLM specifically designed for legal applications in order to overcome the existing obstacles.

In this paper, we introduce LawGPT, the first open-source Chinese legal knowledge-enhanced large language model. With the advantage of being open-source, LawGPT can be self-hosted and accessed privately to ensure data privacy, as compared to proprietary models. We then present legal-oriented pre-training, which utilizes our large-scale legal pre-training corpus to incorporate domain-specific legal knowledge into LawGPT, improving its understanding, reasoning, and generation foundational capabilities in legal tasks. Additionally, we propose legal supervised fine-tuning, employing our knowledge-driven instruction dataset to further enhance LawGPT’s performance on downstream legal tasks. Experimental results demonstrate that LawGPT surpasses the open-source LLaMA 7B model in major legal tasks, shedding light on the development of a practical Chinese legal LLM.

In summary, our contributions can be summarized as follows:

  1. (a)

    We present the first open-source Chinese legal knowledge-enhanced large language model LawGPT. The code and model are available on GitHub 111https://github.com/pengxiao-song/LaWGPT and have received 5.7K stars.

  2. (b)

    We construct a comprehensive legal pre-training corpus and propose a legal-oriented pre-training approach to enhance LawGPT’s foundational abilities in legal tasks by integrating domain-specific knowledge.

  3. (c)

    We create a knowledge-driven instruction dataset and utilize legal supervised fine-tuning to further adapt LawGPT to various legal tasks and improve its downstream performance.

  4. (d)

    Our experimental results demonstrate that LawGPT achieves better performance than the open-source LLaMA 7B model across major legal tasks, providing strong evidence for the effectiveness of our proposed model.

2 Related Work

In this section, we review the existing work on addressing legal tasks using LLMs. This focus is on general language models, legal language models, and legal benchmarks as follows.

2.1 General Language Models

Benefiting from training with large scale corpus, recent LLMs have shown impressive performance on various kind of downstream tasks, including legal tasks. Recent LLMs, trained on extensive corpora, have demonstrated impressive performance across a variety of downstream tasks, including tasks in the legal domain. Proprietary LLMs, such as GPT-4 (OpenAI, 2023b), GPT-3.5-Turbo (OpenAI, 2023a), PaLM (Chowdhery et al., 2023), and PaLM2 (Anil et al., 2023), exhibit strong capabilities in handling legal tasks. Their impressive performance not only demonstrates the potential of LLMs in addressing legal tasks but also facilitates the low-cost, automated construction of high-quality datasets. Concurrently, open-source LLMs, such as LLaMA (Touvron et al., 2023a), LLaMA2 (Touvron et al., 2023b), MPT (Team, 2023), ChatGLM 2 (Du et al., 2022), and Baichuan 2 (Yang et al., 2023a), are available in various model scales. These open-source models facilitate the fine-tuning of legal-specific models using targeted legal datasets, potentially enhancing performance.

2.2 Legal Language Models

The legal language models are fine-tuning based on pre-trained language models or trained from scratch using legal related data for improving legal capabilities of models. Early research in this field utilized a model architecture with millions of parameters for various legal tasks. Extensive efforts have been made to address these tasks separately. These include legal judgment prediction (Luo et al., 2017; Chalkidis et al., 2019; Yang et al., 2019), legal documents and cases retrieval (Chen et al., 2013; Shao et al., 2020; Li et al., 2023), legal reading comprehension (Duan et al., 2019), and legal question answering (Zhong et al., 2020a; Phi et al., 2020). With the benefit of pre-trained models (Chalkidis et al., 2020; Cui et al., 2021), Lawformer (Xiao et al., 2021) combines three attention mechanisms to address the problem of long legal documents, covering a variety of legal tasks. Recent advances of LLMs have given rise to legal LLMs work. HanFei (He et al., 2023), LexiLaw (Li et al., 2023), LawGPT-zh (Hongcheng et al., 2023), and LawGPT-1.0 (Nguyen, 2023) fine-tune foundational LLMs using a specially constructed or collected legal corpus to enhance their legal capabilities. To tackle the hallucination problem in legal tasks, LLMs such as ChatLaw (Cui et al., 2023a), Wisdom-Interrogatory (Wu et al., 2024), and Lawyer-LLaMA (Huang et al., 2023) incorporate a legal data retrieval method to improve the robustness of their responses. LawGiBa (Nguyen et al., 2023), based on the GPT-4 model, has established a legal system. Fuzi-Mingcha (Deng et al., 2023) has created a legal syllogistic reasoning dataset for fine-tuning to ensure logical format and accurate reasoning results.

2.3 Legal Benchmarks

With the emergence of enormous language models for legal tasks, several benchmarks have been proposed to evaluate a variety of existing models. LawBench (Fei et al., 2023) collects 20 legal tasks within three cognitive levels, i.e., legal knowledge memorization, understanding, and applying, to thoroughly evaluate the performance of existing models. LAiW (Dai et al., 2023) contains 14 tasks to evaluate the legal capabilities of LLMs from three levels, i.e., basic information retrieval, legal foundation inference, and complex legal application. SimuCourt (He et al., 2024) introduces a judicial decision-making task to evaluate the judicial analysis and decision-making power of LLMs.

3 Methodology

Example 1: Legal Pre-training Corpus UTF8gbsn 上诉人*********(以下简称**学校)因与被上诉人************(以下简称**公司)装饰装修合同纠纷一案,不服*********人民法院(20xx)辽****民初****号民事判决,向本院提起上诉。本院依法组成合议庭审理了本案。本院认为 本院认为,一审判决程序违法。 1.**学校一审反诉请求解除装修合同及空调合同,一审仅判决解除装修合同,空调合同应否解除未予审理,属漏审漏判;一审双方当事人均未提出解除案涉补充协议,一审判决解除补充协议,超出当事人的诉请; 2.**学校一审反诉请求要求**公司按已付工程款数额开具发票,一审仅判决**公司给付欠付款项的发票,亦属漏审漏判; 3.**公司起诉状中明确了监控布线款为15600元,在**公司未提出变更诉讼请求的情况下,一审按照鉴定结论中的数额认定监控布线损失为32868.85元,亦属超出当事人的诉请。 一审判决认定事实不清。案涉工程两个施工合同均为固定总价合同,一审判决**公司给付**学校欠付工程款258449.56元依据的鉴定结论计算方式错误,且装修合同和空调合同应分别计算。另外,案涉已完工程造价鉴定中是否包含了已施工与图纸不符、质量不合格修复部位的工程造价?已施工与图纸不符、质量不合格修复费用的鉴定结论是否为修复到施工前的原始状态?重审时需补充鉴定。 一审重审时应围绕双方当事人的诉讼请求,合理分配举证责任,在查清事实的基础上依法裁判。 综上,依照《中华人民共和国民事诉讼法》第一百七十七条第一款第(三)、(四)项之规定,裁定如下:判决结果 一、撤销*********人民法院作出的(20xx)辽****民初****号民事判决; 二、本案发回*********人民法院重审。 上诉人*********学校预交的二审案件受理费6579元予以退回。
Example 2: Legal-supervised Dataset - Subset (a) UTF8gbsn Instruction: 请问我向借钱人要钱多次未果,向法院起诉,法院多久才立案 Output: 起诉的当日 ,法院就会立案的。
Example 3: Legal-supervised Dataset - Subset (b) UTF8gbsn Instruction: 根据《中华人民共和国海商法》,在海事关系的法律适用中,旗国法适用于下列哪些情形? (A) 船舶抵押权的设定 (B) 同国籍船舶在公海发生碰撞的损害赔偿 (C) 共同海损理算 (D) 海事赔偿责任限制. Output: (A) 船舶抵押权的设定(B) 同国籍船舶在公海发生碰撞的损害赔偿
Template 1: Prompt of ChatGPT for Augmentation UTF8gbsn 我希望你担任语言专家的角色。我会给你一段与法律问答文本,请你使用正式的文风润色它。要求:\n 1. 修正语法错误、标点符号错误,去掉特殊符号,必须使语句更通顺。 2. 使逻辑更清晰、格式更规范,比如向<answer>中换行符。 3. 使更礼貌,比如向<question>中加入“请问”等礼貌用语。 4. 不要写任何解释性语句。 5. <question>应该是问题,<answer>应该是答案。 这段对话是:\n<question>:{instruction} \n<answer>:{output} \n\n 以JSON格式返回结果:
Template 2: Alpaca Training Template Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n ### Instruction:\n{instruction}\n\n ### Response: \n{output}
Template 3: Alpaca Testing Template Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n ### Instruction:\n{instruction}\n\n ### Response: \n

In this section, we introduce our LawGPT, a large language model specifically designed for Chinese legal applications, aimed at effectively addressing various downstream legal tasks. LawGPT addresses the two major challenges in applying existing open-source general LLMs to legal tasks:

  1. (a)

    The lack of legal domain knowledge in open-source general LLMs, which is crucial for performing legal tasks effectively;

  2. (b)

    The insufficient training of open-source general LLMs on downstream legal tasks, resulting in suboptimal performance in legal applications.

We apply legal-oriented pre-training to LawGPT to incorporate legal domain knowledge within the open-source base model. Then, we conduct legal supervised fine-tuning to further enhance LawGPT’s performance on downstream legal tasks. Each component is elaborated as follows.

3.1 Legal-Oriented Pre-Training

General LLMs are typically pre-trained on large-scale general corpus, which may lack sufficient legal domain knowledge. Consequently, this can result in a limited understanding and reasoning ability for legal tasks. To address this limitation, we propose the integration of Legal-oriented Pre-Training (LPT) into LawGPT, aiming to enhance its legal domain knowledge.

To incorporate legal domain knowledge into LawGPT, we collect a large-scale legal pre-training corpus 𝒟LPT consisting of 500K legal documents from various legal domains, including civil law, criminal law, and administrative law. Example 3 presents a civil-law legal document from the legal pre-training corpus. For each legal document, the tokenizer of base model encodes the text into a token sequence 𝒙=(x0,x1,), and we perform legal-oriented pre-training on the base model fΘ() in an autoregressive manner using the following objective:

LPT(Θ,𝒟LPT)=𝔼𝒙𝒟LPT[ilogfΘ(xi|x0,x1,,xi1)] (1)

where x0,x1,,xi1 denote the context tokens, xi denotes the target token, and Θ is the parameters of base model fΘ(). We optimize the parameters of base model Θ using LPT to obtain the parameters of legal-oriented pre-trained model ΘLPT.

3.2 Legal-Supervised Fine-Tuning

Although fΘLPT() has been pretrained with legal domain knowledge, it is not optimal for specific downstream legal tasks as it cannot generate the desired responses by following the instructions. To address this issue, we propose Legal-Supervised Fine-Tuning (LFT) to further adapt LawGPT to various downstream legal tasks. Specifically, we construct a 300K knowledge-driven instruction dataset, 𝒟LFT, consisting of three subsets:

  1. (a)

    An open-source dataset 222https://github.com/liuhuanyong/CrimeKgAssistant with 200K samples, which includes crime type prediction and crime consult tasks to fine-tune the model for better understanding of crime-related legal tasks and generating user-friendly responses;

  2. (b)

    The JEC-QA dataset (Zhong et al., 2020b) with 20K samples, which consists of legal question answering tasks to fine-tune the model for better adaptation to legal downstream tasks;

  3. (c)

    A constructed legal datasets with 80K samples by refining subsets (a) and (b) with ChatGPT (OpenAI, 2023a), which augments more high-quality legal QA samples, thereby enhancing the generalizability of the model.

The subsets (a) and (b) are shown in Examples 3 and 3, respectively. The subset (c) is reinfed using the prompt template in Template 3 to augment the samples in subsets (a) and (b), where we replace <instruction> with real questions and <output> with the corresponding answer. We adopt the Stanford Alpaca template (Taori et al., 2023) in Template 3 to wrap the instruction and output in our dataset. Then, the parameters of our pre-trained model ΘLPT are fine-tuned on 𝒟LFT using the following objective:

LFT(Θ,𝒟LFT)=𝔼𝒙𝒟LFT[i{output}logfΘLPT(xi|x0,x1,,xi1)] (2)

where Θ represents the optimized parameters, 𝒙=(x0,x1,) represents the tokenized input sequence drawn from dataset 𝒟LFT and wrapped by Template 3, and {output} represents the index set of the output tokens. We optimize the our pre-trained parameters ΘLPT to obtain the parameters of LawGPT ΘLFT.

3.3 Inference of LawGPT

When applying LawGPT to downstream tasks, we should wrapped the instruction using the Alpaca template in Template 3 and then tokenized the texts into 𝒙=(x0,x1,,xn). Then, we feed the tokenized input sequence 𝒙 into the fine-tuned model fΘLFT() to generate the response in an autoregressive manner.

4 Experiments

4.1 Implementation Details

We trained LawGPT using 8 NVIDIA V100 GPUs, based on the Chinese-Alpaca-Plus 7B base model (Cui et al., 2023b), in two stages: legal-oriented pre-training, and legal-supervised fine-tuning. For legal-oriented pre-training, we adopt our 500K legal pre-training corpus 𝒟LPT to train the base model using the LoRA technique (Hu et al., 2022). We set the LoRA rank to 16, alpha to 32, and dropout to 0.05. The learning rate was set to 0.0003, the batch size to 128, and the training epoch to 1. For legal-supervised fine-tuning, we adopt our 30K legal-supervised corpus 𝒟LFT to fine-tune our pre-trained model with Alpaca template using the LoRA technique. We set the LoRA rank to 8, alpha to 16, and dropout to 0.05. We set the learning rate to 0.0003, the batch size to 64, and the training epoch to 20.

4.2 Performance Evaluation

Table 1: Performance comparison between LawGPT, proprietary models including GPT-3.5 Turbo (OpenAI, 2023a) and GPT-4 (OpenAI, 2023b), and 7B open-source model LLaMA (Touvron et al., 2023a) on the zero-shot setting. The best performance among LawGPT and open-source models is in bold.
Models Tasks
#1 #2 #3 #4 #5 #6 #7 #8 Avg.
GPT-3.5 Turbo 29.5 31.3 35.5 78.7 76.8 27.4 61.2 17.4 44.7
GPT-4 52.5 27.5 42.0 82.6 81.9 48.6 77.6 19.6 54.0
LLaMA 1.0 7.5 7.0 41.3 54.2 0.2 14.4 7.8 16.7
LaWGPT 0.2 11.0 15.7 42.4 40.8 6.2 15.4 7.6 17.4

In this section, we conduct experiments to evaluate the performance of LawGPT on 8 legal applications (Fei et al., 2023), including fact-based article prediction (#1), scene-based article prediction (#2), charge prediction (#3), prison term prediction without article (#4), prison term prediction with article (#5), case analysis (#6), criminal damages calculation (#7), and consultation (#8), in a zero-shot setting. We compare the performance of LawGPT with proprietary models including GPT-3.5 Turbo (OpenAI, 2023a) and GPT-4 (OpenAI, 2023b), and 7B open-source models including LLaMA (Touvron et al., 2023a). The results are shown in Table 1. The results show that our LawGPT outperforms LLaMA 7B model on major tasks and leading to a better average performance. Despite the advantage of preserving data privacy, there is still a significant performance gap between LawGPT and proprietary models. This result inspires us and the following researchers to explore the potential of LawGPT in the future work.

5 Conclusion

In this technical report, we introduce LawGPT, a Chinese legal knowledge-enhanced large language model specifically designed for Chinese legal applications. We introduce the legal-oriented pre-training and legal supervised fine-tuning to incorporate legal domain knowledge and enhance the model’s performance on downstream legal tasks, respectively. Our experimental results demonstrate that LawGPT outperforms the open-source LLaMA 7B model. We hope this technical report and LawGPT model can inspire future research on Chinese legal applications and contribute to the development of the legal AI community.

References

  • Anil et al. [2023] Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernández Ábrego, Junwhan Ahn, Jacob Austin, Paul Barham, Jan A. Botha, James Bradbury, Siddhartha Brahma, Kevin Brooks, Michele Catasta, Yong Cheng, Colin Cherry, Christopher A. Choquette-Choo, Aakanksha Chowdhery, Clément Crepy, Shachi Dave, Mostafa Dehghani, Sunipa Dev, Jacob Devlin, Mark Díaz, Nan Du, Ethan Dyer, Vladimir Feinberg, Fangxiaoyu Feng, Vlad Fienber, Markus Freitag, Xavier Garcia, Sebastian Gehrmann, Lucas Gonzalez, and et al. Palm 2 technical report. CoRR, abs/2305.10403, 2023.
  • Chalkidis et al. [2019] Ilias Chalkidis, Ion Androutsopoulos, and Nikolaos Aletras. Neural legal judgment prediction in english. In Proceedings of the 57th Conference of the Association for Computational Linguistics, pages 4317–4323, 2019.
  • Chalkidis et al. [2020] Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, and Ion Androutsopoulos. LEGAL-BERT: "preparing the muppets for court’". In Findings of the Association for Computational Linguistics, pages 2898–2904, 2020.
  • Chen et al. [2013] Yen-Liang Chen, Yi-Hung Liu, and Wu-Liang Ho. A text mining approach to assist the general public in the retrieval of legal documents. Journal of the American Society for Information Science and Technology, 64(2):280–290, 2013.
  • Chowdhery et al. [2023] Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, and Noah Fiedel. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24:240:1–240:113, 2023.
  • Cui et al. [2023a] Jiaxi Cui, Zongjian Li, Yang Yan, Bohua Chen, and Li Yuan. Chatlaw: Open-source legal large language model with integrated external knowledge bases. CoRR, abs/2306.16092, 2023a.
  • Cui et al. [2021] Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, and Ziqing Yang. Pre-training with whole word masking for chinese BERT. IEEE ACM Trans. Audio Speech Lang. Process., 29:3504–3514, 2021.
  • Cui et al. [2023b] Yiming Cui, Ziqing Yang, and Xin Yao. Efficient and effective text encoding for chinese llama and alpaca. arXiv preprint arXiv:2304.08177, 2023b.
  • Dai et al. [2023] Yongfu Dai, Duanyu Feng, Jimin Huang, Haochen Jia, Qianqian Xie, Yifang Zhang, Weiguang Han, Wei Tian, and Hao Wang. Laiw: A chinese legal large language models benchmark. CoRR, abs/2310.05620, 2023.
  • Deng et al. [2023] Wentao Deng, Jiahuan Pei, Keyi Kong, Zhe Chen, Furu Wei, Yujun Li, Zhaochun Ren, Zhumin Chen, and Pengjie Ren. Syllogistic reasoning for legal judgment analysis. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13997–14009, 2023.
  • Dong et al. [2019] Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. Unified language model pre-training for natural language understanding and generation. In Advances in Neural Information Processing Systems, pages 13042–13054, 2019.
  • Du et al. [2022] Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, and Jie Tang. Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335, 2022.
  • Duan et al. [2019] Xingyi Duan, Baoxin Wang, Ziyue Wang, Wentao Ma, Yiming Cui, Dayong Wu, Shijin Wang, Ting Liu, Tianxiang Huo, Zhen Hu, Heng Wang, and Zhiyuan Liu. CJRC: A reliable human-annotated benchmark dataset for chinese judicial reading comprehension. In Proceedings of the 18th China National Conference on Chinese Computational Linguistics, volume 11856, pages 439–451, 2019.
  • Fei et al. [2023] Zhiwei Fei, Xiaoyu Shen, Dawei Zhu, Fengzhe Zhou, Zhuo Han, Songyang Zhang, Kai Chen, Zongwen Shen, and Jidong Ge. Lawbench: Benchmarking legal knowledge of large language models. CoRR, abs/2309.16289, 2023.
  • Gan et al. [2023] Wensheng Gan, Zhenlian Qi, Jiayang Wu, and Jerry Chun-Wei Lin. Large language models in education: Vision and opportunities. In IEEE International Conference on Big Data, pages 4776–4785. IEEE, 2023.
  • He et al. [2023] Wanwei He, Jiabao Wen, Lei Zhang, Hao Cheng, Bowen Qin, Yunshui Li, Feng Jiang, Junying Chen, Benyou Wang, and Min Yang. Hanfei-1.0. https://github.com/siat-nlp/HanFei, 2023.
  • He et al. [2024] Zhitao He, Pengfei Cao, Chenhao Wang, Zhuoran Jin, Yubo Chen, Jiexin Xu, Huaijun Li, Xiaojian Jiang, Kang Liu, and Jun Zhao. Simucourt: Building judicial decision-making agents with real-world judgement documents. arXiv preprint arXiv:2403.02959, 2024.
  • Hongcheng et al. [2023] Liu Hongcheng, Liao Yusheng, Meng Yutong, and Yuhao Wang. Lawgpt: Chinese legal large language model. https://github.com/LiuHC0428/LAW_GPT, 2023.
  • Hu et al. [2022] Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. In Proceedings of the 10th International Conference on Learning Representations, 2022.
  • Huang and Chang [2023] Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey. In Findings of the Association for Computational Linguistics, pages 1049–1065, 2023.
  • Huang et al. [2023] Quzhe Huang, Mingxu Tao, Zhenwei An, Chen Zhang, Cong Jiang, Zhibin Chen, Zirui Wu, and Yansong Feng. Lawyer llama technical report. CoRR, abs/2305.15062, 2023. doi: 10.48550/ARXIV.2305.15062. URL https://doi.org/10.48550/arXiv.2305.15062.
  • Li et al. [2023] Haitao Li, Qingyao Ai, Jia Chen, Qian Dong, Yueyue Wu, Yiqun Liu, Chong Chen, and Qi Tian. SAILER: structure-aware pre-trained language model for legal case retrieval. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1035–1044, 2023.
  • Luo et al. [2017] Bingfeng Luo, Yansong Feng, Jianbo Xu, Xiang Zhang, and Dongyan Zhao. Learning to predict charges for criminal cases with legal basis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2727–2736, 2017.
  • Nguyen [2023] Ha-Thanh Nguyen. A brief report on lawgpt 1.0: A virtual legal assistant based on GPT-3. CoRR, abs/2302.05729, 2023.
  • Nguyen et al. [2023] Ha Thanh Nguyen, Randy Goebel, Francesca Toni, Kostas Stathis, and Ken Satoh. Lawgiba - combining gpt, knowledge bases, and logic programming in a legal assistance system. In Proceedings of the 36th Annual Conference on Legal Knowledge and Information Systems, pages 371–374, 2023.
  • OpenAI [2023a] OpenAI. Gpt-3.5 turbo, 2023a.
  • OpenAI [2023b] OpenAI. Gpt-4, 2023b.
  • Phi et al. [2020] Manh-Kien Phi, Ha-Thanh Nguyen, Ngo Xuan Bach, Vu D. Tran, Minh Le Nguyen, and Tu Minh Phuong. Answering legal questions by learning neural attentive text representation. In Proceedings of the 28th International Conference on Computational Linguistics, pages 988–998. International Committee on Computational Linguistics, 2020.
  • Shao et al. [2020] Yunqiu Shao, Jiaxin Mao, Yiqun Liu, Weizhi Ma, Ken Satoh, Min Zhang, and Shaoping Ma. BERT-PLI: modeling paragraph-level interactions for legal case retrieval. In Proceedings of the 29th International Joint Conference on Artificial Intelligence, pages 3501–3507, 2020.
  • Taori et al. [2023] Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
  • Team [2023] MosaicML NLP Team. Introducing mpt-7b: A new standard for open-source, commercially usable llms. www.mosaicml.com/blog/mpt-7b, 2023.
  • Thirunavukarasu et al. [2023] Arun James Thirunavukarasu, Darren Shu Jeng Ting, Kabilan Elangovan, Laura Gutierrez, Ting Fang Tan, and Daniel Shu Wei Ting. Large language models in medicine. Nature Medicine, 29:1930–1940, 2023.
  • Touvron et al. [2023a] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurélien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation language models. CoRR, abs/2302.13971, 2023a.
  • Touvron et al. [2023b] Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton-Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurélien Rodriguez, Robert Stojnic, Sergey Edunov, and Thomas Scialom. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288, 2023b.
  • Wu et al. [2024] Yiquan Wu, Yuhang Liu, Yifei Liu, Ang Li, Siying Zhou, and Kun Kuang. Wisdom interrogatory. https://github.com/zhihaiLLM/wisdomInterrogatory, 2024.
  • Xiao et al. [2021] Chaojun Xiao, Xueyu Hu, Zhiyuan Liu, Cunchao Tu, and Maosong Sun. Lawformer: A pre-trained language model for chinese legal long documents. AI Open, 2:79–84, 2021.
  • Yang et al. [2023a] Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, Fan Yang, Fei Deng, Feng Wang, Feng Liu, Guangwei Ai, Guosheng Dong, Haizhou Zhao, Hang Xu, Haoze Sun, Hongda Zhang, Hui Liu, Jiaming Ji, Jian Xie, Juntao Dai, Kun Fang, Lei Su, Liang Song, Lifeng Liu, Liyun Ru, Luyao Ma, Mang Wang, Mickel Liu, MingAn Lin, Nuolan Nie, Peidong Guo, Ruiyang Sun, Tao Zhang, Tianpeng Li, Tianyu Li, Wei Cheng, Weipeng Chen, Xiangrong Zeng, Xiaochuan Wang, Xiaoxi Chen, Xin Men, Xin Yu, Xuehai Pan, Yanjun Shen, Yiding Wang, Yiyu Li, Youxin Jiang, Yuchen Gao, Yupeng Zhang, Zenan Zhou, and Zhiying Wu. Baichuan 2: Open large-scale language models. CoRR, abs/2309.10305, 2023a.
  • Yang et al. [2023b] Hongyang Yang, Xiao-Yang Liu, and Christina Dan Wang. Fingpt: Open-source financial large language models. CoRR, abs/2306.06031, 2023b.
  • Yang et al. [2019] Wenmian Yang, Weijia Jia, Xiaojie Zhou, and Yutao Luo. Legal judgment prediction via multi-perspective bi-feedback network. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, pages 4085–4091, 2019.
  • Yu et al. [2022] Wenhao Yu, Chenguang Zhu, Zaitang Li, Zhiting Hu, Qingyun Wang, Heng Ji, and Meng Jiang. A survey of knowledge-enhanced text generation. ACM Computing Surveys, 54(11s):227:1–227:38, 2022.
  • Zhong et al. [2020a] Haoxi Zhong, Chaojun Xiao, Cunchao Tu, Tianyang Zhang, Zhiyuan Liu, and Maosong Sun. How does NLP benefit legal system: A summary of legal artificial intelligence. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5218–5230, 2020a.
  • Zhong et al. [2020b] Haoxi Zhong, Chaojun Xiao, Cunchao Tu, Tianyang Zhang, Zhiyuan Liu, and Maosong Sun. JEC-QA: A legal-domain question answering dataset. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, pages 9701–9708, 2020b.