Large language models (LLMs), including both proprietary and open-source models, have showcased remarkable capabilities in addressing a wide range of downstream tasks. Nonetheless, when it comes to practical Chinese legal tasks, these models fail to meet the actual requirements. Proprietary models do not ensure data privacy for sensitive legal cases, while open-source models demonstrate unsatisfactory performance due to their lack of legal knowledge. To address this problem, we introduce LawGPT, the first open-source model specifically designed for Chinese legal applications. LawGPT comprises two key components: legal-oriented pre-training and legal supervised fine-tuning. Specifically, we employ large-scale Chinese legal documents for legal-oriented pre-training to incorporate legal domain knowledge. To further improve the model’s performance on downstream legal tasks, we create a knowledge-driven instruction dataset for legal supervised fine-tuning. Our experimental results demonstrate that LawGPT outperforms the open-source LLaMA 7B model. Our code and resources are publicly available at and have received 5.7K stars on GitHub.

1 Introduction

Large language models (LLMs) (OpenAI, 2023b; Touvron et al., 2023b) have achieved remarkable success in various natural language processing (NLP) tasks, including natural language understanding (Dong et al., 2019), reasoning (Huang and Chang, 2023), and generation (Yu et al., 2022). Both proprietary and open-source LLMs exhibit strong generalization capabilities, enabling their application in diverse downstream scenarios, such as medicine (Thirunavukarasu et al., 2023), finance (Yang et al., 2023b), education (Gan et al., 2023). Recent studies (Fei et al., 2023; Nguyen, 2023) have demonstrated the preliminary effectiveness of existing general LLMs in legal tasks, including legal judgment prediction (Luo et al., 2017), legal documents retrieval (Chen et al., 2013), and legal question answering (Zhong et al., 2020a).

Despite the preliminary effectiveness of LLMs in legal applications, there are two obstacles that hinder their practical use in legal tasks. On the one hand, proprietary LLMs such as GPT-4 (OpenAI, 2023b) and GPT-3.5 Turbo (OpenAI, 2023a) can only be accessed through APIs, which do not guarantee data privacy in sensitive legal cases. On the other hand, open-source LLMs like LLaMA (Touvron et al., 2023a) and ChatGLM (Du et al., 2022) fail to achieve satisfactory performance due to their insufficient legal knowledge and incompatibility with downstream legal tasks. Therefore, it is necessary to develop a open-source LLM specifically designed for legal applications in order to overcome the existing obstacles.

In this paper, we introduce LawGPT, the first open-source Chinese legal knowledge-enhanced large language model. With the advantage of being open-source, LawGPT can be self-hosted and accessed privately to ensure data privacy, as compared to proprietary models. We then present legal-oriented pre-training, which utilizes our large-scale legal pre-training corpus to incorporate domain-specific legal knowledge into LawGPT, improving its understanding, reasoning, and generation foundational capabilities in legal tasks. Additionally, we propose legal supervised fine-tuning, employing our knowledge-driven instruction dataset to further enhance LawGPT’s performance on downstream legal tasks. Experimental results demonstrate that LawGPT surpasses the open-source LLaMA 7B model in major legal tasks, shedding light on the development of a practical Chinese legal LLM.

    We present the first open-source Chinese legal knowledge-enhanced large language model LawGPT. The code and model are available on GitHub 111 and have received 5.7K stars.

    We construct a comprehensive legal pre-training corpus and propose a legal-oriented pre-training approach to enhance LawGPT’s foundational abilities in legal tasks by integrating domain-specific knowledge.

    We create a knowledge-driven instruction dataset and utilize legal supervised fine-tuning to further adapt LawGPT to various legal tasks and improve its downstream performance.

    Our experimental results demonstrate that LawGPT achieves better performance than the open-source LLaMA 7B model across major legal tasks, providing strong evidence for the effectiveness of our proposed model.

2 Related Work

In this section, we review the existing work on addressing legal tasks using LLMs. This focus is on general language models, legal language models, and legal benchmarks as follows.

2.1 General Language Models

Benefiting from training with large scale corpus, recent LLMs have shown impressive performance on various kind of downstream tasks, including legal tasks. Recent LLMs, trained on extensive corpora, have demonstrated impressive performance across a variety of downstream tasks, including tasks in the legal domain. Proprietary LLMs, such as GPT-4 (OpenAI, 2023b), GPT-3.5-Turbo (OpenAI, 2023a), PaLM (Chowdhery et al., 2023), and PaLM2 (Anil et al., 2023), exhibit strong capabilities in handling legal tasks. Their impressive performance not only demonstrates the potential of LLMs in addressing legal tasks but also facilitates the low-cost, automated construction of high-quality datasets. Concurrently, open-source LLMs, such as LLaMA (Touvron et al., 2023a), LLaMA2 (Touvron et al., 2023b), MPT (Team, 2023), ChatGLM 2 (Du et al., 2022), and Baichuan 2 (Yang et al., 2023a), are available in various model scales. These open-source models facilitate the fine-tuning of legal-specific models using targeted legal datasets, potentially enhancing performance.

2.2 Legal Language Models

The legal language models are fine-tuning based on pre-trained language models or trained from scratch using legal related data for improving legal capabilities of models. Early research in this field utilized a model architecture with millions of parameters for various legal tasks. Extensive efforts have been made to address these tasks separately. These include legal judgment prediction (Luo et al., 2017; Chalkidis et al., 2019; Yang et al., 2019), legal documents and cases retrieval (Chen et al., 2013; Shao et al., 2020; Li et al., 2023), legal reading comprehension (Duan et al., 2019), and legal question answering (Zhong et al., 2020a; Phi et al., 2020). With the benefit of pre-trained models (Chalkidis et al., 2020; Cui et al., 2021), Lawformer (Xiao et al., 2021) combines three attention mechanisms to address the problem of long legal documents, covering a variety of legal tasks. Recent advances of LLMs have given rise to legal LLMs work. HanFei (He et al., 2023), LexiLaw (Li et al., 2023), LawGPT-zh (Hongcheng et al., 2023), and LawGPT-1.0 (Nguyen, 2023) fine-tune foundational LLMs using a specially constructed or collected legal corpus to enhance their legal capabilities. To tackle the hallucination problem in legal tasks, LLMs such as ChatLaw (Cui et al., 2023a), Wisdom-Interrogatory (Wu et al., 2024), and Lawyer-LLaMA (Huang et al., 2023) incorporate a legal data retrieval method to improve the robustness of their responses. LawGiBa (Nguyen et al., 2023), based on the GPT-4 model, has established a legal system. Fuzi-Mingcha (Deng et al., 2023) has created a legal syllogistic reasoning dataset for fine-tuning to ensure logical format and accurate reasoning results.

2.3 Legal Benchmarks

With the emergence of enormous language models for legal tasks, several benchmarks have been proposed to evaluate a variety of existing models. LawBench (Fei et al., 2023) collects 20 legal tasks within three cognitive levels, i.e., legal knowledge memorization, understanding, and applying, to thoroughly evaluate the performance of existing models. LAiW (Dai et al., 2023) contains 14 tasks to evaluate the legal capabilities of LLMs from three levels, i.e., basic information retrieval, legal foundation inference, and complex legal application. SimuCourt (He et al., 2024) introduces a judicial decision-making task to evaluate the judicial analysis and decision-making power of LLMs.

3 Methodology

In this section, we introduce our LawGPT, a large language model specifically designed for Chinese legal applications, aimed at effectively addressing various downstream legal tasks. LawGPT addresses the two major challenges in applying existing open-source general LLMs to legal tasks:

    The lack of legal domain knowledge in open-source general LLMs, which is crucial for performing legal tasks effectively;

    The insufficient training of open-source general LLMs on downstream legal tasks, resulting in suboptimal performance in legal applications.

We apply legal-oriented pre-training to LawGPT to incorporate legal domain knowledge within the open-source base model. Then, we conduct legal supervised fine-tuning to further enhance LawGPT’s performance on downstream legal tasks. Each component is elaborated as follows.

3.1 Legal-Oriented Pre-Training

General LLMs are typically pre-trained on large-scale general corpus, which may lack sufficient legal domain knowledge. Consequently, this can result in a limited understanding and reasoning ability for legal tasks. To address this limitation, we propose the integration of Legal-oriented Pre-Training (LPT) into LawGPT, aiming to enhance its legal domain knowledge.

To incorporate legal domain knowledge into LawGPT, we collect a large-scale legal pre-training corpus 𝒟LPT consisting of 500K legal documents from various legal domains, including civil law, criminal law, and administrative law. Example 3 presents a civil-law legal document from the legal pre-training corpus. For each legal document, the tokenizer of base model encodes the text into a token sequence 𝒙=(x0,x1,), and we perform legal-oriented pre-training on the base model fΘ() in an autoregressive manner using the following objective:

LPT(Θ,𝒟LPT)=𝔼𝒙𝒟LPT[ilogfΘ(xi|x0,x1,,xi1)] (1)

where x0,x1,,xi1 denote the context tokens, xi denotes the target token, and Θ is the parameters of base model fΘ(). We optimize the parameters of base model Θ using LPT to obtain the parameters of legal-oriented pre-trained model ΘLPT.

3.2 Legal-Supervised Fine-Tuning

Although fΘLPT() has been pretrained with legal domain knowledge, it is not optimal for specific downstream legal tasks as it cannot generate the desired responses by following the instructions. To address this issue, we propose Legal-Supervised Fine-Tuning (LFT) to further adapt LawGPT to various downstream legal tasks. Specifically, we construct a 300K knowledge-driven instruction dataset, 𝒟LFT, consisting of three subsets:

    An open-source dataset 222 with 200K samples, which includes crime type prediction and crime consult tasks to fine-tune the model for better understanding of crime-related legal tasks and generating user-friendly responses;

    The JEC-QA dataset (Zhong et al., 2020b) with 20K samples, which consists of legal question answering tasks to fine-tune the model for better adaptation to legal downstream tasks;

    A constructed legal datasets with 80K samples by refining subsets (a) and (b) with ChatGPT (OpenAI, 2023a), which augments more high-quality legal QA samples, thereby enhancing the generalizability of the model.

The subsets (a) and (b) are shown in Examples 3 and 3, respectively. The subset (c) is reinfed using the prompt template in Template 3 to augment the samples in subsets (a) and (b), where we replace <instruction> with real questions and <output> with the corresponding answer. We adopt the Stanford Alpaca template (Taori et al., 2023) in Template 3 to wrap the instruction and output in our dataset. Then, the parameters of our pre-trained model ΘLPT are fine-tuned on 𝒟LFT using the following objective:

LFT(Θ,𝒟LFT)=𝔼𝒙𝒟LFT[i{output}logfΘLPT(xi|x0,x1,,xi1)] (2)

where Θ represents the optimized parameters, 𝒙=(x0,x1,) represents the tokenized input sequence drawn from dataset 𝒟LFT and wrapped by Template 3, and {output} represents the index set of the output tokens. We optimize the our pre-trained parameters ΘLPT to obtain the parameters of LawGPT ΘLFT.

3.3 Inference of LawGPT

When applying LawGPT to downstream tasks, we should wrapped the instruction using the Alpaca template in Template 3 and then tokenized the texts into 𝒙=(x0,x1,,xn). Then, we feed the tokenized input sequence 𝒙 into the fine-tuned model fΘLFT() to generate the response in an autoregressive manner.

4 Experiments

4.1 Implementation Details

We trained LawGPT using 8 NVIDIA V100 GPUs, based on the Chinese-Alpaca-Plus 7B base model (Cui et al., 2023b), in two stages: legal-oriented pre-training, and legal-supervised fine-tuning. For legal-oriented pre-training, we adopt our 500K legal pre-training corpus 𝒟LPT to train the base model using the LoRA technique (Hu et al., 2022). We set the LoRA rank to 16, alpha to 32, and dropout to 0.05. The learning rate was set to 0.0003, the batch size to 128, and the training epoch to 1. For legal-supervised fine-tuning, we adopt our 30K legal-supervised corpus 𝒟LFT to fine-tune our pre-trained model with Alpaca template using the LoRA technique. We set the LoRA rank to 8, alpha to 16, and dropout to 0.05. We set the learning rate to 0.0003, the batch size to 64, and the training epoch to 20.

4.2 Performance Evaluation

Table 1: Performance comparison between LawGPT, proprietary models including GPT-3.5 Turbo (OpenAI, 2023a) and GPT-4 (OpenAI, 2023b), and 7B open-source model LLaMA (Touvron et al., 2023a) on the zero-shot setting. The best performance among LawGPT and open-source models is in bold.
Models Tasks
#1 #2 #3 #4 #5 #6 #7 #8 Avg.
GPT-3.5 Turbo 29.5 31.3 35.5 78.7 76.8 27.4 61.2 17.4 44.7
GPT-4 52.5 27.5 42.0 82.6 81.9 48.6 77.6 19.6 54.0
LLaMA 1.0 7.5 7.0 41.3 54.2 0.2 14.4 7.8 16.7
LaWGPT 0.2 11.0 15.7 42.4 40.8 6.2 15.4 7.6 17.4

In this section, we conduct experiments to evaluate the performance of LawGPT on 8 legal applications (Fei et al., 2023), including fact-based article prediction (#1), scene-based article prediction (#2), charge prediction (#3), prison term prediction without article (#4), prison term prediction with article (#5), case analysis (#6), criminal damages calculation (#7), and consultation (#8), in a zero-shot setting. We compare the performance of LawGPT with proprietary models including GPT-3.5 Turbo (OpenAI, 2023a) and GPT-4 (OpenAI, 2023b), and 7B open-source models including LLaMA (Touvron et al., 2023a). The results are shown in Table 1. The results show that our LawGPT outperforms LLaMA 7B model on major tasks and leading to a better average performance. Despite the advantage of preserving data privacy, there is still a significant performance gap between LawGPT and proprietary models. This result inspires us and the following researchers to explore the potential of LawGPT in the future work.

5 Conclusion

In this technical report, we introduce LawGPT, a Chinese legal knowledge-enhanced large language model specifically designed for Chinese legal applications. We introduce the legal-oriented pre-training and legal supervised fine-tuning to incorporate legal domain knowledge and enhance the model’s performance on downstream legal tasks, respectively. Our experimental results demonstrate that LawGPT outperforms the open-source LLaMA 7B model. We hope this technical report and LawGPT model can inspire future research on Chinese legal applications and contribute to the development of the legal AI community.


