Affective Computing for Healthcare: Recent Trends, Applications, Challenges, and Beyond

Yuanyuan Liu

{}^{1}

Ke Wang

{}^{1}

Lin Wei

{}^{1}

Jingying Chen

{}^{2}

Yibing Zhan

{}^{3}

Dapeng Tao

{}^{4}

&Zhe Chen

{}^{5}

{}^{1}

School of Computer Science, China University of Geosciences, Wuhan, China

{}^{2}

Central China Normal University, China

{}^{3}

JD Explore Academy, China&

{}^{4}

Yunnan University, China

{}^{5}

The School of Computing, Engineering and Mathematical Sciences, La Trobe University, Australia.

Abstract

Affective computing, which aims to recognize, interpret, and understand human emotions, provides benefits in healthcare, such as improving patient care and enhancing doctor-patient communication. However, there is a noticeable absence of a comprehensive summary of recent advancements in affective computing for healthcare, which could pose difficulties for researchers entering this field. To address this, our paper aims to provide an extensive literature review of related studies published in the last five years. We begin by analyzing trends, benefits, and limitations of recent datasets and affective computing methods devised for healthcare. Subsequently, we highlight several healthcare application hotspots of current technologies that could be promising for real-world deployment. Through our analysis, we identify and discuss some ongoing challenges in the field as evidenced by the literature. Concluding with a thorough review, we further offer potential future research directions and hope our findings and insights could guide related researchers to make better contributions to the evolution of affective computing in healthcare.

Refer to caption — Figure 1: Framework of learning-based affective computing for healthcare.

1 Introduction

Affective computing refers to the process of utilizing computer and artificial intelligent technologies to analyze and recognize human emotions Hossain and Muhammad (2019). Human emotions, the intuitive reflections of the body’s state, can convey useful information for treatment processes. By accurately identifying patient emotions through affective computing, doctors can obtain more informative patient statistics and provide more appropriate or customized treatments for patients. As a result, affective computing could play a pivotal role in aiding doctors with various tasks like mental health monitoring, personalized treatment, and patient support. Due to promising applications, affective computing for healthcare has attracted increasing attention Alelaiwi (2019); Yildirim-Celik et al. (2022); Chen and Luo (2023).

In particular, thanks to the significant modeling capabilities brought by deep learning advancements, affective computing has shown outstanding capacities of representing and identifying various human-related information Rejaibi et al. (2019a); Smith et al. (2020). Diverse deep learning models, such as Convolutional Neural Networks (CNN), Recurrent Neural Network (RNN), and Transformer, have been introduced to extract emotional features from either unimodal or multimodal data, resulting in accurate emotion recognition and analysis. Meanwhile, breakthroughs in affective computing have also led to impressive progress in complex healthcare scenarios and applications. Despite promising progress, we found that existing related reviews mostly summarise the affective computing techniques in a relatively narrow application area like depression recognizing Giuntini et al. (2020). We believe that the lack of a comprehensive and insightful review of affective computing in healthcare may create unnecessary barriers for researchers to enter/understand the field.

To bridge the gap mentioned above and summarize the progress achieved in recent deep learning-related advancements, we perform a comprehensive literature review of the recent progress in affective computing for healthcare in this paper. An overview of our work is shown in Figure 1. In general, we attempt to thoroughly review the recently published datasets and approaches, covering a broad area related to affective computing for healthcare. The contributions of this paper and differences from other surveys are as follows:

1) We identify 3 major research directions based on methodologies, including behavior-based, physiological-based, and behavior-physiological-based research. Under this taxonomy, we attempt to perform a comprehensive analysis of the recent development trends of datasets and approaches, illustrating their details, benefits, as well as their limitations.

2) We highlight a few most frequently focused medical application directions of affective computing, i.e., depression diagnosis Othmani et al. (2021), autism recognition and intervention Li et al. (2021), pain level recognition Phan et al. (2023), and other related medical applications Ayata et al. (2020), establishing synergies between affective computing and clinical applications.

3) In addition to existing achievements, we present significant challenges that still pose difficulties for developing and applying affective computing in healthcare. Despite the challenges, we also conclude potential directions for future research, hoping to provide valuable insights for researchers who would like to contribute to this field.

Class	Datasets	#Subjects	#Samples	Modalities	Applications	Labels
Behaviour	FENP Yan et al. (2020)	106	11000	Visual	Pain	Pain, No Pain
	PersionSIChASD Alizadeh and Tabibian (2021)	38	418	Audio	Autism	Autistic, Normal, Phonetic units
	D4 Yao et al. (2022)	201	1339	Text	Depression	Risk, Non Risk
	MMDA Jiang et al. (2022)	962	1025	Text, Audio, Visual	Depre,Anx	Depre,Nondepre,Anx,Nonanx
	D-Vlog Yoon et al. (2022)	816	961	Audio, Visual	Depression	Depre, Nondepre
	SAD Alghifari et al. (2023)	64	64	Audio	Depression	Depre, Nondepre
	TIAD Melinda et al. (2023)	34	6120	Visual	Autism	Autistic, Normal
	PEMF Fernandes-Magalhaes et al. (2023)	68	272	Visual	Pain	Pain, No Pain
	CALMED Sousa et al. (2023)	4	57012	Audio, Visual	Autism	Autistic, Normal
Physiological	DDLES Mohammadi et al. (2019)	60	60	EEG	Depression	Minimal,Mild,Moderate,Severe
Physiological	SADR Zhu et al. (2019)	39	1170	EEG,EM	Depression	Depre, Nondepre
Behaviour	EMCASD Duan et al. (2019)	28	300	Visual, Other	Autism	Autistic, Normal
Behaviour	MMOD Cai et al. (2020b)	160	160	Audio, EEG	Depression	Depre, Nondepre
-Physiological	PPAD Salekin et al. (2021)	58	58	Audio, Visual, Other	Pain	Pain, No Pain
	M-Ms Calabrò et al. (2021)	15	37127	Visual, EEG	Autism	Good, Poor

Table 1: The most recent affective computing datasets in the medicine field. Depre, Nondepre, Anx, and Nonanx are abbreviations for depression, nondepression, anxiety, and nonanxiety, respectively.

2 Datasets and Approaches

We first perform a literature review on affective computing research, mainly covering the related datasets and approaches applied in healthcare in the last five years. Tables 1 and 2 list the overview of databases and typical approaches, respectively. We find that affective computing research varies according to the type of data utilized. Hence, we classify related research into three categories: behavior data-based, physiological data-based, and behavior-physiological data-based research. Each category of research is detailed below.

2.1 Behavior Data-based Research

In the realm of healthcare, behavior data-based research entails employing affective computing technology to analyze patients’ emotions through their behavior data, such as language, facial expressions, and medical records.

Datasets

Behavioral affective datasets can be discussed based on three categories: vision-based datasets, non-vision-based unimodal datasets, and multimodal datasets, depending on their popularity and diverse modalities.

Vision-based datasets are the mainstream dataset format, partly due to the widespread applications of vision techniques. Current visual affective databases are generally datasets with patients’ facial expressions. TIAD collected thermal imaging of the faces of autistic children Melinda et al. (2023), FEMP collected 11,000 facial images of newborns Yan et al. (2020), and PEMF consists of 272 micro-clips with facial images Fernandes-Magalhaes et al. (2023). Through the analysis of facial expressions, important emotional clues can be obtained to help doctors judge the emotional state of patients and improve the diagnosis of disease.

Non-vision unimodal datasets contain text or audio data modalities. Text: D4 records the conversations between doctors and patients during depression diagnosis. The diagnosis results and symptom summaries given in each conversation Yao et al. (2022). Audio: SAD is an audio dataset of English depression that contains 64 recordings of individuals with and without depression Alghifari et al. (2023).

Multimodal datasets primarily combine two or three modalities of text, audio, and visual data, providing richer cues to patients’ emotional behaviour, considering the emotional bias inherent in single behavioral data. D-vlog comprises 961 vlogs collected from YouTube that encompass both audio and videos Yoon et al. (2022). CALMED consists of audio and video features extracted from conference transcript files of children diagnosed with autism Sousa et al. (2023). MMDA is the largest mental disorder dataset including visual, auditory, and textual data, where all subjects in MMDA are diagnosed by professional psychologists using de-identified original interview videos Jiang et al. (2022).

Approaches

As shown in Table 2, deep learning models have become the dominant approaches for affective computing in recent years and are gradually evolving from unimodal approaches towards multimodal approaches.

Vision-based affective approaches primarily employ CNN and attention methods to extract emotional features from facial images and videos, identifying emotion categories or scores. Typical methods include: Liu et al. (2023) built a facial expression recognition model based on the self-attention mechanism for depression detection. Lu et al. (2023) used a two-stream CNN network with cross-stream attention mechanism to integrate spatial and temporal information in newborn facial expression videos for pain level recognition.

Text-based unimodal affective approaches use natural language processing technology to analyze text data to identify emotions. Typical methods are as follows: Dessai and Usgaonkar (2022) used CNN and Long Short Term Memory (LSTM) to analyze Twitter users’ tweets to discover factors related to depression. In Ji et al. (2023), the authors utilized Large Language Models’ (LLM) interactivity and multitasking ability to investigate “hallucination” in medical question-answering systems. That is, the model produces information that sounds reasonable but is not faithful or meaningless.

Audio-based unimodal affective approaches usually adopt CNN and LSTM to capture the long-term dependence and high-level affective feature representation of audio sequences. Han et al. (2023) developed a self-supervised learning framework that combines integrate causal and dilated convolution to continuously enlarge the receptive domain for capturing multi-scale emotion-contextual information, and employed a hierarchical contrast loss to predict depression by exploring the long temporal emotion dependencies of audio.

Multimodal affective approaches have emerged as a trend in healthcare affective computing. This approach combines visual, audio, and text signals to more accurately identify patients’ emotions, surpassing the results of unimodal affective analysis. For example, Li et al. (2021) used pre-trained ResNet-18 and ResNet-50 to construct a two-stage network for children autism prediction from facial expression videos and audio. Wang et al. (2023b) first used the pre-trained models ALBERT and AGG16 to construct a visual-language model to extract facial expression image features, text, and behavioral features, and then used the early fusion method to fuse these features to form multimodal features for depression degree detection. Zhao and Wang (2022) proposed a cross-modal attention mechanism-based Generative Adversarial Network, and used an attention-based fusion approach to integrate facial expression videos, text, and audio for automatically depression severity assessment.

2.2 Physiological Data-based Research

Physiological data-based affective computing research focuses on integrating physiological signals into computational models for robust sentiment analysis. Physiological signals record and measure an individual’s affective state, affective experience, or affective response, such as Electroencephalography (EEG), offering a more objective reflection of the genuine emotional state Bajestani et al. (2019).

Datasets

As shown in Table 1, physiological affective datasets mainly contain emotion-related physiological signals, i.e., EEG, Electrodermal activity (EDA), and other related signals (e.g., Electrocardiogram (ECG), Electromyography (EMG), Heart Rate (HR), Eye-tracking (ET), Accelerometer (ACC), Inter-Beat Interval (IBI), Blood Volume Pulse (BVP), and Skin Temperature (ST)). They also can be divided into unimodal physiological datasets and multimodal physiological datesets. DDLES, an EEG-based depression recognition dataset, is a typical unimodal physiological dataset that contains EEG signals of 60 depressed and non-depressed individuals Mohammadi et al. (2019). SRDR is a multimodal physiological dataset that monitors specific physiological variables and synchronously collects EEG and EM signals from subjects to provide a more accurate detection dataset in a clinical environment Zhu et al. (2019). Since the acquisition of physiological signals requires expensive specialised medical equipment, most of the existing relevant datasets are small.

Approaches

Physiological-based affective analysis methods usually use CNN and RNN to process temporal physiological data, which can extract the dynamic time-frequency features of physiological signals, facilitating a better understanding of dynamic affective states. Depending on the signal data, these methods also can be classified into two categories: unimodal approaches and multimodal approaches. Uimodal approaches focus on mining emotional states from one type of emotional physiological signal. Xia et al. (2023) is a typical unimodal approach that uses multi-head self-attention mechanism and double-branch CNN to construct an end-to-end model for depression recognition from single EEG signals. In contrast, multimodal approaches explore multiple types of physiological signals in the emotion recognition process, thus improving the prediction accuracy. Phan et al. (2023) first used CNN and BiLSTM to extract the low-level features and time information in the sequence from ECG and EDA signals, and then combined ECG and EDA features in an early fusion manner for pain recognition.

2.3 Behavior-Physiological Data-based Research

Due to the directness and interpretability of behavioural data and the high objectivity of physiological data, the research integrating both behavior and psychological data is a natural derivation for the application in healthcare.

Datasets

Behavior-Physiological datasets integrate physiological data like EEG, EDA, and EMG with behavioral data such as facial expressions and audio. This integration aims to capture more comprehensive emotion-related cues of patients. M-MS is a multimodal autism emotion dataset, containing ECG and therapy videos to support the study of synchronization in autism recognition Calabrò et al. (2021). PPAD is the first multimodal neonatal pain dataset containing facial expression videos, sound, and physiological responses (vital signs and cortical activity), which was collected from 58 neonates during their hospitalization in the neonatal intensive care unit Salekin et al. (2021).

Approaches

Behavior-Physiological affective analysis needs to mine both the external manifestations of behavioural data and the internal changes of physiological signals, aiming to achieve more accurate affective analysis performance. For example, Han et al. (2022) proposed a multimodal diagnostic framework that uses an early fusion approach combining EEG and ET data for unsupervised training and supervised fine-tuning to identify children with Autism Spectrum Disorder (ASD). Recently, Qayyum et al. (2023) employed ViT, CNN, and LSTM models for the extraction of audio and EEG features. Subsequently, these features were concatenated by early fusion to improve the diagnostic performance of depression.

2.4 Discussion

Tendency

In general, in the healthcare field, in contrast to unimodal technology, there is a growing research trend towards multimodal affective computing technology that analyzes both behavior and physiological signals. This allows for the exploration of both external manifestations of behavioral data and internal changes in physiological signals. Moreover, affective computing approaches have also shifted from early CNN-based models to Transfomer-based models, and then to hybrid large-scale ones, which can better focus on sentiment changes across various modality sequences. Both facilitate a more comprehensive analysis of a patient’s emotional state, assisting doctors in tasks such as mental health monitoring, personalized treatment, and patient support.

Limitation

Despite the progress made, we find that there are still limitations in the current technology, mainly including: For datasets: Behavioural datasets are easy to collect but subjective and deceptive, and physiological datasets are highly reliable but costly and difficult to collect. As a result, existing multimodal fusion datasets remain limited in size, hindering their ability to fully support complex healthcare and medical applications; For approaches: As shown in Table 2, although higher recognition rates are achieved in current studies, most existing affective models are trained and designed independently on smaller and more limited set datasets. This lack of interoperability makes it challenging to reuse these models across different healthcare systems and applications.

Class	Methods	Technology	Modalities	Application	Dateset	Performance
Class	Methods	Technology	Modalities	Application	Dateset	Acc.	F1	Pre.	Rec.	MAE
B	Rejaibi et al. (2019b)	CNN	Audio	Depression	DAIC-WOZ	73.25	-	-	-	-
	Peng et al. (2020)	Multi-Scale DNN	Visual	Pain	UNBC Shoulder Pain	79.94	-	-	-	0.57
	Mallol-Ragolta et al. (2020)	RNN-LSTM	Visual	Pain	EmoPain	-	-	-	-	-
	Li et al. (2021)	ResNet	Audio, Visual	Autism	ASD-Affect	72.40	75.00	-	-	-
	Wang et al. (2021)	CNN-LSTM-Transformer	Audio,Text	Depression	Own	66.00	-	-	-	-
	Dessai and Usgaonkar (2022)	CNN-LSTM	Text	Depression	Own	92.00	93.00	93.00	94.00	-
	Mohan et al. (2022)	DCNN	Visual	Pain	Own	-	87.00	85.18	88.32	-
	Yuan et al. (2022)	SwinT	Visual	Pain	UNBC-McMaster	95.25	-	-	-	-
	Zhao and Wang (2022)	Cross-modal Attention-GAN	Text,Audio,Visual	Depression	AVEC2019	-	-	-	-	3.56
	Han et al. (2023)	Self-supervised Learning	Audio	Depression	AVEC2017	80.00	76.00	65.00	92.00	-
	Liu et al. (2023)	Self-attention Mechanism	Visual	Depression	AVEC2014	-	-	-	-	6.04
	Lu et al. (2023)	TS-ConvNet-CSA	Visual	Pain	DFEPN	66.20	-	-	-	-
	Wang et al. (2023b)	ALBERT-AGG	Text,Visual	Depression	Chinese Sina Weibo	93.82	91.00	92.52	88.58	-
	Anekar et al. (2023)	CNN-NLTK	Audio, Visual	Depression	FER2013	77.00	-	80.00	91.00	-
	Xu et al. (2023)	LSTM-CNN-Attention Mechanism	Text, Audio	Depression	E-DAIC	83.00	-	89.00	86.00	-
P	Bajestani et al. (2019)	KNN	EEG	Autism	Own	81.67	-	-	-	-
	Hadoush et al. (2019)	ANN	EEG	Autism	Own	97.20	-	-	-	-
	Zhu et al. (2019)	MDAE-SVM	EEG,EM	Depression	Own	83.42	-	-	-	-
	Pham et al. (2020)	LSDA-PNN	EEG	Autism	Own	98.70	-	-	-	-
	Baygin et al. (2021)	Lightweight deep networks-SVM	EEG	Autism	Own	96.44	-	-	-	-
	Wadhera and Kakkarl (2021)	VG-SVM	EEG	Autism	Own	94.19	-	-	-	-
	Abdolzadegan et al. (2020)	SVM	EEG	Autism	Own	90.57	-	-	-	-
	Han et al. (2022)	SDAE	EEG,ET	Autism	Own	95.56	-	-	-	-
	Xia et al. (2023)	MHSA-CNN	EEG	Depression	HUSM	91.06	-	-	-	-
	Torres et al. (2023)	CNN-ROAR	EEG	Autism	Own	93.40	93.40	93.50	93.30	-
	Phan et al. (2023)	CNN-BiLSTM	ECG,EDA	Pain	BioVid heat pain	84.80	-	-	-	-
	Lan et al. (2023)	ASGC	EEG	Depression	Own	84.27	84.22	-	-	-
	Shen et al. (2023)	FL	EEG	Depression	Own	75.00	-	-	-	-
	Zhang et al. (2023)	SVM	ECG	Depression	Own	70.00	61.54	85.71	-	-
B-P	Chen and Zhao (2019)	ResNet-LSTM	ET,Visual	Autism	Own	84.00	-	-	-	-
	Hamid et al. (2023)	LSTM	EEG,Visual	Depression	Own	96.80	-	-	-	-
	Qayyum et al. (2023)	ViT	EEG,Audio	Depression	MODMA	97.31	97.34	97.71	97.34	-
	Wang et al. (2023a)	Multimodal fusion-SVM	EEG,Audio	Depression	Own	86.78	-	-	-	-

Table 2: A technical overview and performance comparison of affective computing approaches in healthcare over the last five years. B, P and B-P represent behaviour, physiological and behavioural-physiological, respectively. Own represents an unpublished dataset constructed by authors. Due to space constraints, we only list representative methods.

3 Applications on Healthcare

In this section, the affective approaches for healthcare applications are presented. Affective computing holds significant potential in various healthcare applications, offering new techniques for diagnosis, therapy, and treatment of emotion-related diseases by identifying patients’ emotions. We collect the studies of affective computing in healthcare applications that has been proposed in the last five years, and Figure 2 provides the statistics. In general, the application of affective computing technology in the field of healthcare involves depression diagnosis, autism recognition, pain level recognition, elderly dementia monitoring, stress monitoring, and intelligent medical systems. For simplicity, we review the top three most popular healthcare applications related to brain disorders with representative methods, including depression diagnosis, autism recognition, and pain level recognition. These applications can cover the major populations of current brain disorders, including children, adults, and the elderly. After the review of 3 popular applications, we also provide a brief review of the rest of the applications.

3.1 Depression Diagnosis

Depression, a widespread mental disorder, results from a complex interplay of social, psychological, and biological factors, leading to prolonged periods of low mood or reduced interest. Traditional self-report-based diagnosis is subjective and prone to inaccuracies, often resulting in delays in treatment. To tackle this, affective computing technology is applied for depression diagnosis, improving diagnosis objectivity and accuracy. Currently, affective computing-based depression diagnosis involves two categories: classification-based and regression-based depression diagnosis.

Affective classification-based depression diagnosis aids healthcare workers and doctors in distinguishing between individuals with and without depression. Alsharif et al. (2022) used mel-frequency cepstral coefficients to extract patients’ audio features and CNN to build a classification model to detect depression in Arabic audio data. Cai et al. (2020a) employed KNN as a depression classification detection model to distinguish depression patients from normal people by integrating different EEG data obtained under neutral, negative, and positive audio stimulation. Wang et al. (2023b) proposed a multi-modal depression detection model with emotional knowledge graph, which integrated text, facial expressions, and other behaviors (e.g., the number of user posts, blog length, etc.) to address the depression detection task. Pan et al. (2023) used Transformer to classify depression and non-depression from audio signals and facial expressions.

Since regression allows access to continuous affective states, affective regression-based depression diagnosis has been proposed to predict the degree of depression or the severity of depressive symptoms in patients. Zhou et al. (2020) proposed a deep regression network called DepressNet with facial depression data to predict depression degree. This significantly improves the latest performance of visual-based complex depression recognition. Zhao and Wang (2022) employed a cross-modal affective regression model to facilitate the learning of more accurate multimodal representations from text, audio, and facial expression videos for automatic depression severity assessment.

Clinical Application With significant advancements in affective computing technology, depression diagnosis is now being employed in clinical monitoring. To prevent the recurrence of patients with depression, Yin et al. (2022) designed an intelligent monitoring system based on a hybrid affective computing model (namely CNN-LSTM), aiming to provide recurrence monitoring for patients with depression within their home and daily environments. Additionally, it can be applied to assess new patients. The system includes user input, depression testing, intelligent monitoring, and connectivity to external wearables like dedicated voice acquisition devices and EEG devices. It also supports communication with online doctors and integration with external systems.

Limitation Overall, depression diagnosis with affective computing is evolving from rough classification to continuous regression analyses, for dynamic clinical monitoring. However, it still faces challenges including subjectivity and heterogeneity of depression, longitudinal monitoring, and comprehensive assessment integrating genetic, psychosocial, and environmental factors. In addition, current affective computing technologies rely mainly on behavioural and physiological indicators while lacking clear biological indicators to support an objective diagnosis of depression, affecting diagnostic accuracy.

3.2 Autism Recognition and Intervention

Autism, a neurodevelopmental disorder, is typically characterized by emotional and social difficulties. Early diagnosis is crucial for facilitating intervention and treatment. Affective computing can aid experts in promptly assessing the emotional state of individuals with autism during interactions, enhancing diagnostic accuracy and positive emotional interventions. As a result, affective computing-based autism diagnosis and intervention are gaining increasing attention.

Currently, autism diagnosis primarily employs emotion classification models to identify Autism spectrum disorder (ASD). Negin et al. (2021) proposed a non-invasive visual assistance method with human action classification to facilitate the diagnosis of autism. Rahman and Booma (2022) employed the MobileNet to detect childhood autism through their facial expressions in a transfer learning manner. Wei et al. (2023) proposed a lightweight, conventional classification model to recognize autism-related behaviors in facial videos. Han et al. (2022) build a stacked denoising autoencoder to identify ASD in children from EEG and ET fusion data. Li et al. (2021) combined audio and facial expression images to diagnose ASD in children, and used ResNet50 and ResNet18 as classifiers for a two-stage emotion classification, thus improving the accuracy of autism diagnosis

Clinical Application In clinical applications of autism intervention, leveraging affective computing technology enhances virtual reality (VR) for human-computer interaction (HCI) scenarios addressing challenges caused by a shortage of autism treatment professionals. Manju et al. (2023) employ VR-assisted system combined with wearable multi-model sensing technologies, to collect physiological signals and game performance data during HCI training. Then, it employs a machine learning model to identify ASD children, assessing the diagnosis, severity, social behavioral intervention, and treatment of ASD with multiple assessment scales.

Limitation Despite some progress in autism diagnosis and intervention, current approaches are still in their infancy and face challenges such as data privacy concerns in children, difficulties in data collection, assessment complexity, and emotion model bias in autism. In addition, Autism may have other co-existing mental health disorders, such as attention deficit hyperactivity disorder (ADHD), complicating affective computing models for accurate diagnosis.

3.3 Pain Level Recognition

Pain is the body’s intricate physical and psychological emotional response to underlying injury or illness. Accurate identification of pain is crucial in medicine, enabling medicine professionals to formulate effective treatment plans for enhancing a patient’s quality of life. Research on pain level recognition through affective computing has been a prominent and challenging issue. The field of pain level recognition employs affective computing techniques for classification and regression tasks, aimed at diagnosing pain patients and obtaining their pain intensity, respectively.

Affective classification-based pain level recognition can assist medical professionals in accurately determining the patient’s pain location, thereby significantly optimizing consultation time. For instance, Vallez et al. (2022) identified joint pain from facial expression images with the help of a pre-trained CNN classification model. Chen et al. (2022) used multi-layer CNN classifies the EEG singals in resting and pain states during daily activities. Lu et al. (2023) identified newborns’ pain levels based on their facial expression videos with the help of Softmax. Similar to Lu et al. (2023), Phan et al. (2023) employed Softmax to recognition pain levels from EDA and ECG signals.

To obtain a continuous numerical output for the pain degree of patients, pain level recognition based on affective regression is developed. Thiam et al. (2020) employed feed-forward neural networks as a regression model for discerning pain level intensity based on emotion-related physiological signals (i.e., EDA, EMG, and ECG). Besides, Jiang et al. (2024) used a non-linear neural network with Sigmoid for both classification and regression tasks, distinguishing between pain and non-pain in patients and detecting pain intensity based on ECG, EDA, and ECG.

Clinical Application In a clinical setting, pain analysis is critical to a patient’s recovery. Ghosh et al. (2023) proposed an emotion analysis system based on deep learning and statistical learning, to analyze facial expressions images of patients for detecting pain levels of patients. In addition, the system has the ability to perform pain detection and recognition on resource-constrained devices, which provides a strong support for the intelligent healthcare field.

Limitation Pain level recognition has developed from initial classification of pain versus non-pain to the recognition of pain intensity, from clinical diagnosis to real-time monitoring. However, pain recognition and assessment still faces significant challenges such as individual differences, emotional differences, standard calibration, and so on. In addition, some pain symptoms may lack distinct behavioural and physiological features, further complicating identification.

3.4 Other Related Healthcare Applications

Affective computing has also been applied to other clinical applications such as Bipolar disorder Baki et al. (2022), elderly companionship and monitoring Meng et al. (2021), smart medicine Ayata et al. (2020) etc. For convenience, we combine some of them into one section for overview.

Bipolar disorder is a mental health disorder that causes mood swings ranging from depression to mania. Baki et al. (2022) created a multimodal decision system for three level mania classification based on recordings of patients’ audio, text, and facial expression videos. Elderly monitoring and dementia diagnosis are crucial in an aging population, especially during illness. To accurately monitor the emotion state of the elderly, Meng et al. (2021) introduced an emotion-aware medicine monitoring system based on brain waves. Intelligent medicine systems offer an alternative to doctor shortages. Ayata et al. (2020) proposed an emotion recognition-based intelligent medicine system for emotional care by collecting and analyzing multiple physiological signals from patients.

4 Challenges and Opportunities

Despite breakthroughs, several challenges remain, yet there are also related opportunities for future development.

4.1 Patient Data Privacy and Ethics

Challenge

Data privacy issues have been well-known in the big data era, and it is particularly important for the healthcare sector. This might be attributed to the fact that patients have to share extremely sensitive information about their own bodies. As a result, the privacy of patient clinical data in affective computing is a crucial ethical concern, especially for children’s information. Ensuring confidentiality involves implementing robust measures in data transmission, storage, and usage. The challenge is to retain critical emotion-related information while adhering to ethical and moral regulations. Therefore, a proper balance needs to be found to ensure privacy and the effectiveness of data analysis.

Opportunities

Some potential future opportunities may include the exploration of advanced privacy-protecting techniques such as federated learning Rieke et al. (2020) and secure multi-party computing Liu et al. (2020). Federated learning allows for model training without sharing raw data, and aggregating models without exposing individual data. Secure multi-party computing allows calculations to be performed between multiple parties while maintaining the privacy of the data. These technologies can ensure that patients’ clinical data is adequately protected.

4.2 Emotion Bias and Fairness in Clinical Data

Challenge

In clinical and other healthcare settings, the collection and annotation of sentiment data exhibit a natural bias due to population sentiment expression heterogeneity and annotator subjectivity. Different from data bias issues in common large datasets, the data from healthcare suffers from low data volume and larger diversities of biased factors. For example, environment, age, occupation, and race can all affect the expression and the labelling of emotions Liu et al. (2022). Furthermore, preferences and concerns during data collection and labeling also vary significantly among doctors. These factors could lead to high emotion biases and unfairness in training AI models. As a result, it is needed to address and reduce the bias for a fair and unbiased understanding of emotions across diverse populations.

Opportunities

To lower the biases in affective computing under limited labelled databases, some research is exploring unsupervised/self-supervised learning algorithms to reduce reliance on affective labels. These algorithms can learn emotional representations from unlabeled data, reducing the need for large-scale labeled datasets Han et al. (2023, 2022). Furthermore, the introduction of domain adaptive learning techniques can improve the generality of affective computing models and mitigate affective biases between different cultures and demographic groups.

4.3 Fine-grained Health-related Emotions

Challenge

Most existing healthcare applications rely on single, simplistic affective models like six-class or three-class emotion classification models Ameer et al. (2023). These models fall short of simulating rich emotions from real patients who may be undergoing complicated treatments, making it challenging for doctors to make accurate judgments. As a result, developing fine-grained health-related emotion models for clinical applications remains a key unresolved issue.

Opportunities

Recently, several researchers have proposed composite face expression models based on linguistic descriptions Liu et al. (2022). We believe that this approach facilitates the description of changes in emotional details, thereby guiding doctors to make more informed medical diagnoses. Consequently, constructing multimodal fine-grained emotion models in healthcare applications emerges as a future development direction.

4.4 Real-time Diagnosis with Affective Computing

Challenge

Some medicine applications demand real-time emotional analysis and diagnosis, involving the rapid processing and analysis of large data sets while ensuring accurate emotion recognition, such as mental health monitoring or emergency response systems. Developing efficient algorithms and infrastructure for real-time processing without compromising accuracy is a significant challenge.

Opportunities

Exploring adaptive algorithms and edge computing systems facilitates the capability of real-time emotional analysis with minimal latency. Adaptive algorithms can be directly adjusted and optimized according to different needs to improve the efficiency and accuracy of sentiment analysis. Edge computing systems can discretize computing tasks to edge devices, reducing latency in data transmission and processing for faster real-time sentiment analysis.

4.5 Large Foundation Model-related Applications

Challenge

With the development of visual-linguistic large foundation models, such as GPT-4 Rathje et al. (2023), the significance of large foundation models has been demonstrated across various application domains. Consequently, constructing an affective large foundation model specifically for healthcare could be beneficial to enhance a wider range of clinical applications. However, the acquisition and annotation of specified large foundation models for healthcare and clinical data remain challenging, posing a significant hurdle in the development of healthcare affective foundation model.

Opportunities

Leveraging existing visual-language foundation models to construct the affective large-scale model through transfer learning and cross-modal prompt learning could reduce the dependence on large amounts of training data, thus enhancing the reusability of these models for diverse application tasks Liu et al. (2024). This method can not only improve the effect and generalization ability of affective models but also provide a common basis for emotion recognition in different domains and tasks.

5 Conclusion

This paper provides a comprehensive survey of the application of affective computing in the field of healthcare. Specifically, we provide an overview of the developments in affective computing for healthcare, covering behavior data-based, psychological data-based, and behavior-psychological data-based datasets and approaches. Next, we introduce key healthcare applications, highlighting the top three most frequently used, as well as other related applications. Finally, we summarize the most potential challenges and opportunities in the development of affective computing in healthcare. We believe that this review helps to provide academic and industrial researchers with a comprehensive understanding of the latest advances in affective computing-based healthcare and provides them with guidance.

References

Abdolzadegan et al. [2020] D. Abdolzadegan, MH. Moattar, and M. Ghoshuni. A robust method for early diagnosis of autism spectrum disorder from eeg signals based on feature selection and dbscan method. Biocybern Biomed Eng, 2020.
Alelaiwi [2019] A. Alelaiwi. Multimodal patient satisfaction recognition for smart healthcare. IEEE Access, 2019.
Alghifari et al. [2023] MF. Alghifari, TS. Gunawan, and M. Kartiwi. Development of sorrow analysis dataset for speech depression prediction. In I2MTC, 2023.
Alizadeh and Tabibian [2021] M. Alizadeh and S. Tabibian. A persian speaker-independent dataset to diagnose autism infected children based on speech processing techniques. ICSPIS, 2021.
Alsharif et al. [2022] Z. Alsharif, S. Elhag, and S. Alfakeh. Depression detection in arabic using speech language recognition. In CDMA, 2022.
Ameer et al. [2023] I. Ameer, N. Bölücü, MHF. Siddiqui, B. Can, et al. Multi-label emotion classification in texts using transfer learning. Expert Syst. Appl, 2023.
Anekar et al. [2023] D. Anekar, Y. Deshpande, R. Suryawanshi, R. Waman, V. Divekar, and R. Salunke. Exploring emotion and sentiment landscape of depression: A multimodal analysis approach. GCAT, 2023.
Ayata et al. [2020] D. Ayata, Y. Yaslan, and ME. Kamasak. Emotion recognition from multimodal physiological signals for emotion aware healthcare systems. J Med Biol Eng, 2020.
Bajestani et al. [2019] GS. Bajestani, M. Behrooz, AG. Khani, M. Nouri-Baygi, and A. Mollaei. Diagnosis of autism spectrum disorder based on complex network features. Comput Methods Programs Biomed, 2019.
Baki et al. [2022] P. Baki, H. Kaya, E .Çiftçi, and H. Güleç. A multimodal approach for mania level prediction in bipolar disorder. IEEE Transactions on Affective Computing, 2022.
Baygin et al. [2021] M. Baygin, S. Dogan, T. Tuncer, PD. Barua, O. Faust, N. Arunkumar, EW. Abdulhay, EE Palmer, and UR. Acharya. Automated asd detection using hybrid deep lightweight features extracted from eeg signals. Comput. Biol. Med, 2021.
Cai et al. [2020a] H. Cai, Z. Qu, Z. Li, Y. Zhang, X. Hu, and B. Hu. Feature-level fusion approaches based on multimodal eeg data for depression recognition. Inf Fusion, 2020.
Cai et al. [2020b] H. Cai, Z. Yuan, Y. Gao, S. Sun, N. Li, F. Tian, H. Xiao, J. Li, Z. Yang, X. Li, Q. Zhao, Z. Liu, Z. Yao, et al. A multi-modal open dataset for mental-disorder analysis. Sci. Data, 2020.
Calabrò et al. [2021] G. Calabrò, A. Bizzego, S. Cainelli, C. Furlanello, and P. Venuti. M-ms: A multi-modal synchrony dataset to explore dyadic interaction in asd. Progresses in Artificial Intelligence and Neural Systems, 2021.
Chen and Luo [2023] X. Chen and T. Luo. Catching elusive depression via facial micro-expression recognition. IEEE Commun. Mag., 2023.
Chen and Zhao [2019] S. Chen and Q. Zhao. Attention-based autism spectrum disorder screening with privileged modality. In ICCV, 2019.
Chen et al. [2022] D. Chen, H. Zhang, PT. Kavitha, FL. Loy, SH. Ng, C. Wang, KS. Phua, SY. Tjan, SY. Yang, and C. Guan. Scalp eeg-based pain detection using convolutional neural network. IEEE Trans. Neural Syst. Rehab. Eng., 2022.
Dessai and Usgaonkar [2022] S. Dessai and SS. Usgaonkar. Depression detection on social media using text mining. In INCET, 2022.
Duan et al. [2019] H. Duan, G. Zhai, X. Min, Z. Che, Y. Fang, X. Yang, J. Gutiérrez, and PL. Callet. A dataset of eye movements for the children with autism spectrum disorder. In ACM MM, 2019.
Fernandes-Magalhaes et al. [2023] R. Fernandes-Magalhaes, A. Carpio, D. Ferrera, D. Van Ryckeghem, I. Peláez, P. Barjola, et al. Pain emotion faces database (pemf): Pain-related micro-clips for emotion research. Behav Res Methods, 2023.
Ghosh et al. [2023] A. Ghosh, S. Umer, MK. Khan, RK. Rout, and BC. Dhara. Smart sentiment analysis system for pain detection using cutting edge techniques in a smart healthcare framework. Cluster Computing, 2023.
Giuntini et al. [2020] FT. Giuntini, MT. Cazzolato, MJD. dos Reis, et al. A review on recognizing depression in social networks: challenges and opportunities. Journal of Ambient Intelligence and Humanized Computing, 2020.
Hadoush et al. [2019] H. Hadoush, M. Alafeef, and E. Abdulhay. Eeg analysis using empirical mode decomposition and second order difference plot. Behavioural Brain Research, 2019.
Hamid et al. [2023] DSBA. Hamid, SB. Goyal, and P. Bedi. Integration of deep learning for improved diagnosis of depression using eeg and facial features. Mater. Today., 2023.
Han et al. [2022] J. Han, G. Jiang, G. Ouyang, and X. Li. A multimodal approach for identifying autism spectrum disorders in children. IEEE Trans. Neural Syst. Rehab. Eng., 2022.
Han et al. [2023] Z. Han, Y. Shang, Z. Shao, J. Liu, G. Guo, T. Liu, H. Ding, and Q. Hu. Spatial-temporal feature network for speech-based depression recognition. IEEE Trans Cogn Dev Syst, 2023.
Hossain and Muhammad [2019] MS. Hossain and G. Muhammad. Emotion recognition using secure edge and cloud computing. Information Sciences, 2019.
Ji et al. [2023] Z. Ji, T. Yu, Y. Xu, N. Lee, E. Ishii, and P. Fung. Towards mitigating llm hallucination via self reflection. In EMNLP, 2023.
Jiang et al. [2022] Y. Jiang, Z. Zhang, and X. Sun. Mmda: A multimodal dataset for depression and anxiety detection. In ICPR, 2022.
Jiang et al. [2024] M. Jiang, R. Rosio, S. Salanterä, AM. Rahmani, and other. Personalized and adaptive neural networks for pain detection from multi-modal physiological features. Expert Syst. Appl, 2024.
Lan et al. [2023] YT. Lan, D. Peng, W. Liu, Y. Luo, Z. Mao, WL. Zheng, and BL. Lu. Investigating emotion eeg patterns for depression detection with attentive simple graph convolutional network. In EMBC, 2023.
Li et al. [2021] J. Li, A. Bhat, and R. Barmaki. A two-stage multi-modal affect analysis framework for children with autism spectrum disorder. arXiv preprint arXiv:2106.09199, 2021.
Liu et al. [2020] J. Liu, Y. Tian, Y. Zhou, Y. Xiao, and N. Ansari. Privacy preserving distributed data mining based on secure multi-party computation. Computer Communications, 2020.
Liu et al. [2022] Y. Liu, W. Dai, C. Feng, W. Wang, G. Yin, J. Zeng, and S. Shan. Mafw: A large-scale and multi-modal and compound affective database for dynamic facial expression recognition in the wild. In Proc. 30th ACM Int. Conf. Multimedia, 2022.
Liu et al. [2023] Z. Liu, X. Yuan, Y. Li, Z. Shangguan, L. Zhou, et al. Pra-net: Part-and-relation attention network for depression recognition from facial expression. Comput. Biol. Med, 2023.
Liu et al. [2024] Z. Liu, K. Yang, T. Zhang, Q. Xie, Z. Yu, and S. Ananiadou. Emollms: A series of emotional large language models and annotation tools for comprehensive affective analysis. arXiv preprint arXiv:2401.08508, 2024.
Lu et al. [2023] G. Lu, H. Chen, J. Wei, X. Li, X. Zheng, H. Leng, Y. Lou, and J. Yan. Video-based neonatal pain expression recognition with cross-stream attention. Multimed. Tools. Appl, 2023.
Mallol-Ragolta et al. [2020] A. Mallol-Ragolta, S. Liu, N. Cummins, and B. Schuller. A curriculum learning approach for pain intensity recognition from facial expressions. In FG, 2020.
Manju et al. [2023] T. Manju, Magesh, S. Padmavathi, and Durairaj. Increasing the social interaction of autism child using virtual reality intervention (vri). TALLIP, 2023.
Melinda et al. [2023] M. Melinda, A. Ahmadiar, M. Oktiana, M. ShadiqAdiNugraha, MAL. Qadrillah, and Y. Yunidar. A novel autism spectrum disorder children dataset based on thermal imaging. In ICCCE, 2023.
Meng et al. [2021] W. Meng, Y. Cai, LT. Yang, and WY. Chiu. Hybrid emotion-aware monitoring system based on brainwaves for internet of medical things. IEEE Internet Things J, 2021.
Mohammadi et al. [2019] Y. Mohammadi, M. Hajian, and MH. Moradi. Discrimination of depression levels using machine learning methods on eeg signals. In ICEE, 2019.
Mohan et al. [2022] HM. Mohan, HCS. Kumara, SH. Mallikarjun, and AY. Prasad. Edge artificial intelligence-based facial pain recognition during myocardial infarction. JAMRIS, 2022.
Negin et al. [2021] F. Negin, B. Ozyer, S. Agahian, S. Kacdioglu, and GT. Ozyer. Vision-assisted recognition of stereotype behaviors for early diagnosis of autism spectrum disorders. Neurocomputing, 2021.
Othmani et al. [2021] A. Othmani, D. Kadoch, K. Bentounes, E. Rejaibi, R. Alfred, and A. Hadid. Towards robust deep neural networks for affect and depression recognition from speech. In Pattern Recognition. ICPR International Workshops and Challenges: Virtual Event and January 10–15 and 2021 and Proceedings and Part II, 2021.
Pan et al. [2023] Y. Pan, Y. Shang, Z. Shao, T. Liu, G. Guo, and H. Ding. Integrating deep facial priors into landmarks for privacy preserving multimodal depression recognition. IEEE Trans Affect Comput., 2023.
Peng et al. [2020] X. Peng, D. Huang, and H. Zhang. Pain intensity recognition via multi-scale deep network. IET Image Processing, 2020.
Pham et al. [2020] TH. Pham, J. Vicnesh, JKE. Wei, SL. Oh, N. Arunkumar, EW. Abdulhay, EJ. Ciaccio, and UR. Acharya. Autism spectrum disorder diagnostic system using hos bispectrum with eeg signals. International Journal of Environmental Research and Public Health, 2020.
Phan et al. [2023] KN. Phan, NK. Iyortsuun, S. Pant, HJ. Yang, and SH. Kim. Pain recognition with physiological signals using multi-level context information. IEEE Access, 2023.
Qayyum et al. [2023] A. Qayyum, I. Razzak, M. Tanveer, M. Mazher, and B. Alhaqbani. High-density electroencephalography and speech signal based deep framework for clinical depression diagnosis. IEEE ACM Trans Comput Bi, 2023.
Rahman and Booma [2022] LA. Rahman and PM. Booma. The early detection of autism within children through facial recognition; a deep transfer learning approach. In NTIC, 2022.
Rathje et al. [2023] S. Rathje, DM. Mirea, I. Sucholutsky, R. Marjieh, et al. Gpt is an effective tool for multilingual psychological text analysis. 2023.
Rejaibi et al. [2019a] E. Rejaibi, D. Kadoch, K. Bentounes, R. Alfred, M. Daoudi, et al. Clinical depression and affect recognition with emoaudionet. arXiv preprint arXiv:1911.00310, 2019.
Rejaibi et al. [2019b] E. Rejaibi, D. Kadoch, K. Bentounes, R. Alfred, M. Daoudi, et al. Clinical depression and affect recognition with emoaudionet. ArXiv, 2019.
Rieke et al. [2020] N. Rieke, J. Hancox, W. Li, F. Milletari, HR. Roth, et al. The future of digital health with federated learning. NPJ digital medicine, 2020.
Salekin et al. [2021] MS. Salekin, G. Zamzmi, J. Hausmann, and D. Goldgof…. Multimodal neonatal procedural and postoperative pain assessment dataset. Data in Brief, 2021.
Shen et al. [2023] J. Shen, Y. Zhang, H. Liang, Z. Zhao, K. Zhu, K. Qian, Q. Dong, X. Zhang, and B. Hu. Depression recognition from eeg signals using an adaptive channel fusion method via improved focal loss. IEEE J. Biomed. Health. Inf., 2023.
Smith et al. [2020] M. Smith, BJ. Dietrich, E. Bai, and HJ. Bockholt. Vocal pattern detection of depression among older adults. International journal of mental health nursing, 2020.
Sousa et al. [2023] A. Sousa, K. Young, M. D’aquin, M. Zarrouk, and J. Holloway. Introducing calmed: Multimodal annotated dataset for emotion detection in children with autism. In International Conference on Human-Computer Interaction, 2023.
Thiam et al. [2020] P. Thiam, HA. Kestler, and F. Schwenker. Multimodal deep denoising convolutional autoencoders for pain intensity classification based on physiological signals. In ICPRAM, 2020.
Torres et al. [2023] JMM. Torres, S. Medina-DeVilliers, T. Clarkson, et al. Evaluation of interpretability for deep learning algorithms in eeg emotion recognition: A case study in autism. Artificial Intelligence in Medicine, 2023.
Vallez et al. [2022] N. Vallez, J. Ruiz-Santaquiteria, O. Deniz, J. Hughes, S. Robertson, K. Hoti, and G. Bueno. Adults’ pain recognition via facial expressions using cnn-based au detection. In ICIAP, 2022.
Wadhera and Kakkarl [2021] T. Wadhera and D. Kakkarl. Social cognition and functional brain network in autism spectrum disorder: Insights from eeg graph-theoretic measures. Biomedical Signal Processing and Control, 2021.
Wang et al. [2021] X. Wang, S. Zhao, and Y. Wang. Bimodal emotion recognition for the patients with depression. In ICSIP, 2021.
Wang et al. [2023a] X. Wang, X. Wan, Z. Ning, Z. Qie, J. Li, and Y. Xiao. A multimodal fusion depression recognition assisted decision-making system based on eeg and speech signals. In CCCI, 2023.
Wang et al. [2023b] Z. Wang, B. Deng, X. Shu, and J. Shu. Multimodal depression detection model fusing emotion knowledge graph. In ICAIBD, 2023.
Wei et al. [2023] P. Wei, D. Ahmedt-Aristizabal, H. Gammulle, S. Denman, and MA. Armin. Vision-based activity recognition in children with autism-related behaviors. Heliyon, 2023.
Xia et al. [2023] M. Xia, Y. Zhang, Y. Wu, and X. Wang. An end-to-end deep learning model for eeg-based major depressive disorder classification. IEEE Access, 2023.
Xu et al. [2023] X. Xu, G. Zhang, Q. Lu, and X. Mao. Multimodal depression recognition that integrates audio and text. In ISCEIC, 2023.
Yan et al. [2020] J. Yan, G. Lu, X. Li, W. Zheng, C. Huang, Z. Cui, Y. Zong, M. Chen, Q. Hao, Y. Liu, J. Zhu, and H. Li. Fenp: a database of neonatal facial expression for pain analysis. IEEE Trans Affect Comput., 2020.
Yao et al. [2022] B. Yao, C. Shi, L. Zou, L. Dai, M. Wu, L. Chen, Z. Wang, and K. Yu. D4: a chinese dialogue dataset for depression-diagnosis-oriented chat. arXiv preprint arXiv:2205.11764, 2022.
Yildirim-Celik et al. [2022] H. Yildirim-Celik, S. Eroglu, and K. Oguz… Emotional context effect on recognition of varying facial emotion expression intensities in depression. Journal of Affective Disorders, 2022.
Yin et al. [2022] W. Yin, C. Yu, P. Wu, W. Jiang, Y. Liu, T. Ren, and W. Dai. An intelligent mobile system for monitoring relapse of depression. In CSCW, 2022.
Yoon et al. [2022] J. Yoon, C. Kang, S. Kim, and J. Han. D-vlog: Multimodal vlog dataset for depression detection. In Proceedings of the AAAI Conference on Artificial Intelligence, 2022.
Yuan et al. [2022] X. Yuan, S. Zhang, C. Zhao, X. He, B. Ouyang, and S. Yang. Pain intensity recognition from masked facial expressions using swin-transformer. In ROBIO, 2022.
Zhang et al. [2023] F. Zhang, M. Wang, J. Qin, Y. Zhao, X. Sun, and W. Wen. Depression recognition based on electrocardiogram. ICCCS, 2023.
Zhao and Wang [2022] Z. Zhao and K. Wang. Unaligned multimodal sequences for depression assessment from speech. In EMBC, 2022.
Zhou et al. [2020] X. Zhou, K. Jin, Y. Shang, and G. Guo. Visually interpretable representation learning for depression recognition from facial images. IEEE Transactions on Affective Computing, 2020.
Zhu et al. [2019] J. Zhu, Y. Wang, R. La, J. Zhan, J. Niu, S. Zeng, and X. Hu. Multimodal mild depression recognition based on eeg-em synchronization acquisition network. IEEE Access, 2019.