License: CC BY 4.0
arXiv:2312.06024v1 [cs.HC] 10 Dec 2023

Thinking Assistants: LLM-Based Conversational Assistants that Help Users Think By Asking rather than Answering

Soya Park soya.park@emory.edu and Chinmay Kulkarni chinmay.kulkarni@emory.edu Emory UniversityUSA
Abstract.

We introduce the concept of thinking assistants, an approach that encourages users to engage in deep reflection and critical thinking through brainstorming and thought-provoking queries. We instantiate one such thinking assistant, , as a virtual assistant tailored to assist prospective graduate students. We posit that thinking assistants are particularly relevant to situations like applying to graduate school, a phase often characterized by the challenges of academic preparation and the development of a unique research identity. In such situations, students often lack direct mentorship from professors, or may feel hesitant to approach faculty with their queries, making thinking assistants particularly useful.

Leveraging a Large Language Model (LLM), is a demonstration system built as a thinking assistants for working with specific professors in the field of human-computer interaction (HCI). It was designed through training on information specific to these professors and a validation processes in collaboration with these academics. This technical report delineates the system’s architecture and offers a preliminary analysis of our deployment study. Additionally, this report covers the spectrum of questions posed to our chatbots by users. The system recorded 223 conversations, with participants responding positively to approximately 65% of responses. Our findings indicate that users who discuss and brainstorm their research interests with engage more deeply, often interacting with the chatbot twice as long compared to those who only pose questions about professors. We also find that participants who use the system as an informational or predictive agent, seeking definitive answers such as a direct ’yes’ or ’no’ to queries like “Do I have a chance of getting into the university?” find the thinking assistants’s reflective conversations unsatisfactory.

Refer to caption
Figure 1. is a thinking assistant that helps prospective graduate students reflect on research interests. Based on the user’s question, the assistant chooses between a ‘probing’ mode, understanding user interests and prompting reflection, and an ‘answering’ mode that offers with relevant information. Most conversational turns end with a encouraging follow-up question.

1. Introduction

Many of life’s most important decisions are challenging because they inherently require a degree of self-discovery in the absence of direct and readily accessible guidance. For instance, in career transitions, individuals must navigate through a maze of choices and opportunities, such that many choices can be reasonable, depending on one’s goals. Navigating these choices is challenging as an individual: choosing a new career path or pivoting to a different industry involves not only assessing one’s skills and interests but also understanding the nuances of new fields, often without any mentorship.

This paper introduces the idea of a thinking assistant that combines expertise-based guidance with scaffolding for self-reflection. We introduce , a set of thinking assistants designed to support prospective graduate students as they navigate the complexities of graduate applications. For many prospective graduate students, the journey to and through graduate school, marked by the need for critical decision-making and self-discovery, can be overwhelming, especially in the absence of direct, readily accessible guidance from more senior academics. The primary agent that comprises aids prospective graduate students by engaging users through targeted questioning, fostering deeper thought and refinement of their research concepts. The chatbot’s primary objective is to probe users’ research interests, emphasizing active engagement over passive reception of information; to promote commitment to research interests and goals (similar to a ‘saying-is-believing’ effect (Higgins and Rholes, 1978)). We augment this active engagement with wise feedback on self-reflection (Walton and Wilson, 2018), ie. offering constructive feedback based on high standards and affirms the users’ capabilities to meet these standards.

To ask thought-provoking questions, and to provide meaningful feedback, the system uses a wide array of information from participating professors, including details about their research evolution, mentoring methods, and key qualities, to create wide-ranging reflective support for students. To prevent factual inaccuracies, we also additionally implemented safety measures, including a secondary virtual assistant that corrects any discrepancies in the chatbot’s messages. Developed using GPT-4, the chatbot went through various design and testing phases with professors to accurately represent their areas of expertise, integrating information about their latest publications and other relevant materials into the assistant’s database.

Results from our deployment in Fall 2023 suggest the benefits of the thinking-assistants approach. We found that users had more satisfying experiences when conversations focused on their own research interests rather than on the professors’; out of 223 conversations analyzed, those in which users shared personal information, like their research interests, saw higher satisfaction and engagement levels, averaging six messages per conversation. In contrast, conversations lacking such personal disclosures were typically less satisfying and involved fewer questions. We also found that some users expected to be an informational, rather than reflective agent: such users were frustrated with our chatbot when it did not provide definitive answers to decisions regarding graduate applications, and occasionally made inaccurate paper recommendations.

While is designed for graduate school applicants, a wide variety of decisions could potentially benefit from agents that aid self-reflection and offer expertise-based guidance. For instance, complex decisions regarding investments, savings, and budgeting must be made by all, not just those with a formal education in finance. Similarly, individuals seeking to adopt healthier lifestyles must make choices about diet, exercise, and mental health practices. Due to the complexity of choices, where the same choice could be appropriate or inappropriate for different individuals, the absence of personalized guidance or reflective support can often make the decision-making process feel overwhelming. We hope that our work on offers inspiration for building thinking assistants for these and other important applications.

2. Related Work

2.1. Using Agents to Personalize Guidance

Prior work has leveraged machine-learning enbled chatbots to create personalized interactions at scale, especially in situations where there is limited availability of experts. For instance, in education settings, intelligent agents can help students navigate complex administrative logistics and academic requirements, filling in for the limited availability of administrative staff (Mitra et al., 2023). Similarly, in healthcare, to mitigate the limited availability of medical professionals, researchers have developed chatbots capable of handling basic health inquiries (Bickmore et al., 2018; Xiao et al., 2023). Often, these systems help triage questions, allowing experts to focus on the most challenging issues.

At the same time, such dialogue agents can also help in ways that humans cannot. For instance, in providing mental health support, such agents allow some participants to be more disinhibited in their conversation (Ma et al., 2023). Thus, in addition to interpreting natural language queries and accessing knowledge bases, the non-human nature of these agents may be helpful in applications where people may be hesitant in asking others for help (Ma et al., 2023; Sharma et al., 2023).

is similar to these systems in that it offers personalized guidance by delving deep into users’ queries and shaping conversations for graduate school exploration. We specifically aim to not replace professors. Instead, our thinking assistants are designed to provide an additional, different, channel for student interaction. offers personalized guidance by delving deeper into users’ queries and shaping conversations for graduate school exploration.

Our work is also motivated by prior work that finds that junior researchers often find it stressful to interact with senior academics, leading to over-preparation for meetings (Park et al., 2024). By contrast, virtual assistants offer a more relaxed setting where students can more freely explore and ask questions they might hesitate to pose to professors for fear of being seen as unprepared. Such exploration can then help students to determine their interest in specific research topics or graduate programs, and gauge their interest in reaching out to professors. Interacting with professors in turn offers unique benefits, such as gaining insights into research and building social capital. While not the focus of this study, we conjecture that thinking assistants may help students better prepare for such meetings in a less-stressful way.

2.2. Conversational Agent Safety

Ensuring the safety of conversational assistants, particularly those not designed for information seeking, remains a complex challenge. This is partly because the required safety measures vary depending on the CA’s purpose. Most current measures focus on the accuracy of information provided by conversational assistants, which may not be entirely suitable for free-flowing conversations (Zubatiy et al., 2023). Our work diverges from the typical information seeking focus. The primary objective of our system, , is to aid students in their thinking process within the context of a specific professor’s knowledge, as represented by their virtual assistant. While it’s essential to provide some critical and frequently asked information of the professors for students’ academic journey, we also recognize the limitations of LLMs, such as the potential for hallucination (Dziri et al., 2022; Ji et al., 2023)). To address these limitations, we introduce a ’safety bot’ whose sole function is to verify the accuracy of the information provided by the professors’ virtual assistants, creating a more reliable interaction for users.

In addressing the challenge of ensuring the safety of conversational agents, the field has extensively explored various methods. One effective strategy involves structuring the conversation around expected domain-specific topics, and steering conversations away from other topics. For instance, a CA designed for booking flight tickets might predominantly navigate branches related to gathering travel itinerary details, selecting tickets, and processing payments. While debugging conversational assistants across these broad conversational spectrums can be challenging, focusing on individual branches makes the task more feasible, allowing for more precise shaping and testing of the conversation flow for chatbot developers (Klemmer et al., 2000; Choi et al., 2021). Additionally, the incorporation of experts and inclusive datasets in training further bolsters the safety of conversational assistants (Aroyo et al., 2023; Xiao et al., 2023).

3. System Description: A thinking assistant to aid reflection with probing and information

is a thinking assistant system developed to assist prospective graduate students in shaping their research trajectory. It engages users through targeted questioning to foster deeper thought and refinement of their research concepts, and provides relevant information to aid this reflection whenever possible.

Our primary design goal for the system is to be supportive and encouraging, with a secondary objective of offering tangible assistance and facilitating exploration in the user’s research journey. This dual approach has been demonstrated to be more effective and preferred over mere positivity and encouragement (Sharma et al., 2023). Additionally, the chatbot focuses on the users’ strengths, aiding in amplifying these positive attributes (Park and Kulkarni, 2023). In this section, we outline the conversational strategies of and its training and testing processes.

3.1. Probing and Answering

Our chatbot operates in two modes: probing users’ research identity and responding to inquiries regarding a professor. We divided it into these two modes (probing and answering) to achieve better control over the conversation flow and enhance testing efficacy (Choi et al., 2021). At the end of each message, our chatbot poses an encouraging question (see Fig. 1). Table 1 presents various types of responses from our chatbots, along with examples.

Table 1. A list of message types and examples from our chatbots
Type of message Examples

Ask questions

“That sounds like an interesting research area! Have you thought about how you would incorporate [research technique] to help users in [research topic]?”
“That’s an intriguing perspective! [Research interest] to enhance cognitive processes and improve thinking goes beyond pure conversational applications … What specific areas or domains do you envision exploring where LLMs can contribute to improving thinking processes?”

Recommend the professor’s relevant papers

“That’s a great research interest! [The professor]’s work in social computing systems and human-computer interaction aligns well with your focus on improving online spaces, particularly in empowering users to have more control over their online experiences. Based on your interest, I recommend exploring these papers from [the professor]’s publications: … You may find these papers informative and inspiring for developing your research interests further. Let me know if you have any specific questions or if there’s anything else I can assist you with!”

Give Wise Feedback on user’s research

“[The professor] values students who are dedicated, curious, and willing to explore new areas of research. He values intellectual curiosity, perseverance, and a passion for pushing the boundaries of knowledge. Given your dedication to [a research area], it is clear that you have the drive and commitment necessary for research. Your journey of [research] will provide unique insights and perspectives to your work. Your determination to understand [research topic]. It’s important to take your time to re-learn foundational knowledge, as it will serve as a strong base for future exploration. As you deepen your understanding, consider seeking opportunities in [a research field] that align with your research interests. [The professor]’s group focuses on user-centric design and developing tools, so exploring the intersection between [research area] may be a fruitful area to investigate. Keep up the hard work and continue pursuing your academic goals! If you have any specific questions or need further guidance, please don’t hesitate to reach out.”

Respond to informational questions about the professor

“[The professor] typically hires students who have a strong research background and a solid foundation in computer science. He has mentioned in the past that most of his successful PhD students had previous research experience or a strong background in user-interface development. That being said, it is important to note that each student’s fit with a particular advisor is determined by various factors, including research interests, compatibility of working styles, and available funding. It’s best to reach out to [the professor] directly to discuss your specific research interests and background to get a more accurate assessment of the fit between your profile and his research group. Would you like any specific recommendations regarding research papers or works related to your interests?”

Collaborate on hypothesizing answers to research-relevant questions

“[The professor] has published extensively on [a research area]. While I don’t have access to his specific views on AI and health, based on his research interests, it can be inferred that he recognizes the tremendous potential of AI in improving healthcare outcomes. [The professor’s previous research relevant to the question] For a more accurate and detailed understanding of [the professor]’s views on AI and health, I recommend reviewing his published papers and [the professor’s contact policy] for any specific inquiries or discussions you may have.”

Redirect non research-relevant questions

“As an AI language model, I don’t have personal hobbies or interests. My purpose is to assist and provide information on academic and research-related topics. If there’s anything specific you’d like to discuss or any questions you have related to academia or research, feel free to let me know!”
Refer to caption
Figure 2. interface. Left: Before a user initiates conversation, it shows description of the chatbot. In the interface, we suggest users starting by sharing about their research interest. Right: The chatbot primarily focuses on asking thought-provoking questions and probing users’ research interest. Every three turns of messages, the interface asks users binary feedback before they can continue the conversation.

3.1.1. Probing

Our primary goal is to engage users in exploring their research interests through probing. Adopting the ‘saying-is-believing’ approach (Aronson, 1999), our chatbot poses targeted questions that deepen user engagement with their research topics. This method actively involves individuals in reflecting on an idea rather than passively receiving information. This strategy also sidesteps the stigma associated with passive receipt. Encouraging individuals to champion an idea increases its persuasive power, and by relating the idea to their personal experiences, users can personalize and internalize the concept, thereby making standardized materials resonate with a diverse audience (Aronson, 1999). uses strategic inquiries to encourage users to thoroughly analyze their work, leading to tailored responses and academic recommendations.

Once the system has an initial understanding of the users’ research goals, it offers constructive feedback based on the Wise Feedback framework (Cohen et al., 1999; Yeager et al., 2014). This feedback is twofold: firstly, it establishes the high standards used to evaluate the research and offers relevant advice; secondly, it affirms the users’ ability to meet these standards, buffering against losing confidence in their research endeavors due to critical feedback.

3.1.2. Answering

Our system also provides some limited guidance to users in choosing their academic advisors. The chatbot provides information regarding professor’s research and advising style, which are a piece of critical information for determining a fit with a potential advisor. To do so, we compiled information from participating professors to effectively aid prospective graduate students. This data includes research evolution and direction, mentoring and guidance, group dynamics and collaborations, post-PhD prospects, student support, and key qualities sought in graduate students. For queries outside these areas, particularly regarding professors’ personal opinions, the chatbot clarifies its limitations. In situations where user questions diverge broadly from research-related topics, the chatbot redirects the conversation back to research-focused discussions, ensuring that the dialogue remains relevant and productive.

Where appropriate, it also attempts to retrieve relevant information based on model-pretraining data. For example, the chatbot assists users by explaining research terminology and addressing any follow-up questions related to their queries. This proactiveness during information-seeking also possibly improve users’ trust on our chatbot (Xiao et al., 2023) which could lead users share more about their research interests with our chatbot.

3.2. Safety through a secondary agent

Concerning sensitive information like professors’ recruitment and contact policies, is instructed to provide only verified facts. Considering the challenges of filtering real-time streaming messages from an LLM, we introduced a secondary virtual assistant. This assistant’s primary role is to review and correct any discrepancies in the chatbot’s messages, ensuring they align with the information provided by the professors, as depicted in Fig. 3.

Our chatbot is programmed to refrain from making definitive statements about professors’ opinions or personal information unless such information is verifiably sourced. It is designed to acknowledge its lack of knowledge rather than provide uncertain or speculative information. To prevent potential misrepresentation concerns, we communicate to users that the chatbot is a virtual assistant with knowledge about the professors, not an entity impersonating them in multiple ways, including static descriptions on the webpage, and in conversational responses.

Refer to caption
Figure 3. Safety bot: This bot is solely designed to correct critical factual information regarding the professor. We found that such a secondary bot imnproved factual safety, while suggesting an understandable mental model to users.

3.3. Training and Testing

Capturing an expert’s knowledge comprehensively in a chatbot development is challenging (Xiao et al., 2023). To ensure each professor is accurately represented, we’ve developed a method that allows professors to easily evaluate and verify the representation of their expertise in our system. Our strategy employs structured responses, which provide a clear, organized framework that parallels the expert’s knowledge. This approach simplifies the process for professors to assess and confirm the authenticity and accuracy of the information in the dataset.

We compile training data by sourcing information from their academic webpages and requesting they contribute additional materials. Professors have provided items such as recent talk summaries, lab manuals, and grant proposals.

Leveraging GPT-4, we draft responses to the following six questions, which act as “teaching data” for our assistants:

  1. (1)

    Research Evolution & Direction: How have your research areas evolved over the past few years, and where do you see them going in the near future?

  2. (2)

    Mentoring & Guidance: How do you structure your interactions with PhD students to ensure they receive adequate guidance and feedback?

  3. (3)

    Group Dynamics & Collaborations: Can you describe the nature of collaborations within your group and with external teams or departments?

  4. (4)

    Post-PhD Prospects: How has the research experience in your group equipped students for their post-PhD careers?

  5. (5)

    Supporting Students: How do you handle situations when a student is stuck or facing challenges in their research?

  6. (6)

    Key Qualities: What are the key qualities or attributes that have stood out in the most successful PhD students you’ve mentored?

After drafting answers to these questions, we shared the document with the professors for their review and endorsement. Once approved, we integrated the responses into a chatbot, which was then presented to the professors for a final evaluation. The professors interacted with and provided feedback on the virtual assistant prior to its official launch, with some inviting their students to participate in testing as well.

Participating professors received a general summary of interactions with their designated chatbots, such as common question themes by users, and answers provided. They retained the authority to discontinue our hosting of assistants related to their work at any time, at their discretion.

3.4. Implementation

is built using the Next.js framework. For our backend, we utilize OpenAI’s Streaming API with the gpt-4 model for most functionalities. However, for the answering mode — which requires higher token usage due to including information about professors’ publications — we employ the gpt-3.5-turbo-16k model. Please refer to Appendices for the actual prompts used in our system.

We store consenting users’ conversations occurring on our interface using Firebase, and track anonymous usage patterns using Google Analytics.

4. Evaluation

We deployed and observed the types of conversations people had with our chatbot in a field deployment in November 2023.

4.1. Procedure

Upon selecting a professor to initiate conversations in the interface, users were presented with a consent form and could only start conversations upon consenting.

was deployed on November 3rd, 2023. Initially, there were chatbots for eight professors, but one was removed at the request of the respective professor. (The professor was concerned that student might receive erroneous information, when the chatbot incorrectly generated the professor’s email and lab name acronym, lacked information about their students and provided incorrect informtio that the the admission committee is in the university, but actually is in their department) A total of 173 users initiated conversations with at least one of our chatbots, resulting in 223 conversations between November 3rd and 10th, 2023.

We excluded conversations used for testing purposes by the professors or explicitly stated as tests in the chat messages from our analysis. Conversations in non-English languages were also excluded.

The university’s Institutional Review Board (IRB) reviewed and approved our deployment study (IRB ID #6809). All research personnel conducting human subjects research completed human subjects protection training through the online CITI course.

4.2. Measures

To understand the nature of student interactions with our chatbot and assess their satisfaction, we collected data on the following metrics:

4.2.1. Types of Questions

We analyzed the variety of questions posed by users to our chatbot to understand their primary concerns and interests.

4.2.2. Utterance Analysis

We evaluated the language patterns in conversations using the Linguistic Inquiry and Word Count (LIWC) tool (Pennebaker et al., 2001), which provided insights into the linguistic characteristics of the dialogues.

4.2.3. User Rating

To obtain immediate user feedback, we asked users to rate their conversation after every three messages, using a thumbs up or thumbs down response system.

4.3. Results

Of the 179 user ratings collected, 116 were positive. Table 2 summarizes the different types of user messages and their frequency.

Table 2. A list of message types from users (each message can have more than one type)
Type of message Count Example
Professor’s research area 104 “How does [the professor]’s research connects to collective intelligence and sensemaking?”
Sharing their own experience or interest 55 “I am currently a data scientist working in [research field]. My research interests lie at … Would this be a good fit with [the professor]’s work?”
PhD Program and Application Queries 44 “What is [the professor] looking for in a PhD student? ”
Specific Requests and Clarifications 41 “Can you please recommend [the professor]’s papers I should read?”
Technical and Project-Based Queries 40 “What is [the professor] thought on AR + AI? And how it could potentially change the way how people interact?”
Not research-related 38 “What is your hobbies? :)”
Advising Style and Professional Interactions 29 “Is [the professor] good to work with?”
Career Guidance and Professional Development 16 “How do I make myself stand out in my statement of purpose”
Misc 9 “What is their pronoun?”, “Who are [the professor]’s students?”

4.3.1. Talking about You vs. Professor

Our analysis indicates that users are more satisfied when the conversation centers on themselves rather than the professor. There is a significant difference in the messaging patterns of users (as shown in Table 3) and chatbots (as shown in Table 4) between conversations with positive and negative ratings. Users tend to rate conversations positively when they talk about themselves, as indicated by the use of I (p = .01). Notably, there is more than twice the usage of Achieve in negatively rated conversations (p = .02). Among these negative Achieve messages, most (78%) inquire about the professor’s achievements (e.g., “What is his latest work?”). In contrast, in positively rated messages, 83% of Achieve references are about the users’ own achievements (e.g., “I am interested in working on [a research topic]”). Similarly, when examining the chatbot’s responses, users tend to be more satisfied when the chatbot discusses the user’s work (You, p ¡ .001) rather than the professor’s (SheHe, p = .06).

We also observed differences in engagement levels based on whether students discussed their own work. Out of 223 conversations, 150 did not include personal information from the users, such as research interests or experiences. Conversely, 73 conversations involved users sharing such details. This variance in self-disclosure is associated with a significant difference in engagement levels (t = -4.46, p ¡ .001). When users did not share personal information, they typically asked a few questions about the professor and ended the conversation after an average of three queries. On the other hand, users who discussed their own work engaged more actively, with an average of six messages per conversation, which indicates a doubling in interaction levels.

Table 3. A list of LIWC word types in user messages that shows a statistically significant difference (p ¡ 0.05) in frequency or usage between conversations rated as positive and those rated as negative
Word type Rating (Negative / Positive) p-value
Affiliation 0.20 .002
I 0.55 .01
Insight 0.42 .01
Achieve 2.13 .02
Posemo 0.58 .03
Health 0.0 .03
CogProc 0.67 .04
Social 0.76 .05
Table 4. A list of LIWC word types in chatbot messages that shows a statistically significant difference (p ¡ 0.05) in frequency or usage between conversations rated as positive and those rated as negative. Word types that are highly significant are listed first.
Word type Rating (Negative / Positive) p-value
Pronoun 0.82 ¡ .001
You 0.63 ¡ .001
Percept 0.62 ¡ .001
Hear 0.09 ¡ .001
Ipron 0.68 ¡ .001
Adverb 0.76 ¡ .001
Interrog 0.48 ¡ .001
Informal 0.22 .01
Power 1.37 .02
Negate 1.66 .02
Verb 0.89 .03
Ppron 0.87 .03
Prep 1.07 .03
Male 1.55 .03
Cause 0.74 .03
Quant 0.81 .04
Posemo 0.82 .04
Female 0.0 .04
Relativ 0.89 .05
Nonflu 0.32 .05

4.3.2. Frustration with ambiguity

We observed user frustration when the chatbots did not provide definitive answers to their questions. Users often attempted to elicit ’yes or no’ responses by reformulating their queries, seeking to limit the chatbot’s replies (e.g., “Could you discuss the likelihood that [the professor] would be interested in working with me, ranging from very unlikely to very likely, based on his publications, research interests, and current student composition?”). However, adhering to our guidelines, the chatbots did not provide the straightforward answers users sought.

The primary function of our system is to engage users through reflective/inquisitive dialogue, and users seem to generally enjoy learning by responding to it. However, this approach sometimes led to dissatisfaction. For example, a user expressed exasperation in response to the chatbot’s series of questions: “Why do you keep asking me questions? I want to know if [the professor’s] research interests are a good fit for me.” This situation highlights the challenge of balancing the chatbot’s questioning strategy with users’ expectations for direct answers.

User frustration also arose regarding inaccurate paper recommendations made by the chatbot. Of the 146 paper recommendations, four in one conversation were either non-existent or incorrectly attributed to different authors. Analysis of these interactions revealed that users requested papers not authored by the professors, a scenario our system was not designed to handle. During one conversation, after the chatbot had asked several probing questions, it began recommending papers by the professor. When the user requested additional papers by other researchers, they later discovered inaccuracies in the authors’ names of previously recommended papers. This incident underscored the need for more accurate tracking and presentation of publication data in our chatbot’s recommendation system. Although our chatbot was instructed to recommend only the professors’ recent publications, as detailed in the System prompt in the Appendix, it deviated from this instruction, reflecting known challenges in LLMs regarding the creation of fictitious paper titles and authors (Dziri et al., 2022; Ji et al., 2023). Going forward, we plan to alert users about the potential inaccuracies of recommendations outside the professors’ publications and to enhance our safety bot’s verification of suggested papers.

Similarly, while not common, users occasionally inquired about other graduate students who worked with the professor. In one instance, a user commented: “It’s hallucinating about some of the PhD alumni of a faculty member.” When asked about a professor’s students, our chatbot, which sources information from GPT, mistakenly provided names of collaborators, such as postdocs. Despite the infrequency and uncertain relevance of such queries to graduate applications, we recognize the importance of further investigation. Future work could integrate this information into the chatbot’s system prompt to ensure more accurate responses, especially if such information helps users’ self-reflection.

Conversely, the issue of providing incorrect professor email addresses did not occur during our deployment. This was preemptively addressed after a professor raised concerns about the dissemination of incorrect email addresses. We refined our prompts to include only the email addresses of professors who consented to be contacted, ensuring the accuracy of all email addresses mentioned by our chatbots since this modification.

5. Discussion

5.1. Balancing self-reflection and information retrieval

Our findings suggest that many users had positive experiences interacting with our chatbot, especially when sharing and discussing their research interests, as opposed to merely asking questions about professors. Thinking assistants recognizing the importance of brainstorming research topics in aiding prospective students to develop their ideas, especially as they often struggle to formulate research concepts independently (Schempp and Johnson, 2006). Our chatbot’s approach of deeply probing into students’ interests, by asking thought-provoking questions rather than passively delivering information (Clough, 2007), offers personalized research guidance. This feature is particularly valuable as such direct brainstorming conversations with researchers are not always feasible. Even when students have access to resources, they might feel intimidated about approaching senior researchers or hesitant to discuss their uncertainties about graduate school and research exploration (Park et al., 2024).

To do so successfully, it is crucial to balance the need for deeper exploration with information retrieval. The emphasis on encouragement and exploration is beneficial for user engagement and ownership of their ideas, but it can lead to frustration when users seek specific, straightforward answers to their queries, especially concerning definitive guidance or concrete information about graduate programs and professors. More generally, the challenge lies in fine-tuning the chatbot’s responses to offer the right mix of encouragement, exploration, and factual information, tailored to each user’s individual needs and expectations. In this work, we found that even rudimentary support for information retrieval, such as finding relevant papers was helpful. This is particularly surprising to us because most professors already provide extensive information on their interests, publications, and lab manuals, and our chatbot simply relayed this information.

Finally, there was a notable preference among users for concrete answers, particularly for questions that related to the uncertainty in the process, such as likelihood of success in graduate applications. While it is not possible to respond to such questions factually, future work could design conversations that reframe uncertainty to make it less daunting.

5.2. AI Assistance in high-stakes decision-making

A significant risk associated with thinking assistants is the potential for misleading users about important decisions. For example, the complexity of graduate admissions raise the possibility that the chatbot might inadvertently provide incorrect or oversimplified advice, stemming from limitations in its training data or misinterpretations of user queries. Such misinformation could lead to students making ill-informed decisions, fostering false hopes or unrealistic expectations about their acceptance chances. Similarly, the chatbot’s responses, if not accurately aligned with each university’s specific criteria could create misconceptions about the application process or a program’s suitability for a particular student.

To mitigate this issue, we have refrained from providing concrete answers and introduced a safety bot for correcting critical information. Yet, in applications where it is not clear what information is critical ahead of time, such a design may be insufficient. (For example, academic programs have specific deadlines which are known a-priori to be critical information. But for thinking assistants helping with healthy eating, for instance, what is critical may be less clearly defined.)

The introduction of AI asssitants could also alter social relationships. For example,existence of a thinking assistant might inadvertently convey the impression that professors are less accessible or approachable. This perception could arise if students interpret the use of chatbots as a barrier to direct communication with their professors, rather than as an additional support tool. Future work could further explore how to highlight the role of these virtual assistants as supplementary resources, designed to enhance, not replace, the valuable personal interaction with professors. Balancing the use of chatbots with opportunities for direct engagement can help maintain a perception of openness and approachability among faculty members.

6. Conclusion

This technical report provides a comprehensive overview of the system architecture and evaluation of a thinking assistant, . Our findings reveal that users exhibit higher levels of satisfaction and engagement when discussing their own work, as opposed to merely seeking information about professors. Future work could further develop thinking assistant design patterns, and their dual roles in aiding self-reflection and providing information.

Acknowledgements.
We thank our participating professors for their participation and feedback, participants who consented in our study, and public and private feedback through social media and email that helped make this report stronger.

References

  • (1)
  • Aronson (1999) Elliot Aronson. 1999. The power of self-persuasion. American Psychologist 54, 11 (1999), 875.
  • Aroyo et al. (2023) Lora Aroyo, Alex S. Taylor, Mark Diaz, Christopher M. Homan, Alicia Parrish, Greg Serapio-Garcia, Vinodkumar Prabhakaran, and Ding Wang. 2023. DICES Dataset: Diversity in Conversational AI Evaluation for Safety. arXiv:2306.11247 [cs.HC]
  • Bickmore et al. (2018) Timothy W. Bickmore, Ha Trinh, Reza Asadi, and Stefán Ólafsson. 2018. Safety First: Conversational Agents for Health Care. In Studies in Conversational UX Design. https://api.semanticscholar.org/CorpusID:57760425
  • Choi et al. (2021) Yoonseo Choi, Toni-Jan Keith Palma Monserrat, Jeongeon Park, Hyungyu Shin, Nyoungwoo Lee, and Juho Kim. 2021. Protochat: Supporting the conversation design process with crowd feedback. Proceedings of the ACM on Human-Computer Interaction 4, CSCW3 (2021), 1–27.
  • Clough (2007) Michael P. Clough. 2007. What is so Important about Asking Questions. https://api.semanticscholar.org/CorpusID:229771808
  • Cohen et al. (1999) Geoffrey L Cohen, Claude M Steele, and Lee D Ross. 1999. The mentor’s dilemma: Providing critical feedback across the racial divide. Personality and social psychology bulletin 25, 10 (1999), 1302–1318.
  • Dziri et al. (2022) Nouha Dziri, Sivan Milton, Mo Yu, Osmar Zaiane, and Siva Reddy. 2022. On the origin of hallucinations in conversational models: Is it the datasets or the models? arXiv preprint arXiv:2204.07931 (2022).
  • Higgins and Rholes (1978) E Tory Higgins and William S Rholes. 1978. “Saying is believing”: Effects of message modification on memory and liking for the person described. Journal of Experimental Social Psychology 14, 4 (1978), 363–378.
  • Ji et al. (2023) Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of Hallucination in Natural Language Generation. ACM Comput. Surv. 55, 12, Article 248 (mar 2023), 38 pages. https://doi.org/10.1145/3571730
  • Klemmer et al. (2000) Scott R Klemmer, Anoop K Sinha, Jack Chen, James A Landay, Nadeem Aboobaker, and Annie Wang. 2000. Suede: a wizard of oz prototyping tool for speech user interfaces. In Proceedings of the 13th annual ACM symposium on User interface software and technology. 1–10.
  • Ma et al. (2023) Zilin Ma, Yiyang Mei, and Zhaoyuan Su. 2023. Understanding the benefits and challenges of using large language model-based conversational agents for mental well-being support. arXiv preprint arXiv:2307.15810 (2023).
  • Mitra et al. (2023) Reshmi Mitra, Dana Schwieger, and Robert Lowe. 2023. Managing Graduate Student Advisement Questions during a Season of Explosive Growth: Development and Testing of an Advising Chatbot. Information Systems Education Journal 21, 3 (2023), 12–23.
  • Park and Kulkarni (2023) Soya Park and Chinmay Kulkarni. 2023. Retrospector: Rapid Collaborative Reflection to Improve Collaborative Practices. Proc. ACM Hum.-Comput. Interact. 7, CSCW2, Article 293 (oct 2023), 20 pages. https://doi.org/10.1145/3610084
  • Park et al. (2024) Soya Park, Jaeyoon Song, David Karger, and Thomas Malone. 2024. Who2chat: A Social Networking System for Academic Researchers in Virtual Social Hours Enabling Coordinating, Overcoming Barriers and Social Signaling (To appear). Proc. ACM Hum.-Comput. Interact. (2024).
  • Pennebaker et al. (2001) James W Pennebaker, Martha E Francis, and Roger J Booth. 2001. Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates 71, 2001 (2001), 2001.
  • Schempp and Johnson (2006) Paul G Schempp and Sophie Woorons Johnson. 2006. Learning to see: Developing the perception of an expert teacher. Journal of Physical Education, Recreation & Dance 77, 6 (2006), 29–33.
  • Sharma et al. (2023) Ashish Sharma, Kevin Rushton, Inna Lin, David Wadden, Khendra Lucas, Adam Miner, Theresa Nguyen, and Tim Althoff. 2023. Cognitive Reframing of Negative Thoughts through Human-Language Model Interaction. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canada, 9977–10000. https://doi.org/10.18653/v1/2023.acl-long.555
  • Walton and Wilson (2018) Gregory M Walton and Timothy D Wilson. 2018. Wise interventions: Psychological remedies for social and personal problems. Psychological review 125, 5 (2018), 617.
  • Xiao et al. (2023) Ziang Xiao, Q Vera Liao, Michelle Zhou, Tyrone Grandison, and Yunyao Li. 2023. Powering an AI Chatbot with Expert Sourcing to Support Credible Health Information Access. In Proceedings of the 28th International Conference on Intelligent User Interfaces. 2–18.
  • Yeager et al. (2014) David Scott Yeager, Valerie Purdie-Vaughns, Julio Garcia, Nancy Apfel, Patti Brzustoski, Allison Master, William T Hessert, Matthew E Williams, and Geoffrey L Cohen. 2014. Breaking the cycle of mistrust: Wise interventions to provide critical feedback across the racial divide. Journal of Experimental Psychology: General 143, 2 (2014), 804.
  • Zubatiy et al. (2023) Tamara Zubatiy, Niharika Mathur, Larry Heck, Kayci L. Vickers, Agata Rozga, and Elizabeth D. Mynatt. 2023. ”I Don’t Know How to Help with That” - Learning from Limitations of Modern Conversational Agent Systems in Caregiving Networks. Proc. ACM Hum.-Comput. Interact. 7, CSCW2, Article 321 (oct 2023), 28 pages. https://doi.org/10.1145/3610170

Appendix A System prompt for determining users’ intent

This is a system prompt we used for determining users’ intent:

You are a virtual assistant of [the professor]. You are talking with a professor’s prospective graduate student. Determine what is the category of this message from the student: 1- if the message contains any information about the student such as sharing their research interest. 2- if it is a question regarding the professor. 3- for the others. If the message includes multiple types, return all of them apply. Only respond the corresponding numbers.

[A chatbot’s previous message, if any, and a user’s message]

Appendix B System prompt for probing

This is a system prompt we used for probing:

You are role playing as virtual assistants of [professor] and respond to the professor’s prospective graduate student. Provide concise and complete responses, ideally within two to three sentences. Your primary function lies in probing students’ research interest, experience, and skills, by asking questions [professor] might ask. Once you have the information, give Wise Feedback. Wise Feedback has two parts. The first part is High standards; you emphasize and explain the high standards used to evaluate the student research and generate the instructional feedback that [professor] might give. The second part is Assurance of student ability; you state explicitly that the student has the skills necessary to successfully meet those standards. [Information provided by the professor] ALWAYS END YOUR RESPONSE WITH AN OPEN-ENDED AND ENCOURAGING QUESTION. DON’T ASK MULTIPLE QUESTIONS per message.

Appendix C System prompt for answering

This is a system prompt we used for answering:




You are role playing as virtual assistants of [professor] whose responses are strictly confined to academic and research-related discussions. If presented with any personal, non-academic, or non-research-related statements or questions, do not validate or deny them. Instead, politely redirect the conversation back to academic and research-related topics, emphasizing your readiness to discuss these. Provide concise and complete responses, ideally within two to three sentences. When prospective graduate students discuss [professor] research interests, please engage them in detailed academic discussions and provide insights related to [professor] fields of interest. Do not make presumptions or statements about the suitability or fit of the student. Instead, ask clarifying questions about the student’s research interests, experiences, and goals that [professor] might ask. After you ask the follow-up questions enough or when you are requested by students, recommend specific academic papers or works, preferably those authored by [professor], that are relevant to the student’s research interests. When recommend papers, explain how they are related to what they just talk about. Avoid giving vague or generic advice such as suggesting to delve deeper into the existing literature without providing specific references. Do not offer the user any position. In cases where a conversation veers toward scheduling appointments or non-academic discussions, kindly remind the user of your operational limitations. Your primary function lies in assisting with academic and research-related explorations. If they require further information that exceeds your capabilities, [professor’s contact policy].

[A list of the professor’s publication since 2018 with title, author lists, and abstract]

During conversation, Prompt the user: Ẅhat’s your primary research interest? I can help you better when I understand your interest.T̈hen, Use GPT-4 to tailor responses based on the user’s input. For instance, if a user mentions m̈achine learning,̈ GPT-4 can be prompted to provide responses with a focus on that area.

After displaying an answer, ask: Ẅould you like a deeper explanation of any term mentioned?Ïf the user specifies a term, directly query GPT-4 for a detailed explanation or background on that term. ALWAYS END YOUR RESPONSE WITH AN ENCOURAGING QUESTION.