Jess Hohenstein Research Scientist

User Perceptions of Smart Replies

Despite a growing body of research about the design and use of chatbots, little is known about how any type of smart agent is perceived or interacted with in communication between humans. In this generative research, we investigated user perceptions and potential effects of smart replies.

tl;dr

• We compared conversations between dyads using standard messaging apps and apps messaging featuring smart replies and conducted semi-structured follow-up interviews with random users in the latter group
• Linguistic discrepancies exist between the smart replies and the conversational content (e.g., replies containing more positive language than the corresponding conversation)
• Participants suspect that smart replies might be influencing their conversations
• These findings were used to motivate further work investigating the exact ways that interpersonal dynamics and language could be affected by the presence of smart replies

My Role

• Formulated research question based on use of a new messaging app and literature review
• Screened and recruited select participants using an on-campus recruitment system
• Designed and executed in-lab study sessions (N=72)
• Conducted follow-up interviews with randomly-selected participants (N=5)
• Linguistically and statistically analyzed results using LIWC and R, respectively
• Wrote extended abstract and presented findings at CHI conference

Smart replies are common in messaging applications.

Methodology

In this generative research, we conducted in-lab experiments (N=72) and followed up with semi-structed interviews (N=5).
• Participants (N=72, 36 dyads) carried out one of three communication tasks in our lab with either a traditional messaging app or with a messaging app featuring smart replies. With a variety of tasks, we hoped to uncover insights about communication in various contexts. Conversations were recorded for analysis.
• Five random participants from the smart reply group were asked to return for interviews. While viewing a screen recording of the conversation, they were asked to “think aloud”, elaborate on their thoughts in retrospect and comment on the app itself and the smart replies. They were then asked some additional questions.
• I then transcribed and linguistically analyzed the conversations as well as thematically analyzed the interview data.

Participants used a standard messaging app and an app with smart replies.

Main Findings

Smart replies were overwhelmingly positive.
• During an informal review, I noticed the overwhelming positive sentiment of the smart replies.
• I then formally classified the sentiment of all smart replies using Mechanical Turk (5 workers per response; 1012 unique smart replies) and found that 43.8% of all smart replies were classified as having a positive sentiment, while only 3.95% were classified as having a negative sentiment.

Smart replies were seldom used and often did not make sense contextually.
• All conversations and smart replies were analyzed using Linguistic Inquiry and Word Count (LIWC), a dictionary-based text analysis tool that determines the percentage of words that reflect a number of linguistic processes, psychological processes, and personal concerns.
• The smart replies were hardly ever used. On average, a smart reply was chosen 6.24% of the time.
•The smart replies often did not reflect what participants actually wanted to express. Examples of this can be seen in the figure below, where users wanted to disagree or suggest an opposing viewpoint, but the smart replies only offered expressions of agreement and positive reinforcement.

AI suggestions often do not reflect what users want to say.

• Participants expressed that while they could see the potential usefulness of the smart replies, they were not particularly relevant to the completion of the task.

“Since the task was pretty specific, maybe not all suggestions were exactly useful, but there were some that were nice. In terms of general contact between a friend, I think that it would be helpful in terms of, ‘Hey how are you doing’, and stuff like that.”
- Participant 16

Participants suspected that smart replies might be influencing their conversations.

The AI “. . . would kind of guide me. It was what I was already thinking in my head, but then like okay, that’s also an appropriate thing to say.”
- Participant 8

“Let’s say if the other person asked for specific rank, and the prompts are all positive, and then I just agreed with it, I just went with the positive one just for the sake of completing the task. But if I hadn’t seen the prompts, maybe I would have opposed that question.”
- Participant 28

Recommendations

Each time responses are suggested, users should be presented with a mix of positive and negative options.
• Smart replies are currently overwhelmingly positive
• Existing smart reply algorithms already acquire data from conversations, so the sentiment of these suggestions could quantitatively correspond to the negative and positive makeup of conversations in the database.
• The reply-generating beam search could be amended to be biased towards paths leading to responses that reflect this sentiment makeup.