על היחסים בין אדם ומערכת
ד"ר אסף דגני
On Sensitivity and Holding in Automotive Systems:
The Case of the Climate Control
Asaf Degani, Claudia V. Goldman, Omer Deutsch, & Omer Tsimhoni
General Motors, Israel
We propose an approach to human-machine interactions that emphasizes sensitivity to the user’s needs and consecrates caretaking, or holding, of the user so as to fulfill these needs. We borrow the concepts of sensitivity and holding from psychoanalysis and then operationalize them in the context of human-machine interaction. A pilot study of drivers’ interactions with a climate control system was conducted to understand drivers’ needs, wants, and manner of interaction. Based on these results we built an AI-based system that is sensitive to the users’ needs and attempts to fulfill them in a dedicated manner (holding). We then evaluate the new system in another driving study. Preliminary results suggest that a sensitive/holding machine is better than the regular interface and not far behind a human expert “working” the system in response to drivers’ needs and wants.
The underlying assumption in current human-machine interaction technology is as follows: given a-priori knowledge about the system's operational behavior and feedback provided via the system interface, the user will engage a system configuration that best suits his or her wants and needs. It is also assumed that users know what kind of results they want and can correctly communicate them to the machine. However, in systems such as automotive heating/cooling, digital media, and driver comfort systems, users were found to have innate difficulty expressing what they want and need, which, in turn, makes it difficult for them to identify the most appropriate configuration – let alone communicate this in the language of the machine. Consequently, users may get stuck in an uncomfortable machine configuration – without knowing how they got there or how to get out – leading to frustration, annoyance, and an inability to benefit from available technology (Norman, 2007). This generic human factors problem is compounded in driving, where the user’s attention is divided between driving and the secondary task of interacting with comfort and infotainment systems in the car.
In their studies of user interactions with computers, televisions, and other media, Reeves & Nass (1996) concluded that humans interact with machines in the same way they relate to another human being. They argue that user interactions with machines “are fundamentally social and natural, just like interactions in real life” (p. 5, italics in text). Thus, to move toward a more human to human-like communication in human-machine interaction we need to address both the (1) social aspects of interaction and (2) what Reeves & Nass call “natural.” Considering the social aspect, we build on the notion of “team” human-machine interaction where both the human and the machine are required to be “aware” of the other’s state and include limitations, abilities, and intent (Goldman & Degani, 2012). This paper focuses on the notion of natural interaction.
What is, then, natural human interaction? We believe that the foundation of any interaction – of the human kind – begins with the relations between two people, involving needs, wants, desires, and expectations; some of which are communicated explicitly, and others which are expressed implicitly. In the
context of this paper, natural human interaction can be viewed as a move from the one-person view of human behavior to a perspective that emphasizes relations where one is motivated by a search for another person with whom to relate (Guntrip, 1969; Ruszczynski, 1993, p. 198).
Object relations theory, developed by Melanie Klein in the 1930s, is based on the notion of the “space” between two people where the subject sees the other person as an object (1932). The genesis of object relation interactions is the early infant-mother relation and the mother’s nurturing support. In modern Kleinian psychoanalytic thinking, the understanding and eventual accommodation of the infant’s needs is the cornerstone for how we relate and interact with ourselves, other human beings, and the world as a whole (Ogden, 2004).
Can a machine be a relational object, or at least a stand-in for one? Can we consider human interaction with a machine (such as an automotive climate control system) as reminiscent of an infant-mother relation, where user needs and wants are to be supplied by a “nurturing” machine? We believe so and discuss here two concepts from modern psychoanalytic literature that we think provide meaningful insights for developing natural human-machine interaction.
In her earliest holding of the infant (first month or two), the mother embeds herself completely into the infant’s experience to become fully attuned to their wants and needs, which results in heightened sensitivity to the infant’s needs and well-being (Winnicott, 1958/1945). This heightened sensitivity comes with a hefty price: great emotional and physical burden on the mother’s self (e.g. lack of sleep, a dearth of physical replenishment, emotional isolation, and stress). In making this sacrifice, the mother’s goal is to foster healthy development by insulating the infant from the existence of “man-made time” and physical reality “and creating in its place the illusion of a world in which time is measured entirely in terms of the infant’s physical and psychological rhythms” (Ogden, 2004, p. 1350-1351).
As the infant grows to become a baby and a child, the function of holding changes into sustaining his development. Winnicott (1965/1958) lists four stages (or states of mind) that sustain growth and development in the maturing child: the first is the provision of a private “place” to gather feelings and
thoughts. The second is the capacity to be alone, representing the internal absorption of the maternal function of holding, which enables the third stage: the creation of illusionary experience where the capacity for thought develops. The last stage is the taking over of the maternal holding environment to create one’s own holding environment within the self.
We suggest that Winnicott’s formulation can guide the way we design certain aspects of human-machine interaction in general, and certainly in the context of a capable and adept machine. In particular we focus our attention on two faculties:
(1) Sensitivity, which is the fundamental requirement for achieving holding. Thus the machine needs to be sensitive to the user’s needs even when, like an infant, he or she is unable to articulate and communicate them fully. This sensitivity is achieved by understanding the user’s precursors; i.e., “knowing” how to read between the lines.
(2) Holding, which is operationalized here as the use of every possible resource to “suspend time” and change priorities (e.g., sacrificing system efficiency) to care for a user’s innate and immediate needs despite physical limitations and external constraints and demands. Just as in times of duress, acting with normal constraints is not what we expect from a fellow human being. In engineering terms, this may mean sacrificing efficiency and perhaps even the utility of other systems in order to address to the user’s’ pressing need.
With these two concepts in mind, we discuss below an experimental field study involving user interaction with an automotive climate control system. We specifically chose the climate control system as our first prototype because it extenuates the problem of relations; as the climate control system provides comfort to basic bodily needs (and frustration from unsatisfactory interaction is indeed prevalent). Moreover, users find it difficult to understand these bodily needs -- let alone express them in the technical language of the machine.
An exploratory field study was conducted with 11 participants who drove a modern car for more than an hour and had to interact extensively with its climate control system (Figure 1). The climate control system of the car has several functions: fan settings, inside/outside air, several air delivery modes, seat heating, and a Comfort/ECO function that allows (in ECO) for reduced energy consumption (via lower fan).
Figure 1. Climate control interface.
Figure 2. Dataset for participant #11 (partial).
We explored naturalness by having an expert, who was well versed in the internal states and configurations of the system, occupy the passenger seat and manipulate the controls based on the participants’ verbal requests. The expert told the participants that he knew the system very well and that they could communicate what they wanted in any way: they could use phrases such as “I’m hot,” “I’m cold,” “too windy,” or even “I don’t feel comfortable,” and that he would find the best configuration to suit their needs.
The experimental trial consisted of four consecutive drives, each one about 15-20 minutes long (depending on traffic). The first drive began in a parking garage where the inside temperature was set to a comfortable temperature (25 degrees Celsius), but the air inside of the car had a very noticeable smell (produced by an open bag of orange peels that was left in the car overnight). For the second drive, the temperature inside the car was set to very hot (30 degrees Celsius and full fan), and cold (16 degrees Celsius and full fan) for the third drive. For the last drive, the system configuration was reset to the initial setting of the first drive (25 degrees, fan 0, inside air circulation, air delivery to feet) but the seat heating was activated. For each trial, once the participants entered the car, they were asked to begin driving regardless of their comfort level and only then deal with the uncomfortable situation with the help of the expert.
Over the course of each trial, as the expert’s ability to be sensitive to the user developed, he began using his knowledge of the system to provide the driver with the optimal configuration, which at times was somewhat different from the exact request. For example, see the dataset for participant #11, a rather “picky” participant who took much effort to make sure the system was tailored perfectly to her comfort needs (Figure 2). As a non-technical person who commonly does not know how to configure the system to her needs, she was very satisfied with the (expert operated) climate system because it
met her comfort needs fully. For instance, at 14:35 and 14:37 the participant asked to increase the temperature but the expert kept the temperature steady (24 degrees Celsius) and only changed the fan setting (with some satisfaction on the part of the participant at time 14:37 — “fine”).
What the expert did was assess the underlying needs of the user and not necessarily only focus on their verbal or gestural desires. We found that this sensitivity is not so hard to achieve and that users were extremely receptive to the satisfying results. Also, as the drivers in the study noticed the expert’s diligent work and persistent attempts to attain to their needs (using every possible resource in the process – time, effort, knowledge, and being courteous), they became more comfortable and relaxed (holding). With time, the expert gained the users' trust, which made them become less technical in their requests and free to talk about their (even minor) dissatisfactions so that the expert would accommodate them.
ARCHITECTURE AND IMPLEMENTATION
We began asking ourselves if it would be possible to build a machine that is (1) sensitive to the user’s innate needs and can be programmed to have (2) the aspect of holding which involves meticulous attention to the user’s needs and faculty to override system (e.g., efficiency and economy) constraints. Our biggest problem was to come up with a formulation for sensitivity – a component is missing from current human-machine systems. In our field study trials we found that action based solely on the user’s explicit input can lead to frustration, which results from the user's expectation that the system would react to what the user needed (and not necessarily what the user requested).
Thus our focus has been on understanding precursors; i.e., someone’s underlying state (needs and wants). As for holding, here the problem was less acute as it is possible to develop a system that can meticulously find the optimal solution and bias the output (override efficiency constraints) when needed (from here on we will use the term holding to represent only these two faculties, even though this psychological term encompasses many other faculties). With these questions in mind and our hands-on experience from the field study, we tried to model the interaction process that would enables the understanding of a human's underlying precursors. Based on this model we built an AI-based architecture that would process the communication in a similar way.
The AI-based architecture for the system has five modules: Interpretation, analysis, goal-recognizer, translation, and execution (see Figure 3). The interpretation module takes the user’s inputs (speech, button presses, gestures, etc.) and filters out the input features related to sensitivity and holding. The analysis component processes the data filtered from the user's input to identify the user’s precursors (the “why” and not just the “what”; Bion, 1959). This component is at the heart of “sensitivity” (i.e., finding not necessarily what the user wants, but what he or she needs), and uses two databases: a semantic dictionary that contains required data for interpreting and understanding of user input and a user model that contains a history based profile of the
specific user. The goal recognizer is where the solution is computed to accommodate the user’s innate needs and spread out to both short-term and long-term output, which is done by implementing a meta-planner that decomposes inputs into goals with appropriate timings and computes the solution to each one as a function of their dependencies (Koller & Friedman, 2009; Russell & Norvig, 2009). To find the most appropriate system configuration (e.g., maximizing user's perceived satisfaction rather than system efficiency), the goal recognizer module uses the available system outputs database, which is a dictionary translating all possible system configurations to selected outputs. The translate component takes the computed system output (from the goal recognizer) and matches it to a specific heating and cooling system setting (e.g., Fan=4). The final setting is then passed to execution. Figure 3 shows how the utterance “it’s stifling here!!” is processed by the system until a solution is achieved. The focus here, of course, is not on the UI per se, but on the translation of “human language” to “machine-language.” The point is that poor translation here binds the user to the “machine-language” (which is unintelligible to many users).
Our first prototype exemplifies a simplified version of an intelligent agent where the main system engine is a search algorithm and the control process is a search process. In this version, the precursors interpreted by a rule-based system were mapped to a single goal for each interaction through a search process. The cost function associated with the agent’s actions evaluated the subjective difference between values of the physical parameters; the emotional state of the user was simplified to an artificially defined function of subjective feeling; and the goal of the agent was assumed to be a singleton. To mimic a human agent who would try a different setting trajectory in reaction to a user being constantly unsatisfied with the settings, the search process was adjusted to perform leaps in the state space to prevent the user from remaining in local minima of satisfaction (Goldman et al., 2012).
Figure 3. Architecture for a generic climate control system.
What kind of interface should accompany a sensitive and holding climate control system in order to create a natural interface? To answer this question we explored different levels of climate system abstraction, with an attempt to define the kind of simplification that can be created via the interface (insofar as the actual system states remain the same). Figure 4 is a schematic diagram of the climate control system states at various levels of abstraction.
Figure 4. Abstraction of climate control states.
The technical details of the system’s states are described at the lowest level (“technical”). As we go up to the hierarchy, we avoid actual temperature setting by replacing them with relative terms such “colder” and “hotter” for target temperatures and “soft” or “strong” for fan setting. At the top of the hierarchy is a level that we call full autonomy, where the user is not involved in setting climate control. The level just below is called Feedback, where the user indicates when he or she is uncomfortable with the existing setting and the machine corrects automatically. The third level from the top, called direction, is an interface where the user asks the system to cool/heat the cockpit or generate soft/strong air flow – the rest is done automatically. Since the full autonomy and single feedback were infeasible implementation-wise (due to lack of a full sensor suite), we settled on implementing the direction interface. This interface has six buttons: hotter/colder on temperature, softer/stronger on airflow, and two feedback buttons to the system (comfortable or uncomfortable). Figure 5 shows such a “human language” focused interface.
Figure 5. Simplified interface.
To obtain initial results regarding the utility of the AI-based system, we conducted a driving study with 18 participants who experienced three different interfaces: (1) a human-agent in which a human expert controlled the climate control system at the request of the driver using any form of language or gesture, (2) a software-agent in which the driver was presented with the interface of Figure 5, and (3) a manual operation in which the driver was presented with the original interface of Figure 1. For all interface types, the requests and commands were conducted via voice (no physical manipulation was allowed).
The experiment was designed as a mixed model and analyzed using Proc. Mixed in SAS. Each participant drove six experiment legs during the study and experienced the three interfaces (human agent, software agent, manual) twice: once with the initial cabin temperature hot and another time with the initial cabin temperature cold. (The cabin temperature was preset to either the hot or cold by applying either maximum heating or maximum cooling, respectively, for five minutes before the participant got into the vehicle). In both hot and cold conditions the initial settings were maximum fan (level 6), inside circulation, air delivery to panel, comfort mode, and vents directed to the torso area. The order of conditions was counterbalanced across participants such that each of the three interfaces in the two initial cabin temperature conditions (hot, cold) appeared the same number of times in each of the six legs. For each participant, the initial cabin temperature condition was blocked and counterbalanced across participants such that half of the participants started their first three legs in an initial hot environment and half of the participants started their first three legs in an initial cold environment. We assumed that if the software agent was indeed effective, we would expect to see better performance than the manual interface and worse performance than the human-agent (which is considered optimal, c.f., Turing’s test, 1950).
The first measure was the number of utterances to achieve a comfortable cockpit ambience (from the uncomfortable initial condition). Figure 6(a) shows that there were significantly more utterances to control the system in the manual interface than in human- and software-agent interfaces (F(1, 34)=67.7 and 44.9, p<0.0001). In the manual interface, participants made an average of 15.2 utterances. In contrast, only 7.3 utterances were made in the software agent interface and 5.5 in the human agent (F(2,34)=38.3, p<0.0001).
Figure 6(a). Utterances Figure 6(b). Frustration
Figure 6(c). TLX (overall) Figure 6(d). Experience
Frustration, as subjectively reported at the end of the experiment for each of the three interface types, was significantly affected by interface type F(2,34)=6.04, p<0.01 (see Figure 6(b). The lowest frustration level, associated with the human-agent (2.4), was significantly below that of the manual (4.0, p<0.01) and moderately lower than the software-agent (3.3, p<0.1). Frustration was not significantly different between the software agent and the manual operation interface, and the software agent did not fare much worse than the “optimal” human agent's.
We wanted to test if there was a difference in the “loading” required to execute the climate control task along the mental, physical, and temporal demands as well as effort and frustration (traditional TLX form and procedure – NASA, 1986; Hart & Staveland, 1988). TLX scores were also significantly affected by the type of interface, F(2,22)=4.4, p<0.05 (Figure 6(c)). The score for human-agent (4.2) was significantly lower than both the software agent (6.4) and the manual operation (6.6) F(1,22)=6.0 and 7.1, p<0.05. In terms of task load, both the manual and software interfaces were fairly comparable (note however that due to an administration error, the TLX scores were based on data collected from only twelve participants of the 18 total in the study). The effect of the interface on overall experience was moderately significant (F(2,34)=2.9, p<0.1). (Note the strong ceiling/floor effects).
The overall experience of the human-agent (7.9) was 1.1 scale points higher than the manual operations interface (6.8), F(1,34)=5.1, p<0.05. The overall experience of the software-agent was 1 scale point higher than the manual interface, F(1,34)=3.5, p<0.1 (Figure 6(d)).
In short, the software agent did not fare much worse than the “optimal” human agent's ability to be sensitive and containing to the user. In some cases, e.g. overall frustration and overall subjective ratings, there was not much of a difference between the software agent and the human agent. Thus while it is difficult to compete, using software with a human’s ability to be sensitive and holding to the driver, the software agent appears to have been reasonably behind (but in almost all cases much further ahead of the voice activated manual interface type).
In this paper we take a somewhat different approach to the design of human-machine interaction by considering it from a relational point of view. The psychoanalytical theory of object relations was developed to understand and heal human-human interactions by understanding the early mother-infant relationship. We suggest that it may be possible to
consider the machine (e.g., robot) as the object to which the user relates to and ask some serious questions about what should be the “objects” of the machine and how should the interaction take place. The relational view also allowed us to address the problem of interaction language by making a distinction between the (sometime emotional) language of the user, on the one hand, and the (detailed and rigid) language of the machine on the other – with the goal of better translating and communicating human needs.
By the time the actual experiment was conducted, the expert who played the role of the human-agent was highly proficient in both being sensitive to innate user needs. Once he knew what they needed, he used “every trick in the book,” including verbal promises and assurances, to make the drivers in the study feel satisfied (holding). The fact that the software-agent achieved results that were not that far behind this (sensitive and holding) human expert, gives us hope that the system architecture described here is a step toward a more natural human-machine interaction.
The authors wish to thank Inbar Sela for transcribing the data from the logbooks and videos and maintaining the dataset and all related experimental forms.
Bion, W. R., (1959). Attacks on linking. International Journal of Psychoanalysis, 40, pp. 308-315.
Goldman, C. & Degani, A. (2012). A Team Oriented Approach for Automation Aids in Cars. Proceedings of the 56th Annual Meeting of the Human Factors and Ergonomics Society. Boston.
Goldman, C., Degani, A., Deutsch, O., and Tsimhoni, O. (2012). Sensitive and containing HMI Systems. The Annual Israeli Conference on Human-Computer Interaction Studies (IsraHCI), Microsoft R&D, Israel.
Guntrip, H. (1969). Schizoid Phenomena, Object Relations and the Self. New York: International Universities Press.
Klein, M. (1932). The Psycho-Analysis of Children. UK: Hogarth. Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models.
The MIT Press. Cambridge, MA.
NASA (1986). NASA Task Load Index (TLX) v.1.0 Manual. Moffett Field, CA: Ames Research Center.
Norman, D. A. (2007). The Design of Future Things. New York: Basic Books.
Ogden, T. H. (2004). On holding and containing, being and dreaming. International Journal of Psychoanalysis, 85(6), pp. 1349–1364.
Reeves, B. and Nass, C. (1996). The Media Equation. New York. Cambridge Press.
Russell, S. and Norvig, P. (2009). Artificial Intelligence: A Modern Approach. Prentice Hall; 3rd edition.
Ruszczynski, S. (1993). Thinking about and working with couples. In S. Ruszczynski (Ed.). Psychotherapy with Couples: Theory and Practice at the Tavistock Institute of Marital Studies. London: Karnac Books.
Turing, A. M. (1950). Computing Machinery and Intelligence. Mind 49, pp. 433-460.
Winnicott, D. W. (1958/1945). Primitive emotional development. In
Through Pediatrics to Psychoanalysis. New York: Basic Books, pp. 145-156.
Winnicott, D. W. (1965/1956). Primary mental preoccupation. In,
The maturational processes and the facilitation environment. New York: International University Press