Multimodal deixis in media discourse: Film vs TV interview narratives

This research was supported by the Ministry for Education in Russia, project no. 075–03-2020–013/3 “Multimodal analysis of communicative behavior in different types of spoken discourse” and was carried out at the Centre for Socio-Cognitive Discourse Studies at Moscow State Linguistic University.

Problem statement

In this study, we regard media as a community formed by a set of institutions, cultural practices, industries and ways in which people interact with the society [Masterman 1997; Couldry 2002] which appear in multimodal discourse formats with film and television being the most popular. Contrastive studies across multimodal media discourse formats commonly explore the differences in their semantic, pragmatic, cognitive, semiotic, functional and social structure. These studies allow to model the discourse as a social event and to identify the contrastive discursive role of speech, image, sound as semiotic modalities [Dobrosklonskaya 2016; Djonov, Zhao 2013; Machin 2013]. Meanwhile, the input of communicative modalities like speech and spontaneous gesture into film and television formats of media discourse is often neglected, although their semiotic nature in cinematic discourse has been extensively discussed in literary criticism (see, for instance: [Eisenstein 1964–1968; Smith 2010; Bugaeva 2016]). Still, multimodal behavior may manifest specificity in different discourse formats and may even serve as a predictor of a media discourse format due to its staged performance character. In this study, we address two media discourse formats, film and TV interview, in the discourse genre of a narrative common for both formats, to contrast the multimodal behavior of a narrator as potentially dependent on the discourse format.

To proceed, the study develops the method which will allow to contrast the narrator’s multimodal behavior in two media formats. The difficulty lies in the fact that whereas there is increasing research in multimodal behavior in single media discourse genres [Iriskhanova 2019; Lavender 2021], the discursive strategies and discursive functions (with their discursive markers) which serve to explore the contingent gesture behavior in one discourse format may not appear in a different discourse format; therefore, they cannot be contrasted. Additionally, gestures are invariably multifunctional [Cienki, Mittelberg 2013; Iriskhanova, Cienki 2018] and consequently different gesture types can appear as contingent on the same discursive strategies and functions.

This accounts for the need to search for new methods of contrasting multimodal communicative behavior of discourse participants in film and television narratives. In the study, we address the multimodal behavior in media discourse formats as stimulated by the discourse categories which shape any discourse type, with the categories of TIME and SPACE being most influential [Kubryakova, Aleksandrova 1997] often integrally addressed as DISCOURSE SPACE [Plotnikova 2011] and explored in film [Pronin 2018]. Notably, DISCOURSE SPACE construal in film and television narratives manifests significant differences since each of the two “specific mediums create particular storytelling parameters, constraining some options while enabling others <…> regarding plot structures and viewer engagement” [Mittell 2007: 156]. In the multimodal narrative, these discourse categories are commonly represented via multimodal deixis, expressed in deictic gestures synchronized (used together) with speech markers relating to either time or space or both. Although multimodal deixis research has gained popularity in discourse studies [Kita, Özyürek 2003; Pfeiffer 2010; Le Guen 2011; Mesh et al. 2021; Iriskhanova et al. 2022], it is still uncommon in contrastive analysis.

This paper proposes a multimodal method of revealing the discursive schemata of two media discourse formats, film and television featuring one discourse genre of narratives. Central to this paper is how the narrators’ preference for DISCOURSE SPACE construal in a specific discourse format affects the choice of speech and gesture discursive schemata, or the patterns of discourse components construal. Our study is performed with highly ranked actors (five males and five females) who produce the narratives in two media formats, film and television interview. In these narratives, multiple examples of gesture use are documented, with deictic gestures being one of the four functional gesture types [Cienki, Mittelberg 2013] contingent on DISCOURSE SPACE construal in speech [Le Guen 2011]. Presumably, DISCOURSE SPACE construal manifested via multimodal deixis in both speech and spontaneous gesture will serve to identify the differences in the multimodal behavior of actors performing narratives in two media discourse formats, film and television. We begin the paper by reviewing the theoretical and empirical background to research on two media formats as hypothetically manifesting different discursive schemata of DISCOURSE SPACE construal via multimodal deixis; and then introduce the method, design, and results of the contrastive study of the narrators’ multimodal behavior in DISCOURSE SPACE construal in film and TV narratives.

Theoretical framework

Since the study contrasts multimodal deixis in speech and spontaneous gesture in the narratives in two media formats, film and TV interview, we address three research areas. First, we explore the media discourse formats of film and TV interview to reveal the specifics of DISCOURSE SPACE construal in narratives via their “plot structures and viewer engagement” [Mittell 2007: 156]. Next, we consider the multimodal deictic specificity of DISCOURSE SPACE construal in gestures and speech. Finally, we explore the coordination patterns of deictic gestures and speech suggesting that there may be a research algorithm for representing multimodal deixis via discursive schemata in the contrastive perspective.

DISCOURSE SPACE in film and television narratives

TIME and SPACE are generally considered as two major categories of discourse construal. While their relations in discourse may be complex with the category of SPACE representing physical, communicative, genre relations, and TIME representing objective and subjective conceptualizations, following E. Kubryakova and O. Aleksandrova we consider discourse as “a spatial representation of time filled with articulation of speech production or being filled with its creation” [Kubryakova, Aleksandrova 1997: 23]; therefore, the discourse category of TIME is conceptualized as integrated into a more complex category of SPACE, or rather DISCOURSE SPACE. Developing the idea of DISCOURSE SPACE, S. Plotnikova defines it as “a logically organized medium incorporating discourses and discourse participants or the people producing discourses. Logical organization of a medium presupposes its abstract extension which is the continuum involving time as one of its coordinates” [Plotnikova 2011: 154]. The idea of discourse as a DISCOURSE SPACE agrees with A. Kibrik’s notion of discourse as “both a process developing in time and a structured object” [Kibrik 2009: 4]. DISCOURSE SPACE construal in film and television narratives manifests significant differences due to the fact that these media discourse formats represent a) different “plot structures” and b) different perspective or “viewer engagement” structures [Mittell 2007: 156].

Films as specific discourse formats have only recently become a focal point in discourse studies; however, they have been extendedly explored in narratology, aesthetic semiotics and literary criticism. These approaches commonly appeal to two semiotic components of films which are the actor’s performance and the film director’s performance. In narratology, films appear as manifesting various narrative stages, categories and signs [Chare, Watkins 2017], whereas aesthetic semiotics mostly addresses the poetics of cinema as a system of signs and means of aesthetically significant information [Deleuze 1986; Deleuze 1989; Auerbach 2007]. In critical studies, the major focus is the discourse categories like time, space, perspective, emotivity, identity, etc. [Smith 2010; Bugaeva 2016]. In recent studies of film discourse the main focus is shifted towards the discursive strategies and discourse categories in single cinematic subdiscourses [Zykova 2021] or in discourse types (narrative, descriptive, expository, and argumentative [Longacre 1996]). Since we address DISCOURSE SPACE construal in film narratives, we specify its plot structure and viewer engagement structure which appear in this media discourse format. Considering the film narrative plot structure explored in both semiotic and discourse studies, we can specify its major characteristics: 1) varying significance of actor’s performance (often termed as “gesture”) and the film director’s performance (termed as “image”) in contributing to the narrative plot [Noys 2014; Harbord 2015; Agamben 2019]; 2) enhanced embodiment role appearing in accentuated gesturing and mimics [Kristeva 2014]; 3) the effects of cultural and aesthetic experience and insight (in both actor’s and film director’s performance) [Eisenstein 1964–1968, vol. 3] termed as “stadium” and “punctum” in R. Barthes study [Barthes 1981]; 4) sensory experience foregrounding [Mulvey 1988]; 5) presenting fictive events with an emphasis on emotionality, imagery, pragmaticity and logics [Ivanov 1976; Bugaeva 2016].

In terms of the viewer engagement structure characteristics, the studies name the following ones: 1) multimodal alignment of speech and exaggerated gesturing of the type “look into the camera” [Ciccognani 2018]; 2) viewer engagement gesture use which stimulates discourse dynamicity [Iriskhanova 2018]; 3) the use of shot freezeframe and shot repetition to appeal to the viewer’s attention [Mulvey 2006].

TV interviews [Ilie 1999] may be categorized as live interviews, which are usually face-to-face interaction between people; phone interviews, the events where the interlocutors do not see each other; recorded interviews which can be both face-to-face or done via phone, but they are not real time interviews and are conducted beforehand. TV interviews are a part of journalistic interviews which are based on the information acquisition (in comparison to social studies where they are viewed as means of polling). A TV interview is a type of TV discourse which occurs in the situation of interpersonal communication, conditioned by mass communication and institutional status of the interviewee [Yashina 2007]. Since we address DISCOURSE SPACE construal in narratives which constitute a part of face-to-face TV interviews, we specify the plot structure and viewer engagement structure which appear in this media discourse format. Considering the interview plot structure explored in discursive studies, we can specify its characteristics: 1) distinct roles of interview participants [Milroy 1987; Wolfson 1976]; 2) hybrid discourse types which appear in the plot implementation (expository, rhetorical, and echo) [Ilie 1999]; 3) two distinct communication regimes which are sharing and obtaining information [Scherbatyh 2016]; 4) a combination of staged character and spontaneity [Yashina 2007].

Concerning the viewer engagement structure characteristics, the studies name the following ones: 1) high dynamicity in answering and resisting interviewee’s questions [Clayman, Heritage 2002]; 2) a significant power disbalance in favor of the interviewers, as they can choose topics and formulate questions, whereas the interviewees have to follow those questions and are obliged to answer them [Milroy 1987]; 3) a variety of cooperative strategies applied during the interaction which are commonly studied following Grice’s cooperative principles [Molenaar, Smit 1996]; 4) multimodal alignment traced in both verbal and nonverbal behavior and explored via contextualization cues which allow to interpret the interactional moves and semantic content [Gumperz 1982].

Overall, DISCOURSE SPACE construal in two media formats manifests several common characteristics; still, there are differences in both plot structure and viewer engagement structure which appear in speech and gesture use. To explore them, we address the ways multimodal deixis representing DISCOURSE SPACE can be expressed in speech and gesture.

Multimodal deixis in gesture and speech

In the recent studies gestures have been regarded not solely as random motions but as a fully functioning level of communication in discourse. They can be regarded as signs that people give to their interlocutors and themselves while producing speech [Bavelas et al. 2008; Chu, Kita 2011; Cienki 2017]. DISCOURSE SPACE construal is commonly explored via deictic semantics [Levinson 2003; Majid et al. 2004; Shusterman, Li 2016] which has developed the research methods applicable to both gesture and speech; therefore, they allow to study multimodal deixis in both media discourse formats. Following Levinson (2003), four major deictic coordinates which can stimulate the coordination of speech and gesture, are identified; they are 1) closer to the speaker, 2) farther from the speaker, 3) pointing to the discourse space of communication, 4) pointing to the discourse space not present in the discourse space of communication. Additionally, there are studies addressing the problem of speech deixis [Paducheva 2008; Borisova, Obchinnikova 2011; Apresjan 2014] which identify the discursive markers of deictic meanings and which may be consequently exploited to study the speech and gesture alignment via the semantics of discursive markers contingent on gesture types. There are also studies which address the problem of deixis in gestures, most commonly in deictic gestures as contingent on discursive schemata or the patterns of discourse components construal [Clark 2003; Enfield et al. 2007; Le Guen 2011; Cooperrider 2017].

These three approaches allow to address different research questions with the foci on deictic coordinates variation across languages, discourses and cultures in the first approach, on gesture variation as dependent on discursive markers in the second approach, and speech and gesture alignment as maintained by discursive schemata. Since in this study we explore DISCOURSE SPACE construal in two media discourse formats which perform a) different “plot structures” and b) different perspective or “viewer engagement” structures via discursive schemata, we adopt the third approach and will explore the speech and gesture alignment in deictic gestures as coordinated (synchronized) with discursive schemata in the contrastive analysis of film and TV narratives plot structures and viewer engagement structures.

The discourse narratives in film and TV interview narratives, therefore, manifest two types of discursive schemata, with their first type revealing the plot contents, and the other revealing the viewer engagement perspective. Since the plot contents may be explored via the schemata representing the discourse types, in this study we address the discursive schemata of argumentation and description which can be contrasted in both film and TV interview narratives. Regarding argumentative discourse, the studies account for its potential in expressing opinions and beliefs [Amossy 2009], subjectivity and intersubjectivity [von Stutterheim, Klein 1989], as well as argumentation schemata Example, Cause to Effect and Effect to Cause, Practical Reasoning, Inconsistency [Cabrio et al. 2013]. The studies of descriptive discourse commonly address discourse themes [Merlo, Mansur 2004], discourse components and descriptive event types [Longacre 1996]. These studies allow to identify the discursive schemata which could be contrasted as contingent on deictic gestures in film and TV interview narratives.

Addressing viewer engagement structures, we appeal to the discursive schemata manifesting the rhetorical structure of discourse. Following the seminal paper of Mann and Thompson (1988) revealing these schemata and the study of Kibrik and Podlesskaya (2009) who integrate them into multimodal (speech and gesture as well as in other modalities) research, we consider these discursive schemata as modulating both film and TV interview narratives. These include Emphasizing opinion or assessment, Self-correcting, Specification, Generalization, Initializing communication, Chain of arguments and other schemata.

Coordination patterns of deictic gestures and deictic discursive schemata in speech

Deictic gestures which are examined in the current paper can fall under the Pierce’s notion of indexes, dynamic signs which form a connection between the speaker (origo) and the target of the vector created by such gestures [Clark 2003]. These gestures are a part of a deictic field that is formed around the interlocutors, thus hereby we will call such gestures deictic. Deictic gestures vary in forms, palm orientation, movement and the position that hands have in space. These properties of deictic gestures might depend on culture [Enfield 2001; Kita 2009] and information that is being conveyed [Enfield et al. 2007]. Deictic gestures can be divided into two categories according to the precision, type of the referent, form and movement of the hand. The first type, pointing gestures are regarded as the standard representative of deictic gestures. They are vectors, created in space by some body part, a line, that connects the speaker and the referent [Clark 2003; Kita 2003]. Their various forms are also context dependent. Index figure pointing is used in situations when some clarification is needed while hand and thumb pointing occur when little precision is needed [Wilkins 2003]. Another factor which might influence the form is the distance between the gesturer and the referent: further located objects are addressed with the whole hand whereas closer objects might require precision in order to be distinguished from the surroundings and the speaker might use an index finger. The same choice of a pointing gesture may be true for describing situations and events, depending on their proximity [Cochet, Vauclair 2014]. There are several typologies of pointing gestures, depending on the referent: they can be abstract and concrete [McNeil 1992] or entity- / place- / actionreferring [Cooperrider 2017]. In addition, pointing gestures might perform three functions. The first is imperative pointing which expresses a request for an object or an action. The second is declarative expressive pointing which is used to share interest in an object/ event with the interlocutor, and the last declarative informative pointing which is used to provide information for the other party [Camaioni 1997; Tomasello et al. 2007; Cochet, Vauclair 2014]. However, pointing is not the only type of deictic gestures. Another type, touching or also termed placing (see: [Clark 2003]), is also a deictic gesture which can have various forms and occur in various contexts determining its meaning. Whereas the pointing gesture creates a vector in space, the touching gesture locates an object/notion in space [Clark 2003].

Still, pointing and touching gestures relate to the functional gesture use and do not directly relate to their discursive meanings. This explains the need to identify the discursive meanings of deictic gestures explicit via speech as manifested by their functional types. In the study performed by Le Guen (2011), three discursive schemata of deictic gesture meaning are presented. The first schema is Object reference type which can be direct, metonymic, and metaphoric. Direct pointing to actual places is manifested when the arm is oriented according to the accurate angle with respect to the target (i. e., the actual place occupied by the entity in the world) and when the target is available to speakers even in case it is not visible. Metonymic pointing is found when a speaker points, for instance, to an empty chair someone has just left to speak about this person. In Metaphorical pointing (or abstract pointing by McNeill (1992)), the target is metaphorically related to a person or place. The second discursive schema is Move type which comprises Path indication and Placement indication. In Path indication the speaker’s extended arm indicates the direction of the target with a focus on manner or direction of move. In Placement indication the target is the focal attention point which will be accentuated in speech. The third schema is Frame of reference type [Levinson 2003] which can refer to either Speech event or Narrated event. Narrated event is observed when the objects or participants are performing some actions which do not correspond to the speech event, for instance if the speaker is talking about a target located in a distant place or when the speaker describes the event which follows the event in speech but cannot occur in the existing setting [Fludernik 1996].

Presumably, while exploring the coordination of discursive schemata in speech and the discursive schemata manifested in deictic gestures, we will identify the specific patterns of multimodal deixis in film and TV interview narratives and contrast these patterns to reveal the specifics of two media formats.

Methods and procedure

In this section, we present the research data and the method used to explore the multimodal deixis, which is the multimodal analysis of discursive schemata coordination patterns in speech and deictic gesture occurring in film and TV interview narratives.

The research data are 10 film narratives of 10 highly ranked actors of 1960–1980s (5 actresses and 5 actors). These are A. Freindlikh (“Stalker”, 1979), L. Akhedzhakova (“Garage”, 1979), L. Gurchenko (“Love and Doves”, 1984), N. Mordyukova (“Relatives”, 1981), V. Alentova (“Time for thought”, 1982), S. Bondarchuk (“Man’s fate”, 1959), R. Bykov (“The Doorbell Rings, Open the Door”, 1961), V. Tikhonov (“Let Us Live Till Monday”, 1968), Yu. Nikulin (from “They Fought for the Country”, 1975), A. Batalov (“Moscow Does not Believe in Tears”, 1980); and 10 TV interview narratives presented by the same actors and actresses where the interviews with V. Bykov, Yu. Nikulin, and V. Tikhonov were taken in the studio, the interview with S. Bondarchuk was taken in his study, and the interview with A. Batalov was taken on the river embankment. As we have shown, to explore multimodal deixis, we will identify the co-speech synchronization of two deictic gesture types, pointing and touching. In Fig. 1, there are two examples manifesting the use of these gestures in film and TV interview.

In Fig. 1 a, R. Bykov performing in “The doorbell rings, open the door” is using a touching deictic gesture which is synchronized in speech with /я оши­ба­юсь/ (“I am mistaken”); the referent of deixis is the speaker himself. In Figure 1b, the same actor performing in the TV interview is using a pointing gesture synchronized in speech with /это была пер­вая бар­мен­ша бара/ (“that was the best barwoman of the bar”); the referent of deixis is a woman who is not a participant of the speech event.


Fig. 1. Deictic gestures in media narratives


In Fig. 2, we present the methodological framework developed for the study basing on the works exploring a) DISCOURSE SPACE construal in film and television interview narratives, b) multimodal deixis in gesture and speech, and c) coordination patterns of deictic gestures and deictic discursive schemata in speech presented in the previous section.


Fig. 2. Discursive schemata of multimodal DISCOURSE SPACE construal in media narratives


Hence, in the study we explore three types of deictic gesture schemata — Object reference, Move, and Frame of reference, as contingent on the speech schemata — Plot contents, and Viewer engagement, in two media formats, film and TY interview.

In Table 1, we present the taxonomy of discursive schemata employed for studying speech behavior in film and IT interview narratives. Since the TV interviews employ two discourse types, argumentation and description, we give the discursive schemata for each of them separately.

Table 1. Discursive schemata of plot contents and viewer engagement in speech

Discursive schemata revealing plot contentsDiscursive schemata revealing plot contents
(Emotional) assessmentProcess
Stating reasons, consequences, conditionsState
ContrastAccentuated subject
AccusationAccentuated object
Agreement/disagreementAccentuated action or state
Appeal to actionAccentuated characteristics
PromiseAccentuated time
ThreatAccentuated place
Appeal to power
Discursive schemata revealing viewer engagementDiscursive schemata revealing viewer engagement
Emphasizing opinion or assessmentEmphasizing discourse component
IntersubjectivityChain of events
Appeal to attentionNew event
Rhetorical communicationAppeal to attention
Initializing communicationSelf-quote
Chain of argumentsQuoting others
Quoting others


To identify which discursive schema/schemata the actor is using in speech, we address the discursive markers appearing in every clause in speech.

In Table 2, we present the taxonomy of discursive schemata employed for studying deictic gesture behavior in film and IT interview narratives.

Table 2. Discursive schemata of Object reference, Move and Frame of reference in deictic gesture

Object referenceMoveFrame of reference
Discursive schemataDiscursive schemataDiscursive schemata
Direct pointingPath indicationSpeech event
Metonymic pointingPlacement indicationNarrated event
Metaphoric pointing


To identify which discursive scheme/schemata the actor is using in gesture, we address the two deictic gesture types, pointing and touching, manifesting Object reference, Move and Frame of reference [Le Guen 2011]; the schemata were defined as modulated by the contents of speech.

Data analysis

In this section, we present the analysis algorithm of speech and gesture schemata identification in the recorded data and also describe the research steps to explore multimodal deixis in film and TV interview.

To identify which discursive schema/schemata the actor is using in speech, we address the discursive markers appearing in every clause in speech.

For instance, in Argumentation plot contents Opinion schema appears in /это чуть ли не един­ствен­ный слу­чай в миро­вой исто­рии/ (“this might be the only case in world history”), (Emotional) assessment appears in /люб­лю свою рабо­ту/ (“I love my job”), Accusation is found in /у вас серд­ца ника­ко­го нету/ (“you have no heart”). Argumentation viewer engagement Emphasizing opinion or assessment schema appears in /конеч­но, я ему очень бла­го­да­рен за это, без­услов­но/ (“of course I am very grateful to him for it no doubt”), Self-correcting schema is found in /если я не оши­ба­юсь/ (“if I am not mistaken”).

In Description plot contents Achievement schema is identified in /а наши взя­ли это­го Ган­са и пере­ки­ну­ли его через эту доро­гу/ (“and our (troops) caught this Hans and threw him across this road”), Process schema appears in /это мы ведь игра­ли не их/ (“we were not playing them”), Accentuated subject is found in /так что не все хотят быть руко­во­ди­те­ля­ми/ (“not everyone wants to be a director”), Accentuated time appears in /год их не видел/ (“it has been a year since I last saw them”).

Description viewer engagement Generalization schema can be shown with /ну вот так это было/ (“this is the way it happened”), Chain of events schema — with /хохо­та­ли лежа­ли и пла­ка­ли/ (“[we] were all on the floor in stitches”), Appeal to attention schema — with /зна­ешь / он попя­тил­ся попя­тил­ся/ (“you know he moved back back”).

To give an example of deictic gesture use, in Fig. 3 a, b, we show the actors employing these gestures in the construal of DISCOURSE SPACE in film and TV interview narratives.


Fig. 3. Deictic gesture use in film and TV interview narratives


In Fig. 3 a (film narrative), actor V. Tikhonov is using a pointing gesture directing it onto the students. He is talking about the letters which he wants his students to address in /а потом были толь­ко пись­ма / сот­ни писем / читай­те их / они опуб­ли­ко­ва­ны/ (“And then there were only letters, hundreds of letters, read them, they are published”). Importantly, that while pointing at the students, the actor is speaking about the action he wants his students to perform since the main phase of the gesture is synchronized with читай­те (“read”); therefore, this gesture is clearly a Metonymic pointing. Meanwhile, the gesture does not indicate any path of action (the way the action should be performed) but indicates the addressees whom the actor wishes to perform this action; consequently, this is Placement indication. Since the actor is talking about the action which might start since the moment of the present event, involves the present event participants into its performance, this is a Speech event.

In Fig. 3 b, actress V. Alentova in a TV interview is using a touching gesture while specifying the reason of the event and stresses it by saying и это было то (and it was something that). Hence, this is a Direct pointing. Still, the gesture does not indicate any path of action; therefore, this is Placement indication. The event the actress is pointing at is clearly not the event which might take place in the studio; consequently, this is a Narrated event.

The research procedure involved several steps.

At Step 1, we annotated the research corpus of film and TV interview narratives adopting the inventory of discursive schemata of argumentation and description in speech and discursive schemata embodied in the deictic gestures (we also annotated other functional gesture types as was shown in [Kiose et al. 2022]). Annotation was made in ELAN in several tiers allowing to identify the synchronization patterns of speech and gesture discursive schemata.

At Step 2, we identified the distribution of discursive schemata in speech and gesture in two subcorpora, film narratives and TV interview narratives, and also in the individual actor’s narrative. This allowed 1) to identify the significant differences in the use of discursive schemata in speech and gesture in film and TV narratives, 2) to contrast the use of co-speech deictic gesture as modulated by discursive schemata in argumentation and description in film and TV narratives, 3) to find the significant distinctions in pointing and touching gesture use with argumentation and description in film and TV narratives. A series of one-way ANOVA tests in Jamovi statistical software was run for the purpose.

At Step 3, we explore the distribution of discursive schemata in speech and gesture in film narratives and TV interview narratives as modulated by the Plot contents and Viewer engagement structure characteristics earlier disclosed as typical of film narratives and TV interview narratives.

Results and discussion

Step 1. To demonstrate the annotation procedure, we will illustrate a fragment from V. Alentova film narrative (Fig. 4). To identify the gesture types and the discursive schemata in speech, we employed a coded system with deictic gestures coded as 301 (pointing) and 302 (touching).


Fig. 4. ELAN Annotation sample


We may observe that at the moment of producing either of two clauses /зна­чит смот­ри/ (well, look) and /у Сере­жи в суб­бо­ту день рож­де­ния/ (“Serezha has birthday on Saturday”) Alentova employs a Touching deictic gesture <302> produced by both hands. While uttering these clauses, Alentova enumerates the micro events which are to happen in the near future. Still, to introduce this enumeration, she uses her left hand as if manifesting the event which she wants the interlocutor to observe; therefore, this deictic gesture is clearly a Metaphorical pointing. At the same time with the forefinger of her right hand Alentova is indicating the direction towards this palm-event which she wants the interlocutor to follow; this is then a Path indication schema. Since she is introducing this event-to-follow while pronouncing the clause appealing to the interlocutor’s attention in /зна­чит смот­ри/ (“well, look”), we consider it a manifestation of Speech event. Still, in producing the second clause /у Сере­жи в суб­бо­ту день рож­де­ния/ (“Serezha has birthday on Saturday”) Alentova touches her left-hand finger with the forefinger of her right hand to start enumerating the microevents-to-follow. This deictic gesture is clearly a Metaphoric emblematic pointing where a finger represents a microevent. Still, in this case this is a Placement indication since the metaphoric microevent is held on her open palm. This microevent does not involve the participants and setting represented in speech; therefore, it can be referred to as Narrated event.

Alentova employs several argumentation and description schemata in speech. In / зна­чит смот­ри/ (“well, look”) she uses Appeal to action as a Plot contents schema and Appeal to attention as a Viewer engagement schema. In /у Сере­жи в суб­бо­ту день рож­де­ния/ (“Serezha has birthday on Saturday”) we find Stating reasons, consequences, conditions as a Plot contents schema and Specification as a Viewer engagement schema manifesting argumentation. Additionally, this clause manifests description discursive schemata of State and Accentuated subject as Plot contents schemata.

The same procedure was applied to annotate all the clauses in the actors’ speech with gestures. The total number of the clauses which co-occur with deictic gestures in film narratives was 54; the total number of the clauses appearing with deictic gestures in TV interview narratives was 59.

Step 2. In Table 3, we present the total number of discursive schemata found in gesture of film and TV narratives.

Table 3. Discursive schemata of Object reference, Move and Frame of reference in media discourse

Discursive schemataFilm narrativeTV interview narrative
Object reference
   Direct pointing
   Metonymic pointing
   Metaphoric pointing



Path indication
   Placement indication



Frame of reference
   Speech event
   Narrated event




We may observe that gesture discursive schemata in film and TV narratives manifest differences which appear in all types of schemata. Still the Paired samples T‑test shows these differences are statistically insignificant with Student’s t (6) = –0.18 at p = 0.863. Additionally, we presume that 1) the differences (although not significant) can occur due to various speech discursive schemata applied in the clauses, 2) there may be distinctions in the individual speech and gesture multimodal deixis. Therefore, we further addressed individual participants gesture behavior to determine the gesture and speech contingencies.

First, we contrasted the use of discursive schemata in speech and gesture in film and TV narratives. Two One-Way ANOVA (Welch’s) tests were performed for this purpose with a) speech schemata used with deictic gesture as determined by film or TV interview, b) gesture schemata as determined by film or TV interview.

There were 42 speech schemata tested as potentially contingent on either film or TV interview; still, only 6 schemata displayed significant differences in their use with a media format. In argumentation these include the discursive schemata revealing Plot contents, which are Stating reasons, consequences, conditions with F = 4.59 at p = 0.034, Contrast with F = 4.43 at p = 0.039; also, they include one discursive schema revealing Viewer engagement, which is Quoting others with F = 4.67 at p = 0.034. In description these include the discursive schema revealing Plot contents, which is Accentuated place with F = 17.16 at p < 0.001; also, they include two discursive schemata revealing Viewer engagement, which are Specification with F = 17.21 at p < 0.001, and Chain of events with F = 9.53 at p = 0.003.

The results show that the most significant differences are observed in description schemata, and mostly relate to specifying information and stressing the place; in both cases the schemata with deictic gestures are found more frequently in TV interviews.

There were 7 gesture schemata tested as potentially contingent on either film or TV interview. Noticeably, 6 out of them manifested significant distinctions with an only exception of Direct pointing. In Table 4, we present the One-Way ANOVA (Welch’s) test results for all gesture schemata.

Table 4. Differences in Object reference, Move and Frame of reference

Discursive schemataFilm vs TV interview, F; p
Object reference
   Direct pointing
   Metonymic pointing
   Metaphoric pointing

1.77; 0.187

10.39; 0.002
10.49; 0.002
   Path indication
   Placement indication

16.76; < .001

21.95; < .001
Frame of reference
   Speech event
   Narrated event

14.28; < .001

11.24; 0.001


As the results show, the strongest differences are found in the use of Move and Frame of reference schemata. The diagrams in Figs 4–6 allow to determine how these differences are distributed in film vs. TV interview.

Fig. 5 shows the distribution in gesture Object reference.


Fig. 5. Object reference


In Object reference, the differences were found in Metonymic and Metaphoric pointing; however, the diagrams show that while the number of Metaphoric and Metonymic pointing in film is almost the same, it is not the same in TV interviews. Whereas Metonymic pointing is infrequent in interviews, it is the Metaphoric pointing which is commonly found.


Fig. 6. Move


In Move schemata, the tendency in the Path and Placement indication distribution is similar; still, Placement indication occurs significantly more often in films.

In Fig. 7, we present the distribution in gesture Frame of reference.


Fig. 7. Frame of reference


We determine that in film, Speech events synchronized with deictic gestures appear more frequently; presumably, this happens due to higher involvement of actors into the situation of the film shot, while in the TV interview it is the narrated event which the actors accentuate by deictic gesturing.

Next, we explored the use of co-speech deictic gesture as modulated by discursive schemata in argumentation and description in both film and TV narratives. To do it, we performed two One-Way ANOVA (Welch’s) tests with speech schemata as modulated by a) pointing gestures and b) touching gestures. It is to be mentioned that only one speech schema was used with deictic gestures which manifested significant distinctions in its use with pointing and touching gestures, that was Accentuated characteristics, with F = 4.72 at p = 0.045 for pointing gestures, and F = 4.79 at p = 0.034 for touching gestures. Accentuated characteristics schema appears significantly more frequently with pointing gestures than with touching gestures, with 21 and 2 cases in films, and 15 and 7 cases in TV interview, correspondingly.

Step 3. The obtained results describing multimodal deixis in film and interview helped specify the earlier described characteristics of these two media formats. Below, we will consider them separately.

Multimodal deixis in film manifested the preference for Metonymic pointing, Placement indication and Speech event construal. Importantly, we did not observe enhanced embodiment role appearing in gesturing in films [Kristeva 2014] in terms of deictic gestures, as opposed to TV interviews. This might be explained by the fact that the interview participants were also actors; however, there is a more potent argument that these differences will appear in other gesture types, for instance, in pragmatic or representational gestures [Cienki, Mittelberg 2013]. Another finding was that Metaphoric deictic gestures which were expected to appear more often in films did not manifest this tendency. Importantly, there were cases (see Fig. 7) of entrenched or emblematic multimodal metonymies and metaphors expressed in speech and gesture behavior of actors performing in film.

In Fig. 8 a, Yu. Nikulin uses a pointing gesture to show the direction to God who is commonly deemed to be somewhere up; hence, we observe a metonymic shift from subject onto the direction towards this subject. In Fig. 8 b, L. Akhedzhakova points at each of the event participants while touching with a forefinger of her left hand one of the fingers of her right hand, which is an emblematic gesture of enumeration; still, this is not a case of sole enumeration but also focusing attention on each particular participant. Therefore, there is a metaphoric shift from the finger she is touching to a participant.


Fig. 8. Emblematic multimodal metonymy and metaphor with deictic gesture in film


Presumably, the prevalence of Metonymic pointing is interconnected with another frequently appearing gesture schema which is Speech event construal. Since this is the fictive event with an emphasis on emotionality, imagery, pragmaticity and logics [Ivanov 1976] which the actor is performing in the film shot, the components of the shot are located in close proximity with the speaker and there is no need to point at them by forming an iconic view of them. Additionally, other gesture types (for instance, representational) might produce the exaggerated gesturing of “look into the camera” type [Ciccognani 2018].

Multimodal deixis in TV interview displayed the significant prevalence of Metaphoric pointing, Path indication and Narrative event construal. In terms of speech schemata specifics, we determined the significant prevalence of Specifying information and Accentuating the place. Since in the interview, the speaker focuses in the Narrated event, we expected higher dynamicity in answering the interviewee’s questions [Clayman, Heritage 2002], which was found in both speech and in the use of Metaphoric deictic gesturing.

Fig. 9 shows an example of Metaphoric gesturing which clearly complies with the Narrated event.


Fig. 9. Metaphoric deictic gesture in TV interview


In Fig. 9 the actor points at a person who is not nearby. The gesture is used as if playing out a situation when the individual that they are describing was in the same room with the actor to intensify the argumentation. These findings show that multimodal deixis is mediated by the iconicity in gesturing of the communicators (Metaphoric pointing), which proves that it reflects two communication regimes of sharing and obtaining information [Scherbatyh 2016].

The prevalence of Path indication in deictic gesturing evidences in favour of the speakers’ high involvement into a narrative event construal; presumably this happens since the speakers are professional actors and produce a combination of staged character and spontaneity [Yashina 2007] with exaggerated staged effects. Importantly, that acting out a narrated event might have influenced the speech schemata choice. To bring closer the components of the plot implementation [Ilie 1999], the speakers focalize in speech the details of the event, accentuate the place of the event. This involvement is manifested in Fig. 10.


Fig. 10. Narrated event in TV interview


In Fig. 10, we observe the example of a touching gesture, where the actor abstractly “touches” the sides which the troops moved to. This quite frequent identification of the sides (right-left, west-east) is implemented by a pointing gesture, for instance pointing to the right while showing the orientation in space. The co-speech gesture chosen by the actor intensifies the Placement and Path indication of the movement of the object in the Narrated event.

Overall, multimodal deixis contributes to intensifying specific characteristics of plot contents and viewer engagement in two media formats, film and TV interview. Most notably, these are the characteristics related to their different narrative regimes and different combination of staged performance and spontaneity.


In this study, we addressed the problem of multimodal information construal in contrasting two most common media formats, film and TV interviews, in short narratives produced by the same actors. We presumed that since these two formats display differences in their plot content and viewer engagement characteristics, the latter might affect the way the actors employ speech and gesture to present the discourse space. With the research data of 20 film and TV interview fragments with 10 highly rated Russian actors (5 men and 5 women) we developed a three-stage procedure to explore the possible contingency of discursive schemata presenting discourse space in speech and of discursive schemata presenting object of reference, move and frame of reference in gesture.

The results manifest that multimodal deixis contributes to intensifying both types of these characteristics. Importantly, the results contributed to specifying several of them. For instance, we did not observe enhanced embodiment role appearing in gesturing in films in terms of deictic gestures, as opposed to TV interviews. We also identified higher dynamicity in responsive narratives in TV interviews, which was found in both speech and in the use of metaphoric deictic gesturing. Statistical results show that while multimodal deixis in film manifests the significant preference for metonymic pointing, placement indication and speech event construal; in TV interview we observe the prevalence of metaphoric pointing, path indication and narrative event construal. Additionally, TV interviews display the speech schemata which specify information and accentuate the place. Overall, the study shows that there are two basic types of discourse characteristics which produce higher differences between film and TV interview formats. These are different narrative regimes and different combination of staged performance and spontaneity.

