Пятница, 13 декабряИнститут «Высшая школа журналистики и массовых коммуникаций» СПбГУ
Shadow

Semantic “portraits” of the Russian “media center” and “media periphery”: Lexical-statistical and ideographic analysis

Problem statement

Despite the development of digital technologies, the increasing media activity of the audience, as well as the clear prospects for creating a unified media space [Korkonosenko 2014], spatial differences continue to matter in Russia. This happens because Russia, being the largest country in the world, has features influenced by national factors — geographical, demographic, climatic, multicultural, settling, etc. — which determine intra-country differentiation.

Different geographical and climatic conditions determine the specifics of creating infrastructure and disseminating media technologies in the country. At the same time, the current gap in digital capabilities of different geographical regions persists even with the development of digitalization and with the attempts to reduce digital inequality [Gladkova, Vartanova, Ragnedda 2020]. This inequality demonstrates differences in media saturation of Russian regions — the differences that affect both the modes of journalistic production and the audience’s use of journalistic output. This, in turn, forms groups of media users who are united not only by certain social variables related to demographic and generational characteristics and specific media consumption, but also by the reproducibility of and demand for different semantic palettes. In this study, intra-country disproportions in media saturation are conceptualized through the media hierarchy of Russian territories: “media center”, “media semi-periphery” and “media periphery”.

The purpose of the study is to identify semantic “portraits” of the Russian “media center” and “media periphery” by studying the intentions of text authors — representatives of institutional and civic journalism.

The research is focused on the study of verbal matter, since language, according to Yu. M. Lotman, is the first modeling system [Lotman 1998]. The subject of research interest is the language of regional Russian mass media. This allows us to solve on a concrete case study one of the topical tasks of medialinguistics, related to the identification of differences in the language of media in Russian large regional and local media [Duskaeva 2018: 258]. The study focuses on “interpretive properties of media texts, their role in building an information picture of the world, culturally specific and ideological factors affecting the production and perception of mass media texts” [Dobrosklonskaya 2020: 113] by the audience of different Russian territories.

Historical background

In Russia, the spread of media technologies is determined by the spatial development of the country. In particular, the change in the technological paradigm — from “analog” to “digital” — is predetermined by the characteristics of the development of the domestic technosphere. The most significant changes were noted in 2011–2015, when the number of Internet users in Russia doubled annually [Vartanova 2020: 37]. At first, digital media technologies appeared in Russian megacities, which currently have advantages in the use of telecommunications. Later, these technologies appeared in other types of settlements: regional centers, small towns, urban-type settlements, and villages. Yuri Levada* noted that social time in Russia is measured by space: periods of history are materialized in the endless periphery [Levada* 2001]. In the periphery, a stable way of life and a closed communication environment lead to a delayed influence of information technologies. The influence is also not very transformative, since, according to E. Trubina, these territories reproduce the “tracks of previous development” [Trubina 2013]. At the same time, large Russian territorial entities concentrate resources (infrastructure, human and financial capital, development of formal education), which are fundamentally significant for the development of the territorial media environment.

Taking into account all the factors about media inequality in Russian regions, we propose to distinguish three territorial media levels: “media center”, “media semi-periphery”, “media periphery”. There are previous examples of such terminology in scientific discourse. In particular, scientists analyze “media centers”, media capitals as territories characterized by a high concentration of media resources and great opportunities for the audience to use media [Curtin 2003; Keane 2006]. There are examples of the use of the concepts of “information periphery”, “information semi-periphery”, “media semiperiphery” in national media systems [Barry 2012; Volkhin 2018]. These terms reflect the results of uneven distribution of information flows in a specific territory. However, these concepts are only partially understood in the context of the center-periphery model of regions, especially in the context of Russian territories.

We mark territories in the logical concept of “media center — media periphery” based on a four-component and multi-criteria structure of objective measurable indicators: demographic data, levels of economic sustainability and media saturation, as well as concentration of institutions providing educational programs for media professions. The media saturation of a territory is determined by analyzing the diversity and quantity of transnational, federal, regional, and local media outlets.

We propose to classify the largest cities with a population of 250 thousand people or more as Russian “media centers”1. In this case, the “media center” includes approximately 7.08 % of Russian cities, which are home to 42.61 % of the country’s population. These territories are the core of the development of the domestic media industry: they spread development impulses to the “media semi-periphery”, have a mobilizing influence on it, and use (or take) its human resources potential.

We propose to classify predominantly large cities with a population of 100 to 250 thousand and medium-sized cities with a population of 50 to 100 thousand as “media semi-periphery”2. In total, “media semi-periphery” includes 21.71 % of all Russian cities, where only 22.71 % of the Russian population lives. Applying the theory of regional development of J. Friedmann, we can assume that this is the most mobile media zone: its development dynamics can periodically move from the center to the periphery and vice versa. The “media semi-periphery” is, in a sense, a stabilizing basis for the development of the entire domestic media space: it acts as a link between the contrasting “media center” and “media periphery”. These media territories influence the development of the “media periphery”.

We propose to classify small Russian towns with a population of up to 50 thousand people (71.21 % of all Russian cities) and rural areas as “media periphery”. In total, 34.68 % of the Russian population, including residents of villages and rural settlements, live in a “media periphery”. As a rule, these are the least developed, under-urbanized territories, which sometimes show signs of degradation and marginalization due to insufficient socio-economic and media development, as well as unfavorable geographical position (remoteness from domestic markets and significant transport routes, etc.). As a result, these territories have infrastructural limitations and insufficient resources for the media production that meet the demands of the time. The following factors are also significant: institutional barriers (client relations between media and municipalities, low quality of professional qualifications of media specialists); population decline (due to regional migration and natural decline). The “media periphery”, being at the very bottom of the center-periphery hierarchy, is guided by the “media semi-periphery” and the “media center” in its development but prefers to stay away from them: it has its own traditions of media production and media use. These traditions are determined by the isolation of communication communities and the desire to preserve a local identity, which is significant for the territory’s survival.

Media inequality of Russian regions is the first level of the digital divide between the media audiences of different Russian territories. At the first level, inequality is expressed in the level of access to information technology [Norris 2001]. At the second level, the existing inequality of skills and competencies influence user communication in the digital media environment. At the third level, inequality affects the chances of social advancement and improvement of the quality of life [Ragnedda, Ruiu 2017]. After researching this issue in developed countries since 1995, scientists have concluded that the inherited digital inequality has moved from a technological to a social dimension, changing the fundamental way of social organization and impacting knowledge about society [Vartanova, Gladkova 2022].

The concept of media hierarchy in Russian regions was tested by one of the coauthors of the study (A. S. Sumskaya), who applied a four-component and multi-criteria structure of all measured indicators in 18 territories of six constituent entities of the Ural Federal District. This allowed us to differentiate territories of the Ural Federal District and identify them as media centers, media semi-peripheries, and media peripheries [Sumskaya 2024: 357].

In almost all cases (with rare exceptions), there is a linear hierarchy: the administrative center has a larger population and a more developed economy, which affects the levels of media saturation and media staffing, which, in turn, leads to its identification as the media center of the territory. Small regional cities have smaller populations (smaller media audience) and a smaller margin of economic sustainability, which affects media saturation and the need for highly qualified media staff. Therefore, these territories are predominantly defined as the media semi-periphery. Continuing this logic, small towns and rural areas in Russia have all the signs of the media periphery.

The results of the distribution of territories in accordance with the criteria of “media-center” and “media-periphery” differ from the official statistics on the urban and rural population of the region — according to the statistics, the population of the Ural Federal District is predominantly urbanized. Applying the concept of “media center — media periphery” to the marking of territories shows that official urban areas are not always characterized by high media saturation. Urbanized territories do not always meet the criteria of “media central” territories in some subjects of the Ural Federal District, which, in our opinion, better reflects the real situation.

However, important questions still arise remain: what information shapes the perception of the world of the media audience in media-centers and media-periphery? Are there differences in semantic priorities and what are they? Are there any intersections of the semantic fields in these territories? What significant meanings unite them and are they united?

To answer these questions, we turn to the possibilities of corpus linguistics.

Corpus linguistics represents one of the most promising and fastest growing areas of language research. For example, a Google Scholar library query for “corpus research” yields 4,460,000 articles over all time (since 2004), and 898,000 full-text publications in the last incomplete 5 years (since 2020). In the Russian-speaking scientific field, the query “corpus research” shows since 2004 19,600 articles, and only from 2020 — 13,800 publications. For the keyword query “automatic semantic processing” in Russian since 2020, 7,130 scientific articles have been downloaded, while for the entire period of the search engine’s operation, only 15,700 articles on this topic have been indexed in it.

The history of the formation of corpus linguistics, which allows us to understand how automatic semantic analysis was and still is applied, is thoroughly described in the work of K. P. Chilingarian [Chilingarian 2021]. The indisputable advantage of computational linguistics lies in the ability to obtain more reliable and objective results in working with language matter, reaching conclusions based on statistical data, and not only on the intuition of the researcher or informant [Lavrent’ev 2004: 121].

The study uses the activity principle of studying the means of expressing meanings [Duskaeva 2019: 7]. This principle presents media speech as an objectification of social and professional activity [Duskaeva 2019: 15] and allows us to study arrays of linguistic means as a technical and linguistic toolkit used to form an information picture of the world of the media audience of a particular Russian territory. This is important because corpus linguistics, which is becoming increasingly popular among modern linguists, is still used to a limited extent for the needs of sociolinguistics [Markovina et al. 2022: 91].

In this paper, we focus on media texts posted on the Internet, understanding that the use of new networked digital technologies in current everyday practices provides grammaticalization that differs from the previous stable systems of traditional media [Zagidullina 2018: 282]. We are guided by the notion that “from a medialinguistic point of view, it is important that language can also be considered as a process of grammaticization: a certain flow is dissected and ordered” [Zagidullina 2018: 281], providing an analysis of the linguistic features of mass media texts, their linguomedia properties and sociocultural characteristics, in general the study of the functioning language in the media in the context of developed media technologies.

Research method and characteristics of the empirical base

This study used a semantic approach to characterize the vocabulary of media-central and media-peripheral journalism by exploring the intentions of text authors — representatives of institutional and civic journalism. The basic empirical method of the study is comparative lexical-statistical analysis, on the basis of which sociolinguistic interpretation of the data is carried out. The model of method realization is substantiated and described in the work of M. Y. Mukhin and A. I. Lozovskaia [Mukhin, Lozovskaia 2019].

The lexical-statistical method applied to the analysis of vocabulary in media allows us to identify thematic semantic fields and comprehensively analyze them [Polishchuk, Koklikov 2022: 64], to detect the variability of communicative intents and corresponding communicative structures in media texts [Ivolgin 2020: 63], reveal sociolinguistic features of text corpuses, which manifest themselves not only through linguistic (lexical) features, but also through the sociocultural priorities of media users [Mukhin, Lozovskaya 2019: 39], establish statistically reliable correlations between extra-and linguistic features of texts and objects, detect certain dependencies between the components of texts [Bogoiavlenskaia 2022: 60], identify patterns, and much more.

In addition, the work uses ideographic analysis, which allows us to identify textual dominants, thematic (or ideographic) groupings of priority vocabulary in the analyzed corpuses of texts.

On the basis of lexical-statistical and ideographic analyses it is possible to identify the most characteristic features of text creators [Mukhin, Poliakova 2023: 41], their mental lexicon, speech habits [Kolmogorova, Liamzina, Nikol'skaia 2023: 39], worldview and value system of those to whom these messages are addressed, which can be considered a schematic basis for compiling their peculiar portraits. The result of semantic comparison of media texts aimed at different audiences is the identification of the most significant semantic sets for each sociolinguistic group, ideographic classifications. And as a result — identification of unifying priorities and similar features of the media audience’s perception of reality on the basis of linguistic dominants.

A comparative analysis of modern tools for corpus-based philological research has highlighted the corpus manager Sketch Engine3 (https://​www​.sketchengine​.eu/) as one of the relevant tools [Paliichuk 2022: 77; Novikova 2020: 76]. To conduct the study, this particular computer program was used, designed for text mining. Note that despite the popularity of this tool in English-language scientific discourse, Russian researchers do not use it so often4. The program was developed by a team of authors at Masaryk University (Czech Republic) and opened for mass use in 2003 [Kilgarriff et al. 2014]. According to the data, since 2003 a group of Russian linguists started working on the creation of the National Corpus of the Russian Language [Lavrent’ev 2004: 132], which is now very actively used by Russian researchers.

At present, Russian researchers are using Sketch Engine to study political discourse in the media of different countries, construct images of political figures [Belozerova 2020; Gornostaeva 2020; Gornostaeva 2020a], clarify linguistic and ethnocultural contexts in Russian and other linguocultures [Zagidullina, Ghodrati, Shafaghi 2023], reveal the linguistic picture of the world of users of different languages at different age life stages [Lenart, Markovina, Endrody 2023], study emerging terminosystems [Novikova 2020; Kuzmina 2020] and others.

The study implements the following sequence of actions:

  1. Analysis of the quantitative characteristics of the formed corpuses of texts oriented to “media centers” and “media periphery” in comparison with the referential corpus.
  2. Analyzing the 100 most frequent lemmas in different contexts in each corpus using “Gender Lemma”.
  3. Identification and analysis of Top-50 unique frequent lexemes in each analyzed corpus.
  4. Making semantic “portraits” of the most frequent lemmas using the “Word Sketch” function in Sketch Engine. The compilation of such “portraits” is based on automatic analysis of the contexts in which the word occurs.
  5. Identification of keywords, frequency analysis of keywords and their “strength” to establish the typicality of word usage in the analyzed corpuses.
  6. Identification of Top-50 unique and frequency collocations with keywords in the analyzed text corpora.
  7. Combining the most frequent lemmas, “strong” keywords and collocations into lexical-semantic (denotative and ideographic) groups and “portraiture” of the “media center” and “media periphery” by creating their images on the basis of linguospecific lexicon of the territories.

As can be seen, the planned result of the research is exactly the semantic “portrait” of media territories. Using this metaphorical expression, following M. V. Zagidullina, we focus on the possibilities of the Word Sketch method, which allows us to get a cross-section of the linguistic “behavior” of an individual word, to identify the dominant meanings, their possible “condensations” and, thus, to display the semantic “portrait” of the word [Zagidullina 2017: 86]. “Portraits” of individual words obtained by using Sketch Engine are described by M. V. Zagidullina in a number of publications [Zagidullina 2017, Zagidullina 2020: 67]. By the way, the National Corpus of the Russian Language also allows us to form a “portrait” of a word.

The methodology of portraiture of an array of texts of “media center” and “media periphery” is an extension of the logic of portraiture of a single word. Unlike program-machine portraiture, however, it is semantic rather than grammatical portraiture that is used here.

“Portraiture itself is an interpretation of the ‘motley vocabulary’, and its strength becomes precisely the selection of words — not so much on the principle of their mere frequency, but on their strength” [Zagidullina 2017: 112], on what distinguishes and differentiates the media texts of one particular array from others and from the “average” native language. Thus, it is possible to obtain not only a flexible number and content of lexical-semantic groups, a certain thematic repertoire, but also a generalized description of significant meanings for the media audience of the analyzed territories.

It is important to note that portraiture of the population of different territories using Sketch Engine is realized in other Russian studies [Markovina et al. 2022: 99]. It should be added that the result of portraiture of a single word or a whole array is the created images, so in this case the “portrait” of a separate studied phenomenon, object, person or group of persons through the analysis of media texts can be called a media image [Bogoiavlenskaia 2022; Mamonova 2023]. However, in our case, following the logic described above, we prefer to use the expression semantic “portrait”.

The study was carried out using a continuous sampling method in October-December 2022. Those were the months following September 2022, when the first (partial) mobilization since the World War II was announced in Russia. That was the time when the media provided the most extensive coverage of events related to the mobilization, as well as the response to those events. Therefore, the analysis of the texts of that period can provide us with qualitative results in identifying semantic markers of the analyzed media territories. The sample included nine digital media outlets. The “media center” (the city of Yekaterinburg5) was represented by the websites Е1.ru, Lenta​.ru, “Typical Yekaterinburg”, “URA​.RU”, as well as the “OTV-Yekaterinburg” TV channel and the “Ural Worker” newspaper. The “media-periphery” (the town of Nyazepetrovsk6) was represented by the newspaper “Nyazepetrovskie News”, the TV channel “Nyazepetrovsky Contour-TV”, and the online page “Overheard in Nyazepetrovsk”. All media texts are posted on the Russian social network VKontakte. The empirical base included 7,541 texts, of which 1,281 were addressed to the “media periphery” (48,532 words) and 6,260 to the “media center” (354,795 words). During the study, we recorded the repertoire of significant vocabulary and established semantic priorities — universal media topics and thematic dominants of the media-central and media-peripheral territories based on the analysis of texts representing reality. This allowed us to formulate ideas about the ordered picture of the world of the population as the audience of the mass media of the Russian territories under study.

Results and discussion

We created corpora of “media periphery” and “media center” texts using Sketch Engine. First, we carried out a simple frequency analysis of the resulting text corpora in comparison with the reference corpus RuTenTen117. Comparative data on the composition of corpora is shown in Table 1.

Table 1. General information about corpora

CategoriesCorpus of “media periphery”Corpus of “media center”Nationwide corpus
RuTenTen11
Tokens862,745453,07818,280,486,876
Words48,532354,79514,553,856,113
Sentences314628,2451,016,579,568
Number of lemmas (lemma_lc9)741023,90241,377,553
Number of words per sentence15.412.614.5
Lexical uniqueness coefficient (ratio of lemmas and words)0.20.070.003

 

A review of quantitative characteristics allowed us to conclude: the media periphery corpus was more similar to the national reference corpus based on the number of words per sentence. The media center corpus was more similar to the national reference corpus in terms of the lexical uniqueness coefficient. It could be assumed that the texts of the media periphery were more formulaic and “correct” in the way they were presented. The degree of difference between the corpora of the media periphery and the media center was significant — 3.2910 (Table 2). This allowed us to identify the unique characteristics of each text array.

Table 2. Degree of difference between corpora

 RuTenTen11Corpus of “media periphery”Corpus of “media center”
RuTenTen111.003.062.61
Corpus of “media periphery”3.061.003.29
Corpus of “media center”2.613.291.00

 

Gender lemma11 was used for thorough analysis. Analysis of the 100 most frequent lemmas in each corpus showed the similarities and differences of vocabulary in the corpora. There were similar lemmas among the frequent vocabulary, but their different frequency in the corpora (I.P.M.12) indicated the following ten most frequent lemmas in Yekaterinburg, the “media center” of the territory, in comparison to the media periphery (Nyazepetrovsk): person (I.P.M. — 2553.64), Russia (I.P.M. — 1767.91), rubles (I.P.M. — 1617.82), time (I.P.M. — 1514.09), place (I.P.M. — 1306.62), thousand (I.P.M. — 1138.88), center (I.P.M. — 922.58), region (I.P.M. — 759.25), history (I.P.M. — 704.07), photo (I.P.M. — 675.38).

The ten most frequent lemmas in the “media periphery” compared to the “media center” were: district (I.P.M. — 5817.20), region (I.P.M. — 2773.13), work (I.P.M. — 2757.19), days (I.P.M. — 2518.13), residents (I.P.M. — 2374.69), newest (I.P.M. — 2071.88), home (I.P.M. — 1928.44), child (I.P.M. — 1769.07), help (I.P.M. — 1641.57), family (I.P.M. — 1051.88).

We can conclude that the semantic center in the “media center” is a person as an individual. This man feels an attachment to Russia, he/she is involved in financial activities, cares about time and is connected to a certain place, remembers history or is reminded of history, most often uses photos to capture reality. The situation is different in the media periphery: residents of a particular area are most interested in traditional patterns of communal settled life — home — family — work and the solution of social problems without a pronounced emphasis on the individuality of a person.

Since the collection of material for this study was carried out during the mobilization campaign, we consider it appropriate to clarify that the frequency of the word mobilize in the media periphery was greater (I.P.M. — 1051.88) than in the Russian media center (I.P.M. — 995.41). And as we know today, the largest number of mobilized people were sent to the special military operation precisely from the Russian provincial areas.

By excluding repeating lemmas from the Top-100 list, we obtained a list of 50 unique frequent lexemes, which are presented in Table 3.

Table 3. Comparative frequency of lemmas

no. “Media periphery”“Media center”
LemmaAbsolute frequencyI.P.M.LemmaAbsolute frequencyI.P.M.
1.Nyazepetrovsky1893012.19Yekaterinburg19014195.75
2.Chelyabinsk1412247.19man8191807.64
3.projects1041657.50video6921527.33
4.administration861370.63Sverdlovsk5351180.81
5.participation841338.75driver5331176.40
6.contest731163.44Ukraine5061116.81
7.celebration701115.63woman5001103.56
8.municipal671067.81street4801059.42
9.gift62988.13car4791057.21
10.culture57908.44word430949.06
11.governor57908.44mobilization408900.51
12.center55876.56case402887.26
13.citizens53844.69police365805.60
14.guys53844.69girl338746.01
15.regional53844.69Vladimirovich328723.94
16.campaign52828.75employee327721.73
17.frame52828.75Putin322710.69
18.telephone49780.94country319704.07
19.support48765.00apartment318701.87
20.health47749.06accident317699.66
21.point47749.06service307677.59
22.program46733.13AFU*301664.34
23.website45717.19automobiles295651.10
24.reception43685.31comment292644.48
25.meeting41653.44special operation290640.07
26.department40637.50company289637.86
27.groups40637.50Russian284626.82
28.object40637.50month278613.58
29.reputable40637.50President275606.96
30.director39621.5night268591.51
31.Kravtsov38605.63information266587.10
32.best38605.63detail255562.82
33.Alexeyevich37589.69hour254560.61
34.garbage37589.69hospital243536.33
35.vote36573.75court242534.12
36.types36573.75road accident236520.88
37.authorities35557.81reader236520.88
38.veteran35557.81Dmitrievich230507.64
39.budget35557.81bus227501.02
40.specialist35557.81shop216476.74
41.means34541.88mom212467.91
42.doctor34541.88side211465.70
43.attention34541.88must210463.50
44.participant34541.88explosion208459.08
45.activity33525.94building203448.05
46.collection33525.94eyewitness201443.63
48.measure33525.94fighter200441.43
49.New Year’s32510.00friend198437.01
50.line32510.00Ukrainian195430.39

Note: * — Armed Forces of Ukraine.

We can summarize that the media center covered the topic of the special military operation (SMO) more extensively and diversely, as evidenced by such lemmas as: Ukraine, mobilization, Armed Forces of Ukraine (AFU), special operation, fighter, Ukrainian. In addition, there are a lot of lemmas indicating official information from the President of the Russian Federation, as well as lemmas connecting the situation to specific people (man, woman, girl, mom, etc.).

On the surface, it appears that the media periphery has no corresponding frequent vocabulary associated with the SMO. In addition, words related to festive events are common: celebration, gift, contest. However, additional contextological analysis allowed us to classify the following lemmas as related to the SMO: point (collection point for dispatch to the SMO), collection (of funds for the SMO participants), measure (of support for families who sent family members to the SMO), line (of contact in the SMO zone). Moreover, festive events, which were organized by the heads of territories and ordinary citizens during this period, were mainly associated with patriotic support for the families of mobilized people (Mother’s Day, Father’s Day, National Unity Day). This is generally consistent with collectivist practices and is aimed at uniting citizens to resist Russia’s external challenges. In this case, the most frequent vocabulary in the “media center” is associated with the discussion of troubling issues related to the SMO, while the most frequent vocabulary of “media periphery” is associated with specific actions of dispatching mobilized people to the zone of the special military operation.

To create semantic “portraits” of the most frequent words with the aim of identifying possible differences between the “media center” and “media periphery” corpora, we used the Word sketch13 function in Sketch Engine, which assists in creating “a draft of a dictionary entry”. Two lemmas were taken for analysis: “year” and “person”. Visualization of the results was created in the Word Art service and is presented in Figures 1–2.

By comparing the “portraits”, we can observe the repertoire of differences. The “year” lemma in the “media center” corpus has pronounced negative meanings, since the most frequent words include those related to the restriction of human freedoms. At the same time, the texts also mention places where people engage in active recreation and health-improving procedures: “spa center”, “fitness club”, etc.

Fig. 1. The most stable combinations with the word “year” in the arrays of “media center” (left) and “media periphery” (right)
Fig. 2. The most stable combinations with the word “person” in the arrays of “media center” (left) and “media periphery” (right)

 

Combinations with the “year” lemma in the “media periphery” corpus reflect the variety of human activities in everyday life, including festive events (concert) and memorial events (cemetery). In the “media periphery” arrays, the word “year” is combined with a larger number of adjectives and verbs than in the “media center” array, i. e., there are more diverse ideas about the “year” than in the “media center”. Adjectives in the “media periphery” corpus differ from adjectives in the “media center” corpus. In the materials from the “media periphery,” adjectives associated with the word “year”, while diverse, most often correlate with the meaning of “overcoming” (difficult, long, etc.). In the “media center” corpus, however, adjectives with neutral connotations are most frequent (last, academic, previous). Based on this, we can conclude that the audience of the “media periphery” has more difficult living conditions. However, it does not label the “year” only with the negative vocabulary.

The semantics of the word “person” and the analyzed contexts are different in the “media periphery” and “media center” corpora. In the materials of the “media center”, the following verbs make an emphatic difference: live, suffer, die, fear. In the “media periphery” corpus, there are such verbs as: understand, help, begin. It is important to note that the “media center” corpus also has such frequent verbs as: be, be able to, write, speak. These verbs are absent from the “media periphery” corpus. In this regard, we conclude that the “media center” discusses the issues that arise on the agenda, talks about emerging concerns. At the same time, the “media periphery” broadcasts executive discipline and an active position. Analysis of the use of the lexeme “person” in conjunction with adjectives shows the most significant differences in the corpora. It is important to mention that young and old exist in both analyzed corpora, but the word young is frequent in the “media center” corpus, and the word elderly is frequent in the “media periphery” corpus. The “media center” corpus often uses words about the person in general, about his/her actions and personal qualities. Typical ideas about a person in the “media center” include the following characteristics: talented, creative, misunderstood, cheerful, cultured, peaceful.

Sketch Engine allows us to conduct another type of frequency analysis by identifying frequent keywords. “Keywords” is a tool for identifying unusually high or low frequency of words compared to reference corpora. This tool allows us to determine which words are unique in comparison and therefore most typical of their corpora. While comparing the keywords of the “media periphery” and “media center” corpora in relation to the RuTenTen11 reference corpus, we also analyzed the “strength”14 of the word — a parameter that allows us to identify the significance of the analyzed keyword for a given corpus in comparison with the main corpus of the Russian language.

The 10 “strongest” keywords in the “media periphery” corpus include: Teksler (governor of the Chelyabinsk region) (score 570.6), RDK (district’s cultural center) (score 451.8), Kravtsov (head of the municipal district) (score 392.9), Araslanovo (score 380.2), Ufaley (score 300.7), MKOU (municipal state educational institution) (score 292.1), Bunakov (chairman of the Assembly of Deputies of the Nyazepetrovsk municipal district) (score 288.4), Shemakha (score 265.3), regional (score 242.5), mobilize (score 232.1). Proper names identified in media texts create, according to I. A. Pushkareva, the flavor of the onomastic space of the city and region [Pushkareva 2017].

The 10 “strongest” keywords in the “media center” corpus include: AFU (Armed Forces of Ukraine) (score 567.8), special operation (score 309.7), mobilize (score 303.8), DPR (Donetsk People's Republic) (score 231.1), Yekaterinburg residents (score 184.8), LPR (Lugansk People's Republic) (score 165.3), Kherson (score 154.5), digest (score 129.8), social media (score 128.3), PMC (private military company) (score 122.9).

Analysis of the keywords allows us to identify the least common lexemes in one corpus in relation to another. Thus, this forms insights about typical lemmas in corpora of different territories. The list of typical lemmas for the “media periphery” includes: Nyazepetrovsk, MKOU, South Ural residents, Helix, Ufaley, Shemakha, etc. A list of typical lemmas for the “media center” includes: digest, Zelensky, Navalny**, hostel, mayor, donor, Biden.

As can be seen, the media-peripheral territory is mainly focused on local issues and local management structures, while media-central territory is focused on socio-political issues at the national and international levels.

Sketch Engine also allows us to identify collocations15 with keywords. If we exclude repeated collocations in the arrays of “media periphery” and “media center” in relation to the RuTenTen11 corpus, we obtain a list of Top-50 collocations, which are presented in Table 4.

Table 4. Unique collocations in the texts of the “media periphery” and “media center”

 Collocations of the “media periphery” in relation to the “media center”Collocations of the “media center” in relation to the “media periphery”
1Nyazepetrovsk municipal districtsquare meter
2residents of Nyazepetrovsk districtKherson region
3head of the districtshopping mall
4residents of NyazepetrovskRussian military man
5governor of the Chelyabinsk regionYekaterinburg center
6district’s cultural centerNizhny Tagil
7power outagepeople’s award
8National Unity Daypublic transport
9Administration of Nyazepetrovsk districtbirthday
10Christmas tree decorationspecial operation zone
11Verkhniy UfaleyRussian news
12mobilize citizenstime of the special operation
13proactive budgetingMinistry of Defense of the Russian Federation
14assembly of deputiesshort video
15State Services portalyear of imprisonment
16chairman of the assembly of deputiesVladimirovich Putin
17medical center doctorSverdlovsk region
18district of Chelyabinsk regionVerkhnyaya Pyshma
19chairman of the assemblyspecial operation participant
20work of a medical teamcredit holidays
21Mother’s Dayhead of the region
22chief medical officerice town
23team’s field workyear of jail
24television departmentlocal resident
25local managerSlavic brigade
26central libraryair raid alert
27humanitarian aidSiberian highway
28reception of residentsNew Year’s mood
29medical teamYekaterinburg City Hall
30department of the Ministry of Internal AffairsZaporozhye region
31long shelf life producthouse residents
32collection of humanitarian aidRussian military serviceman
33New Year’s toySerovsk highway
34respected employeeUkrainian military man
35Prosecutor’s office of the Chelyabinsk regionmartial law
36regional council of veteransgas explosion
37the newest medical and midwifery stationfather of many children
38the newest housethe next day
39local authoritiesMayakovsky Park
40youth of the Nyazepetrovsk municipal districtcompulsory military service
41cultural mosaicRF Armed Forces
42Father Frost’s workshopthe newest region
43support measureannouncement of partial mobilization
44disabled person’s daythe strictest regime
45respected resident of Nyazepetrovsk districtwave of mobilization
46Prosecutor’s office of Nyazepetrovsk districthead of the city
47newest podcast episodethe most typical Yekaterinburg
48deputy of Nyazepetrovsk districtnext year
49mobilize the guysAFU attack
50budget proposalbeginning of partial mobilization

 

The presented collocations, obtained as a result of statistical comparison, are actually “fragments of the vocabulary of lexical combinability” [Mukhin, Mukhin 2019: 13] of the studied corpuses of texts of the “media center” and “media periphery” of Russia, since only characteristic combinations of words not found in other analyzed corpuses were extracted from the texts.

We combined the resulting lemmas and collocations into denotative-ideographic groups (Table 5) based on the classification of Russian vocabulary developed by the Ural semantic school [Babenko 2015].

Table 5. Lexico-semantic groups of the vocabulary of “media periphery” and “media center” (gender lemma, “strong” keywords and phrases)

Denotative-ideographic groups“Media periphery”“Media center”
Live natureFire
Person as a living beingChild, person, residents, veteranMan, person, woman, girl
LocalityNyazepetrovsk district, Verkhniy Ufaley, Shemakha village, Chelyabinsk region, district’s house, Aptryakova village, Unkurda village, Araslanovo village, Unkurdinsk settlement

District, region, Yekaterinburg, Nizhny Tagil, Sverdlovsk region, Verkhnyaya 

Pyshma, ice town, Zaporozhye region, 

Kherson, DPR, LPR, Yelan

Family and relationshipsFamilyMom, family, son, father of many children, mobilize a husband
Person and his/her inner worldSupport, help, bestFriend
MediaTelevision department, website, information, children’s television journalism studio, television programmedia information, Russian news, social media, digest, Telegram channel, podcast, Ukrainian media, commentary, photo
Public sphere (state, power, public order, politics, person in politics)Voting, governing, mobilize, governor, Teksler, Bunakov, Kravtsov

AFU, court, President, service, Putin, military enlistment office, Zelensky, Biden, Ministry of Defense of the 

Russian Federation, Russian National 

Guard, RF Armed Forces, police

Economics/finance, financial activitiesRubles, proactive budgeting, share, budget, Single tax account

Thousand, rubles, credit holidays, 

money

EducationMKOU (school)UrFU (university)
Culture and artCultural center, cultural mosaic, celebrationPhoto
Social and state sphere. Military serviceMobilizeMilitary, special operation, mobilization, service, military enlistment office, fighter
HealthcareHelix, medical center doctor, medical team, chief medical officer, healthHospital
Service sectorSpa center, fitness club
TransportVehicles, car
Social sphere of a person’s life. A person at workBoss, specialist, doctor, local manager, Bunakov, Lukoyanov, KravtsovDoctor, employee, police, driver, Musk, Merzlyakov
LawCourt, imprisonment, year in jail
Perception of the environment. PeriodizationYear, time, road, momentNight, week, hour
Universal ideas, meanings and relationshipsRespected, object, part, collection, measure, situation, problemInformation, life, question, part, problem

 

The “strong” vocabulary of the “media periphery” has a greater focus on healthcare, the social sphere, finance, and traditional media. The “media center” focuses on family and relationships (which have become especially relevant in the context of a mobilization campaign), the public sphere, international relations, business activities, crime stories, digital media technologies, etc. The texts of the “media periphery” mention educational institutions (schools), while the texts of the “media center” mention university professional training (Ural Federal University). The vocabulary of the “media periphery” includes information about settlements of different sizes (cities, towns, villages), while the “media center” includes only large cities. The texts of the “media center” contain more diverse words related to social media.

The “strongest” and most frequent vocabulary paints the following semantic “portrait” of the “media center”:

The “media center” is focused on a person who is included in the media agenda related to the public sphere, business/finance, and law. This person is more focused on obtaining higher education and developing a professional career, is guided by the decisions of government agencies or those associated with them, knows about human rights and the reasons for limiting human freedoms. He/she strives to lead a healthy lifestyle, is mobile, and finds issues related to personal vehicles important. This person is sensitive to the international and national information agenda, including an active discussion of the prospects for mobilization to the zone of a special military operation. In this regard, the topic of family and relationships appears very relevant in the context of discussing the possible postponement of mobilization for family reasons. The person from the “media center” actively uses digital media technologies in his/her communication.

The image of the “media periphery” can be described through linguistically specific vocabulary in the following way:

The “media periphery” is focused on local residents and representatives of local authorities interacting with them. Most of the residents are no longer young, so they mainly care about healthcare, medical care, state support in solving social problems, as well as support of local and national Russian identity through various mass festive events. For residents, news related to school education is of great importance, since school is the place where their children and grandchildren study. Residents support each other in many ways; they participate in the activities of the Russian mobilization campaign; they send mobilized sons and grandsons to the special military operation zone; they regularly collect and send humanitarian aid to soldiers, as well as to people in new Russian territories. For them, family is an indivisible whole, it’s a support and a regulator at the same time. Media residents of the “media periphery” are mostly united by local television, which is represented on the social network “VKontakte”.

These are the semantic “portraits” of the Russian “media center” and “media periphery” obtained in the course of the study.

Сonclusion

A comparison of “portraits” of “media center” and “media periphery” serves as the basis for discussing the uniqueness and general characteristics of linguistic matter.

The identity of the “media center” on the basis of linguospecific lexicon is described through such universal media topics as focus on business development, activities of government bodies, social and political activity. The specific thematic dominance in this case is the discussion of disturbing news in connection with the mobilization campaign and “the possibility of compensating for the loss of subjective well-being in conditions of turbulence” [Sumskaya 2023].The texts of the “media center” contain more diverse words related to social media.

The “media periphery” focuses on universal media topics such as social, health, local authorities and cultural celebrations. Perhaps the most important specific thematic dominants include the topic of state subsidies for the territory. A significant amount of information is related to local television.

There are significant differences in the perception of the world of the Ural media-central and media-peripheral media audiences. We can say that the semantic “portraits” of the “media center” and the “media periphery” reflect the social differences of the population in large cities and on the periphery of Russia.

The study showed that semantic fields intersect in the area of family and relationships, since family is the core of support for Russian people in difficult times. It can also be stated that the “media center” and the “media periphery” are meaningfully unified by the ideas of Russian statehood, as well as awareness and acceptance of government decisions.

* Is recognized in Russian federation as a foreign agent.

** Included by Rosfinmonitoring in the register of terrorists and extremists.

1 According to the classification of Russian settlements by the Ministry of Regional Development.

2 Semi-periphery — an intermediate link between the center and the periphery, combining the characteristics of the center and the periphery, not equal to either of them.

3 With Sketch Engine you can analyze the frequency of words in more than 100 languages, create frequency dictionaries, group lexical units into lexical-semantic fields with internal clustering and indication of the strength of connection between lexemes, extract key words and terms, analyze contexts, form a kind of dictionary entries of a word based on the Word Sketch function (which is the basis for compiling a “portrait” of a word). The program allows you to form a distributive thesaurus of a word based on a search for words with similar meaning or appearing in the same or similar context. The corpus manager can identify typical and unique word uses in the analyzed corpora compared to the referential corpus and more.

4 For example, a Google Scholar query for “Sketch Engine” yields 458,000 articles for all time (since 2004) and 28,300 full-text publications for the last incomplete 5 years (since 2020). In the Russian-language scientific field, since 2004, 1,610 articles indicate the use of the Sketch Engine program, and since 2020, only 763 publications. Curiously and maybe even tellingly, the query “Sketch Engine, Russian, regional, media” has only 46 publications since 2020.

5 Yekaterinburg is the administrative center of the Sverdlovsk region and the Ural Federal District. Bears the unofficial title of the “capital of the Urals”.

6 Nyazepetrovsk is a Russian single-industry town in the Middle Urals. Order no. 1398‑р of the Government of the Russian Federation dated July 29, 2014 included the Nyazepetrovsk urban settlement in the category of “Single-industry municipalities of the Russian Federation (monotowns) with the most difficult socio-economic situation”.

7 RuTenTen11 is a Russian-language corpus of texts collected from the Internet (https://​www​.sketch​-engine​.eu/​r​u​t​e​n​t​e​n​-​r​u​s​s​i​a​n​-​c​o​r​p​us/). Only linguistically valuable web content is included in the corpus. It contains more than 14 billion words obtained by integrating different databases (including texts of everyday communication, media, etc.). Despite the active development of the National Corpus of the Russian language, which currently includes more than two billion words (https://​ruscorpora​.ru/), the RuTenTen11 corpus is the largest collection of the Russian language used for communication.

8 A token is the smallest unit that makes up a corpus. Includes words and non-words, represents the sum of words and punctuation marks.

9 Lemma converted to lower case.

10 The difference coefficient shows the deviation from the absolute similarity of corpora. The coefficient is calculated automatically by Sketch Engine based on an analysis of vocabulary, keywords, and their compatibility (linguistic behavior). The higher the score, the more different the corpora are.

11 Gender lemma reflects terminology in the correct word form in languages that differentiate gender using adjectives and nouns.

12 I.P.M. (instances per million) — relative frequency is used to compare frequencies between corpora of different sizes. It indicates the frequency of a word per million words in a given corpus, i.e. it is reduced to numbers which are acceptable for comparing vocabulary of different absolute frequencies.

13 Word sketch is a tool for displaying word combinations. The Word sketch (in other words, “portrait” of the word) provides a complete description of an individual word and its linguistic behavior, that is, it allows us to understand how a word is combined with other words, how frequent these combinations are (the highest frequency may indicate linguistic cliches, template expressions), in what contexts the word and its combinations usually occur. The Word sketch typically displays phrases of nouns, adjectives, verbs, and adverbs only.

14 Score — strength of the corpus keyword compared to RuTenTen11.

15 Collocations are statistically stable phrases, syntactically and semantically integral units in which the choice of one word dictates the choice of another.

Ста­тья посту­пи­ла в редак­цию 7 фев­ра­ля 2024 г.;
реко­мен­до­ва­на к печа­ти 20 июня 2024 г.

© Санкт-Петер­бург­ский госу­дар­ствен­ный уни­вер­си­тет, 2024

Received: February 7, 2024
Accepted: June 20, 2024