In recent decades, globalization has given rise to a need for more seamless multilingualism. As all of us increasingly exist in more diverse spaces and interact with people from different backgrounds, it is more important than ever that effective communication is prioritized–not solely for practical, productive purposes, but also for reasons of personal expression. Our languages, both native and learned, constitute an important part of our identity. Not being able to fully utilize them not only limits our access to people and information, but also sets arbitrary restraints on our conceptions of ourselves. This applies to the digital sphere just as much, if not more, than it applies to the real world. The way you communicate online and the content you consume shapes your identity in significant ways. How different would that identity look if it were to incorporate the full extent of your linguistic abilities? The world is multilingual–the internet needs to catch up.
On this page:
UNESCO is convinced that cultural diversity and multilingualism on the Internet have a key role to play in fostering pluralistic, equitable, open and inclusive knowledge societies. UNESCO encourages its Member States to develop comprehensive language-related policies, to allocate resources and use appropriate tools in order to promote and facilitate linguistic diversity and multilingualism, including the Internet and media. In this regard, the Organization supports the inclusion of new languages in the digital world, the creation and dissemination of content in local languages on the Internet and mass communication channels, and encourages multilingual access to digital resources in the cyberspace.UNESCO: Linguistic diversity and multilingualism on the internet
This statement by UNESCO summarizes the importance of diverse language representation online. Many people recognize the harm in preventing a person from being able to use their native language in their day-to-day life, yet fail to see how this applies to the digital sphere. The reality is that linguistic representation on the internet is far from equal.
In both number of internet users and participation online, English and Chinese dominate, according to Internet World Stats.
As of March 31st, 2020, the number of English-speaking internet users was almost 1.2 billion. The number of Chinese-speaking people using the internet was about 890 million. Combined, the number of internet users that speak either English or Chinese is around 2.1 billion, compared with the 1.5 billion people who speak any of next 8 most-used languages.
In terms of internet participation, the layout is similar.
Around 26% of all internet users speak English, and 19% speak Chinese. In other words, nearly half of all internet users speak either English or Chinese.
Of course, these statistics make sense given the high populations of English and Chinese-speaking people. However, it is not proportional to how many people actually speak these languages. In terms of number of native speakers, the lineup looks much different.
So, while English speakers constitute around 5% of the world’s population, they make up 26% of all internet users. A UNESCO report from 2008 found that “98% of the internet’s web pages are published in just 12 languages, and more than half of them are in English” (Trevino). If the representation of language on the internet matched the representation of language in the real world, we would have more online content in Spanish than in English.
The norms set around language representation online have important consequences for accessibility, the most significant being inequality of information. Speaking in 2012, researcher Daniel Prado pointed out that while Google recognizes 30 European languages, it recognizes just one African language and no indigenous American or Pacific languages. Moreover, searching in dominant languages yields more results than searching in non-dominant languages. For instance, a Google search in English will deliver four to five times more results than the same search in Arabic (The digital language divide).
Let’s take another example: Wikipedia. English is used most frequently, and there is nearly no content in many Asian and African languages. And even looking at articles in dominant languages, you still get limited results. One report found that “74% of concepts have articles in only one language and 95% of concepts are in fewer than six languages on Wikipedia. Even English – the largest and potentially most diverse edition – contains only 51% of the articles in the second-largest edition, German” (The digital language divide). Moreover, Wikipedia articles describing specific places are written largely in dominant languages. That means that, as researcher Mark Graham puts it, “rich countries largely get to define themselves and poor countries largely get defined by others” (The digital language divide).
In another study, Google Maps searches in different languages yielded different results. Those that searched for “restaurant” in English were offered different locations from those that searched the same word in Arabic and Hebrew, despite all searches being done in the same neighborhood. This raises concerns about how the internet could exacerbate issues of segregation (The digital language divide).
Finally, the internet could perpetuate the extinction of less-dominant languages. Researcher András Kornai predicts that “95% of all languages in use today will never gain traction online”. As our experience of the world relies increasingly on our interaction with the digital sphere, speakers of those languages are faced with a difficult choice: adapt or be silenced. Adaptation would mean switching to one of the 5% of languages that are well-represented, and the more people make that switch, the greater the chance that less-dominant languages will fall out of use (The digital language divide).
Language and Online Identity
Language and identity are inextricably linked. The languages you speak paint a picture not just of your ethnicity and nationality, but also of your loved ones, your history, your travels, your education, and your interests. To not be able to operate online using the full extent of your language abilities is to have a part of yourself rendered silent and invisible.
Studies have uncovered fascinating ways in which people express multilingual identity in the digital sphere, particularly on social media. Through the use of different languages, users demonstrate their membership to specific groups while also carving out new, unique identities. An example is the practice of code-switching: when users alternate between multiple languages within the same conversation. Using code-switching and other forms of language mixing is a way for people to demonstrate “hybrid identities” (Biro 40).
One study followed a group of bilingual students at a university in Hungary. The students were from a region of Romania where there is a high population of Hungarians. As a result, all of the students spoke Hungarian, Romanian and English (which they learned in school). In examining the students’ posts on Facebook, researchers discovered a complex and creative mixing of all three languages depending on the context and audience of the posts. The study concluded that “with a multilingual linguistic repertoire, the speakers’ online linguistic identity appears to be more diverse, fluid, and complex. In the digital space, users are able to perform multiple identities with the mobilization of linguistic resources and with the help of a rich mix of semiotic modes” (Biro 44).
When writing to address a wider audience, users tended to switch to English. The students also used English as a way to “self-fashionize”–that is, to align themselves with the cultural superiority attributed to English as a prestige language (Biro 44). They also very commonly mixed English and Hungarian words within the same sentence or post. When writing captions, the students would sometimes use a specific language in order to call on a certain audience. For instance, one student posted a picture that contained Romanian text but captioned the photo in Hungarian, implying that the post was meant to address speakers of both languages (Biro 45). In another example, a Hungarian-Romanian bilingual student re-posted a French music video with a French caption along with an English translation of the French caption. The resulting comments on the post were in Hungarian, Romanian, and English, resulting in a creative multilingual conversation (Biro 48). The mixing of the three languages, the researcher writes, “expresses self-fashioning while constructing multilingual identity in the globalized world” (Biro 49).
All in all, the study found that the ability to mix and switch fluidly between their languages allowed these students to demonstrate the full complexity of their identities. They used language to demonstrate membership to specific groups and also to connect with the wider English-speaking world. While the students in the study were able to achieve this seamless integration of their language repertoire on social media, it remains difficult to do across all parts of the internet. Many of the people I surveyed mentioned that they found it difficult to access online content in other languages.
One way to promote multilingual accessibility online is through effective translation tools. If you go to Instagram right now and find a post with a caption in another language, there is an option to see a translation. At best, you’ll get a passable equivalent. At worst, you’ll get a barely comprehensible jumble of words. On other websites, the option to translate content might not exist at all. The ineffectiveness of online translation not only inhibits access for speakers of non-dominant languages, but also shrinks the world for speakers of dominant languages. Many of the people I surveyed wrote that they would be very interested in viewing foreign online content if it was translated effectively, and they felt it would both broaden their horizons and help them learn more about other people and cultures.
Luckily, some steps are being taken to improve online translation tools. In 2018, Facebook announced it would add 24 new languages to their automatic translation services (Expanding automatic machine translation to more languages). Then, in 2020, it introduced M2M-100–Facebook’s new multilingual machine translation tool that can translate between any pair of 100 languages without using English. While most online translation tools use English as an intermediary language when translating from one language to another, Facebook’s new tool aims to go directly between languages, which helps to improve the accuracy of the translation. The new M2M-100 model outperforms traditional English-centric translation tools, bringing Facebook closer to its goal to serve all its users together equally, regardless of language (Introducing the First AI Model That Translates 100 Languages Without Relying on English).