manonymous

musings on OSINT and linguistics

ChatGPT and Machine Translation

Although gallons of ink have been spilled dissecting the ethical implications and practical uses of OpenAI’s ChatGPT, on use of particular interest to OSINT analysts and linguists has been neglected: ChatGPT is one of the most advanced machine translation tools ever developed.  Unlike standard machine translation tools (Google Translate), which rely on huge corpora of parallel texts, ChatGPT pulls data from all over the web and uses natural language processing to generate wholly new text. The end result is that this tool is able to better handle poor spelling, missing words, slang, code switching, etc. In the blog post below (my first!) we’ll be exploring the potential uses of this tool for OSINT analysts and linguists.

This post will be the first in a series that explores different uses for ChatGPT’s translation and OSINT potential, and the best ways to use it.  I’ll start by comparing ChatGPT’s output to Google Translate for French translation. Future blog posts will assess ChatGPT’s ability to translate Arabic dialects, identify unknown languages, “correct” non-standard spellings in social media posts, and translate low-resource languages. Along the way, we’ll also be looking at optimizing prompting to get the best possible outputs, as well as assessing the potential pitfalls of this tool.

ChatGPT and Google Translate: Showdown of the Giants

The easiest way for me personally to compare the capabilities of these tools was to translate texts in French, which I speak natively.  It is among the best-support non-English languages in most machine translation software because of the abundance of high-quality translated texts. In later blog posts, we’ll explore other languages that have very different grammatical structures, including Swahili and Arabic.

For this exercise, I used an example of a literary text (excerpt from the fabulous novel “l’Elegance du Herisson”), a media text, and the slang-filled lyrics of a popular song from the 1980s. For literary texts and standard news texts, the outputs were largely similar, though ChatGPT seems to convert punctuation more accurately than Google Translate. The difference really manifested in the programs’ abilities to handle (outdated) slang and cultural references:

(CW: Mention of domestic violence)

Differences between the two translations are highlighted in yellow, and in every single case, ChatGPT is more accurate. Google Translate in some cases didn’t translate words (barbouze), mistranslated poor grammar or nonstandard spelling (sinon y cogne dessus should be spelled “sinon il cogne dessus,” which translates to “Otherwise he’ll hit her”), mistranslated words with alternate meanings (givre, chouraver), and completely missed idiomatic expressions (peigne cul).

Similarly, ChatGPT did a much better job translating slang-heavy social media posts in French, in this case about the World Cup:

Google Translate did a decent job with the first two bullet points (though “absence” is certainly a better translation of “forfait” than “package” in this context) but completely fell apart in the third. Google missed common abbreviations like CDM for “Coupe du Monde” (World Cup), pcq for “parceque” (because), and tt for “tout” (each, every).

However, ChatGPT is not perfect. A better translation for “zehef” is annoying or frustrating, not shocking. And unlike Google Translate, where you can look up the translation of individual words, ChatGPT does not offer such a capability.

Conclusions

Google Translate’s major advantage over ChatGPT is its transparency; I suspect ChatGPT’s lack of sourcing will be a recurring issue throughout this blog series. However, ChatGPT handles slang and abbreviations MUCH better than Google Translate, making it an invaluable tool for intermediate translators or OSINT analysts who are familiar with the language but may not be up on the latest (or oldest) slang.

I’d love to hear any other thoughts you might have on translating using ChatGPT. Do you have other examples of use cases where it worked – or didn’t? Are there other types of texts or languages you’d like to see me break down? What about slang or nonstandard spellings from elsewhere in the Francophone world? Are screenshots an effective way to show the translation outputs? I’m hoping this blog will also provide a discussion space for these kinds of questions.

Published by

Leave a comment