Greynir Analyses Icelandic Linguistic Particles

Greynir Analyses Icelandic Linguistic Particles

Published July 14, 2020

Nico Borbely
Photo by
Kristín Pétursdóttir

A herculean task in the development of Icelandic language software is currently underway in an office building in the Grandi district by the Reykjavik seashore, Morgunblaðið reports. This area is home to Miðeind, a company which is working on software development for Greynir, a linguistic analysis tool and processor for the Icelandic language. 

Greynir is the brainchild of programmer Vilhjálmur Þorsteinsson, who started working on Greynir shortly after finishing Netskrafl, a popular website for playing a Scrabble-like crossword game exclusively with Icelandic words. While working on the game he got to know the Database of Modern Icelandic Inflection (BÍN) put together by the Árni Magnússon Institute for Icelandic Studies. It became apparent to him that this database could be quite valuable to the future of the Icelandic language in the digital world, if utilized properly. 

“A natural language processor is essentially software that reads Icelandic text and breaks it up into its linguistic components,” says Vilhjálmur. The Greynir program parses sentences and analyses their structure, identifies subjects, predicates, objects and other elements and places them in a sentence tree. 

The text which Greynir has practiced on is taken from news articles and official documents. The program has already covered over 600 thousand articles and documents with more than 10 million sentences, and Vilhjálmur says that the analytic success rate has been around 90%. The software understands the vast majority of words in news content, but has been much more challenged in understanding poetic or literary texts, where word order is often unusual, grammar rules are bent and sentences tend to be longer. 

It Is Important That Icelandic Remains Relevant In The Digital World 

Vilhjálmur says that the project is driven first and foremost by idealism. The software is open source, and the solutions that will be developed will be available to the public free of charge. However, there may be some opportunities for revenue in the medium term, such as subscription options for frequent users. 

“The advantage of Greynir is probably reflected in things other than hard cash,” he says. “It is extremely important to work on artificial intelligence for the Icelandic language in order to improve the language’s position if it is to remain relevant in the digital age, such as in communication with voice recognition features on smart devices. Otherwise the usefulness of the language will be greatly limited in the future.” 

Iceland is not alone in this. Most national languages face similar challenges as Icelandic, even those that have many more speakers. Vilhjálmur believes that Greynir’s work could be useful for other languages later on, but it can be said that few such projects are as ambitious as Greynir. 

Icelandic is one of the more difficult languages, with its complex grammar and rich morphology, says Vilhjálmur, and the software reflects that. Sentence structure and word order are, for example, quite flexible and variable. Therefore, Greynir’s logic responsible for parsing sentences has to investigate and choose between an exponential number of possible sentence forms. 

The project has received funding from the government’s five year language technology development program, and from the language technology fund at the Icelandic Centre for Research (Rannís). The company has six employees, soon to be seven and is looking for the eighth. The team is a mix of linguists, programmers, mathematicians and philosophers, in addition to Vilhjálmur himself, who is fortunately on his home turf, having previously worked on translating programming languages into machine code. He says that this experience has helped him when it comes to moving on to translating spoken languages. 

One problem that the Greynir staff struggled with in the beginning was verifying the quality of the programming. In English and other more widely spoken languages, there exist so-called “gold standards,” large databases with sentences compiled by linguists that can be used to judge the quality of a natural language processor like Greynir. By measuring how often the outcome is correct you get an indication of how successful the programming is so far and the challenges at hand. 

There used to be no such databases for Icelandic, but the situation is improving. “A project is underway to manually review and correct sentences that have already been mechanically parsed by Greynir,” says Vilhjálmur. The project has to date delivered around 3,000 “gold standard” sentences but will expand to about 5,000 sentences before the end of this year. 

 

Endless Opportunities 

Grammar and spelling analysis will be among the project’s first concrete applications, but the opportunities are innumerable. The next step will be to translate between Icelandic and foreign languages, mainly English for the time being. Vilhjálmur says that children are growing accustomed to interacting with devices – in English, of course – before they even know how to read. There is a lot of content for children on the Internet, and they often surf the Internet on iPads, which are also in English. The younger generations, as well as the older ones, are starting to get better at English than Icelandic when it comes to vocabulary in certain areas. Tools that allow translation and subtitling in real time are needed to reverse this trend. 

“We are at a critical inflection point for our language, but the good news is that the technology to support it in the digital age is within reach – we just have to make sure that it is adapted, developed and applied.” With a concerted effort, it should be possible to ensure that Icelandic is on equal footing with other languages in a digital future.

Note: Due to the effect the Coronavirus is having on tourism in Iceland, it’s become increasingly difficult for the Grapevine to survive. If you enjoy our content and want to help the Grapevine’s journalists do things like eat and pay rent, please consider joining our High Five Club.

You can also check out our shop, loaded with books, apparel and other cool merch, that you can buy and have delivered right to your door.

Support The Reykjavík Grapevine!
Buy subscriptions, t-shirts and more from our shop right here!

Show Me More!