Different Word Forms Counting
Rig Raizon last edited by
I see that new forms of the same word are marked as 'new word'. For example 'risque' and 'risques'. Does the app also count them as different words?
It's a great app by in my opinion that wouldn't be a good idea since french has so many configuration..
carlosquintanillaa last edited by carlosquintanillaa
Hi Rig. I was going to write a longer post about this, but I might as well start writing it while I answer to you. 'Risque' and 'Risques' are two different words, but they share the same lemma. The same thing happens with verbs: 'vais', 'vas', 'va', 'allons', 'allez' , 'vont' ... are all different words with the same lemma: 'aller'.
I have literally been playing with words lately (text mining software--Rapidminer + Rosette: links below if you are interested). I downloaded all the French words taught in Lingvist. It was a total of 4795 at the time. I did the analysis. I fed them to the lemmatizer provided by Rosette and I got 3434 different lemmas. The ratio is 1.4. Some place else I had read that the typical ratio is 1.6, so Lingvist is giving us more words.
I'm attaching the spreadsheet in case you are curious. It has two tabs: WordLingvist (the 4795 words found in Lingvist) and LemmasLingvist (the 3434 Lemmas found by Rosette).
Link to Spreadsheet ( I tried to upload the file as an attachment to this message but it says I don't have this privilege):