Different Word Forms Counting

  • I see that new forms of the same word are marked as 'new word'. For example 'risque' and 'risques'. Does the app also count them as different words?

    It's a great app by in my opinion that wouldn't be a good idea since french has so many configuration..

  • Hi Rig. I was going to write a longer post about this, but I might as well start writing it while I answer to you. 'Risque' and 'Risques' are two different words, but they share the same lemma. The same thing happens with verbs: 'vais', 'vas', 'va', 'allons', 'allez' , 'vont' ... are all different words with the same lemma: 'aller'.

    I have literally been playing with words lately (text mining software--Rapidminer + Rosette: links below if you are interested). I downloaded all the French words taught in Lingvist. It was a total of 4795 at the time. I did the analysis. I fed them to the lemmatizer provided by Rosette and I got 3434 different lemmas. The ratio is 1.4. Some place else I had read that the typical ratio is 1.6, so Lingvist is giving us more words.

    I'm attaching the spreadsheet in case you are curious. It has two tabs: WordLingvist (the 4795 words found in Lingvist) and LemmasLingvist (the 3434 Lemmas found by Rosette).

    Link to Spreadsheet ( I tried to upload the file as an attachment to this message but it says I don't have this privilege):
    [https://s3.amazonaws.com/mirlitus/Lingvist1.xlsx](link url)

    Software used:

    [http://www.rapidminer.com](link url)
    [https://www.rosette.com/rapidminer/](link url)

Log in to reply

Recent topics: